Posteado por: tamaragc79 | Mayo 30, 2008

Characteristics of translation

The FEMTI report focuses on the evaluation of MT and other language processing applications. The main characteristics of the translation task according with a FEMTI´S report are the following ones:

1. Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.

·        Document routing or sorting: The purpose of document routing / sorting is to scan incoming translated documents quickly in order to send them to the appropriate points for further processing or storage.

·        Information extraction or summarization: The purpose of information extraction or summarization is to extract some portion(s) of the translated text, either manually or automatically, for subsequent processing or storage. Information extraction is typically concerned with filling templates by identifying atomic elements of events. In contrast, summarization aims to provide a self-contained and internally cohesive text which serves as a selective account of the original.

·        Search: The goal of a search process is to identify a set of documents that together can satisfy an information need. Subtasks include refinement of the searcher’s understanding of their need, refinement of the expression of that need as a query, and recognition of relevant documents. Automated components of search systems typically accomplish only portions of the required task, leaving the searcher to assess factors (e.g., veracity and completeness) that would be difficult to detect by automated means. Searchers with limited proficiency in languages in which the document are written will require translation support to accomplish information need refinement, query reformulation, and relevant document recognition.

 2. Dissemination: The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.

·        Internal or in-house dissemination: In the case of internal / in-house dissemination the translations are sent to other people in the same organization, who share aspects of the culture, terminology, and domain knowledge to some extent.

·        External dissemination – publication: In the case of external dissemination / export / publication the translations are sent to other people in other organizations, who may not share aspects of the culture, terminology, and domain knowledge.

 3. Communication: The ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage. The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.

·        Synchronous communication: In the case of synchronous or interactive communication, the interaction between the participants occurs in real time.

·        Asynchronous communication: In the case of asynchronous or delayed communication the interaction between participants occurs with interruption, for example by email.

 

Sources:

Posteado por: tamaragc79 | Mayo 28, 2008

Concepts relationated with translation

There are some common concepts which are relationated with the translation world.

Machine Translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Computer-Assisted Translation (CAT) is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process.Computer-assisted translation is sometimes called machine-assisted, or machine-aided, translation.

Translation Technology: It is the action of interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the source text, and the language it is to be translated into is called the target language; the final product is sometimes called the “target text”.

 

Management in simple terms means the act of getting people together to accomplish desired goals. Management comprises planning, organizing, resourcing, leading or directing, and controlling an organization (a group of one or more people or entities) or effort for the purpose of accomplishing a goal. Resourcing encompasses the deployment and manipulation of human resources, financial resources, technological resources, and natural resources.Management can also refer to the person or people who perform the act(s) of management.

 

 

 

 

 

Sources:

·        Wikipedia, la enciclopedia libre. Última modificación: 07-05-08. Fecha de consulta: 28-05-08 from http://en.wikipedia.org/wiki/Machine_translation

·        Wikipedia, la enciclopedia libre. Última modificación: 09-05-08. Fecha de consulta: 28-05-08 from http://en.wikipedia.org/wiki/Computer-assisted_translation

·        Wikipedia, la enciclopedia libre. Última modificación: 17-05-08. Fecha de consulta: 28-05-08 from http://en.wikipedia.org/wiki/Translation_technology

·        Wikipedia, la enciclopedia libre. Última modificación: 16-05-08. Fecha de consulta: 28-05-08 from http://en.wikipedia.org/wiki/Management

 

Posteado por: tamaragc79 | Mayo 19, 2008

Grammar Induction

Grammar Induction, also known as Grammatical Inference, is the usual name given to the process of learning (inferring) a grammar from a set of sample strings, and, in view of the equivalences that may be established between grammars and automata, the task of learning an automaton from a set of sample strings may also be called grammatical inference.

 

Grammatical inference is usually formulated in terms of learning recognition or classification tasks from sets in which all strings are labelled as belonging to one or another class (language); however, tasks such as learning a finite-state machine that transducer (translates) strings from one language into strings from another language or learning a probabilistic finite-state machine that generates strings following a certain probability distribution may also be formulated as grammatical inference tasks.

Learning Recursive Transition Networks (It works by converting grammatically correct sentences into transition networks that are similar to finite state diagrams).Learning CFG using Version Spaces, Learning NPDA using Genetic Search and Learning Deterministic CFG using Connectionist Networks.

It should be also mentioned that there are different models of grammar Induction, such as learning from examples, learning using examples and queries, incremental VS non incremental learning, distribution free models of learning, learning under various distributional assumptions, Impossibility results, complexity results ans finally characterizations of representational and search biases of grammar induction algorithms.

 

Sources:

 

 

Posteado por: tamaragc79 | Mayo 7, 2008

Word Sense Disambiguation (WSD)

Word Sense Disambiguation (WSD) is one of the topics where The Stanford Natural Language Processing Group is focused on. It’s relation with the translation issue.

In computational linguistics, word sense disambiguation (WSD) is the process of identifying which sense of a word (having a number of distinct senses) is used in a given sentence. For example, consider the word bass, two distinct senses of which are:

1. A type of fish

2. Tones of low frequency

The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950’s. Sense disambiguation is an” intermediate task” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. It is obviously essential for language understanding applications such as message understanding, man-machine communication, etc.; it is at least helpful, and in some instances required, for applications whose aim is not language understanding:

·        Machine translation: sense disambiguation is essential for the proper translation of words.

·         Information retrieval and hypertext navigation: when searching for specific keywords, it is desirable to eliminate occurrences in documents where the word or words are used in an inappropriate sense.

·        Content and thematic analysis: a common approach to content and thematic analysis is to analyze the distribution of pre-defined categories of words across a text.

·        Grammatical analysis: sense disambiguation is useful for part of speech tagging.

·        Speech processing: sense disambiguation is required for correct phonetization of words in speech synthesis.

·        Text processing: sense disambiguation is necessary for spelling correction.

There’s are some WSD paradigms that have been proposed for machine translation (MT), which are:

·        Knowledge-based approaches: depend on manual linguistic knowledge and disambiguation rules.

·        Corpus-based approaches: make use of knowledge taken from text using machine learning techniques.

·        Hybrid approaches: mix characteristics of the two previous ones.

Nowadays, the most used ones in the recent works are the corpus-based and the hybrid techniques because they have very good results.

 

 

Sources:

·        Wikipedia, la enciclopedia libre. Última modificación: 03-05-08. Fecha de consulta: 07-05-08 from http://en.wikipedia.org/wiki/Word_sense_disambiguation

·        Ide, Nancy; Veronis, Jean: (1998), Word Sense Disambiguation: The State of the Art. Retrieved 17:28, May 13th, 2008 from http://sites.univ-provence.fr/veronis/pdf/1998wsd.pdf

 

 

 

  

 

 

 

 

 

Posteado por: tamaragc79 | Mayo 5, 2008

Topics about Machine Learning

As a broad subfield of artificial intelligence, machine learning is concerned with the design and development of algorithms and techniques that allow computers to “learn”. At a general level, there are two types of learning: inductive, and deductive. Inductive machine learning methods extract rules and patterns out of massive data sets.

The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. Hence, machine learning is closely related not only to data mining and statistics, but also theoretical computer science.

Applications

Machine learning has a wide spectrum of applications including natural language processing, syntactic pattern recognition, search engines, medical diagnosis, bioinformatics, brain-machine interfaces and cheminformatics, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, object recognition in computer vision, game playing and robot locomotion.

Human interaction

Some machine learning systems attempt to eliminate the need for human intuition in the analysis of the data, while others adopt a collaborative approach between human and machine. Human intuition cannot be entirely eliminated since the designer of the system must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data. Machine learning can be viewed as an attempt to automate parts of the scientific method.

Some statistical machine learning researchers create methods within the framework of Bayesian statistics.

We can find a lot of references about this topic. One of them is The Edimburgh Machine Learning Group.

The Edinburgh Machine Learning Group – part of the Institute for Adaptive and Neural Computation (ANC), School of Informatics. The group focuses on probabilistic and information theoretic approaches to machine learning problems. There is a significant emphasis on problems in image modelling/interpretation and stochastic processes, as well as an interest in manifold learning, clustering methods and signal analysis. Much of the group’s work is applied to problems of scientific inference, including research in areas of astronomy, remote sensing, meteorology, medical imaging, medical signal processing, neuro-informatics and bio-informatics.

This Machine Learning has strong associations with other groups within and out with the School of Informatics. It also has a paper discussion meeting approximately fortnightly. This is called the Probabilistic Inference Group (PIGS). As we can see, it is an interesting Machine Learning which has important and useful works.

Sources:

Posteado por: tamaragc79 | Abril 28, 2008

Three of Research Centres

The most important research topics which are relationated with the Human Language Technologies are the following ones:

1. The German Research Centre

 The German Competence Centre for Language Technology was founded as a public service to the global R&D community in language technology and neighbouring technology areas, to the German IT industry and to German companies and other organizations planning to employ language technology applications. In addition to a wide variety of free service functions, the Centre also offers highly specialized professional services to private customers.

The Competence Centre comprises the following components:

·        The Virtual Information Centre “Language Technology World”, the world’s most comprehensive information resource about speech and language technology.

·        The Demonstration Centre in Saarbrücken, which offers interested parties the possibility to play and experiment with different speech and language technologies, or to attend guided demonstrations.

·        The Evaluation Centre, which conducts evaluations of the overall usability of language technology systems and advances knowledge of relevant usability issues and evaluation methods.

   2. The Edimburgh Language Technology Group

·        Combining Shallow Semantics and Domain knowledge (EASIE).

·        Text Mining for Biomedical Content Curation (TXM).

·        Cross-retail multi-agent Retail Comparison (CROSSMARC).

·        Smart Qualitative Data: Methods and Community tools for Data Mark-up (SQUAD).

·        Machine learning for named Entity Recognition (SEER).

·        Joint Action Science and Technology (JAST).

·        Study of how pairs collaborate when in planning a route on a map (Collaborating using diagrams) 

 

3. The Austrian Research Institute for Artificial Intelligence (OFAI)

In the area of language and speech processing we are conducting both basic and applied research. We develop linguistic resources and processes as well as application prototypes

 

      3.1 Linguistic resources and processes

 

·        Typed unification-based grammar formalisms 

·        Development of a HPSG-based grammar for German 

·        Natural Language Generation 

·        Speech Synthesis 

·        Computational Morphology 

      3.2 Application Prototypes

 

·        Natural language interfaces and advisory systems

·        Concept-to-speech systems

 

 

 

Sources:

·     The German Research Centre for Artificial Intelligence from http://www.dfki.de/lt/lt-general.php

·     The Edimburgh Language Technology Group from http://www.ltg.ed.ac.uk/projects

·     The Austrian Research Institute for Artificial Intelligence (OFAI) from http://www.ofai.at/research/nlu/

 

Posteado por: tamaragc79 | Abril 23, 2008

Computer-Assisted Language Learning (CALL)

Computer-assisted language learning (CALL) is a form of computer-based learning which carries two important features: bidirectional learning and individualized learning. It is not a method. CALL materials are tools for learning. The focus of CALL is learning, and not teaching. CALL materials are used in teaching to facilitate the language learning process. It is a student-centered learning material, which promotes self-paced learning.However, there are different definitions about this term. Some people may call it a courseware, an educational computerized program.Others may call CALL an approach to teaching and learning foreign languages whereby the computer and computer-based resources such as the Internet are used to present, reinforce and assess material to be learned.

The integration of computers as a way of learning foreign languages is a fact increasingly developed among suppliers and language materials. Nowadays there is a tendency of using CALL didactid materials for independent learning, without the supervision of a proffesor.

On the other hand, the gradual integration of technology in classrooms over the last twenty years reflects the technological developments that those technologies are undergoing. The tools used by this new system (CALL) enable the professor and the students to have a more interactive teaching and learning, because they use the internet and digital platforms where the students can work on and evaluate their own knowledge and progresses on the language they are focused on.

Moreover, Computer Assisted Language Learning (CALL) has highly developed in Europe and the USA. There are organizations which centre their investigations on this topic such as APACALL (Asia-Pacific Association for CALL), CALICO (US-based professional association devoted to CALL. Manages a regular annual conference.), EUROCALL (Europe-based professional association devoted to CALL. Manages a regular annual conference.) or PacCALL (Professional CALL association in the Pacific: from East to Southeast Asia, Oceania, across to the Americas.).

This new teaching tool continues improving the learning of foreign languages and making them more accesible to those who use this method.

 

 

 

     

Sources:

Posteado por: tamaragc79 | Abril 16, 2008

HLT’S Research Centres

We can find a lot of research Centres, europeans and internationals, which tell us more things about the Human Languages Technologies. These research Centres are available on the Net. Here there are some of them:   

  • National Centre for Language Technology (NCLT): Language is the key modality in communication. The National Centre for Language Technology conducts research into the processing of human language by computers such as speech recognition and synthesis, machine translation, human-computer interfaces, information retrieval and extraction, the teaching and learning of languages using computers and software localization and globalization. Research in Human Language Technology (HLT) is interdisciplinary and includes Natural Language Processing (NLP) and Computational Linguistics (CL). HLT has substantial economic implications and potential. The centre carries out basic research and develops applications.

 

  • HKUST Human Language Technology Centre: It is a multidisciplinary research centre at the Hong Kong University of Science and Technology (HKUST) whose mission is to lead state-of-the-art research directions that drive the development of new applications in both text and spoken language technology. HLTC is led by seven faculty members from de EE and the CS departments: Oscar Au, Roland Chin, Pascale Fung, Brian Mak, Bertram Shi, Manhung Siu, and Dekai Wu, specializing in speech and signal processing, statistical and corpus-based natural language processing, machine translation, text mining, information extraction, Chinese Language processing, knowledge management, and related fields. Special emphasis is given to machine processing of Chinese language and Chinese information. Systems built at HLCT include automated language translation for the Internet, speech-based web browsing, and speech recognition for the telephone.

 

Sources:

     

 

 

 

 

 

Posteado por: tamaragc79 | Abril 14, 2008

Hans Uszkoreit

Hans Uszkoreit is a Professor of Computational Linguistics at the Dept. of Computational Linguistics and Phonetics of Saarland University at Saarbrücken. Scientific Director at the German Research Center for Artificial Intelligence (DFKI) and Head of DFKI Language Technology Lab.

Uszkoreit studied Linguistics and Computer Science at the Technical University of Berlin and the University of Texas. He began working in a traslation research at the Linguistics Research Center, where he began his succesful career. He worked as a computer scientist at the Artificial Intelligence center of SRI International in Melon Park as well.

In 1988 Uszkoreit was appointed to a newly created chair of Computational Linguistics at Saarland University and started the Department of Computational Linguistics and Phonetics. In 1989 he became the head of the newly founded Language Technology Lab at DFKI. He has been a co-founder and principal investigator of the Special Collaborative Research Division (SFB 378) “Resource-Adaptive Cognitive Processes” of the DFG (German Science Foundation). He is also co-founder and professor of the “European Postgraduate Program Language Technology and Cognitive Systems”, a joint Ph.D. program with the University of Edinburgh.

Uszkoreit is Permanent Member of the International Committee of Computational Linguistics (ICCL), Member of the European Academy of Sciences, Past President of the European Association for Logic, Language and Information, Member of the Executive Board of the European Network of Language and Speech, Member of the Board of the European Language Resources Association (ELRA), and serves on several international editorial and advisory boards.  He is co-founder and Board Member of XtraMind Technologies GmbH, Saarbruecken, acrolinx gmbh, Berlin and Yocoy Technologies GmbH, Berlin. Since 2006, he serves as Chairman of the Board of Directors of the international initiative dropping knowledge.

His current research interests are computer models of natural language understanding and production, advanced applications of language and knowledge technologies such as semantic information systems, translingual technologies, cognitive foundations of language and knowledge, deep linguistic processing of natural language, syntax and semantics of natural language and the grammar of German.

 

 

 

Sources:

 

 

 

 

 

 

 

Posteado por: tamaragc79 | Abril 7, 2008

Human Language Technologies

Language technology is often called human language technology (HLT) or natural language processing (NLP) and consists of computational linguistics (or CL) and speech technology as its core but includes also many application oriented aspects of them. Language technology is closely connected to computer science and general linguistics.

As Hans Uszkoreit says Language technology — sometimes also referred to as human language technology — comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing or modifying texts and speech. These systems must be based on some knowledge of human language. Therefore language technology defines the engineering branch of computational linguistics.

Hans Uszkoreit’s book “Language Technology A First Overview says that
Language Technologies are information technologies that are specialized for dealing with the most complex information medium in our world: human language. Therefore rhese technologies are also subsumed under the term Human Language Technology. Human language occurs in spoken and written form. Whereas speech is the oldest and most natural mode of language comunication, complex information and most of human knowledge is maintained and transmitted in written texts. Speech and text technologies process or produce language in these two modes of realization. But language also has aspects that are shared between speech and txt such as dictionaries, most of grammar and the meaning of sentences. Thus large parts of language technology cannot be subsumed under speech and text techmologies. Among those are technologies that link language to knowledge. We de not know how language, knowledge and thought are represented in the human brain. Nevertheless, language technology had to create formal representation systems that link language to concepts and tasks in the real world. This provides the interface to the fast growing area of knowledge technologies.

In our communication we mix language with other modes of communication and other information media. We combine speech with gesture and facial expressions. Digital texts are combined with pictures and sounds. Movies may contain language and spoken and written form. Thus speech and text technologies overlap and interact with many other technologies that facilitate processing of multimodal communication and multimedia documents”.

Sources:

Entradas antiguas »

Categorías