Posteado por: tamaragc79 | Noviembre 9, 2009

Terminología

  • Autenticidad: Garantía del carácter genuino y fidedigno de ciertos materiales digitales, es decir, de que son lo que se afirma de ellos, ya sea objeto original o en tanto que copia conforme y fiable de un original, realizada mediante procesos perfectamente documentados.
  • Certificación: Proceso de evaluacióndel grado en el que un programa de preservación cumple con un conjunto de normas o prácticas mínimas previamente acordadas.
  • Protección de datos: Operaciones destinadas a resguardar los dígitos binarios que constituyen los objetos digitales de perdidas o de modificaciones no autorizadas.
  • Objetos conceptuales: Objetos digitales con los que el ser humano interactúa de manera comprensible para él.
  • Patrimonio digital: Conjunto de materiales digitales que poseen el suficiente valor para ser conservados para que se puedan consultar y utilizar en el futuro.
  • Preservación digital: Acciones destinadas a mantener la accesibilidad de los objetos digitales a largo plazo.
  • Identidad de objetos digitales: característica que permite distinguir un objeto digital del resto, incluidas otras versiones o copias del mismo contenido.
  • Ingesta: Operación consistente en almacenar objetos digitales, y la documentación relacionada, de manera segura y ordenada.
  • Integridad de objetos digitales: Estado de los objetos que se encuentran completos y que no han sufrido corrupción o alternación alguna no autorizada ni documentada.
  • Metadatos de preservación: Metadatos destinados a ayudar a la gestión de la preservación de materiales digitales documentando su identidad, características técnicas, medios de acceso, responsabilidad, historia, contexto y objetivos de preservación.
  • Derechos: Facultades o poderes legales que se tienen o ejercen con respecto a los materiales digitales, como son los derechos de autor, la privacidad, la confidencialidad y las restricciones nacionales o corporativas impuestas por motivos de seguridad.
  • Verificación: Acción de comprobar si un objeto digital, en un formato de fichero dado, está completo y cumple con la especificación de formato.

Fuentes:

Posteado por: tamaragc79 | Noviembre 6, 2009

Documentación tradicional vs. Documentación digital

Cuando hablamos de documento nos referimos a cualquier unidad de información que haya sido registrada en un soporte que permita su almacenamiento y su posterior recuperación, y por tanto, que permita también una consulta ilimitada.  Este tipo de documento se denomina documentación tradicional o análogica.

No obstante, cuando nos referimos a la documentación digital hay que tener en cuenta tres propiedades fundamentales:

  1. Computabilidad: La información puede ser procesada o “calculada” por un ordeandor.
  2. Virtualidad: La información digital no está sujeta a las limitaciones propias de la analógica.
  3. Capacidad: Ausencia de limitaciones prácticas en cuanto al volumen de información al que puede tener acceso en línea mediante interfaces unificados.

La documentación tradicional o análogica no comparte estas caracterísicas ya que no existe computabilidad alguna puesto que no necesita de ningún aparato para ser leída ni tampoco consta de virtualidad debido a sus limitaciones, ya que sólo ofrece texto e imagen estática, pero no sonido ni imagen animada. Por último, es importante decir que a diferencia de la documentación digital, la tradicional sí tiene limitaciones en cuanto al volumen de información al que se puede tener acceso. Sin embargo, cabe destacar dos de las ventajas de la edición tradicional: la confortabilidad y la practicabilidad ya que siempre se puede llevar y leer a cualquier parte.

Teniendo en cuenta lo previamente dicho, la documentación digital dispone de una serie de ventajas con respecto a la documentación tradicional:

  • Dispone de información multimedia (texto, sonido e imagen).
  • Es posible la recuperación de la información.
  • Interactividad (relación entre el lector y el sistema).
  • La cantidad de información por unidad de volumen es superior.
  • Permite que el editor se ajuste a los hábitos y gustos de los lactores más jóvenes.
  • Tiene acceso a los títulos.
  • Virtualidad (facilidad para ser reproducido, transmitido y almacenado)

Aún así, la documentación digital cuenta con los siguientes inconvenientes:

  • Dificultades en la distribución y la venta al detalle.
  • Mediatización (necesidad de un ordeandor).
  • Existencia de un número de aparatos de lectura considerable ( aún no masivos).
  • Poca ergonomía.
  • Poca información sobre los contenidos.
  • Precios no tan competiticos comparados con la edición tradicional.

Sources:

Posteado por: tamaragc79 | Octubre 14, 2009

Social Bookmarking

Social bookmarking is a method for Internet users to share, organize, search, and manage  bookmarks of web resources. Unlike file sharing, the resources themselves are not shared, merely bookmarks that reference them.

In a social bookmarking system users store lists of Internet resources that they consider them as useful. They also categorize the  resources with  ’tags‘ or ‘labels‘ which are words given by the users associated with the resource. Most social bookmarking services allow users to search for markers associated with certain ‘tags’ and classify in a ranking resources according to the number of users that have marked.

Nevertheless, we can find advantages and disadvantages related to this topic. On the one hand, some of the advantages are the following ones:

  • Advantages over other traditional tools as search engines. The ranking of resources is done by humans instead of machines that process information automatically according to a programme.
  • The most useful resources are marked by more users. In this sense, it creates a ranking of resources based on user criteria. It is a measure of the usefulness of resources.

On the other hand, these are the disadvantages:

  • There is no pre-established system of keywords or categories.
  • Users can create ‘tags’ too customized with little meaning for others.

In this way, the competition has made that services offer more than share bookmarks and allow votes, comments, import or export, add notes, send email links, automatic notification, rss, create groups and social networks.

Sources:



Posteado por: tamaragc79 | Mayo 30, 2008

Characteristics of translation

The FEMTI report focuses on the evaluation of MT and other language processing applications. The main characteristics of the translation task according with a FEMTI´S report are the following ones:

1. Assimilation: The ultimate purpose of the assimilation task (of which translation forms a part) is to monitor a (relatively) large volume of texts produced by people outside the organization, in (usually) several languages.

·        Document routing or sorting: The purpose of document routing / sorting is to scan incoming translated documents quickly in order to send them to the appropriate points for further processing or storage.

·        Information extraction or summarization: The purpose of information extraction or summarization is to extract some portion(s) of the translated text, either manually or automatically, for subsequent processing or storage. Information extraction is typically concerned with filling templates by identifying atomic elements of events. In contrast, summarization aims to provide a self-contained and internally cohesive text which serves as a selective account of the original.

·        Search: The goal of a search process is to identify a set of documents that together can satisfy an information need. Subtasks include refinement of the searcher’s understanding of their need, refinement of the expression of that need as a query, and recognition of relevant documents. Automated components of search systems typically accomplish only portions of the required task, leaving the searcher to assess factors (e.g., veracity and completeness) that would be difficult to detect by automated means. Searchers with limited proficiency in languages in which the document are written will require translation support to accomplish information need refinement, query reformulation, and relevant document recognition.

 2. Dissemination: The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.

·        Internal or in-house dissemination: In the case of internal / in-house dissemination the translations are sent to other people in the same organization, who share aspects of the culture, terminology, and domain knowledge to some extent.

·        External dissemination – publication: In the case of external dissemination / export / publication the translations are sent to other people in other organizations, who may not share aspects of the culture, terminology, and domain knowledge.

 3. Communication: The ultimate purpose of the communication task is to support multi-turn dialogues between people who speak different languages. The translation quality must be high enough for painless conversation, despite possible syntactically ill-formed input and idiosyncratic word and format usage. The ultimate purpose of dissemination is to deliver to others a translation of documents produced inside the organization.

·        Synchronous communication: In the case of synchronous or interactive communication, the interaction between the participants occurs in real time.

·        Asynchronous communication: In the case of asynchronous or delayed communication the interaction between participants occurs with interruption, for example by email.

 

Sources:

Posteado por: tamaragc79 | Mayo 28, 2008

Concepts relationated with translation

There are some common concepts which are relationated with the translation world.

Machine Translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies.

Computer-Assisted Translation (CAT) is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process.Computer-assisted translation is sometimes called machine-assisted, or machine-aided, translation.

Translation Technology: It is the action of interpretation of the meaning of a text, and subsequent production of an equivalent text, also called a translation, that communicates the same message in another language. The text to be translated is called the source text, and the language it is to be translated into is called the target language; the final product is sometimes called the “target text”.

 

Management in simple terms means the act of getting people together to accomplish desired goals. Management comprises planning, organizing, resourcing, leading or directing, and controlling an organization (a group of one or more people or entities) or effort for the purpose of accomplishing a goal. Resourcing encompasses the deployment and manipulation of human resources, financial resources, technological resources, and natural resources.Management can also refer to the person or people who perform the act(s) of management.

 

 

 

 

 

Sources:

·        Wikipedia, la enciclopedia libre. Última modificación: 07-05-08. Fecha de consulta: 28-05-08 from http://en.wikipedia.org/wiki/Machine_translation

·        Wikipedia, la enciclopedia libre. Última modificación: 09-05-08. Fecha de consulta: 28-05-08 from http://en.wikipedia.org/wiki/Computer-assisted_translation

·        Wikipedia, la enciclopedia libre. Última modificación: 17-05-08. Fecha de consulta: 28-05-08 from http://en.wikipedia.org/wiki/Translation_technology

·        Wikipedia, la enciclopedia libre. Última modificación: 16-05-08. Fecha de consulta: 28-05-08 from http://en.wikipedia.org/wiki/Management

 

Posteado por: tamaragc79 | Mayo 19, 2008

Grammar Induction

Grammar Induction, also known as Grammatical Inference, is the usual name given to the process of learning (inferring) a grammar from a set of sample strings, and, in view of the equivalences that may be established between grammars and automata, the task of learning an automaton from a set of sample strings may also be called grammatical inference.

 

Grammatical inference is usually formulated in terms of learning recognition or classification tasks from sets in which all strings are labelled as belonging to one or another class (language); however, tasks such as learning a finite-state machine that transducer (translates) strings from one language into strings from another language or learning a probabilistic finite-state machine that generates strings following a certain probability distribution may also be formulated as grammatical inference tasks.

Learning Recursive Transition Networks (It works by converting grammatically correct sentences into transition networks that are similar to finite state diagrams).Learning CFG using Version Spaces, Learning NPDA using Genetic Search and Learning Deterministic CFG using Connectionist Networks.

It should be also mentioned that there are different models of grammar Induction, such as learning from examples, learning using examples and queries, incremental VS non incremental learning, distribution free models of learning, learning under various distributional assumptions, Impossibility results, complexity results ans finally characterizations of representational and search biases of grammar induction algorithms.

 

Sources:

 

 

Posteado por: tamaragc79 | Mayo 7, 2008

Word Sense Disambiguation (WSD)

Word Sense Disambiguation (WSD) is one of the topics where The Stanford Natural Language Processing Group is focused on. It’s relation with the translation issue.

In computational linguistics, word sense disambiguation (WSD) is the process of identifying which sense of a word (having a number of distinct senses) is used in a given sentence. For example, consider the word bass, two distinct senses of which are:

1. A type of fish

2. Tones of low frequency

The automatic disambiguation of word senses has been an interest and concern since the earliest days of computer treatment of language in the 1950’s. Sense disambiguation is an” intermediate task” (Wilks and Stevenson, 1996) which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. It is obviously essential for language understanding applications such as message understanding, man-machine communication, etc.; it is at least helpful, and in some instances required, for applications whose aim is not language understanding:

·        Machine translation: sense disambiguation is essential for the proper translation of words.

·         Information retrieval and hypertext navigation: when searching for specific keywords, it is desirable to eliminate occurrences in documents where the word or words are used in an inappropriate sense.

·        Content and thematic analysis: a common approach to content and thematic analysis is to analyze the distribution of pre-defined categories of words across a text.

·        Grammatical analysis: sense disambiguation is useful for part of speech tagging.

·        Speech processing: sense disambiguation is required for correct phonetization of words in speech synthesis.

·        Text processing: sense disambiguation is necessary for spelling correction.

There’s are some WSD paradigms that have been proposed for machine translation (MT), which are:

·        Knowledge-based approaches: depend on manual linguistic knowledge and disambiguation rules.

·        Corpus-based approaches: make use of knowledge taken from text using machine learning techniques.

·        Hybrid approaches: mix characteristics of the two previous ones.

Nowadays, the most used ones in the recent works are the corpus-based and the hybrid techniques because they have very good results.

 

 

Sources:

·        Wikipedia, la enciclopedia libre. Última modificación: 03-05-08. Fecha de consulta: 07-05-08 from http://en.wikipedia.org/wiki/Word_sense_disambiguation

·        Ide, Nancy; Veronis, Jean: (1998), Word Sense Disambiguation: The State of the Art. Retrieved 17:28, May 13th, 2008 from http://sites.univ-provence.fr/veronis/pdf/1998wsd.pdf

 

 

 

  

 

 

 

 

 

Posteado por: tamaragc79 | Mayo 5, 2008

Topics about Machine Learning

As a broad subfield of artificial intelligence, machine learning is concerned with the design and development of algorithms and techniques that allow computers to “learn”. At a general level, there are two types of learning: inductive, and deductive. Inductive machine learning methods extract rules and patterns out of massive data sets.

The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods. Hence, machine learning is closely related not only to data mining and statistics, but also theoretical computer science.

Applications

Machine learning has a wide spectrum of applications including natural language processing, syntactic pattern recognition, search engines, medical diagnosis, bioinformatics, brain-machine interfaces and cheminformatics, detecting credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition, object recognition in computer vision, game playing and robot locomotion.

Human interaction

Some machine learning systems attempt to eliminate the need for human intuition in the analysis of the data, while others adopt a collaborative approach between human and machine. Human intuition cannot be entirely eliminated since the designer of the system must specify how the data is to be represented and what mechanisms will be used to search for a characterization of the data. Machine learning can be viewed as an attempt to automate parts of the scientific method.

Some statistical machine learning researchers create methods within the framework of Bayesian statistics.

We can find a lot of references about this topic. One of them is The Edimburgh Machine Learning Group.

The Edinburgh Machine Learning Group – part of the Institute for Adaptive and Neural Computation (ANC), School of Informatics. The group focuses on probabilistic and information theoretic approaches to machine learning problems. There is a significant emphasis on problems in image modelling/interpretation and stochastic processes, as well as an interest in manifold learning, clustering methods and signal analysis. Much of the group’s work is applied to problems of scientific inference, including research in areas of astronomy, remote sensing, meteorology, medical imaging, medical signal processing, neuro-informatics and bio-informatics.

This Machine Learning has strong associations with other groups within and out with the School of Informatics. It also has a paper discussion meeting approximately fortnightly. This is called the Probabilistic Inference Group (PIGS). As we can see, it is an interesting Machine Learning which has important and useful works.

Sources:

Posteado por: tamaragc79 | Abril 28, 2008

Three of Research Centres

The most important research topics which are relationated with the Human Language Technologies are the following ones:

1. The German Research Centre

 The German Competence Centre for Language Technology was founded as a public service to the global R&D community in language technology and neighbouring technology areas, to the German IT industry and to German companies and other organizations planning to employ language technology applications. In addition to a wide variety of free service functions, the Centre also offers highly specialized professional services to private customers.

The Competence Centre comprises the following components:

·        The Virtual Information Centre “Language Technology World”, the world’s most comprehensive information resource about speech and language technology.

·        The Demonstration Centre in Saarbrücken, which offers interested parties the possibility to play and experiment with different speech and language technologies, or to attend guided demonstrations.

·        The Evaluation Centre, which conducts evaluations of the overall usability of language technology systems and advances knowledge of relevant usability issues and evaluation methods.

   2. The Edimburgh Language Technology Group

·        Combining Shallow Semantics and Domain knowledge (EASIE).

·        Text Mining for Biomedical Content Curation (TXM).

·        Cross-retail multi-agent Retail Comparison (CROSSMARC).

·        Smart Qualitative Data: Methods and Community tools for Data Mark-up (SQUAD).

·        Machine learning for named Entity Recognition (SEER).

·        Joint Action Science and Technology (JAST).

·        Study of how pairs collaborate when in planning a route on a map (Collaborating using diagrams) 

 

3. The Austrian Research Institute for Artificial Intelligence (OFAI)

In the area of language and speech processing we are conducting both basic and applied research. We develop linguistic resources and processes as well as application prototypes

 

      3.1 Linguistic resources and processes

 

·        Typed unification-based grammar formalisms 

·        Development of a HPSG-based grammar for German 

·        Natural Language Generation 

·        Speech Synthesis 

·        Computational Morphology 

      3.2 Application Prototypes

 

·        Natural language interfaces and advisory systems

·        Concept-to-speech systems

 

 

 

Sources:

·     The German Research Centre for Artificial Intelligence from http://www.dfki.de/lt/lt-general.php

·     The Edimburgh Language Technology Group from http://www.ltg.ed.ac.uk/projects

·     The Austrian Research Institute for Artificial Intelligence (OFAI) from http://www.ofai.at/research/nlu/

 

Posteado por: tamaragc79 | Abril 23, 2008

Computer-Assisted Language Learning (CALL)

Computer-assisted language learning (CALL) is a form of computer-based learning which carries two important features: bidirectional learning and individualized learning. It is not a method. CALL materials are tools for learning. The focus of CALL is learning, and not teaching. CALL materials are used in teaching to facilitate the language learning process. It is a student-centered learning material, which promotes self-paced learning.However, there are different definitions about this term. Some people may call it a courseware, an educational computerized program.Others may call CALL an approach to teaching and learning foreign languages whereby the computer and computer-based resources such as the Internet are used to present, reinforce and assess material to be learned.

The integration of computers as a way of learning foreign languages is a fact increasingly developed among suppliers and language materials. Nowadays there is a tendency of using CALL didactid materials for independent learning, without the supervision of a proffesor.

On the other hand, the gradual integration of technology in classrooms over the last twenty years reflects the technological developments that those technologies are undergoing. The tools used by this new system (CALL) enable the professor and the students to have a more interactive teaching and learning, because they use the internet and digital platforms where the students can work on and evaluate their own knowledge and progresses on the language they are focused on.

Moreover, Computer Assisted Language Learning (CALL) has highly developed in Europe and the USA. There are organizations which centre their investigations on this topic such as APACALL (Asia-Pacific Association for CALL), CALICO (US-based professional association devoted to CALL. Manages a regular annual conference.), EUROCALL (Europe-based professional association devoted to CALL. Manages a regular annual conference.) or PacCALL (Professional CALL association in the Pacific: from East to Southeast Asia, Oceania, across to the Americas.).

This new teaching tool continues improving the learning of foreign languages and making them more accesible to those who use this method.

 

 

 

     

Sources:

Entradas antiguas »

Categorías