Machine Translation Archive

(www.MT-Archive.info)

Introduction and guide to usage

[to return to home page click here]

This electronic repository includes copies of articles, books and papers in the field of machine translation and computer-based translation technology. In a few cases publications are accessed by links to other sites. The archive includes also index entries for publications which for copyright reasons cannot be located on this site.

Coverage. The archive contains only English-language publications – although, in due course, some publications in other languages may be included selectively. It covers publications on all aspects of machine translation and computer-assisted translation, translation memories, and translation tools; it includes also publications in related areas of interest to researchers in the field, such as controlled languages, cross-language information retrieval, information extraction, multilingual resources, terminology, etc. The proceedings of conferences devoted to machine translation (and computer-assisted translation) are being covered in full; for other conferences, papers are being included selectively.

The ultimate aim is completeness. In the first instance an effort will be made to cover comprehensively publications since 2000. The next priorities are publications from the mid 1980s to the late 1990s and then selectively from the earliest years of MT in the 1950s and 1960s. The goal of comprehensiveness means that some papers included are known to be inaccurate, misleading or ill-informed – particularly some articles from popular magazines and the internet. Caveat lector!

The Machine Translation Archive does not include information about current commercial systems (except when described in papers). For such information see the Compendium of translation software  (http://www.hutchinsweb.me.uk/Compendium.htm).

Copyright. All publications are the copyright of authors (except where the copyright is held by a publisher). In general, any material may be used and copied for teaching and research purposes, and no material may be used for commercial purposes without permission of authors.  Items copied from the ACL Anthology are copyright of the Association for Computational Linguistics and subject to a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Licence.

Citations. Every effort has been made to ensure the correctness and completeness of the bibliographical details. When citing articles it is recommended that these full details are given - plus, if and when appropriate, this source (http://www.mt-archive.info) and the file name. File names consist generally of an abbreviation for the conference name, for organizations holding the conference, or for the journal title, followed by the year and the name of the first author. For example ‘http://www.mt-archive.info/AMTA-1994-White’ refers to a paper given by John White at the 1994 conference of the Association for Machine Translation in the Americas. Users are assured that these file names will not be changed, so that references to them will always be valid. Note that file labels are case-sensitive, so that using e.g. ‘amta’ instead of ‘AMTA’ may result in failure to access a file.

Format. Most publications are provided in PDF format (sometimes converted from PostScript or PowerPoint); a few are in HTML format. As far as possible, publications have been scanned (and checked for typographical mistakes) from original hard copies. However, the reproduction and legibility of some PDF files taken from other websites are sometimes poor. In due course some of these will be re-scanned.

Indexes. All publications are listed in six indexes. In the index of authors they are entered under the names of all authors (in as full forms as can be ascertained). In the index of organizations, they are under the names of the institutions and organizations with which authors are affiliated and/or where or for which the research is undertaken. In the index of systems, they are entered under the names of projects or systems which are mentioned. Other indexes are those of languages and language pairs treated in publications, of methods, techniques, and other computational and linguistic topics, and of applications and other issues affecting the use of systems.

Indexes for institutions/organizations, for applications, and for methods/techniques/etc. are divided into appropriate time periods (currently: 2000 to the present, 1990-1999, and pre-1990), with future divisions into 5 and 10 year periods as necessary.

The index of authors is divided alphabetically (with subdivisions introduced as necessary); note that names beginning Mc are ‘spelled out’ as Mac, and diacritics are ignored (i.e. ü is filed as if u, ø as if o, å as if a, č as if c, etc.)  Indexes for languages and for systems are also divided alphabetically.

Note that in all the indexes the publications under a heading or name are listed in reverse chronological order.

Conferences and journals. For those conference proceedings which are included complete in the Archive, users will find tables of contents accessed from the index of conferences. From this index users will also be able to find a list of conferences devoted wholly or partly to machine translation, whether or not the proceedings have yet been included in the Archive. There are also tables of contents for those journals which are included so far; these are accessed from the index of journals.

Web sites. Links to the personal websites of individual researchers may be useful for tracing publications which have not (yet) been included in the Archive. No assurance can be given that these links will remain constant; the compiler welcomes news of any changes and suggestions for additions. The same caveat applies to the links for individual publications.