Machine Translation Archive

(www.MT-Archive.info)

Introduction and guide to usage

The aim of this electronic archive is to provide a permanent on-line location for a comprehensive collection of articles, books and papers in the field of machine translation and computer-based translation technology. The primary aim of the archive is comprehensive coverage of publications which are difficult to find or obtain from the usual sources; this is true particularly for the proceedings of conferences which have not been published by well-known commercial book publishers. For the sake of completeness, the archive also includes index entries for publications which for copyright reasons cannot be located on this site.

Coverage. The archive contains only English-language publications – although, in due course, some publications in other languages may be included selectively (a preliminary index of some French-language publications is now available). The archive covers publications on all aspects of machine translation and computer-assisted translation, translation memories, and translation tools; it includes also publications in related areas of interest to researchers in the field, such as controlled languages, cross-language information retrieval, information extraction, multilingual resources, terminology, etc. The proceedings of conferences devoted to machine translation (and computer-assisted translation) are being covered in full; for other conferences, papers are being included selectively (for details see below.)

The ultimate aim is completeness. In the first instance an effort will be made to cover comprehensively publications since 1990. The next priorities are publications from the mid 1970s to the late 1980s and then selectively from the earliest years of MT in the 1950s and 1960s. The goal of comprehensiveness means that some papers included are known to be inaccurate, misleading or ill-informed – particularly some articles from popular magazines and the internet. Caveat lector!

The Machine Translation Archive does not include information about current commercial systems (except when described in papers). For such information see the Compendium of translation software on the EAMT website (http://www.eamt.org/soft_comp.php).

Copyright. All publications are the copyright of authors (except where the copyright is held by a publisher). In general, any material may be used and copied for teaching and research purposes.  Permission is given under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. Permission to download is not given, therefore, to any individuals or organizations which charge or intend to charge users (by fees or by subscriptions) for materials on their databases.

Citations. Every effort has been made to ensure the correctness and completeness of the bibliographical details. When citing articles it is recommended that these full details are given - plus, if and where appropriate, this source (http://www.mt-archive.info) and the file name. File names consist generally of an abbreviation for the conference name, for organizations holding the conference, or for the journal title, followed by the year and the name of the first author. For example ‘http://www.mt-archive.info/AMTA-1994-White’ refers to a paper given by John White at the 1994 conference of the Association for Machine Translation in the Americas. Users are assured that these file names will not be changed, so that references to them will always be valid. Note that file labels are case-sensitive, so that using e.g. ‘amta’ instead of ‘AMTA’ will result in failure to access a file.

Format. Publications are provided in PDF format (sometimes converted from PostScript or PowerPoint). As far as possible, publications have been scanned (and checked by the compiler for typographical mistakes) from the original hard copies. However, the reproduction and legibility of some PDF files are sometimes poor. In due course some of these will be re-scanned.

Indexes. All publications are listed in six indexes. In the index of authors they are entered under the names of all authors (in as full forms as can be ascertained). In the index of organizations, they are under the names of the institutions and organizations with which authors are affiliated and/or where or for which the research is undertaken. Institutions, organizations and companies are grouped by country. In the index of systems, they are entered under the names of projects or systems which are mentioned. Other indexes are those of languages and language pairs treated in publications, of methods, techniques, and other computational and linguistic topics, and of applications and other issues affecting the use of systems.

Indexes for institutions/organizations, for applications, and for methods/techniques/etc. are divided into appropriate time periods (currently: 2010 to the present, 2005-2009, 2000-2004, 1990-1999, and pre-1990).

The index of authors is divided alphabetically; note that names beginning Mc are ‘spelled out’ as Mac, and that diacritics are ignored (i.e. ü is filed as if u, ø as if o, å as if a, č as if c, etc.)  Indexes for languages and for systems are also divided alphabetically.

Note that in all the indexes the publications under a heading or name are listed in reverse chronological order.

Conferences and journals. For those conference proceedings which are included complete in the Archive, users will find tables of contents accessed from the index of conferences. In a second index users can find a list of conferences from which articles have been selectively included. A third index lists conferences, whether devoted wholly or partly to machine translation, which have yet to be included in the Archive. There are also tables of contents for those journals which are included so far; these are accessed from the index of journals.

Web sites. Links to the personal websites of individual researchers may be useful for tracing publications which have not (yet) been included in the Archive. No assurance can be given that these links will remain up-to-date; the compiler welcomes the assistance of any collaborator and news of any changes and suggestions for additions.