Wednesday, September 14, 2005

Machine translation (computer-based translation) Publications by John Hutchins


Articles, books and papers about machine translation and computer-based translation tools, the historical development and current use of computers for the translation of natural languages. The collection includes general articles, surveys of contemporary developments, and historical works. At a later date more may be added. Most are in PDF format, but some are HTML.


All publications © W.John Hutchins (except otherwise indicated). References and citations should be made to original sources and not only to the website copies.

General works Brief overviews
Contemporary surveys
Powerpoint presentations

Historical works: general Historical: 1980s and 1990s
Historical: 1960s and 1970s
Historical: pioneers Historical: short papers

Directory of current systems


*
Compendium of translation software: commercial machine translation systems and computer-based translation support tools. [Geneva]: European Association for Machine Translation. [PDF files]

General works

*
Towards a definition of example-based machine translation. To be presented at the MT Summit X workshop on Example-Based Machine Translation, Phuket, Thailand, 16 September 2005. [PDF, 170KB]
Machine translation today and tomorrow. In Computerlinguistik: was geht, was kommt? Festschrift für Winfried Lenders, hrsg. Gerd Willée, Bernhard Schröder, Hans-Christian Schmitz. Sankt Augustin: Gardez! Verlag, 2002, pp.159-162. [PDF, 81KB]

Towards a new vision for MT. Introductory speech at the 'MT Summit VIII' conference, 18-22 September 2001, Santiago de Compostela, Galicia, Spain. [Also available on the EAMT website] [PDF, 96KB]
Machine translation and human translation: in competition or in complementation? International Journal of Translation, vol.13, no.1-2, Jan-Dec 2001, pp. 5-20. Special theme issue on machine translation, [editor] Michael S. Blekhman. Also in: Machine translation theory & practice, edited by Michael S. Blekhman. New Delhi: Bahri Publications, 2001. (BP Series in Translation Studies, 8). [PDF, 113KB]

Retrospect and prospect in computer-based translation In Machine Translation Summit VII, 13th-17th September 1999, Kent Ridge Digital Labs, Singapore. Proceedings of MT Summit VII "MT in the great translation era", 30-34. [Tokyo]: Asia-Pacific Association for Machine Translation, 1999. [PDF, 125KB]

The IAMT Certification initiative and defining translation system categories. Paper given at
EAMT Workshop, Ljubljana, May 2000. [Description of current work on developing a categorisation of systems for vendors and purchasers.] [PDF] [Available also via the EAMT archives.]

Evaluation of machine translation and translation tools In: Survey of the state of the art in human language technology. Managing editors: Giovanni Battista Varile [and] Antonio Zampolli. Editorial board: Ronald A.Cole (editor in chief), Joseph Mariani, Hans Uszkoreit, Annie Zaenen, Victor Zue. Cambridge: Cambridge University Press; [and] Pisa: Giardini Editori e Stampatori, 1997 [i.e. 1998], pp. 418-419. (Linguistica Computazionale XII-XIII.) [PDF, 85KB]

Reflections on the history and present state of machine
translation. In: MT Summit V proceedings, Luxembourg, July 10-13, 1995. [pp. 89-96] [PDF, 106KB]

An introduction to machine translation [jointly with Harold L.Somers]. London: Academic Press, 1992. xxi, 362pp. [ISBN: 0-12-362830-x.] All chapters are now available on this website.

Why computers do not translate better. In: Translating and the Computer 13: the theory and the practice of machine translation - a marriage of convenience? Papers presented at a conference jointly sponsored by Aslib, The Association for Information Management; Aslib Technical Translation Group; Institute of Translation and Interpreting, 18-19 November 1991, CBI Conference Centre, London WC1. London: Aslib, 1992. p.3-16. {Reprinted in: Aslib Proceedings 44 (10), October 1992, p.351-359.} [PDF, 151KB]

Prospects in machine translation. In: Machine Translation Summit [Proceedings of conference held September 17-19, 1987, Hakone Prince Hotel, Japan.] Editor-in-chief: Makoto Nagao. Tokyo: Ohmsha Ltd., 1989. pp. 7-12. [PDF, 111KB]

Brief overviews
*
Machine translation: general overview. In: The Oxford Handbook of Computational Linguistics. Edited by Ruslan Mitkov (Oxford: University Press, 2003), pp. 501-511.

*
Commercial systems: the state of the art. In: Computers and translation: a translator's guide. Edited by Harold Somers (Amsterdam: John Benjamins, 2003), pp. 161-174.

Commercial systems: the state of the art in 1999. [Unpublished earlier version of the paper in Somers' collection, reflecting the situation in mid 1999; PDF, 136KB]

Commercial systems: the state of the art in 2001 [Unpublished earlier version of the papers in Somers' collection, reflecting the situation in mid 2001; PDF, 132KB]

Translation technology and the translator. In: ITI Conference 11. Proceedings [of] International conference, exhibition & AGM, 8-10 May 1997. Compiled by Catherine Greensmith & Marilyn Vandamme. (London: Institute of Translation & Interpreting, 1997), pp. 113-120. [Also: Machine Translation Review, issue no. 7 (April 1998), pp. 7-14] [PDF, 105KB]

Contemporary Surveys

*
Current commercial machine translation systems
and computer-based translation tools: system types and their uses. To appear in International Journal of Translation. [PDF, 191KB]

*
Machine translation and computer-based translation tools: what's available and how it's used. A new spectrum of translation studies, ed. José Maria Bravo (Valladolid: Univ. Valladolid, 2004), pp.13-48. [PDF, 186KB]

The development and use of machine translation systems
and computer-based translation tools. International Conference on Machine Translation & Computer Language Information Processing, 26-28 June 1999, Beijing, China. Proceedings of the conference, editor: Chen Zhaoxiong, 1-16. [Beijing: Research Center of Computer & Language Engineering, Chinese Academy of Sciences.] Reprinted in: International Journal of Translation vol.15, no.1, Jan-June 2003, 5-26 [PDF, 165KB] [also HTML]

* Computer-based translation systems and tools. (revised January 2005) [Available on
British Computer Society website.]

The state of machine translation in Europe and future prospects. HLT Central, January 2002. [Available on
HLT Central website.]

Computer-based translation tools, terminology and
documentation in the organizational workflow: a report from recent EAMT workshops. Proceedings of the International Conference on Professional Communication and Knowledge Transfer, Vienna, 24-26 August 1998, vol.ll: 4th Infoterm Symposium: Terminology work and knowledge transfer ? Best practice in terminology management and terminography. (Vienna: TermNet, 1998), pp. 255-268. [HTML, 29KB]

PowerPoint presentations [temporary]

The Georgetown-IBM experiment demonstrated in
January 1954. Presentation on 30 September at AMTA-2004, Washington DC. [PPT, 952KB] {For full paper see below.}

*
Machine translation and computer-based translation aids. Presentation in January 2005 at the University of East Anglia. [PPT, 159KB]

Machine translation: past imperfect, future indefinite. Presentation in November 2003 at the University of Leeds. [PPT, 87KB]

Machine translation and computer-based translation aids. Presentation in November 2003 at the University of East Anglia, Norwich [PPT (175KB), available on request]

Machine translation: current research problems and issues, and future prospects. Presentation in March 2003 at the University of Valladolid, Spain [PPT (131KB) available on request]

Machine translation and computer-based translation aids: systems and usage. Presentation given in December 2002 at Libera Università degli Studi "S.Pio V", Rome, Italy [PPT (178KB) available on request.]

Machine translation and translation aids: systems, problems, uses, prospects. Presentation given in December 2002 at Università di Bologna, SSLMIT, Forlí, Italy. [PPT (265KB) available on request.]

Machine translation in the real world. A presentation surveying current and possible future uses of computer-based translation systems and tools. Given in July 2002, Barcelona, Spain. [PPT file (186KB) available on request.]

Evolution of machine translation: systems and use. Short presentation at MT Summit VIII, Santiago de Compostela, September 2001. [PPT, 35KB]

Historical works: general

*
Machine translation: half a century of research and use. Paper for UNED summer school, Avila 2003. To be published. [PDF, 195KB]

Has machine translation improved?. MT Summit IX: proceedings of the Ninth Machine Translation Summit, New Orleans, USA, September 23-27, 2003, 181-188. [East Stroudsburg, PA: AMTA.] [PDF, 191KB]

An expanded version [PDF, 288KB] illustrates further aspects, and a database gives longer examples of MT texts from the past, with comparative output from some current systems.

Machine translation over fifty years. Histoire Epistémologie Langage vol. 23 (1), 2001, 7-31. [PDF, 160KB] [also HTML available on request]

Machine translation: a brief history. In: The concise history of the language sciences: from the Sumerians to the cognitivists. Ed. E.F.K.Koerner and R.E.Asher (Pergamon, 1995), pp. 431-445. [HTML, 84KB]

Machine translation: history and general principles. In: The encyclopedia of languages and linguistics. Editor-in-chief: R.E.Asher. Oxford: Pergamon Press, 1994. vol. 5, pp. 2322-2332. [Available on request.]

Machine translation: past, present, future. (Ellis Horwood Series in Computers and their Applications.) Chichester, Ellis Horwood, 1986. 382p. ISBN: 0-85312-788-3. {Chinese translation: Ji chifan yi: guo chyu, shan zai, wei lai. [Taipei], Zhi-Wen Publication Company, 1993. ISBN: 957-8759-01-0. 487pp.} All chapters are now available on this website.

The origins of the translator's workstation. Machine Translation vol.13 (4), 1998, 287-307. [PDF, 170KB]

"The whiskey was invisible", or persistent myths of MT. MT News International 11 (June 1995), 17-18. [PDF, 92KB]

History of MT in a nutshell. A two-page sketch, from the beginnings to the present. [Not published.] [HTML, 11KB]

Historical works: 1980s and 1990s

Twenty years of Translating and the Computer. In: Translating and the Computer 20: proceedings of the Twentieth International Conference on Translating and the Computer, 12-13 November, 1998, London. (London: Aslib, 1998). 16pp. [A historical survey of MT as reflected in the proceedings of the series of annual Aslib conferences, from 1978 to 1998.] [PDF, 169KB]

The state of machine translation in Europe. In: Expanding MT horizons: proceedings of the Second Conference of the Association for Machine Translation in the Americas, 2-5 October 1996, Montreal, Quebec, Canada, pp. 198-205. [PDF, 103KB]

Research methods and system designs in machine translation:
a ten-year review, 1984-1994. In: Proceedings of the second international conference organised by Cranfield University [and] British Computer Society,... 12-14 November 1994. Cranfield, Bedford: Cranfield University Press, 1998. 16pp. [PDF, 149KB] [also HTML]

A new era in machine translation. Aslib Proceedings 47 (10), October 1995, pp. 211-219 [Paper presented at Translating and the Computer 17, ... 10-11 November 1994, Institution of Civil Engineers, London.] [PDF, 162KB]

Vers une nouvelle époque en traduction automatique. In: Clas, André et Bouillon, Pierrette (eds.) TA-TAO: recherches de pointe et applications immédiates. Troisièmes Journées Scientifiques du réseau thématique "Lexicologie, Terminologie, Traduction" Montréal, 30 septembre, 1er et 2 octobre 1993. Montréal (Québec): AUPELF/UREF, 1994, pp.3-16. [PDF, 133KB]

Latest developments in machine translation technology: beginning a new era in MT research. In: The Fourth Machine Translation Summit: MT Summit IV. Proceedings: International cooperation for global communication, July 20-22, 1993, Kobe, Japan. [Tokyo: AAMT, 1993], pp. 11-34. [PDF, 184KB]

Out of the shadows: a retrospect of machine translation in the eighties. Paper presented at Computer & Translation '89, Tbilisi (Georgia), November-December 1989. In: Terminologie et Traduction no.3, 1990, pp. 275-292. [HTML, 54KB]

Recent developments in machine translation:
a review of the last five years. In: New directions in machine translation: conference proceedings, Budapest 18-19 August 1988. Editors: Dan Maxwell, Klaus Schubert, and Toon Witkam. Dordrecht: Foris Publications, 1988. pp. 7-62. [Substantial survey of developments since my 1986 book, with full bibliography.] [PDF, 329KB]

Historical works: 1960s and 1970s

* Bar-Hillel, Yehoshua. In: Encyclopedia of linguistics, ed. Philipp Strazny. (New York: Fitzroy Dearborn, 2005), vol.1, pp.124-126. [PDF, 89KB]

ALPAC: the (in)famous report. MT News International 14, June 1996, pp. 9-12. Reprinted in: Readings in machine translation, ed. Sergei Nirenburg, Harold Somers, and Yorick Wilks (Cambridge, Mass.: The MIT Press, 2003), pp. 131-135. [PDF, 102KB; also HTML, 26KB]

The evolution of machine translation systems . In: Practical experience of machine translation: Proceedings of a conference, London 5-6 November 1981, ed. by Veronica Lawson. Amsterdam, North-Holland Publ.Co., 1982. pp. 21-37. [PDF, 190KB]

Linguistic models in machine translation. UEA Papers in Linguistics 9, January 1979, pp.29-52. [PDF, 197KB]

Machine translation and machine-aided translation. Journal of Documentation 34(2), June 1978, 119-159 (Progress in Documentation). [Reprinted in: Translation: literary, linguistic, and philosophical perspectives. Edited by William Frawley. (Newark: University of Delaware Press, 1984); pp. 93-149] [PDF, 378KB]

Historical works: pioneer decades

* The Georgetown-IBM experiment demonstrated in
January 1954. In: Machine translation: from real users to research. 6th Conference of the Association for Machine Translation in the Americas, AMTA-2004, Washington DC, USA, September 28 - October 2, 2004. Proceedings, ed. Robert E.Frederking [and] Kathryn B.Taylor. (Berlin: Springer, 2004), 102-114. [PDF, 168KB]

An expanded version will be made available in due course, with information about (and some copies of) contemporary reports. {See also PPT file of the presentation, and the original sentences and translations.}

From first conception to first demonstration: the nascent years of machine translation, 1947-1954. A chronology. Machine Translation vol. 12 (3), 1997, pp. 195-252. [PDF, 326KB]. Also a corrected version, with minor additions (2005) [PDF, 328KB]

Looking back to 1952: the first MT conference. In TMI-97: proceedings of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation, July 23-25, 1997, St. John's College, Santa Fe, New Mexico, USA. [Las Cruces: Computing Research Laboratory, New Mexico State University] pp. 19-30. [HTML, 43KB]

First steps in mechanical translation. In: MT Summit VI: past, present, future. Proceedings, 29 October - 1 November 1997, San Diego, California. Edited by Virginia Teller and Beth Sundheim. [Washington, D.C.: Association for Machine Translation in the Americas, 1997] pp. 14-23. [PDF, 142KB]

* Two precursors of machine translation: Artsrouni and Trojanskij. International Journal of Translation 16(1), Jan-June 2004, 11-31. [PDF, 289KB]

Petr Petrovich Troyanskii (1894-1950): a forgotten pioneer of mechanical translation. Machine Translation vol. 15(3), 2000, pp. 187-221. [PDF, 277KB]

(editor) Early years in machine translation: memoirs and biographies of pioneers. (Studies in the History of the Language Sciences 97) Amsterdam: John Benjamins, 2000. xii,400 pp. Illus. [ISBN: 1-58811-013-3] See contents and publishers' flyer [PDF, 53KB.]

The Georgetown-IBM demonstration,7th January 1954. MT News International, no.8, May 1994, pp.15-18 [PDF]

Other papers in linguistics, information retrieval summarization, etc.



Language Weaver Awarded Rising Star in Deloitte's Technology Fast 50 Program for Los Angeles


Wednesday September 14, 11:42 am ET

Leader in Statistical Machine Translation Software Recognized for Meteoric Growth

LOS ANGELES--(BUSINESS WIRE)--Sept. 14, 2005--Language Weaver, Inc., a developer of statistical machine translation software (SMTS) for the automation of human language translation, has been named a Rising Star in Deloitte's Technology Fast 50 program for Los Angeles. The Rising Star award is a special designation for fast-growth technology companies that have been in business just three or four years. It is part of the Los Angeles Technology Fast 50 program, which ranks the 50 fastest growing technology companies headquartered in Los Angeles, Ventura, Riverside, San Bernardino, and Santa Barbara counties. Rankings are based on percentage revenue growth over three years (2002-2004). In 2005, only four companies in the Los Angeles metro area received this award and Language Weaver ranks as the second fastest growing young technology company in the region.

Language Weaver's translation software facilitates customers' global market reach and information gathering activities through computer automation of language translation. SMTS technology offers a significant departure from traditional rule-based machine translation systems by producing fluent, natural sounding language translations and can enable cost effective, real-time translations of previously unseen text, like Web based customer communications and text-based research information. Its customers include both government and commercial enterprises.

"To achieve Deloitte's Technology Fast 50 Rising Star award, companies within Los Angeles must have had tremendous revenue growth from 2002-2004," said Tim Lovoy, partner, Technology, Media & Telecommunications, Deloitte & Touche LLP. "Deloitte applauds Language Weaver for having the vision to develop technologies that change the way we work and communicate."

To qualify for the Technology Fast 50 Rising Star program, companies must have had operating revenues of at least $50,000 in 2002 and $1,000,000 in 2004, must be public or private companies headquartered in North America, and be a "technology company," defined as a company that owns proprietary technology that contributes to a significant portion of the company's operating revenues; or devotes a significant proportion of revenues to the research and development of technology. Using other companies' technology in a unique way does not qualify.

"We are honored to be recognized by Deloitte for our strong performance," said CEO Bryce Benjamin of Language Weaver. "Our vision for the future calls for even faster revenue growth in the years ahead. We expect our automated translation software to greatly promote communication and understanding throughout our global neighborhoods through applications in cross-lingual search, multi-lingual email and chat and, eventually, voice-to-voice communication."

Rising Star winners are automatically entered in Deloitte's Technology Fast 50 Rising Star category. Deloitte's Technology Fast 500 program ranks North America's top 500 fastest growing technology companies based on percentage revenue growth over five years (2000-2004). Its Rising Star ranking is based on percentage revenue growth over three years (2002-2004). For more information on Deloitte's Technology Fast 50 or Technology Fast 500 programs, visit www.fast500.com.

About Language Weaver, Inc.

Language Weaver was founded in 2002 to commercialize a unique approach to automatic language translation using proprietary statistical translation algorithms that resulted from 20 person-years of invention and development at the University of Southern California's Information Sciences Institute (USC/ISI). Its resulting product, statistical machine translation software (SMTS), provides the highest quality output to date in automated translation, using the best of traditional and new translation technologies. Language Weaver's translation systems produce fluent, natural sounding translations and save customers money and time through automation of the translation process.

The company has more than 50 patents pending on its SMTS technology worldwide. Bidirectional language pairs available include: Arabic/English, Chinese/English, French/English, and Spanish/English; unidirectional languages include Somali to English and Hindi to English. Language Weaver's software is available as licensed software in server and standalone versions. Website: www.languageweaver.com.

About Deloitte

Deloitte refers to one or more of Deloitte Touche Tohmatsu, a Swiss Verein, its member firms and their respective subsidiaries and affiliates. As a Swiss Verein (association), neither Deloitte Touche Tohmatsu nor any of its member firms has any liability for each other's acts or omissions. Each of the member firms is a separate and independent legal entity operating under the names "Deloitte", "Deloitte & Touche", "Deloitte Touche Tohmatsu" or other related names. Services are provided by the member firms or their subsidiaries or affiliates and not by the Deloitte Touche & Tohmatsu Verein. Deloitte & Touche USA LLP is the US member firm of Deloitte Touche Tohmatsu. In the US, services are provided by the subsidiaries of Deloitte & Touche USA LLP (Deloitte & Touche LLP, Deloitte Consulting LLP, Deloitte Financial Advisory Services LLP, Deloitte Tax LLP and their subsidiaries), and not by Deloitte & Touche USA LLP.

Friday, September 09, 2005

Grammar Lost Translation Machine In Researchers Fix Try Will

The makers of a University of Southern California computer translation system consistently rated among the world's best are teaching their software something new: English grammar.

Most modern "machine translation" systems, including the highly rated one created by USC's Information Sciences Institute, rely on brute force correlation of vast bodies of pre-translated text from such sources as newspapers that publish in multiple languages.

Software matches up phrases that consistently show up in parallel fashion ?the English "my brother's pants" and Spanish "los pantalones de mi hermano," ?and then use these matches to piece together translations of new material.
It works ?but only to a point. ISI machine translation expert Daniel Marcu (left) says that when such a system is "trained on enough relevant bilingual text ... it can break a foreign language up into phrasal units, translate each of them fairly well into English, and do some re-ordering. However, even in this good scenario, the output is still clearly not English. It takes too long to read, and it is unsatisfactory for commercial use."

So Marcu and colleague Kevin Knight (right), both ISI project leaders who also hold appointments in the USC Viterbi School of Engineering department of computer science, have begun an intensive $285,000 effort, called the Advanced Language Modeling for Machine Translation project, to improve the system they created at ISI by subjecting the texts that come out of their translation engine to a follow-on step: grammatical processing.

The step seems simple, but is actually imposingly difficult. "For example, there is no robust algorithm that returns 'grammatical' or 'ungrammatical' or 'sensible' or 'nonsense' in response to a user-typed sequence of words," Marcu notes.
The problem grows out of a natural language feature noted by M.I.T. language theorist Noam Chomsky decades ago. Language users have literally a limitless ability to nest and cross-nest phrases and ideas into intricate referential structures ?"I was looking for the stirrups from the saddle that my ex-wife's oldest daughter took with her when she went to Jack's new place in Colorado three years ago, but all she had were Louise's second-hand saddle shoes, the ones Ethel's dog chewed during the fire."

Unraveling these verbal cobwebs (or, in the more common description, tracing branching "trees" of connections) is such a daunting task that programmers long ago went in the brute force direction of matching phrases and hoping that the relation of the phrases would become clear to readers.

With the limits of this approach becoming clear, researchers have now begun applying computing power to trying to assemble grammatical rules. According to Knight, one crucial step has been the creation of a large database of English text whose syntax has been hand-decoded by humans, the "Penn Treebank."

Using this and other sources, computer scientists have begun developing ways to model the observed rules. A preliminary study by Knight and two colleagues in 2003 showed that this approach might be able to improve translations.
Accordingly, for their study, "We propose to implement a trainable tree-based language model and parser, and to carry out empirical machine-translation experiments with them. USC/ISI's state-of-the-art machine translation system already has the ability to produce, for any input sentence, a list of 25,000 candidate English outputs. This list can be manipulated in a post-processing step. We will re-rank these lists of candidate string translations with our tree- based language model, and we plan for better translations to rise to the top of the list."

One crucial trick that the system must be able to do is to pick out separate trees from the endless strings of words. But this is doable, Knight believes -- and in the short, not the long term.

Referring to the annual review of translation systems by the National Institute of Science and Technology, in which ISI consistently gains top scores, "we want to have the grammar module installed and working by the next evaluation, in August 2006," he said.

Knight and Marcu are cofounders and, respectively, chief scientist and chief technology and operating officer of a spinoff company, Language Weaver.