Apertium is a free/open-source rule-based machine translation platform. It is free software and released under the terms of the GNU General Public License.
Stable release | 3.8.3[1]
/ 1 November 2022 |
---|---|
Repository | github |
Written in | C++ |
Operating system | POSIX compatible and Windows NT (limited support) |
Available in | 35 languages, see below |
Type | Rule-based machine translation |
License | GNU General Public License |
Website | www |
Apertium is a transfer-based machine translation system, which uses finite state transducers for all of its lexical transformations, and Constraint Grammar taggers as well as hidden Markov models or Perceptrons for part-of-speech tagging / word category disambiguation.[2] A structural transfer component is responsible for word movement and agreement; most Apertium language pairs up until now have used "chunking" or shallow transfer rules, though newer pairs use (possibly recursive) rules defined in a Context-free grammar.[3]
Many existing machine translation systems available at present are commercial or use proprietary technologies, which makes them very hard to adapt to new usages. Apertium code and data is free software and uses a language-independent specification, to allow for the ease of contributing to Apertium, more efficient development, and enhancing the project's overall growth.
At present (December 2020), Apertium has released 51 stable language pairs,[4] delivering fast translation with reasonably intelligible results (errors are easily corrected). Being an open-source project, Apertium provides tools for potential developers to build their own language pair and contribute to the project.
Apertium originated as one of the machine translation engines in the project OpenTrad, which was funded by the Spanish government, and developed by the Transducens research group at the Universitat d'Alacant. It was originally designed to translate between closely related languages, although it has recently been expanded to treat more divergent language pairs. To create a new machine translation system, one just has to develop linguistic data (dictionaries, rules) in well-specified XML formats.
Language data developed for it (in collaboration with the Universidade de Vigo, the Universitat Politècnica de Catalunya and the Universitat Pompeu Fabra) currently support (in stable version) the Arabic, Aragonese, Asturian, Basque, Belarusian, Breton, Bulgarian, Catalan, Crimean Tatar, Danish, English, Esperanto, French, Galician, Hindi, Icelandic, Indonesian, Italian, Kazakh, Macedonian, Malaysian, Maltese, Northern Sami, Norwegian (Bokmål and Nynorsk), Occitan, Polish, Portuguese, Romanian, Russian, Sardinian, Serbo-Croatian, Silesian, Slovene, Spanish, Swedish, Tatar, Ukrainian, Urdu, and Welsh languages. A full list is available below. Several companies are also involved in the development of Apertium, including Prompsit Language Engineering, Imaxin Software and Eleka Ingeniaritza Linguistikoa.
The project has taken part in the 2009,[5] 2010,[6] 2011,[7] 2012,[8] 2013[9] and 2014[10] editions of Google Summer of Code and the 2010,[11] 2011,[12] 2012,[13] 2013,[14] 2014,[15] 2015,[16] 2016[17] and 2017[18] editions of Google Code-In.
This is an overall, step-by-step view how Apertium works.
The diagram displays the steps that Apertium takes to translate a source-language text (the text we want to translate) into a target-language text (the translated text).
List of currently stable language pairs, hover over the language codes to see the languages that they represent.
af |
ar |
an |
ast |
eu |
br
|
bg |
ca |
da |
nl |
en |
eo
|
fi |
fr |
gl |
de |
hin |
is
|
id |
it |
kaz |
mk |
ms |
mt
|
sme |
nb |
nn |
oc |
pt
|
ro |
sc |
hbs |
slv |
es
|
sv |
tat |
urd |
cy
| |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Afrikaans | — | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Arabic | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (←) | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Aragonese | No | No | — | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No |
Asturian | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No |
Basque | No | No | No | No | — | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (→) | No | No | No | No |
Breton | No | No | No | No | No | — | No | No | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Bulgarian | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Catalan | No | No | Yes (⇄) | No | No | No | No | — | No | No | Yes (⇄) | Yes (→) | No | Yes (⇄) | No | No | No | No | No | Yes (←) | No | No | No | No | No | No | No | Yes (⇄) | Yes (⇄) | No | Yes (→) | No | No | Yes (⇄) | No | No | No | No |
Danish | No | No | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | Yes (⇄) | No | No | No | No | No | No | No | Yes (←) | No | No | No |
Dutch | Yes (⇄) | No | No | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
English | No | No | No | No | Yes (←) | No | No | Yes (⇄) | No | No | — | Yes (⇄) | No | No | Yes (⇄) | No | No | Yes (←) | No | No | No | Yes (←) | No | No | No | No | No | No | No | No | No | Yes (←) | No | Yes (⇄) | No | No | No | Yes (←) |
Esperanto | No | No | No | No | No | No | No | Yes (←) | No | No | Yes (⇄) | — | No | Yes (←) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Finnish | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
French | No | No | No | No | No | Yes (←) | No | Yes (⇄) | No | No | No | Yes (→) | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | Yes (⇄) | No | No | No |
Galician | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | Yes (⇄) | No | No | No | No |
German | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Hindi | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No |
Icelandic | No | No | No | No | No | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No |
Indonesian | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Italian | No | No | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No |
Kazakh | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No |
Macedonian | No | No | No | No | No | No | Yes (⇄) | No | No | No | Yes (→) | No | No | No | No | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | Yes (←) | No | No | No | No | No | No |
Malaysian | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Maltese | No | Yes (→) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
Northern Sami | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | Yes (→) | No | No | No | No | No | No | No | No | No | No | No | No |
Norwegian (Bokmål) | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (←) | — | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No |
Norwegian (Nynorsk) | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | — | No | No | No | No | No | No | No | No | No | No | No |
Occitan | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | Yes (←) | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No | No | No | Yes (⇄) | No | No | No | No |
Portuguese | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No | No | Yes (⇄) | No | No | No | No |
Romanian | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No | Yes (←) | No | No | No | No |
Sardinian | No | No | No | No | No | No | No | Yes (←) | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | — | No | No | No | No | No | No | No |
Serbo-Croatian | No | No | No | No | No | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | No | No | No | — | Yes (⇄) | No | No | No | No | No |
Slovenian | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | — | No | No | No | No | No |
Spanish | No | No | Yes (⇄) | Yes (⇄) | Yes (←) | No | No | Yes (⇄) | No | No | Yes (⇄) | Yes (→) | No | Yes (⇄) | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | Yes (⇄) | Yes (←) | No | No | No | — | No | No | No | No |
Swedish | No | No | No | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No | No |
Tatar | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No | No |
Urdu | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Yes (⇄) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — | No |
Welsh | No | No | No | No | No | No | No | No | No | No | Yes (→) | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | No | — |
(All services are based on the Apertium engine)