BATMAN
Description
BilinguAl TerM AligNer (BATMAN) (Arcan et al, 2014) is an open-source tool for aligning monolingual terminology, extracted from parallel texts, across different languages. BATMAN requires in input monolingual terms from the source and target language and the parallel documents from where the terms were extracted. As a result, it provides a list of aligned bilingual terminology.
The tool performs the extraction of bilingual terms in two phases. In the first one, a set of possible translations is obtained for each term using a translation system and a word aligner trained on the same data from which the bilingual terminology is extracted. This enhances the possibility of obtaining good term translations also with a small amount of parallel data. The second step consists in identifying the best translation. Given a set of possible translations for each term, the correct translation is retrieved taking advantage of the parallelism between source and target sentences, whereby two methods are investigated: sentence lookup or term lookup. With the first, a target translation from the candidate list is accepted as correct if it matches a span in the target sentence. With the term lookup strategy, a translation is accepted only if it has also been identified as a term in the target sentence. The term lookup method reduces the number of extracted bilingual terms, but guarantees a better quality of the term alignments, whereby the sentence lookup strategies are more tolerant, identifying more bilingual terms.
Acknowledgment
The software development was supported by the EU-funded project MateCat (ICT-2011.4.2-287688).
License
BATMAN is distributed under the GNU Lesser General Public License (LGPL).
Manual
Installation, configuration and usage instructions are available:
available soon
Source code
Source code is available here
available soon
Reference
If you intend to use BATMAN, please cite:
Arcan, Mihael, Marco Turchi, Sara Tonelli and Paul Buitelaar. 2014. “Enhancing Statistical Machine Translation with Bilingual Terminology in a CAT Environment”. Proceedings of the Association for Machine Translation in the Americas (AMTA ‘14), Vancouver, Canada, pp 54-68. (pdf) (bibentry)
Contacts
For questions and support about BATMAN please contact: turchi [at] fbk [dot] eu