The benchmark MateCat post-edits contains post-edited documents created within the project MateCat during field-tests with professional translators. The original documents were written in English and have been translated into French and Italian. They consist of texts selected from the body of the European Union Law (EUR-Lex), and of transcriptions of TED talks as released from the WIT3 website.
The benchmark includes the source text, the human reference translation (if any), the suggestion chosen by translators to post-edit and the final post-edition, for a total of 13,880 segment tuples (i.e. source, reference, suggestion, and post-edit) and over 320 thousand English tokens.
Details and statistics of the benchmark are available in the "README" file included in the distributed archive.
The benchmark is distributed under the terms of the Creative Commons Attribution - NonCommercial - NonDerivative (BY-NC-ND) license.
AcknowledgmentThe creation of the benchmark was supported by the EU-funded project MateCat (ICT-2011.4.2-287688).
The benchmark can be downloaded by clicking here.
For questions and support about this benchmark, please contact Mauro Cettolo (cettolo [at] fbk [dot] eu)