We're looking for an open source Machine Translation Engine that could be incorporated into our localization workflow. We're looking at the options below:
Among these, Moses has the widest community support and has been tried out by many localization companies and researchers. We are actually leaning towards a Java-based engine since our applications are all in Java. Have any of you used either Joshua or Phrasal as part of your workflow. Could you please share your experiences with them? Or, is Moses way too far ahead of these in terms of the features it provides and ease of integration.
And, we require that the engine supports:
- Domain-specific training (i.e. it should maintain separate phrase tables for each domain that the input data belongs).
- Incremental training (i.e. avoiding having to retrain the model from scratch every time we wish to use some new training data).
- Parallelizing the translation process.
A lot has been moving forward, so I thought to give an update on this topic, and leave the previous answer there to document the progress.
Domain-specific training: domain adaptation techniques can be useful if your data is taken from various sources and you need to optimise towards a sub-domain. From our experience, there is no single solution that consistently performs best, so you need to try out as many as possible approaches and compare results. There is a mail on the Moses mailing list that lists possible methods: http://thread.gmane.org/gmane.comp.nlp.moses.user/9742/focus=9799various. The following page also gives an overview of the current research: http://www.statmt.org/survey/Topic/DomainAdaptation
Incremental training: there was an interesting talk on IWSLT 2013: http://www.iwslt2013.org/downloads/Assessing_Quick_Update_Methods_of_Statistical_Translation_Models.pdf it demonstrated that current incremental methods (1) take your system offline, so you have no real "live-update" of your models (2) are outperformed by full re-trainings. It seems that the problem has not been solved yet.
Parallelizing the translation process: the moses server lags behind on the moses-cmd binary. So if you want to use the latest features, it is better to start from moses-cmd. Also, the community has not kept its promise of never releasing a 1.0 version :-). In fact, you can find the latest release (2.1) here: http://www.statmt.org/moses/?n=Moses.Releases
This question is better asked on the Moses mailing list (moses-support@mit.edu), I think. There are lots of people there working with different types of systems, so you'll get an objective answer. Apart from that, here's my input:
And here's some input on your feature requests:
Hope this helps. Feel free to PM me if you have any more questions.