![]() Litti Chokha is prepared by stuffing buckwheat flour mixed with various spices in dough and toasting it in fire, and is served with spice paste. Obama's language is sophisticated, Putin speaks directly and prefers to use punctuation and statistics, but both have the same ability to win the audience's heart. Technical authors also write about various procedures for commercial, professional or domestic use. Technical writers also write various procedures for business, professional or domestic use. Large warehouses began to store shoe made by small producers of this area. Large warehouses began to stock footwear in warehouses, made by many small manufacturers from the area. pt files to translator/app/models.Īfter requirements and models are in place, run python app/app.py from translator directory.ĭetails on the training itself can be obtained from fairseq repo or documentation. The checkpoint files that realize the results in the report are available here. Preparing the translatorĪfter training a model using the fairseq implementation of Transformer, copy the checkpoint file to translator/app/models/ and rename it en-ne.pt or ne-en.pt based on the translation direction of the checkpoint file. There are other libraries like python-docx and lxml used by the cleaning scripts. To be able to run the translator interface, Indic NLP Library needs to be cloned to translator/app/modules/. For handling the Nepali text, we use the Indic NLP Library.Īll the libraries can be installed using pip. Requirementsįairseq is used for training, sentencepiece is used to learn BPE over the corpus, sacremoses for treating English text, sacrebleu for scoring the models, flask for the interface. The results on devtest are from models that use vocab sizes of 2500. Here we report the scores on both dev and devtest sets. We reproduce their model using their implementation to score it. In the report linked above, we report only the scores on the dev set. There are actually two more sets they release: the validation set called dev set and the recently released (October 2019) test set. The BLEU scores of 7.6 and 4.3 (for supervised methods) that Guzman et al report in their paper are on their devtest set. Resultsįind the more recent results in the paper linked above. To fix these, use the following versions of the packages: torch-1.3.0 fairseq-0.9.0 portalocker-2.0.0 sacrebleu-1.4.14 sacremoses-0.0.43 sentencepiece-0.1.91. I will also add a link to the bigger corpus soon.Īs of Feb 2021, there are a few compatibility issues between the model files and the more recent versions of the packages. The models reported in the paper can be found here. Towards the end of 2019 some additional work was carried out under the project, described here. Translator directory has a working interface for the translator. The parallel data we prepared can be found here.ĭata_cleaning directory has the scripts that implement the cleaning methods discussed in the report. Neural Machine Translation (NMT) on the Nepali-English language pair.Ĭontributions of this project: adding to and cleaning the parallel data that is publicly available and improving the baseline scores for supervised MT on the pair.Ī report on this project is available here.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |