当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

澳门大学:统计机器翻译领域适应性研究 Domain Adaptation for Statistical Machine Translation Master Defense

资源类别:文库,文档格式:PPTX,文档页数:84,文件大小:2.74MB,团购合买
Introduction Proposed Method I: New Criterion Proposed Method II: Combination Proposed Method III: Linguistics Domain-Specific Online Translator Conclusion
点击下载完整版文档(PPTX)

University of Marcaru Domain Adaptation for Statistical Machine Translation Master Defense By Longyue WANG, Vincent MT Group, NLP2CT Lab, FST, UM Supervised by Prof. Lidia S. Chao, Prof. Derek F. Wong 20/08/2014 UNIVERSIDADE DE MACAU UMM

Domain Adaptation for Statistical Machine Translation Master Defense By Longyue WANG, Vincent MT Group, NLP2CT Lab, FST, UM Supervised by Prof. Lidia S. Chao, Prof. Derek F. Wong 20/08/2014

A Research Scope Computational Linguistics tr Machine domain ranslation adaptation Speech Text Translation Translation Rule-based mt Hybrid MT Domain-Specific Statistical mt Figure 1: Our Research Scope [1][2] [] Daniel Jurafsky and James Martin(2008)An Introduction to Natural Language Processing, Computational Linguistics, and Speed Recognition, Second Edition. Prentice Hall [2wikipedIa,http://en.wikipediaorg/wiki/machine_tranSlation. (2/84)

Computational Linguistics Machine Translation Text Translation Domain-Specific Statistical MT Rule-based MT Hybrid MT Speech Translation Research Scope Figure 1: Our Research Scope [1] [2] [1] Daniel Jurafsky and James Martin (2008) An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Second Edition. Prentice Hall. [2] Wikipedia, http://en.wikipedia.org/wiki/Machine_translation. (2/84) Domain-Specific Statistical MT

genda Introduction Proposed Method I: New Criterion Proposed Method l: Combination Proposed Method lll: Linguistics Domain-Specific Online Translator ■Conc| UsIOn (3/84)

Agenda ◼ Introduction ◼ Proposed Method I: New Criterion ◼ Proposed Method II: Combination ◼ Proposed Method III: Linguistics ◼ Domain-Specific Online Translator (3/84) ◼ Conclusion

Part e Introduction (4/84)

Part I: Introduction (4/84)

The First Question WHAT IS STATISTICAL MACHINE TRANSLATION?

WHAT IS STATISTICAL MACHINE TRANSLATION? The First Question 5

a Statistical Machine Translation Corpus Word in Bl-Jext Alignment Training Models (static) Translation Language /sowe A 其 中<二 Decoding Figure 2: Phrase-based SMT Framework o SMt translations are generated on the basis of statistical models whose parameters are derived from the analysis of text corpora [3] o Currently, the most successful approach of SmT is phrase-based SMT, where the smallest translation unit is n-gram consecutive words. [3 Peter F. Brown, Vincent ]. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics. 19: 263-311 (6/84)

Statistical Machine Translation  SMT translations are generated on the basis of statistical models whose parameters are derived from the analysis of text corpora [3].  Currently, the most successful approach of SMT is phrase-based SMT, where the smallest translation unit is n-gram consecutive words. [3] Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics. 19:263–311. Figure 2: Phrase-based SMT Framework (6/84)

a Statistical Machine Translation It can be a very complicated thing, the ocean And it can be a very complicated thing, what 海洋是一个非常复杂的事物 human health is 叭类的健康也是一件非常复杂的事情。 And bringing those two together might seem very daunting task, but what I'm going to 将两者统一起来看起来是一件艰巨的任务。 我想要试图去说明的是即使是如此复杂 的情电存在一些我以为简单的话 we understand, we can really move forward 的话题 And those simple themes arent really themes about the complex science of what 这些简单的话题确实不是有关那复杂的科 going on, but things that we all pretty well 学有了怎样的发展,而是一些我们都恰好 知道的事情 And I'm going to start with this one 接下来我就来说一个。如果老妈不高兴了 momma ain 't happy, ain 't nobody happy 大家都别想开心 We know that, right? We 've experienced 我们都知道,不是吗?我们都经历过 Figure 2: Phrase-based SmT Frame work o Corpus is a collection of texts. e. g, IWSLT2012 official corpus language. Monolingual corpus, in one(mostly are the target side) language o Bilingual corpus is a collection of text paired with translation into anothe o Corpus may come from different genres topics etc. (7/84)

Statistical Machine Translation  Corpus is a collection of texts. e.g., IWSLT2012 official corpus.  Bilingual corpus is a collection of text paired with translation into another language. Monolingual corpus, in one (mostly are the target side) language.  Corpus may come from different genres, topics etc. Figure 2: Phrase-based SMT Framework Parallel Corpus Monolingual Corpus (7/84)

a Statistical Machine Translation Porallel Corpus Training Models (static) Extracion Language [回中一二 Figure 2: Phrase-based SmT Frame work o Word alignment can be mined by the help of em algorithm o then extract phrase pairs from word alignment to generate translation table. o Distance- based reordering model is a penalty of changing position of translated phrases (8/84)

Statistical Machine Translation  Word alignment can be mined by the help of EM algorithm.  Then extract phrase pairs from word alignment to generate translation table.  Distance-based reordering model is a penalty of changing position of translated phrases. Figure 2: Phrase-based SMT Framework Translation Table Word Alignment Reordering Model (8/84)

a Statistical Machine Translation Porallel Corpus in Bi-Text Training Models (static) Extracion guage [回中一二 Figure 2: Phrase-based SmT Frame work o Language model assigns a probability to a sequence of words. (n-gram)[4] PLM(S)=p(w, 1wi=n+) =1 [4 F Song and W B Croft( 1999). "A General Language Model for Information Retrieval". Research and Development in Information Retrieval 279-280 (9/84)

Statistical Machine Translation  Language model assigns a probability to a sequence of words. (n-gram) [4] Figure 2: Phrase-based SMT Framework Language Model [4] F Song and W B Croft (1999). "A General Language Model for Information Retrieval". Research and Development in Information Retrieval. pp. 279–280.. 1 1 1 1 ( ) ( | ) l i LM i i n i p s p w w + − − + = = (9/84) (1)

a Statistical Machine Translation Porallel Corpus Training Models (static) Extracion Language 已中一中 Figure 2: Phrase-based SmT Frame work e hest =arg maxx Io(,le, xd(start, - -DII PiM(e, le. e-) Decoding function consists of three components the phrase translation table, which ensure the foreign phrase to match target ones; reordering model, which reorder the phrases appropriately; and language model, which ensure the output to be fluent (10/84)

Statistical Machine Translation Decoding function consists of three components: the phrase translation table, which ensure the foreign phrase to match target ones; reordering model, which reorder the phrases appropriately; and language model, which ensure the output to be fluent. Figure 2: Phrase-based SMT Framework Source Text Decoding Target Text Searching Translation Candidates 1 1 1 1 1 arg max ( | ) ( 1) ( | ... ) I e best e i i i i LM i i i i e f e d start end P e e e  − − = = = − −   (10/84) (2)

点击下载完整版文档(PPTX)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共84页,可试读20页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有