Cambridge University, UK Dialectal Chinese Speech Recognition Thomas Fang Zheng Aug.24,2007 Center for Speech and language Technologies Center for Speech and Language Technologies, Tsinghua University
Center for Speech and Language Technologies, Tsinghua University Dialectal Chinese Speech Recognition Thomas Fang Zheng Aug. 24, 2007 @ Cambridge University, UK
2 Outline 口 Motivation u Dialectal chinese database collection ☆Wu Mi ☆ Chuan Approaches o Chinese syllable mapping 令 Lexicon adaptation State-dependent phoneme-based model merging sDPBMM Integration of SDPBMM with adaptation 口 Remarks Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 2 Outline ❑ Motivation ❑ Dialectal Chinese database collection ❖ Wu ❖ Min ❖ Chuan ❑ Approaches ❖ Chinese syllable mapping ❖ Lexicon adaptation ❖ State-dependent phoneme-based model merging (SDPBMM) ❖ Integration of SDPBMM with adaptation ❑ Remarks
Motivation u Chinese asr encounters an issue that is bigger than that of any other language-dialect a There are 8 major dialectal regions in addition to Mandarin(northern China), including o Wu Southern Jiangsu, Zhejiang, and shanghai Yue(guangdong, Hong Kong, Nanning Guangxi 8 Min(Fujian, Shantou Guangdong, Haikou Hainan, Taipei taiwan) Hakka meixian guangdong, Hsin-chu Taiwan 令Gan( Jiangxi); 令 Xiang( Hunan); 冷Hi( Anhui) o Jin shanxi, Hohehot Inner mongolia u Can be further divided into over 40 sub-categories Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 3 Motivation ❑ Chinese ASR encounters an issue that is bigger than that of any other language - dialect. ❑ There are 8 major dialectal regions in addition to Mandarin (Northern China), including:- ❖ Wu (Southern Jiangsu, Zhejiang, and Shanghai); ❖ Yue (Guangdong, Hong Kong, Nanning Guangxi); ❖ Min (Fujian, Shantou Guangdong, Haikou Hainan, Taipei Taiwan); ❖ Hakka (Meixian Guangdong, Hsin-chu Taiwan); ❖ Gan (Jiangxi); ❖ Xiang (Hunan); ❖ Hui (Anhui) ❖ Jin (Shanxi, Hohehot Inner Mongolia). ❑ Can be further divided into over 40 sub-categories
中国汉语方言图 新州土 ]州话 东 请查地区 其请有制方 可a准 布老冒函 回回 北万冒晒 射语 语 中话 □ 话 客家话 客家函据民 若南盲语土话并用地区
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 4
h 5 a Chinese dialects share a same written language B The same Chinese pinyin set (canonically B The same Chinese character set (canonically), and The same vocabulary canonically) a And standard Chinese(known as Putonghua, or PTH) is widely spoken in most regions over china a However, speech is strongly influenced by the native dialects, most Chinese people speak in both standard Chinese and their own dialect, resulting in dialectal Chinese- Putonghua influenced by native dialect o In dialectal Chinese B Word usage, pronunciation, and syntax and grammar vary depending on the speaker's dialect g asr relies to a great extent on the consistent pronunciation and usage of words within a language B ASR systems constructed to process PTh perform poorly for the great majority of the population Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 5 ❑ Chinese dialects share a same written language:- ❖ The same Chinese pinyin set (canonically), ❖ The same Chinese character set (canonically), and ❖ The same vocabulary (canonically). ❑ And standard Chinese (known as Putonghua, or PTH) is widely spoken in most regions over China. ❑ However, speech is strongly influenced by the native dialects, most Chinese people speak in both standard Chinese and their own dialect, resulting in dialectal Chinese - Putonghua influenced by native dialect ❑ In dialectal Chinese :- ❖ Word usage, pronunciation, and syntax and grammar vary depending on the speaker's dialect. ❖ ASR relies to a great extent on the consistent pronunciation and usage of words within a language. ❖ ASR systems constructed to process PTH perform poorly for the great majority of the population
6 Research Goal a To develop a general framework to model in dialectal Chinese Asr tasks: g Phonetic variability i Lexical variability and i Pronunciation variability u To find suitable methods to modify the baseline pth recognizer to obtain a dialectal Chinese recognizer for the specific dialect of interest, which employ g dialect-related knowledge(syllable mapping, cross-dialect synonyms,.), and training/adaptation data ( in relatively small quantities a Expectation: the resulted recognizer should also work for PTH, in other words it should be good for a mixture of Pth and dialectal chinese a This proposal was selected as one of three projects for 2003 Johns Hopkins University Summer Workshop from tens of proposals collected from universities/companies over the world, and was postponed to 2004 due to SARS Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 6 Research Goal ❑ To develop a general framework to model in dialectal Chinese ASR tasks :- ❖ Phonetic variability, ❖ Lexical variability, and ❖ Pronunciation variability ❑ To find suitable methods to modify the baseline PTH recognizer to obtain a dialectal Chinese recognizer for the specific dialect of interest, which employ :- ❖ dialect-related knowledge (syllable mapping, cross-dialect synonyms, …), and ❖ training/adaptation data (in relatively small quantities) ❑ Expectation: the resulted recognizer should also work for PTH, in other words, it should be good for a mixture of PTH and dialectal Chinese. ❑ This proposal was selected as one of three projects for '2003 Johns Hopkins University Summer Workshop from tens of proposals collected from universities/companies over the world, and was postponed to 2004 due to SARS
h Standard Chinese Dialectal Chinese Related Speech Recognizer Knowledge Resources Dialectal Chinese Speech Recognition Framework Dialectal Chinese Speech recognizer Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 7 Dialectal Chinese Speech Recognition Framework Standard Chinese Speech Recognizer + Dialectal Chinese Speech Recognizer Dialectal Chinese Related Knowledge & Resources
h u For practical reasons, during the summer we only focused on one specific dialect, the wu dialect(Shanghai Area), and the target language was Wu dialectal Chinese(WDC for short) 日 Why wu dialect? 8 Population: more than 70 million people use Wu dialect, the 2nd popular dialect in China: 8 Economy: one of the most advanced city in China- Shanghai s Wu dialect is a full-developed language The syntax of Wu dialect is very complex The vocabulary is even more larger than Mandarin Many literature masterpiece were influenced by wu dialect (in history WU Mandarin Cantonese Phoneme# 50 37 <33 Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 8 ❑ For practical reasons, during the summer we only focused on one specific dialect, the Wu dialect (Shanghai Area), and the target language was Wu dialectal Chinese (WDC for short); ❑ Why Wu dialect? ❖ Population: more than 70 million people use WU dialect, the 2nd popular dialect in China; ❖ Economy: one of the most advanced city in China – Shanghai ❖ Wu dialect is a full-developed language ▪ The syntax of Wu dialect is very complex; ▪ The vocabulary is even more larger than Mandarin; ▪ Many literature masterpiece were influenced by WU dialect (in history). WU Mandarin Cantonese Phoneme# 50 37 <33
9 Useful Dialect-Related Knowledge a Chinese Syllable Mapping(CSM) This Csm is dialect-related ☆ Two types: Word-independent CSM: e.g. in Southern Chinese, Initial mappings include zh>z, ch->c, sh>S, n>L, and so on, and Final mappings include eng>en, ing>in, and so on; Word-dependent CSM: e.g. in dialectal Chuan Chinese, the pinyin guo2' is changed into 'guio in word'FfEl(China) but only the tone is changed in word过去past Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 9 Useful Dialect-Related Knowledge ❑ Chinese Syllable Mapping (CSM) ❖ This CSM is dialect-related. ❖ Two types: ▪ Word-independent CSM: e.g. in Southern Chinese, Initial mappings include zh→z, ch→c, sh→s, n→l, and so on, and Final mappings include engen, ingin, and so on; ▪ Word-dependent CSM: e.g. in dialectal Chuan Chinese, the pinyin 'guo2' is changed into 'gui0' in word '中国(China)' but only the tone is changed in word '过去(past)
h 10 A ☆ The CSm could be n→1,1→N,令 The CSm is or crossed not exact For any mapping Chuan dialect A→>B,itis BI kuo kui mostly that B2(B(3 the resulted pronunciation is not B Bi is a variation of b. such 克服/上课 exactly, but as 扩大/魁梧 something nasalization, quite similar centralization iced to B. more ku voiceless, similar to B rounding, syllabic Standard Chinese syllabe set than to any pharyngrealization other syllable. aspiration Goal
Motivation Goal Knowledge Data Collection Workshop Conclusion I SDPBMM Conclusion II 10 ❖ The CSM is not exact. For any mapping A→B, it is mostly that the resulted pronunciation is not B exactly, but something quite similar to B, more similar to B than to any other syllable. A B B1 B3 B4 B2 Bi is a variation of B, such as :- nasalization, centralization, voiced, voiceless, rounding, syllabic, pharyngrealization, aspiration kei kuo kui... Standard Chinese Syllabe Set Chuan Dialect ke [克]服 上[课] kuo kui [扩]大 [魁]梧 ❖ The CSM could be N→1, 1→N, or crossed