清华大学：Making Full Use of Chinese Speech Corpora（PPT讲稿）

Purpose of speech corpora Factors to be considered in data creation Data creation Data transcription Learning from corpora Chinese Corpus Consortium (CCC)

团购合买资源类别：文库，文档格式：PPT，文档页数：67，文件大小：999KB

O-COCOSDA. Oct. 1-3. 2003 Sentosa, singapore Making Full Use of Chinese speech Corpora Thomas Fang Zheng Center of speech Technology State Key laboratory of intelligent Technology and Systems Tsinghua University http://sp.cs.tsinghuaedu.cn, Beijing d-Ear Technologies Co. Ltd http://www.d-ear.com Oct.2,2003

Making Full Use of Chinese Speech Corpora Thomas Fang Zheng Center of Speech Technology State Key Laboratory of Intelligent Technology and Systems Tsinghua University http://sp.cs.tsinghua.edu.cn/ Beijing d-Ear Technologies Co., Ltd. http://www.d-Ear.com Oct. 2, 2003 O-COCOSDA, Oct. 1-3, 2003 Sentosa, Singapore

ecur 得意音通技术 2 Outline Your Partnerin the Century of Speech aPurpose of speech corpora U factors to be considered in data creation 日 Data creation 日 Data transcription ULearning from corpora aChinese Corpus Consortium(CCc)

Your Partner in the Century of Speech 2 Outline ❑Purpose of speech corpora ❑Factors to be considered in data creation ❑Data creation ❑Data transcription ❑Learning from corpora ❑Chinese Corpus Consortium (CCC)

ecur 得意音通技术 Purpose of Speech Corpora Your Partnerin the Century of Speech Item Description Percentage 1. Speech/ system development, evaluation, sentence 73% speaker comprehension and summarization, speech recognition recognition, speaker recognition 2. Speech system development, prosodic analysis 11% synthesis 3. Acoustic acoustic analysis, speech codin g 9% analVSiS 4. Sentence syntactic and semantic analysis 5% analysis 5. Speech/ speech and language education 2% language education

Your Partner in the Century of Speech 3 Purpose of Speech Corpora Item Description Percentage 1. Speech/ speaker recognition system development, evaluation, sentence comprehension and summarization, speech recognition, speaker recognition 73% 2. Speech synthesis system development, prosodic analysis 11% 3. Acoustic analysis acoustic analysis, speech coding 9% 4. Sentence analysis syntactic and semantic analysis 5% 5. Speech/ language education speech and language education 2%

ecur 得意音通技术 Outline Your Partnerin the Century of Speech PUrpose of speech corpora FActors to be considered in data creation 日 Data creation 日 Data transcription ULearning from corpora aChinese Corpus Consortium(CCc)

Your Partner in the Century of Speech 4 Outline ❑Purpose of speech corpora ❑Factors to be considered in data creation ❑Data creation ❑Data transcription ❑Learning from corpora ❑Chinese Corpus Consortium (CCC)

ecur 得意音通技术 5 Factors to be considered in data creation(1) Your Partnerin the Century of Speech 口 The language Language: e. g, Chinese or English i Dialectal background (e.g, for Chinese Putonghua or standard Chinese(普通话); Mandarin(官话, northern china Wu(xia, Southern Jiangsu, Zhejiang, and Shanghai Yue(ia, Guangdong, Hong Kong, Nanning Guangxi Min(闽南话, Fujian, Shantou guangdong, Haikou hainan, Taipei Taiwan kka(客家话, Meixian guangdong,Hsn- Chu Taiwan); Xiang(湘, Hunan); Gan(赣, Jiangxi; Hui(徽, Anhui;and Jn(晋, Shanxi ☆ Special for chinese: Simplified chinese Traditional chinese

Your Partner in the Century of Speech 5 Factors to be considered in data creation (1) ❑ The language. ❖ Language: e.g., Chinese or English ❖ Dialectal background (e.g., for Chinese) :- ▪ Putonghua or standard Chinese (普通话); ▪ Mandarin (官话，Northern China); ▪ Wu (吴语，Southern Jiangsu, Zhejiang, and Shanghai); ▪ Yue (粤语，Guangdong, Hong Kong, Nanning Guangxi); ▪ Min (闽南话，Fujian, Shantou Guangdong, Haikou Hainan, Taipei Taiwan); ▪ Hakka (客家话，Meixian Guangdong, Hsin-Chu Taiwan); ▪ Xiang (湘，Hunan); ▪ Gan (赣，Jiangxi); ▪ Hui (徽，Anhui); and ▪ Jin (晋，Shanxi). ❖ Special for Chinese :- ▪ Simplified Chinese ▪ Traditional Chinese

ecur 得意音通技术 6 Your Partner inthe Centum af snatch A中适兰糖话陶容话 e江官话说明:本图《中国西喜集(图A2) 言的官话方言分布图

Your Partner in the Century of Speech 6

ecur 得意音通技术 Your Partnerinthe Century of speech 现代吴语方言分区图江淮官话” 苏沪嘉小片宣州片灶徽语太湖片州片处衢」福瓯江片建

Your Partner in the Century of Speech 7 太湖片台州片瓯江片？处衢片苏沪嘉小片江淮官话徽语宣州片杭州小片林绍小片

ecur 得意音通技术 Factors to be considered in data creation(2) Your Partnerin the Century of Speech 日 Speaking style Read for asr in earlier research, or for Tts Spontaneous/ conversational: for ASR nowadays 口 Recording channel 8 Depending on goal of task or application, or the application environment Close-talk microphones: for personal computers(PCs) Telephone, and or cellular phone: for telephony applications Specific channel: for embedded applications(PDA, digital recorder, .) or broadcast news, TV news. Normally mono channel instead of stereo channel 4 However, microphone array may be used for some research purpose

Your Partner in the Century of Speech 8 Factors to be considered in data creation (2) ❑Speaking style :- ❖Read: for ASR in earlier research, or for TTS ❖Spontaneous/conversational: for ASR nowadays ❑Recording channel ❖Depending on goal of task or application, or the application environment ▪ Close-talk microphones: for personal computers (PCs) ▪ Telephone, and/or cellular phone: for telephony applications ▪ Specific channel: for embedded applications (PDA, digital recorder, ...), or broadcast news, TV news. ❖Normally mono channel instead of stereo channel. ❖However, microphone array may be used for some research purpose

ecur 得意音通技术 9 Factors to be considered in data creation (3) Your Partnerin the Century of Speech 口 Sampling rate: s8 kHz: for the telephone/ mobile-phone channel where the bandwidth is about 3. 4 khz 16 kHz: for the close-talk microphone PC channel though the bandwidth is higher than 8 kHz 日 Sampling precision: ☆16bits, normally. 88-bit A-law or Miu-law(13-bit wide after decompression) a Signal-to-Noise Ratio ( snr) level s Was/is often collected in a good environment (clean speech database For noise-related research, noisy data obtained via Noises(noiseX 92 )mixed with clean speech Collected in real-world noisy environments

Your Partner in the Century of Speech 9 Factors to be considered in data creation (3) ❑ Sampling rate :- ❖ 8 kHz: for the telephone/mobile-phone channel where the bandwidth is about 3.4 kHz ❖ 16 kHz: for the close-talk microphone PC channel though the bandwidth is higher than 8 kHz. ❑ Sampling precision :- ❖ 16 bits, normally. ❖ 8-bit A-law or Miu-law (13-bit wide after decompression). ❑ Signal-to-Noise Ratio (SNR) level: ❖ Was/is often collected in a good environment (clean speech database). ❖ For noise-related research, noisy data obtained via :- ▪ Noises (NOISEX 92) mixed with clean speech; ▪ Collected in real-world noisy environments

ecur 得意音通技术 10 Factors to be considered in data creation(4) Your Partnerin the Century of Speech U Number of speakers and speaker balance The more, the better: with a good speaker diversity according to Gender ge ■ Education Birthplace or dialectal background Occupation and so on 日 Corpus size: B Measured by either the number of speakers or the length of valid speech in hour, or both

Your Partner in the Century of Speech 10 Factors to be considered in data creation (4) ❑Number of speakers and Speaker balance: ❖The more, the better: with a good speaker diversity, according to :- ▪ Gender; ▪ Age; ▪ Education; ▪ Birthplace (or dialectal background); ▪ Occupation; ▪ and so on. ❑Corpus size: ❖Measured by either the number of speakers or the length of valid speech in hour, or both

点击下载完整版文档（PPT格式）

共67页，可试读20页，点击继续阅读 ↓↓

点击下载（PPT格式）

浏览记录