9 NOILOHS 'GI HLdVHO (XIHHIH8)NOILINDOOHH HOHdS
y) (briefl recognition Speech 6 Section 15, Chapter 1 6 Section 15, Chapter
乙9o19S‘I deD s3uanb3sP1oM◇ uolelounuo1dP1oM◇ spunos yo∂3dS◇ 3uaau!os!jqeqoud se4p∂3dS◇ aurlinO
Outline inference robabilistic p as eech Sp ♦ sounds eech Sp ♦ ronunciation p rd o W ♦ sequences rd o W ♦ 2 6 Section 15, Chapter
aouanbas uonenasqo ayl s!oubis 'aouanbas anens uapply ay1 ae sp.loM lepow a3enguel epow osnooe oju!sesodwoap (spoM)d(sp.oMloubis)do=(Toubis spoM)d :ain sakeg asn (2Du6s|sp.0M)d3 zIwixew o1sp.l0M∂sooyp‘31 ileugis ypaads ayn uanl8 'aouanbas pom KIax!I qsou ayn s!eyM snongiqwe 'a]qeuen 'Ksiou aJe sjeugis ypaads yoeaq aoru e yaIm 0]Asea qou s.I aouolojur ons!qeqold se yooads
inference probabilistic as h eec Sp h eac b nice a k wrec to easy not It’s ambiguous riable, va , noisy re a signals eech Sp signal? eech sp the given sequence, rd ow ely lik most the is What ) nal sig | ds or W( P maximize to ds or W ose cho I.e., rule: es’ y Ba Use ) ds or W( P) ds or W| nal sig ( Pα =) nal sig | ds or W( P del mo language + del mo acoustic into oses decomp I.e., sequence observation the is nal sig sequence, state hidden the re a ds or W 3 6 Section 15, Chapter
9 monos'eI tadero [ue I A!s]/[3u x!I A!s]/[3u y!I K!s]s 3ul! uonnq [ua] SuIs [8u SOso [网 13还 [M] 1[ 可 11百9 [e] 1eq7 [yp] 48可 [4 leoq MO I74 [ 4e可 [44 1q3noq oe] 1e5 [s] 1qΘ币 [p] 10q [Ke] ieJ 万 [p] 9 [g 48d [d] 19q [q] leaq [网 ys!3u ueouawy joj pauBisap 1aqVdyV Japow auoyd Jepow uonelounuod Japow onsnooe leugis pue spiom uaamjaq salens uapply jo Janal alelpawjalu!ue woy (Moe'spo)1eoA3nauo4‘y1331'sd)s04 ejn31μe yo uor4en8yuo ay1 Aq paulwalap 'sauoyd 0g-0t woj pasodwo s!ypaads uewny llV sououd
Phones the yb determined , phones 40-50 from osed comp is eech sp human All w) flo air rds, co cal vo tongue, teeth, (lips, rs rticulato a of configuration signal and rds ow een wet b states hidden of level intermediate an rmoF del mo phone + del mo ronunciation p = del mo acoustic ⇒ English American r fo designed et Ab ARP et p [p] et b [b] t ea b [iy] at r [r] et Ch [ch] ti b [ih] et s [s] ebt d [d] t eb [ey] ick th [th] at h [hh] t ough b [ao] at th [dh] igh h [hv] t oa b w] [o et w [w] et l [l] t er B [er] on butt [en] ng si [ng] s e ros [ix] . . . . . . . . . . . . . . . . . . en] l iy [s / ng] ix l iy [s / ng] ih l iy [s is “ceiling” E.g., 4 6 Section 15, Chapter
wnpads Jamod ay ul syead-squewoj Klleoid]ae saneaj awey 668 28 Lo as :s3.Injeo qHIM s3wI EL ET OT 8E GT OT :jeudis I8j1ip pazquenb paldwes jeuBis ousnooe Boleuy saunieaj Aq paquosap yoea 'saweyy swoc Suiddejano onul passaooud :awn jo uonouny e se quawapejdsip auoydooiw ay s!jeugis Mey spunos yoaadS
sounds h eec Sp time; of function a as displacement microphone the is signal w Ra features yb ed describ each , frames 30ms overlapping into cessed ro p Analog acoustic signal: Sampled, quantized digital signal: Frames with features: 10 15 38 52 47 82 22 63 24 89 94 11 10 12 73 ectrum sp er wop the in eaks —p rmants fo ypically t re a features rame F 5 6 Section 15, Chapter
99o19S‘I deD y1331 quoy asulee an3uoa sey u!] suonlsod uaamiaq Kjsnoauequensul yollMs qouue pue elaul 3 ney sJo4eInIμeayH:spay9 uone]no1μeo)8 ullpuey joj injasn s3uo4du⊥ (i.e wo up)[(ee's)]um s!es u! ny8u pue yal son souoyd ay1 uo Sulpuadap 'sauoyd 1unsip u sawoaq auoyd ypea :1xaquo auoydu (aspyd auoyd samqoaf)d pu uiss!y‘pIW3Iso1dx3‘asu01 ualls sey:83 (pug 'p!W 'suO)saseyd aay sey auoyd yoea :sauoyd anens-aay1 sueissnes jo aunaxiw e jo sialawejed ayl- o:(uoez4 uenb Jo43 Buisn)[GSZ·“0]u!Ba4uIue- Kq pazuewwns (auoyd sa.ingnaf)d u!saineaj awely slapow auoud
dels mo Phone yb rized summa ) phone | es eatur f( P in features rame F r o ); quantization r vecto (using 255] . . . [0 in integer an – Gaussians of mixture a of rameters pa the – End) Mid, (Onset, phases three has phone each : phones Three-state End hissing Mid, explosive Onset, silent has [t] E.g., ) phase phone, | es eatur f( P ⇒ n ecomes b phone each : context riphone T 2 on ending dep phones, distinct right and left its to phones the r”!) “ta from (different [t(s,aa)] written is r” “sta in [t] E.g., have rs rticulato a the effects: rticulation coa handling r fo useful riphones T ositions p een wet b instantaneously switch cannot and inertia teeth front against tongue has “eighth” in [t] E.g., 6 6 Section 15, Chapter
0L0 1090 80:80 9090 ㄥ00 Z0:Z0 L0: 乙00 90:O :pu :PIW esuO WWH euoyd au]Joy sen!l!qeqoud indino pu归 PIW 90 L'O 10 t0 6^0 80 :[w]Joy WWH auoud odwexa Japou auoud
example del mo Phone Phone HMM for [m]: 0.1 0.9 0.3 0.6 0.4 C1: 0.5 C2: 0.2 C3: 0.3 C3: 0.2 C4: 0.7 C5: 0.1 C4: 0.1 C6: 0.5 C7: 0.4 Output probabilities for the phone HMM: End: Mid: Onset: FINAL 0.7 End Mid Onset 7 6 Section 15, Chapter
8 9 monos'eI tadero enep woly paueal san!llqeqoud uonlsuen K]lenuew parea s!aunonns 0=(oewo,[mogoDuyo)d=(oewo,[mogfauy)d I'0=(oewo,,mogooumo=(oewo,,moifiaumo?)d ee 01 ye 80 MO [ [w] [ 01 O 0 MO 乙0 lepow uollsue WWH ue se paquasauda uolinq!s! saouanbas auoyd Jano uolnquisip e se paquosap s!piom yoe slopow uolerounuold pIoM
dels mo unciation pron ord W sequences phone over distribution a as ed describ is rd ow Each del mo transition HMM an as resented rep Distribution 0.5 0.5 0.2 0.8 [m] [ey] [ow] [t] [aa] [t] [ah] [ow] 1.0 1.0 1.0 1.0 1.0 1. 0 =) “tomato” |] maatow tow ([ P =) “tomato” |] tow mey tow ([ P 4. 0 =) “tomato” |] tahmaatow ([ P =) “tomato” |] tow tahmey ([ P data from rned lea robabilities p transition , manually created is Structure 8 6 Section 15, Chapter
Koeunoe %66-g6 ypea 3uulen y!m swasAs uone!p poM-palelos] (x)=(p.iom]a)d uoyt pue (1+1)HVMHO=1+F1 anepdn anisinoa ayl asn pue (6X)d=7 auyap :KJeAisina paindwoo aq ue (p.lom|a)d salpuanbayy piom Sununoo Kq Kjdwls paulenqo(p.iom)I Kqeqod oud (piomd(piom F3)=(a piom)d piom palejos!oj (p.iom 1ta)d pooy!jay!l xy sjapou pom sjapou auoud spIOM palelosT
ords w Isolated rd ow isolated r fo ) d or w|t 1: e( P do eliho lik fix dels mo rd ow + dels mo Phone ) d or w( P) d or w|t 1: e( Pα =)t 1: e| d or w( P frequencies rd ow counting yb simply obtained ) d or w( Py robabilit p r Prio define recursively: computed eb can ) d or w|t 1: e( P )t 1: e,t X( P=t 1: ` date up recursive the use and ) +1 t e,t 1: `( ard w or F = +1 t 1: ` x Σ =) d or w|t 1: e( P then and t )t x(t 1: ` accuracy 95–99% reach training with systems dictation rd o Isolated-w 9 6 Section 15, Chapter
019o1139S91dD Kep poo3 e uo Koeunoe %08-09 a3euew swe]sAs ypaads snonuluo) ulyxu,a-uonejnoleo poM-sso- ypaads ul sde8 Mey aue alayn uonequawas- spiom go aouanbas Kjay!l isow spioM Kjay!l nsow go aouanbas- panejauoo Kjy3ly spiom quapelpy- iswalqoud uo!!u3o poM-paejos!o aouanbas e asn!oN yoaads snonurquo
h eec sp uous tin Con roblems! p recognition rd o isolated-w of sequence a just Not rrelated co highly rds ow Adjacent – rds ow of sequence ely lik most 6= rds ow ely lik most of Sequence – eech sp in gaps few re a there Segmentation: – thing” “next rticulation—e.g., coa rd o Cross-w – y da do go a on accuracy 60–80% manage systems eech sp Continuous 10 6 Section 15, Chapter