Self-attention Hung-yi Lee 李宏毅
Hung -yi Lee 李宏毅 1
Sophisticated Input 。Input is a vector Model →Scalar or Class Input is a set of vectors 000- Model →Scalars or Classes (may change length)
Sophisticated Input • Input is a vector • Input is a set of vectors Model Scalar or Class Model Scalars or Classes (may change length) 2
this is a cat Vector Set as Input 1000 One-hot Encoding Word Embedding apple=[10000…] bag=[01000.] dog rabbit orun ●● cat=[00100…] ojump cat dog=[00010…] otree o flower elephant =[0000 1......] To learn more:https://youtu.be/X7PH3NuYWOQ(in Mandarin)
Vector Set as Input this is a cat dog cat rabbit jump run flower tree apple = [ 1 0 0 0 0 …… ] bag = [ 0 1 0 0 0 …… ] cat = [ 0 0 1 0 0 …… ] dog = [ 0 0 0 1 0 …… ] elephant = [ 0 0 0 0 1 …… ] One-hot Encoding Word Embedding To learn more: https://youtu.be/X7PH3NuYW0Q (in Mandarin) 3
Vector Set as Input 10ms 1s→100 frames 25ms 400 sample points(16KHz) frame 39-dim MFCC 80-dim filter bank output
Vector Set as Input 10ms 25ms 400 sample points (16KHz) 39-dim MFCC 80-dim filter bank output frame 1s → 100 frames 4
https://medium.com/analytics-vidhya/social- network-analytics-f082f4e21b16 Vector Set as Input Graph is also a set of vectors(consider each node as a vector) Each profile is a vector 5
Vector Set as Input • Graph is also a set of vectors (consider each node as a vector) https://medium.com/analytics-vidhya/socialnetwork-analytics-f082f4e21b16 Each profile is a vector 5
http://www.twword.com/wiki/%E5%8 8%86%E5%AD%90 Vector Set as Input Graph is also a set of vectors (consider each node as a vector) H=[10000.] C=[01000..] 0=[00100…] One-hot vector
Vector Set as Input • Graph is also a set of vectors (consider each node as a vector) http://www.twword.com/wiki/%E5%8 8%86%E5%AD%90 One-hot vector H = [ 1 0 0 0 0 …… ] C = [ 0 1 0 0 0 …… ] O = [ 0 0 1 0 0 …… ] …… 6
What is the output? Each vector has a label. 000→ Model N N Example Applications not I saw a saw N V DET N a a b b POS tagging HW2 buy buy
Model What is the output? • Each vector has a label. N N I saw a saw N V DET N a a b b HW2 buy buy not Example Applications POS tagging 7
What is the output? Each vector has a label. 00→ Model N N The whole sequence has a label. 0000→ Model Example Applications this is good Sentiment analysis HW4 positive speaker hydrophilicitya
Model What is the output? • Each vector has a label. • The whole sequence has a label. N N Model this is good positive speaker HW4 hydrophilicity Example Applications 8 Sentiment analysis
What is the output? Each vector has a label. focus of this lecture Model N N The whole sequence has a label. Model Model decides the number of labels itself seq2seq 0L→ Model N Translation (HW5) N
Model What is the output? • Each vector has a label. N N • Model decides the number of labels itself. N N’ Model Translation (HW5) • The whole sequence has a label. Model seq2seq focus of this lecture 9
Sequence Labeling Is it possible to consider the context? FC can consider the neighbor FC Fully- connected How to consider the whole sequence? a window covers the whole sequence? FC FC FC FC window saw a saw 10
Sequence Labeling FC FC FC FC Is it possible to consider the context? I saw a saw FC Fullyconnected FC can consider the neighbor How to consider the whole sequence? window a window covers the whole sequence? 10