Research context 2024/5/13 4 Masked s_中国高校课件下载中心

点击下载：广东工业大学：《机器学习》课程教学资源（课件讲义）第20讲预训练模型 Pre-training of Deep Bidirectional Transformers for Language Understanding（授课：周郭许）

正在加载图片...

Research context Masked self-attention layer yo y1 y2 Outputs: mul(-)+add (t) context vectors:y (shape:D) Vo Operations: Prevent vectors from Key vectors:k=xW 2 looking at future vectors. Value vectors:v =xW Query vectors:g =xW Manually set alignment softmax (1) Alignment:e.=q·k/VD scores to-infinity Attention:a softmax(e) a. Output::y=∑ay Inputs: q1 Input vectors:x(shape:N x D) 重)亲大学 2024/5/13 4Research context 2024/5/13 4 Masked self-attention layer mul(→) + add (↑) Alignment q0 Attention Inputs: Input vectors: x (shape: N x D) softmax (↑) y1 Outputs: context vectors: y (shape: Dv) Operations: Key vectors: k = xWk Value vectors: v = xW v Query vectors: q = xWq Alignment: ei,j = qj ᐧ ki / √D Attention: a = softmax(e) Output: yj = ∑i ai,j vi x2 x1 x0 -∞ -∞ 0 0 a0,0 -∞ e2,2 e1,1 e1,2 e0,0 e0,1 e0,2 0 a2,2 a0,1 a0,2 a1,1 a1,2 q1 q2 y2 y0 Input vectors k2 k1 k0 v2 v1 v0 - Prevent vectors from looking at future vectors. - Manually set alignment scores to -infinity

<<向上翻页向下翻页>>