Research context Multi-head self-attention layer yo y:y2 Concatenate head head headH-1 yo y1 y2 yo y1 y2 yo y1 y2 ↑↑↑ Self-attention Self-attention Self-attention Xo x1 x2 Split X1X2 国产之大当 2024/5/13 5 Research context 2024/5/13 5 Multi-head self-attention layer x2 x1 x0 Self-attention y0 y1 y2 x2 x1 x0 Self-attention y0 y1 y2 x2 x1 x0 Self-attention y0 y1 y2 head0 head1 ... headH-1 x2 x1 x0 y0 y1 y2 Concatenate Split