相关文档

《深度自然语言处理》课程教学课件（Natural language processing with deep learning）07 Language Model & Distributed Representation（4/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）05 Language Model & Distributed Representation（2/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）06 Language Model & Distributed Representation（3/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）04 Language Model & Distributed Representation（1/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）03 Fundamental Tasks of NLP
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）01 About the course
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）02 What is NLP, why NLP and How NLP
佛山大学（佛山科学技术学院）：2022年版计算机科学与技术专业理论课程教学大纲汇编
佛山大学（佛山科学技术学院）：2022年版物联网实践课程教学大纲汇编
佛山大学（佛山科学技术学院）：2022年版智能科学与技术专业理论课程教学大纲汇编
佛山大学（佛山科学技术学院）：2022年版物联网实验课程教学大纲汇编
《物联网导论》课程教学资源（PPT课件）第16章 SLAM空间智能计算
《物联网导论》课程教学资源（PPT课件）第15章低功耗广域网
《物联网导论》课程教学资源（PPT课件）第14章毫米波感知
《物联网导论》课程教学资源（PPT课件）第13章无源感知系统
《物联网导论》课程教学资源（PPT课件）第12章智慧工业
《物联网导论》课程教学资源（PPT课件）第11章智慧供应链
《物联网导论》课程教学资源（PPT课件）第10章智能建筑
《物联网导论》课程教学资源（PPT课件）第6章新兴通信技术
《物联网导论》课程教学资源（PPT课件）第5章移动互联网
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）09 Language Model & Distributed Representation（6/6）
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）12 sentiment analysis
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）11 coreference resolution
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）10 information extraction
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）15 Machine translation
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）14 Question Answering
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）16 Natural Language Generation
《深度自然语言处理》课程教学课件（Natural language processing with deep learning）17 Deep leanring Programing framework
全国信息安全标准化技术委员会：大数据安全标准化白皮书（2018 版）
沈阳师范大学：《大学计算机基础》课程教学大纲 Fundamentals of University Computer A
沈阳师范大学：《大学计算机基础》课程授课教案（讲义，共五章，任课教师：刘冰）
《大学计算机基础》课程教学资源（教案讲义，共五章，沈阳师范大学：刘冰）
《大学计算机基础》课程教学大纲 Fundamentals of University Computer A
《大学计算机基础》课程教学资源（PPT课件，完整讲稿，共五章）
《数据库技术与应用》课程教学资源（授课教案）第1章数据库基础、第2章数据库和表（沈阳师范大学：安晓飞）
沈阳师范大学：《大学计算机基础》课程教学资源（PPT课件，完整讲稿，共五章）
沈阳师范大学：《数据库原理》课程教学大纲 DataBase Principle
沈阳师范大学：《数据库原理》课程授课教案（讲义，共十章，主讲：马佳琳）
沈阳师范大学：《数据库原理》课程教学课件（讲稿）第10章数据库恢复技术
沈阳师范大学：《数据库原理》课程教学课件（讲稿）第11章并发控制

《深度自然语言处理》课程教学课件（Natural language processing with deep learning）08 Language Model & Distributed Representation（5/6）

团购合买资源类别：文库，文档格式：PDF，文档页数：94，文件大小：2.72MB

西安交通大学Natural languageprocessingwith deeplearningXIANHAOTONGUNIVERSITYLanguage Model&Distributed Representation (5)交通大学ChenLicli@xjtu.edu.cn2023

Chen Li cli@xjtu.edu.cn 2023 Language Model & Distributed Representation (5) Natural language processing with deep learning

Outlines1.Self-attention2. Transformer3. Pre-training LM

Outlines 1. Self-attention 2. Transformer 3. Pre-training LM

Outlines1.Self-attention2. Transformer3. Pre-training LM

Outlines 1. Self-attention 2. Transformer 3. Pre-training LM

Self-attentionSelf-Attentionyt=f(at,A,B)Where AandB areanother sequence (matrix)交通大学

Self-attention l Where A and B are another sequence (matrix) l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)WhereA andB areanotherseguence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention交通大学

Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Where A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!

Self-attentionSelf-Attentionyt = f(at, A, B)Completely out ofthetraditional RNNorCNNframeworkWhere A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!

Self-attention l Where A and B are another sequence (matrix) l If take A(key)= B(value) = X(query), then it is called self attention. l It means to compare Xt with the original words and calculate Yt at last! Completely out of the traditional RNN or CNN framework l Self-Attention

Self-attentionSelf-Attentionyt = f(at, A, B)Completelyoutofthetraditional RNNorCNNframeworkWhere A and B are another sequence (matrix)If take A(key)= B(value) = X(query), then it is called selfattention.It means to compare X, with the original words and calculate Yat last!Fasterand can directly get globalinformation!

Self-attentionSelf-AttentionKeylKey2Key3Key4AttentionQueryValueValuelValue2Value3Value4Source交道大学

Self-attention l Self-Attention

Self-attentionSelf-AttentionKeylKey2Key3Key4KeylKey2Key3Key4AttentionQueryValueStep1QueryF(Q,K)F(QK)FIQKF(Q,K)ValuelValue2Value3Value4s2s3s4SSourceSoftMax(Step2Calculationprocess:Step 1:calculatingthesimilarityAttentionbetweenqueryandkeytogettheValueweightsStep3ValuelValue2Value3Value4

Self-attention l Self-Attention Step 1 Step 2 Step 3 Calculation process: lStep 1: calculating the similarity between query and key to get the weights

点击下载完整版文档（PDF格式）

共94页，可试读20页，点击继续阅读 ↓↓

点击下载（PDF格式）

浏览记录