NLP Text Mining Machine Learning Natural Language Processing,Topic Modeling and Neural Text Generation Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering Xidian University
Natural Language Processing, Topic Modeling and Neural Text Generation Yueshen Xu (lecturer) ysxu@xidian.edu.cn Software Engineering Xidian University NLP & Text Mining & Machine Learning
Outline 历些毛子种枝大” XIDIAN UNIVERSITY Natural Language Processing Language Understanding,Language Modeling and Language Generation ▣Topic Modeling ■Basic Topic Modeling Hierarchical Topic Modeling Neural Text Generation ▣Ali Xiaomi Supplement Reference Keywords:natural language processing,topic modeling,Bayesian model,neural network
2017/10/25 Software Engineering Outline Natural Language Processing Language Understanding, Language Modeling and Language Generation Topic Modeling Basic Topic Modeling Hierarchical Topic Modeling Neural Text Generation Ali Xiaomi Supplement & Reference 2 Keywords: natural language processing, topic modeling, Bayesian model, neural network
Natural Language Processing 历安毛子代枚大学 XIDIAN UNIVERSITY Language Understanding To understand the structure,relation,constitution of linguistic elements>Computational Linguistics Language Modeling To find latent structures,relations and rules of or in Natural Language text corpus>Text Mining (Not always automatic) Processing (NLP) Language Generation To generate different types of linguistic texts Artificial Intelligence Multimedia Related to Speech,Graphics,Video ←→Artificial Intelligence
2017/10/25 Software Engineering Natural Language Processing 3 Natural Language Processing (NLP) Language Understanding Language Modeling Language Generation To find latent structures, relations and rules of or in text corpus Text Mining (Not always automatic) To understand the structure, relation, constitution of linguistic elements Computational Linguistics To generate different types of linguistic texts Artificial Intelligence Related to Speech, Graphics, Video Artificial Intelligence Multimedia
Natural Language Processing 历安毛子代枚大学 XIDIAN UNIVERSITY >Language Understanding Language Understanding (a few) ■Stemming:runs,ran,running→run ■Segmentation:我是一名大学老师→我/是/一名/大学/老师 Part of speech(POS):I am a teacher>I(pronoun)am (copula) a(article)teacher(noun) ■Dependency parsing: tmod punct punct nsubj advmod det advmod advmod -Root-This time around they 're moving even faster ■Coreference:小明和小江去吃饭,他说饭很好吃→他?
2017/10/25 Software Engineering Natural Language Processing Language Understanding Language Understanding (a few) Stemming: runs, ran, running run Segmentation:我是一名大学老师 我 / 是 / 一名 / 大学 / 老师 Part of speech (POS):I am a teacher I (pronoun) am (copula) a (article) teacher (noun) Dependency parsing: Coreference: 小明和小江去吃饭,他说饭很好吃 他?
Natural Language Processing 历些毛子种技大” XIDIAN UNIVERSITY Language Understanding Language Understanding 妈妈把旧窗帘撕成了抹布 依存权 这价格比我预料的高 解放军更早在四月就进入学生运动的发起地、仿如地票央的北京大学。 投资环境的议善,吸引了国内外大财团、大企业的雄享资金、先进经验、先进技术接速而至。 分析 句子视图篇章视图XML视图 ·词性标注命名实体口句法分析口语义角色标注口语义依存(树)分析语义依存(图)分析 段落1句子1妈妈把日窗帘撕成了抹布。 ADV- PO VOB Root 5
2017/10/25 Software Engineering Natural Language Processing Language Understanding Language Understanding 5
Natural Language Processing 历忠毛子代枚大学 XIDIAN UNIVERSITY →Language Modeling Language Modeling (a few) ≈Text Mining Text/Document Clustering Text/Document Classification ■Topic Modeling Hierarchical topic modeling Sentiment Classification >Aspect-level sentiment classifiction Entity (Relation)Extraction ■..etc 6
2017/10/25 Software Engineering Natural Language Processing Language Modeling Language Modeling (a few) ≈ Text Mining Text/Document Clustering Text/Document Classification Topic Modeling ➢ Hierarchical topic modeling Sentiment Classification ➢ Aspect-level sentiment classifiction Entity (Relation) Extraction …etc 6
Natural Language Processing 历些毛子代拔大》 XIDIAN UNIVERSITY →Language Generation Language Generation (a few) Machine Translation Document Summarization ■Q&A(小冰,小娜) ■Poetry Generation ■News Generation Short Text Generation(sentence,weibo) ■.etc Topic Modeling
2017/10/25 Software Engineering Natural Language Processing Language Generation Language Generation (a few) Machine Translation Document Summarization Q&A (小冰,小娜) Poetry Generation News Generation Short Text Generation (sentence, weibo) …etc 7 Topic Modeling
Topic Modeling 历忠毛子代枚大学 XIDIAN UNIVERSITY Information Overloading Big Data Cloud Computing Artificial Intelligence TODAY ⊕@ Deep Learning Chinese International Travel Monitor 2015 at o glance Hoteis.com .…,etc /0 we need summarization 0m> Visualization 相微〉阳强 惠%>是°x Dimensional 签芝iii 5 Reduction 麻人
2017/10/25 Software Engineering Topic Modeling Information Overloading 8 we need summarization Visualization Dimensional Reduction Big Data Cloud Computing Artificial Intelligence Deep Learning …, etc
Background 历些毛子种枝大等 XIDIAN UNIVERSITY Dimensional Reduction(Text) Document Summarization What do these docs (or this doc)talk about? Laptop Reviews ■Sentiment Analysis What do these consumers care about or complain about? Short Text/Tweets Mining 目e1e 可 dnarents didn't coma to America all What are people discussing about? 7 2 ▣Basic tool 动1且亲月去日B7 Topic modeling:learn latent semantic topics from a corpus/ text collection
2017/10/25 Software Engineering Background Dimensional Reduction(Text) Document Summarization What do these docs (or this doc) talk about? Sentiment Analysis What do these consumers care about or complain about? Short Text/Tweets Mining What are people discussing about? 9 Basic tool Topic modeling: learn latent semantic topics from a corpus/ text collection
Topic Modeling 历忠子代枚大号 XIDIAN UNIVERSITY ▣Topic modeling an example in Chinese (from my doctorate thesis) Corpus 继续实施稳健的货币政策,保 从员额上来看,这次改革远远超 持松紧适度适时预调微调,做 过了裁军的数量,它是一种结构 好与供给侧结构,并综合运用 性的改革,是军队组织结构现代 Doc4 Do 数量、价格等多种货币政策 化的一个关键步骤 美元作为主要国际货币的地位在 独立学院从母体高校“断奶”后 可预见的将来仍无可取代,唯 可能会面临品牌、招生等方面阵 的出路是推动全球治理向更均衡 痛,但是在国家和省市鼓励民间 Doc1 的方向发展。国际货币基金组织 资本进入教育领域的实施意见发 布后,一些独立学院果断切割连 Doc2 总裁拉加德日前在美国马里兰大 学演讲时就呼吁,国际治理改革 接母体大学的“脐带”,自立门 应认清新兴经济体越来越重要这 户发展。 十现实。 10
2017/10/25 Software Engineering Topic Modeling Topic modeling an example in Chinese (from my doctorate thesis) 10 继续实施稳健的货币政策,保 持松紧适度适时预调微调,做 好与供给侧结构,并综合运用 数量、价格等多种货币政策 从员额上来看,这次改革远远超 过了裁军的数量,它是一种结构 性的改革,是军队组织结构现代 化的一个关键步骤 美元作为主要国际货币的地位在 可预见的将来仍无可取代,唯一 的出路是推动全球治理向更均衡 的方向发展。国际货币基金组织 总裁拉加德日前在美国马里兰大 学演讲时就呼吁,国际治理改革 应认清新兴经济体越来越重要这 一现实。 独立学院从母体高校“断奶”后, 可能会面临品牌、招生等方面阵 痛,但是在国家和省市鼓励民间 资本进入教育领域的实施意见发 布后,一些独立学院果断切割连 接母体大学的“脐带”,自立门 户发展。 Corpus Doc1 Doc2 Doc3 Doc4