中国社会科学院大学：通识选修《网络爬虫与数据采集》课程教学大纲.pdf_大学文库

中国社会科学院大学《网络爬虫与数据采集》课程大纲课程基本信息（CourseInformation）*学时*学分课程编号482(CourseID)(Credit Hours)(Credits)网络爬虫与数据采集*课程名称(CourseName)Webcrawler anddata collection先修课程(Prerequisite Courses)在信息时代，以几何级数增加的数据形成了海量数据集，如何从这些数据中寻找有价值的信息就变得十分重要，大数据及数据分析技术已经成为信息技术中十分重要的一个环节。随着互联网+的不断发展，在所有行业中，数据分析技术都能发挥一定作用。中国特色社会主义进入新时代，实现中华民族伟大复兴的中国梦开启新征程。党中央决定实施国家大数据战略，吹响了加快发展数字经济、建设数字中国的号角。习近平总书记在十九届中共中央政治局第二次集体学习时的重要讲话中指出：“大数据是信息化发展的新阶段”，并做出了“推动大数据技术产业创新发展、构建以数据为关键要素的数字经济、运用大数据提升国家治理现代化水平、运用大数据促进保障和改善民生、切实保障国家数据安全”的战略部署，为我国构筑大数据时代国家综合竞争新优势指明了方向。*课程简介近年来，在互联网向智能时代迈进的过程中，数据发挥了巨大的推动作用。互联网每天都会产(Description)生大量数据，人们已经认识到蕴含在这些数据中的巨大价值，要想充分利用数据首先就需要获取数据。网络肥虫就是从互联网中获取数据的主要手段之一，Python爬虫框架Scrapy简单易用、灵活易拓展、文档丰富、开发社区活跃，使用Scrapy可以高效地开发网络爬虫应用。《Python网络爬虫》这门课程旨在让同学们了解网络爬虫的工作原理以及使用Scrapy爬虫框架从网络中获取数据，课程详细深入地介绍了Python流行框架Scrapy的核心技术及网络爬虫的开发技巧。课程从逻辑上可分为基础篇和进阶部分，基础部分重点介绍Scrapy的核心元素，如spider、selector、item、link等：进阶部分讲解爬虫的高级话题，如登录认证、文件下载、执行JavaScript、动态网页爬取、使用HTTP代理等，并配合项目案例讲解。《Python网络爬虫》很适合有一定Python语言基础，想学习编写复杂网络爬虫的学生选修。In the information age, massive data sets are formed by geometrically increasing data.It isveryimportant tofind valuable information fromthesedata.Big dataanddataanalysistechnologyhavebecomeaveryimportantpartofinformationtechnology.Withthecontinuous developmentof Internet+,dataanalysistechnology can playa certain role inall industries.Socialism with Chinese characteristics has entered a newera,and a newjourneyhasbeguntorealizetheChinesedreamofthegreatrejuvenationoftheChinesenation.TheParty Central Committeedecidedto implement the national bigdatastrategy*课程简介sounding the clarion call to accelerate the development of the digital economy and build(Description)a digital China.General SecretaryXi Jinping pointed out in his important speech at thesecond collective study of the Political Bureau of the 19th CPC Central Committee:"Bigdata isanewstageof informatizationdevelopment",andmade"promoting theinnovation and development of the big data technology industry,building a The strategicdeployment ofthedigital economywithdataas thekeyelement,theuseofbigdatatoimprovethe modernizationlevel of nationalgovernance, theuse of bigdata to promotetheprotection and improvementofpeople's livelihood,and the effectiveprotection ofnationaldatasecurityhaspointed outthedirectionformycountrytobuildanewnational

中国社会科学院大学《网络爬虫与数据采集》课程大纲课程基本信息（Course Information）课程编号（Course ID） *学时（Credit Hours） 48 *学分（Credits） 2 *课程名称（Course Name）网络爬虫与数据采集 Web crawler and data collection 先修课程（Prerequisite Courses） *课程简介（Description）在信息时代，以几何级数增加的数据形成了海量数据集，如何从这些数据中寻找有价值的信息就变得十分重要，大数据及数据分析技术已经成为信息技术中十分重要的一个环节。随着互联网+ 的不断发展，在所有行业中，数据分析技术都能发挥一定作用。中国特色社会主义进入新时代，实现中华民族伟大复兴的中国梦开启新征程。党中央决定实施国家大数据战略，吹响了加快发展数字经济、建设数字中国的号角。习近平总书记在十九届中共中央政治局第二次集体学习时的重要讲话中指出：“大数据是信息化发展的新阶段”，并做出了“推动大数据技术产业创新发展、构建以数据为关键要素的数字经济、运用大数据提升国家治理现代化水平、运用大数据促进保障和改善民生、切实保障国家数据安全”的战略部署，为我国构筑大数据时代国家综合竞争新优势指明了方向。近年来，在互联网向智能时代迈进的过程中，数据发挥了巨大的推动作用。互联网每天都会产生大量数据，人们已经认识到蕴含在这些数据中的巨大价值，要想充分利用数据首先就需要获取数据。网络爬虫就是从互联网中获取数据的主要手段之一，Python 爬虫框架 Scrapy 简单易用、灵活易拓展、文档丰富、开发社区活跃，使用 Scrapy 可以高效地开发网络爬虫应用。《Python 网络爬虫》这门课程旨在让同学们了解网络爬虫的工作原理以及使用 Scrapy 爬虫框架从网络中获取数据，课程详细深入地介绍了 Python 流行框架 Scrapy 的核心技术及网络爬虫的开发技巧。课程从逻辑上可分为基础篇和进阶部分，基础部分重点介绍 Scrapy 的核心元素，如 spider、 selector、item、link 等；进阶部分讲解爬虫的高级话题，如登录认证、文件下载、执行 JavaScript、动态网页爬取、使用 HTTP 代理等，并配合项目案例讲解。《Python 网络爬虫》很适合有一定 Python 语言基础，想学习编写复杂网络爬虫的学生选修。 *课程简介（Description） In the information age, massive data sets are formed by geometrically increasing data. It is very important to find valuable information from these data. Big data and data analysis technology have become a very important part of information technology. With the continuous development of Internet+, data analysis technology can play a certain role in all industries. Socialism with Chinese characteristics has entered a new era, and a new journey has begun to realize the Chinese dream of the great rejuvenation of the Chinese nation. The Party Central Committee decided to implement the national big data strategy, sounding the clarion call to accelerate the development of the digital economy and build a digital China. General Secretary Xi Jinping pointed out in his important speech at the second collective study of the Political Bureau of the 19th CPC Central Committee: "Big data is a new stage of informatization development", and made "promoting the innovation and development of the big data technology industry, building a The strategic deployment of the digital economy with data as the key element, the use of big data to improve the modernization level of national governance, the use of big data to promote the protection and improvement of people's livelihood, and the effective protection of national data security has pointed out the direction for my country to build a new national

comprehensive competitiveadvantage in theera of big data.Inrecent years, data has playedahuge role inpromoting the Internettothe era ofintelligence.The Internetgenerates a largeamount of data everyday,and people haverealized the great value contained in this data. To make fulluse of the data, it is necessaryto acquire the data first. Web crawler is one of the main means to obtain data from theInternet.ThePython crawlerframework Scrapyis easytouse,flexibleandeasyto expand,rich indocumentation,and activeinthedevelopmentcommunity.Using Scrapycanefficientlydevelopweb crawlerapplications.The course"Python Web Crawler"aims to let students understand the working principleofwebcrawlersandusetheScrapycrawlerframeworktoobtaindatafromthenetwork.The course introduces indetail thecoretechnologyof thepopularPythonframeworkScrapyandthedevelopment skills of web crawlers.Thecoursecanbelogicallydividedintobasicpartsandadvancedparts.Thebasicpart focuses onthecoreelements ofScrapy,suchas spider,selector,item,link,etc.;theadvanced part explainstheadvancedtopicsofcrawlers,suchasloginauthentication,filedownload,execution JavaScript,dynamic web crawling,using HTTP proxy,etc.,andexplain with the project case."PythonWeb Crawler"is very suitable for students who have a certain Python languagefoundationand wanttolearn towritecomplexweb crawlers.*教材《精通Scrapy网络爬虫》，作者：刘硕出版社：清华大学出版社出版时间：2017年10月，版次1，(Textbooks)国际标准书号ISBN：9787302484936。参考资料（OtherReferences)*课程类别口公共基础课/全校公共必修课v通识教育课专业基础课（CourseCategory）口专业核心课/专业必修课口专业拓展课/专业选修课口其他口线上，教学平台*授课对象*授课模式全校本科生v线下口混合式口其他(TargetStudents)(Modeof Instruction)口实践类（70%以上学时深入基层）v中文*开课院系*授课语言口全外语计算机教研部(School)口双语：中文+(Languageof Instruction)（外语讲授不低于50%）课程负责人徐卫克，1980年生，男，中国社会科学院大学计算机教研部教师，计姓名及简介算社会科学研究中心成员，主要研究方向为数据分析、人工智能。教*授课教师信息(Teacher Information)团队成员无姓名及简介

comprehensive competitive advantage in the era of big data. . In recent years, data has played a huge role in promoting the Internet to the era of intelligence. The Internet generates a large amount of data every day, and people have realized the great value contained in this data. To make full use of the data, it is necessary to acquire the data first. Web crawler is one of the main means to obtain data from the Internet. The Python crawler framework Scrapy is easy to use, flexible and easy to expand, rich in documentation, and active in the development community. Using Scrapy can efficiently develop web crawler applications. The course "Python Web Crawler" aims to let students understand the working principle of web crawlers and use the Scrapy crawler framework to obtain data from the network. The course introduces in detail the core technology of the popular Python framework Scrapy and the development skills of web crawlers. The course can be logically divided into basic parts and advanced parts. The basic part focuses on the core elements of Scrapy, such as spider, selector, item, link, etc.; the advanced part explains the advanced topics of crawlers, such as login authentication, file download, execution JavaScript, dynamic web crawling, using HTTP proxy, etc., and explain with the project case. "Python Web Crawler" is very suitable for students who have a certain Python language foundation and want to learn to write complex web crawlers. *教材（Textbooks）《精通 Scrapy 网络爬虫》，作者: 刘硕出版社: 清华大学出版社出版时间:2017 年 10 月，版次 1，国际标准书号 ISBN：9787302484936。参考资料（Other References） *课程类别（Course Category） 公共基础课/全校公共必修课 √通识教育课 专业基础课 专业核心课/专业必修课 专业拓展课/专业选修课 其他 *授课对象（Target Students）全校本科生 *授课模式（Mode of Instruction） 线上，教学平台 √线下 混合式 其他 实践类（70%以上学时深入基层） *开课院系（School）计算机教研部 *授课语言（Language of Instruction） √中文 全外语 双语：中文+ （外语讲授不低于 50%） *授课教师信息（Teacher Information）课程负责人姓名及简介徐卫克，1980 年生，男，中国社会科学院大学计算机教研部教师，计算社会科学研究中心成员，主要研究方向为数据分析、人工智能。教授课程有《大学计算机》、《Python 数据分析》、《Python 深度学习》、团队成员姓名及简介无