A More Complete Correct Algorithm PRO_中国高校课件下载中心

点击下载：北京大学：《信息检索》课程教学资源（PPT课件讲稿）Crawling the Web

正在加载图片...

A More Complete Correct Algorithm PROCEDURE SPIDER(G,{SEEDS } Initialize COLLECTION <big fil STACK Initialize VISITED <big hash-t 用disk-based heap结构实现 For every ROOT in SEEDS Initialize STACK <stack data structure> Let STACK push (ROOT,STACK) While STACK is not empty, Do URLcurr :pop (STACK) Until URLcurr is not in VISITED insert-hash (URLcurr,VISITED) PAGE look-up (URLcurr) STORE (<URLcur:PAGE>,COLLECTION) For every URL:in PAGE, Push(URL生，STACK) Return COLLECTIONA More Complete Correct Algorithm PROCEDURE SPIDER4(G, {SEEDS}) Initialize COLLECTION <big file of URL-page pairs> Initialize VISITED <big hash-table> For every ROOT in SEEDS Initialize STACK <stack data structure> Let STACK := push(ROOT, STACK) While STACK is not empty, Do URLcurr := pop(STACK) Until URLcurr is not in COLLECTION insert-hash(URLcurr, VISITED) PAGE := look-up(URLcurr) STORE(<URLcurr, PAGE>, COLLECTION) For every URLi in PAGE, push(URLi, STACK) Return COLLECTION Until URLcurr is not in VISITED STACK 用disk-based heap结构实现

<<向上翻页向下翻页>>

点击下载：北京大学：《信息检索》课程教学资源（PPT课件讲稿）Crawling the Web