Search Engines Mining Massive Datasets Wu-Jun li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 6: Search Engines
Search Engines 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 6: Search Engines Mining Massive Datasets
Search Engines Outline Architecture of Search Engines Index construction Boolean retrieval Vector Space Model for ranked retrieval
Search Engines 2 Outline ▪ Architecture of Search Engines ▪ Index Construction ▪ Boolean Retrieval ▪ Vector Space Model for Ranked Retrieval
Search Engines U nigritude ultramarine -Google Search - Mozilla Firefox 回区 File Edit View Go Bookmarks Yahoo! Tools Help 中,中·园③价/9如m洲=如-h地e6o+a Go e Getting Started 6, Latest Headines YI SearchWeb·,国M:wslm,器m,自随数,因e·Pesm出,h pragh60@gmail.comIMyAccountISignoutA Web Images Groups News Froogle Local more m oogle nigritude ultramarine Search Avenged search Preferences Web Results 1-10 of about 185,000 for nigritude ultramarine. (0.35 seconds) Anil Dash: Nigritude Ultramarine ponsored Links ness Blogging Seminar www.dashes.com/anil/2004/06/04/nigritudeultra-101k-mar1,2006 to LA. March 16 Cached· Similar pages Search ads厂 Td bloggers reveal key techniques www.blogbusinesssummit.com Nigritude Ultramarine FAQ os Angeles, CA Nigritude Ultramarine FAQ- frequently asked questions about nigritude ultramarine and the realted SEo contest Fu‖- Time SEo& SEM Jobs www.nigritudeultramarines.com/-59k-cached-similarpages Find companies big& small hiring full-time SEo SEM pros right now SEO contest- Wikipedia, the free encyclopedia Careerbuilder.com The nigritude ultramarine competition by Search Guild is widely acclaimed as Comparison of search results for nigritude ultramarine during and after the SEO Contests en.wikipedia. org/wiki/Nigritude ultramarine- Cached -Similar pages Information on SEO Contests like the Nigritude Ultramarine contest Slashdot I How To Get Googled, By Hook Or By Crook www.seo-contests.com/ The current 3rd result showcases the Nigritude Ultramarine Fighting Force"who.. When discussing nigritude ultramarine [slashdot org] it is important to slashdot. org/article. pl?sid 9/1840217-110k--Similar pages Nigritude Ultramarine& SEO secrets Fun. free. raw . different The Nigritude Ultramarine Search Engine Optimization Contest It's sweeping the web-or at least search engine optimizers -a new contest to rank tops for the term nigritude ultramarine on Google Algorithmic results searchenginewatch. com/sereport/article. php/3360231-57k- -Similar pages Overstock.com Done
Search Engines 3 Algorithmic results. Paid Search Ads
Search Engines Architecture Architecture of Search Engines(SE) How do search engines like google work?
Search Engines 4 Architecture of Search Engines (SE) ▪ How do search engines like Google work? Architecture
Search Engines Architecture Architecture User Web spidel 地如; earc Indexer The Web Indexes Ad indexes
Search Engines 5 Architecture The Web Ad indexes Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds) Miele, Inc -- Anything else is a compromise At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages Herzlich willkommen bei Miele Österreich - [ Translate this page ] Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ... www.miele.at/ - 3k - Cached - Similar pages Sponsored Links CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation www.cgappliance.com San Francisco-Oakland-San Jose, CA Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! www.vacuums.com Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice. www.best-vacuum.com Web spider Indexer Indexes Search User Architecture
Search Engines Architecture Indexing process Document data store Text Acquisition Index Creation E-mail, Web pages, News articles, Memos, Letters Index Text Transformation
Search Engines 6 Indexing Process Architecture
Search Engines Architecture Indexing process Text acquisition identifies and stores documents for indexing Text transformation transforms documents into index terms ndex creatⅰon takes index terms and creates data structures( indexes)to support fast searching
Search Engines 7 Indexing Process ▪ Text acquisition ▪ identifies and stores documents for indexing ▪ Text transformation ▪ transforms documents into index terms ▪ Index creation ▪ takes index terms and creates data structures (indexes) to support fast searching Architecture
Search Engines Architecture Query Process Document data store User interaction Ranking Index Evaluation Log data
Search Engines 8 Query Process Architecture
Search Engines Architecture Query Process User interaction supports creation and refinement of query display of results Ranking uses query and indexes to generate ranked list of documents Evaluation monitors and measures effectiveness and efficiency (primarily offline
Search Engines 9 Query Process ▪ User interaction ▪ supports creation and refinement of query, display of results ▪ Ranking ▪ uses query and indexes to generate ranked list of documents ▪ Evaluation ▪ monitors and measures effectiveness and efficiency (primarily offline) Architecture
Search Engines Architecture Details: Text acquisition Crawler Identifies and acquires documents for search engine Many types -web, enterprise, desktop Web crawlers follow links to find documents Must efficiently find huge numbers of web pages( coverage) and keep them up-to-date (freshness) Single site crawlers for site search Topical or focused crawlers for vertical search Document crawlers for enterprise and desktop search Follow links and scan directories
Search Engines 10 Details: Text Acquisition ▪ Crawler ▪ Identifies and acquires documents for search engine ▪ Many types – web, enterprise, desktop ▪ Web crawlers follow links to find documents ▪ Must efficiently find huge numbers of web pages (coverage) and keep them up-to-date (freshness) ▪ Single site crawlers for site search ▪ Topical or focused crawlers for vertical search ▪ Document crawlers for enterprise and desktop search ▪ Follow links and scan directories Architecture