Web Search Overview Crawling IR What is a Document? Examples: web pages, email, books, news stories scholarly papers, text messages, WordTM, PowerpointTM, PDF, forum postings, patents, etc. Common properties Significant text content Some structure (e.g., title, author, date for papers; subject, sender, destination for email)Web Search Overview & Crawling 5 What is a Document? ▪ Examples: ▪ web pages, email, books, news stories, scholarly papers, text messages, Word™, Powerpoint™, PDF, forum postings, patents, etc. ▪ Common properties ▪ Significant text content ▪ Some structure (e.g., title, author, date for papers; subject, sender, destination for email) IR