当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

香港科技大学:Record Linkage for Big Data

资源类别:文库,文档格式:PPTX,文档页数:84,文件大小:1.86MB,团购合买
点击下载完整版文档(PPTX)

Record linkage for big data Slides from Luna Dongs VLDB Tutoria

Record Linkage for Big Data Slides from Luna Dong’s VLDB Tutorial 1

Record Linkage Matching based on identifying content: color, pattern

Record Linkage  Matching based on identifying content: color, pattern 2

Record linkage Matching based on identifying content: color, pattern

Record Linkage  Matching based on identifying content: color, pattern 3

Record Linkage: Three Steps [ElVO7, GMi2] Record linkage blocking+ pairwise matching+ clustering Scalability, similarity semantics Blocking Pairwise Matching Clustering

Record Linkage: Three Steps [EIV07, GM12]  Record linkage: blocking + pairwise matching + clustering – Scalability, similarity, semantics 4 Blocking Pairwise Matching Clustering

Record linkage: Three Steps Blocking: efficiently create small blocks of similar records Ensures scalability Blocking 学事 Pairwise Matching Clustering

Record Linkage: Three Steps  Blocking: efficiently create small blocks of similar records – Ensures scalability 5 Blocking Pairwise Matching Clustering

Record linkage: Three Steps Pairwise matching: compares all record pairs in a block Computes similarity Blocking Pairwise Matching Clustering

Record Linkage: Three Steps  Pairwise matching: compares all record pairs in a block – Computes similarity 6 Blocking Pairwise Matching Clustering

Record linkage: Three steps Clustering: groups sets of records into entities Ensures semantics Blocking 事 Pairwise Matching Clustering

Record Linkage: Three Steps  Clustering: groups sets of records into entities – Ensures semantics 7 Blocking Pairwise Matching Clustering

BDI: Record Linkage 4 Volume: dealing with billions of records Map-reduce based record linkage [vcl10, KTr12 Adaptive record blocking [DNS+12, MKB12, VN12 Blocking in heterogeneous data spaces [Plp+12, PKP+13] ◆ Velocity Incremental record linkage [wgm10, WGM13

BDI: Record Linkage  Volume: dealing with billions of records – Map-reduce based record linkage [VCL10, KTR12] – Adaptive record blocking [DNS+12, MKB12, VN12] – Blocking in heterogeneous data spaces [PIP+12, PKP+13]  Velocity – Incremental record linkage [WGM10, WGM13] 8

BDI: Record Linkage ◆ variety Matching structured and unstructured data [KGA+11, KTT+12 Matching Web tables and catalogs [lsc10 ◆ Veracity Linking temporal records [ldm+11

BDI: Record Linkage  Variety – Matching structured and unstructured data [KGA+11, KTT+12] – Matching Web tables and catalogs [LSC10]  Veracity – Linking temporal records [LDM+11] 9

Matching with Unstructured Data Matching product offers: 1000s of stores, millions of products Product offers are terse, unstructured text Many similar but different product offers Panasonic Lumix DMC-SZ3 16 1 MP Digital camera -Black Other style options: Violet($124)White($125) Panasonic Lumix-Point Shoot-161 megapixel- Compact Sensor -CCD optical zoom -SD Card-Built-in Flash-39 ounce-ISo 6, 400 a Add to Shortlist Panasonic Lumix DMC-ZS25 16.1 MP Digital camera-SilverC Other style options: Black ($225 Panasonic Lumix- Point Shoot- 16.1 megapixel- Compact Sensor R Add toshertli Panasonic Lumix DMC-ZS8 14.1 MP Digital camera-Blackv Other style options: Silver($200) Panasonic Lumix-Point& Shoot-141 megapixel -Compact Sensor -16x optical zoom-SD Card-Built-in Flash- 6.6 ounce-Iso 6,400 2 ★★★到 a Add to shortlist

Matching with Unstructured Data  Matching product offers: 1000s of stores, millions of products – Product offers are terse, unstructured text – Many similar but different product offers 10

点击下载完整版文档(PPTX)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共84页,可试读20页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有