Motivation – Why do we need big data integration? – How has “small” data integration been done? – Challenges in big data integration Schema alignment Record linkage Data fusion Emerging topics
Parallel DBMS technologies Proposed in the late eighties Matured over the last two decades Multi-billion dollar industry: Proprietary DBMS Engines intended as Data Warehousing solutions for very large enterprises Hadoop Spark UC Berkeley