正在加载图片...
Data Cleaning as a Process Data discrepancy detection Use metadata(e.g. domain, range, dependency distribution) Check field overloading Check uniqueness rule consecutive rule and null rule Use commercial tools Data scrubbing: use simple domain knowledge(e.g. postal code, spell-check) to detect errors and make corrections Data auditing: by analyzing data to discover rules and relationship to detect violators(e.g. correlation and clusterin to find outliers) Data migration and integration Data migration tools: allow transformations to be specified ETL(EXtraction/Transformation/Loading tools: allow users to specify transformations through a graphical user interface Integration of the two processes Iterative and interactive(e.g, Potters Wheels 1111 Data Cleaning as a Process ◼ Data discrepancy detection ◼ Use metadata (e.g., domain, range, dependency, distribution) ◼ Check field overloading ◼ Check uniqueness rule, consecutive rule and null rule ◼ Use commercial tools ◼ Data scrubbing: use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make corrections ◼ Data auditing: by analyzing data to discover rules and relationship to detect violators (e.g., correlation and clustering to find outliers) ◼ Data migration and integration ◼ Data migration tools: allow transformations to be specified ◼ ETL (Extraction/Transformation/Loading) tools: allow users to specify transformations through a graphical user interface ◼ Integration of the two processes ◼ Iterative and interactive (e.g., Potter’s Wheels)
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有