Data Cleaning as a Process data discrepancy detection Use metadata(e.g, domain, range, dependency, distribution ◆ Check field overloading e Check uniqueness rule, consecutive rule and null rule ◆ Use commercial tools n Data scrubbing use simple domain knowledge(e. g, postal code, spell-check to detect errors and make corrections n Data auditing by analyzing data to discover rules and relationship to detect violators(e.g, correlation and clustering to find outliers Data migration and integration e Data migration tools allow transformations to be specified ETL(EXtraction/Transformation/Loading)tools: allow users to specify transformations through a graphical user interface Integration of the two processes Iterative and interactive(e.g, Potter's Wheels) 11 同济大学软件学院 ool of Software Engineering. Tongpi Unversity11 Data Cleaning as a Process ◼ Data discrepancy detection ◆ Use metadata (e.g., domain, range, dependency, distribution) ◆ Check field overloading ◆ Check uniqueness rule, consecutive rule and null rule ◆ Use commercial tools Data scrubbing: use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make corrections Data auditing: by analyzing data to discover rules and relationship to detect violators (e.g., correlation and clustering to find outliers) ◼ Data migration and integration ◆ Data migration tools: allow transformations to be specified ◆ ETL (Extraction/Transformation/Loading) tools: allow users to specify transformations through a graphical user interface ◼ Integration of the two processes ◆ Iterative and interactive (e.g., Potter’s Wheels)