正在加载图片...
Data Cleaning Data in the real world is dirty: Lots of potentially incorrect data, e.g instrument faulty human or computer error transmission error incomplete: lacking attribute values lacking certain attributes of interest, or containing only aggregate data e.g., Occupation="(missing data) noisy: containing noise errors or outliers e.g., Salary--10"(an error) inconsistent: containing discrepancies in codes or names, e. g nAge=42",Bita="03/07/2010″ Was rating 1,2,3 now rating", B, c discrepancy between duplicate records Intentional(e. g disguised missing data) Jan. 1 as everyone' s birthday?6 Data Cleaning ◼ Data in the Real World Is Dirty: Lots of potentially incorrect data, e.g., instrument faulty, human or computer error, transmission error ◼ incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data ◼ e.g., Occupation=“ ” (missing data) ◼ noisy: containing noise, errors, or outliers ◼ e.g., Salary=“−10” (an error) ◼ inconsistent: containing discrepancies in codes or names, e.g., ◼ Age=“42”, Birthday=“03/07/2010” ◼ Was rating “1, 2, 3”, now rating “A, B, C” ◼ discrepancy between duplicate records ◼ Intentional (e.g., disguised missing data) ◼ Jan. 1 as everyone’s birthday?
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有