正在加载图片...
Data Cleaning a Data in the real World Is Dirty Lots of potentially incorrect data, e. g instrument faulty, human or computer error, transmission error incomplete: lacking attribute values lacking certain attributes of interest, or containing only aggregate data o e. g, Occupation=(missing data noisy: containing noise, errors, or outliers n e.g., Salary="-10(an error) inconsistent: containing discrepancies in codes or names, e.g 口Age=42, Birthday=“03/07/2010 n Was rating"1, 2, 3, now rating, B, C a discrepancy between duplicate records Intentional(e.g, disguised missing data) a Jan. 1 as everyone's birthday 6 同济大学软件学院 ool of Software Engineering. Tongpi Unversity6 Data Cleaning ◼ Data in the Real World Is Dirty: Lots of potentially incorrect data, e.g., instrument faulty, human or computer error, transmission error ◆ incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data  e.g., Occupation=“ ” (missing data) ◆ noisy: containing noise, errors, or outliers  e.g., Salary=“−10” (an error) ◆ inconsistent: containing discrepancies in codes or names, e.g.,  Age=“42”, Birthday=“03/07/2010”  Was rating “1, 2, 3”, now rating “A, B, C”  discrepancy between duplicate records ◆ Intentional (e.g., disguised missing data)  Jan. 1 as everyone’s birthday?
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有