Quiz What are they? 1.数据(data) Report 1.Bit Card 2.Byte 2.数据类型(data types) 3.信息(information) 2
2 Quiz What are they? 1. 数据(data) 1. Bit 2. Byte 2. 数据类型(data types) 3. 信息(information)
Data The term data refers to groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. Data (plural of "datum",which is seldom used)are typically the results of measurements and can be the basis of graphs,images,or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which information and knowledge are derived. Raw data refers to a collection of numbers,characters,images or other outputs from devices that collect information to convert physical quantities into symbols,that are unprocessed. 3
3 Data ◼ The term data refers to groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. ◼ Data (plural of "datum", which is seldom used) are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. ◼ Data are often viewed as the lowest level of abstraction from which information and knowledge are derived. ◼ Raw data refers to a collection of numbers, characters, images or other outputs from devices that collect information to convert physical quantities into symbols, that are unprocessed
Bit Multiples of bits v.d.e 位(英语:Bt), 亦称二 SI decimal prefixes IEC binary prefixes 进制位,指二进制中的一位, Name Standard Binary Name Value 是信息的最小单位。Bt是 (Symbol) SI usage (Symbol) Binary digit(二进制数位) kilobit (kbit) 103 210 kibibit(Kibit) 210 的缩写 megabit(Mbit) 106 220 mebibit (Mibit) 20 假设一事件以A或B的方式发 gigabit(Gbit) 109 230 gibibit (Gibit) 30 生,且A、B发生的概率相等 terabit(Tbit) 1012 290 tebibit(Tibit) 240 都为0.5,则一个二进位可用 petabit(Pbit) 1015 250 pebibit(Pibit) 250 来代表A或B之一。例如: exabit (Ebit) 1018 260 exbibit (Eibit) 60 二进位可以用来表示一个简单 zettabit(Zbit) 1021 270 zebibit(Zibit) 20 的正负 yottabit (Ybit) 1024 280 yobibit (Yibit) 20 ◆ 有两种状态的开关(如电灯开关) See also:Nibble·Byte·Multiples of bytes 晶体管的通断 Orders of magnitude of data 。某根导线上电压的有无 一个抽像的逻辑上的是否 4
4 Bit ◼ 位(英语:Bit),亦称二 进制位,指二进制中的一位, 是信息的最小单位。Bit是 Binary digit(二 进制数位) 的缩写 ◼ 假设一事件以A或B的方式发 生,且A、B发生的概率相等, 都为0.5,则一个二进位可用 来代表A或B之一。 例如: ◼ 二进位可以用来表示一个简单 的正负 ◼ 有两种状态的开关(如电灯开关) ◼ 晶体管的通断 ◼ 某根导线上电压的有无 ◼ 一个抽像的逻辑上的是否
Byte 55U#1 SEPTE法BE钢197形 8uTE 字节,英文名称是Byte。 Byte是Binary Terml的 150 the small systems journal 缩写。一个字节代表八 个比特。它是通常被作 Which Microprocessor for you? 为计算机信息计量单位, Cassette Interface-Your key to inexpensive bulk memory 不论被存储数据的类型 Assembling Your Assembler 为何。 Can YOU use these SURPLUS KEYBOARDS? (You bet you can!) COMPUTERS. the World's Greatest Toy! 5
5 Byte ◼ 字节,英文名称是Byte。 Byte是Binary Term的 缩写。一个字节代表八 个比特。它是通常被作 为计算机信息计量单位, 不论被存储数据的类型 为何
History of "Information" Latin origin:a representation implanted in the mind->idea Language and Coding:hide information in messages and then decode them。莫尔斯电码 Mathematics:Shannon:在channel transmission.工作中,定 义了一个message)所包含的信息量为它在source中出现概率 的log2,单位为’bits'。 Logic and linguistics:communication-oriented sense of information涉及到semantic meaning语义,knowledge知识 Society:information as something that is contained in the message used to inform."information is the tennis ball of communication" 6
6 History of “Information” ◼ Latin origin: a representation implanted in the mind-> idea ◼ Language and Coding:hide information in messages and then decode them。 莫尔斯电码 ◼ Mathematics: Shannon在channel transmission工作中,定 义了一个message所包含的信息量为它在source中出现概率 的log2 ,单位为’bits’。 ◼ Logic and linguistics:communication-oriented sense of information涉及到semantic meaning语义, knowledge知识 ◼ Society:information as something that is contained in the message used to inform. “information is the tennis ball of communication
Information Age World Wide Web THE WORLD WIDE WER 7
7 Information Age & World Wide Web
大纲 ■大规模数据处理是? n 云计算(Cloud Computing)是? ■我们这门课的目标和内容是? 8
8 大纲 ◼ 大规模数据处理是? ◼ 云计算(Cloud Computing)是? ◼ 我们这门课的目标和内容是?
NC&IS 大规模数据
大规模数据
Human Genomics http://www.int 7000PB) nte Wikipedia Particle Physics World Wide Web G日 (10GB) Large Hadron on iCollider 200BCa1 1PB) tured 100%CAGR 15PB) 200°6CA GR www.intel.co W11W1 VE Personal Digital Annual Email Internet Archive Estimated On-line Photos Traffic,no spam RAM in Google (300PB+)】 (1PB+) (8PB) 1000PB+) 00%CAGR 200 of London's 2004 Walmart Typical Oil Merck Bio Traffic Cams Transaction DB Company Research DB (8TB/day) (500TB) (350TB+) (1.5TB/qtr) UPMC Hospitals MIT Babytalk Terashake One Day of Imaging Data Speech Earthquake Model Instant Messaging (500TB/yr) Experiment of LA Basin in2002 1.4PB) (1PB) (750GB) Total digital data to be created this year 270,000PB (IDC)
10
Happening everywhere! Molecular biology microarray chips (cancer) fiber optics Network traffic (spam) CsbR4pmTm专0M约:Pxg2许un 11 14 4 34 t2 饼88器9路路第 300M/day Simulations microprocessors (Millennium) particle colliders Particle events (LHC) 1B 11 1M/sec
11 Happening everywhere! Molecular biology microarray chips (cancer) particle colliders Particle events (LHC) microprocessors Simulations (Millennium) Network traffic (spam) fiber optics 300M/day 1B 1M/sec