正在加载图片...
Ch. 2 Probability Theory 1 Descriptive Study of Data 1.1 Histograms and Their Numerical Characteristics By descriptive study of data we refer to the summarization and exposition(tab- ulation, grouping, graphical representation) of observed data as well as the derivation of numerical characteristics such as measures of location, dispersion and shape Although the descriptive study of data is an important facet of modeling with real data itself, in the present study it is mainly used to motivate the need for probability theory and statistical inference proper In order to make the discussion more specific let us consider the after-tax ersonal income data of 23000 household for 1999-2000 in the uS. There data in row form constitute 23000 numbers between $5000 and $100000. This presents us with a formidable task in attempting to understand how income is distributed among the 23000 households represented in the data. The purpose of descriptive catistics is to help us make some sense of such data. A natural way to proceed to summarize the data by allocating the numbers into classes(intervals). The number of intervals is chosen a priori and it depends on the degree of summa- rization needed. Then we have the Table of the personal income in the US The first column of the table shows the income intervals. the second column the second column shows the number of income falling into each interval and the third column the relative frequency for each interval. The relative frequency is calculated by dividing the number of observations in each interval by the total number of observations. The fourth column is the cumulative frequency. Sum- marizing the data in this Table enables us to get some idea of how income is distributed among various class. If we plot the relative(cumulative)frequencies in a bar graph we get what is known as the histogram(cumulative) For further information on the distribution of income we could calculate vari- ous numerical characteristics describing the histogram's location, dispersion and shape. Such measure can be calculate directly in terms of the raw data. How- in the expositional purpose grouped data. The main reason for this is to introduce various concept which ill b ted in the context of probabilityCh. 2 Probability Theory 1 Descriptive Study of Data 1.1 Histograms and Their Numerical Characteristics By descriptive study of data we refer to the summarization and exposition (tab￾ulation, grouping, graphical representation) of observed data as well as the derivation of numerical characteristics such as measures of location, dispersion and shape. Although the descriptive study of data is an important facet of modeling with real data itself, in the present study it is mainly used to motivate the need for probability theory and statistical inference proper. In order to make the discussion more specific let us consider the after-tax personal income data of 23000 household for 1999-2000 in the US. There data in row form constitute 23000 numbers between $5000 and $100000. This presents us with a formidable task in attempting to understand how income is distributed among the 23000 households represented in the data. The purpose of descriptive statistics is to help us make some sense of such data. A natural way to proceed is to summarize the data by allocating the numbers into classes (intervals). The number of intervals is chosen a priori and it depends on the degree of summa￾rization needed. Then we have the ” Table of the personal income in the US”. The first column of the table shows the income intervals, the second column the second column shows the number of income falling into each interval and the third column the relative frequency for each interval. The relative frequency is calculated by dividing the number of observations in each interval by the total number of observations. The fourth column is the cumulative frequency. Sum￾marizing the data in this Table enables us to get some idea of how income is distributed among various class. If we plot the relative (cumulative) frequencies in a bar graph we get what is known as the histogram (cumulative). For further information on the distribution of income we could calculate vari￾ous numerical characteristics describing the histogram’s location, dispersion and shape. Such measure can be calculate directly in terms of the raw data. How￾ever, in the present case it is more convenient for expositional purpose to use the grouped data. The main reason for this is to introduce various concept which will be reinterpreted in the context of probability. 1
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有