
社会网络研究导论(6)网络经典统计分析*Classical statistical analysis forsocial network杜海峰何晓晨(本部分PPT部分参考刘军2007年1月培训资料)
1 社会网络研究导论( 6 ) 网络经典统计分析* Classical statistical analysis for social network 杜海峰 何晓晨 (本部分PPT部分参考刘军2007年1月培训资料)

内容提要·引言:网络中的统计分析Statistical analysisQAP方法原理QAP method·网络间相关关系分析Analysis of correlation·网络间因果关系分析Causality analysis·P*模型简介P* model2
2 内容提要 • 引言:网络中的统计分析 Statistical analysis • QAP方法原理 QAP method • 网络间相关关系分析 Analysis of correlation • 网络间因果关系分析 Causality analysis • P*模型简介 P* model

引言·常规的统计分析要求自变量之间相互独立,否则会出现“共线性”,它会引起一些问题。一例如,在完全共线性的条件下,将得不到参数的估计量;在近似共线性情况下普通最小二乘法(OLS)估计量非有效。多重共线性使参数估计值的方差增大,变量的显著性检验失去意义。. Conventional statistical analysis requires independentvariables, otherwise there will be"collinearity", which willcausesomeproblems.For example, under the condition of complete collinearity,theestimator of parameters will not be obtained; under the condition ofapproximate collinearity, the estimator of ordinary least squaresmethod (OLS) is not effective. Multiple collinearity increases thevariance of parameter estimation, and the significance test of3variables is meaningless
3 引言 • 常规的统计分析要求自变量之间相互独立,否则 会出现“共线性”,它会引起一些问题。 – 例如,在完全共线性的条件下,将得不到参数的估计 量;在近似共线性情况下普通最小二乘法(OLS ) 估 计量非有效。多重共线性使参数估计值的方差增大, 变量的显著性检验失去意义。 • Conventional statistical analysis requires independent variables, otherwise there will be "collinearity", which will cause some problems. • For example, under the condition of complete collinearity, the estimator of parameters will not be obtained; under the condition of approximate collinearity, the estimator of ordinary least squares method (OLS) is not effective. Multiple collinearity increases the variance of parameter estimation, and the significance test of variables is meaningless

引言·网络研究“关系”,难以满足统计独立性假设一例如:在研究网络成员的“朋友关系”和“社会支持关系”之间是否有关系的时候,假设通过常规统计分析发现二者之间确实有关。实际上,此时的相关是虚假的,一方面是因为网络成员之间在地理位置上的相近带来的,一方面这种研究本身就具有重大缺陷:这两种关系本身就有相关性,. Network research "relationship" is difficult tomeet the statistical independence hypothesis4
4 引言 • 网络研究 “关系”,难以满足统计独立性 假设 – 例如:在研究网络成员的“朋友关系”和“社 会支持关系”之间是否有关系的时候,假设通 过常规统计分析发现二者之间确实有关。实际 上,此时的相关是虚假的,一方面是因为网络 成员之间在地理位置上的相近带来的,一方面, 这种研究本身就具有重大缺陷:这两种关系本 身就有相关性。 • Network research "relationship" is difficult to meet the statistical independence hypothesis

QAP方法原理· QAP (Quadratic Assignment Procedure,二次指派程序)一是一种对两个网络(方阵中各个值)的相似性进行比较的方法,即它对网络对应方阵的各个格值进行比较,给出两个矩阵之间的相关系数,同时对系数进行非参数检验,它以对矩阵数据的置换为基础。 It is a method to compare the similarity of two networks (each valuein the square matrix), that is, it compares each lattice value of thecorresponding square matrix of the network, gives the correlationcoefficientbetweenthetwomatrices,andconductsnon-parametertest on the coefficient.It is based onthe replacement of matrix data5
5 QAP方法原理 • QAP(Quadratic Assignment Procedure, 二次指派程序) – 是一种对两个网络(方阵中各个值)的相似性进 行比较的方法,即它对网络对应方阵的各个格值 进行比较,给出两个矩阵之间的相关系数,同时 对系数进行非参数检验,它以对矩阵数据的置换 为基础。 – It is a method to compare the similarity of two networks (each value in the square matrix), that is, it compares each lattice value of the corresponding square matrix of the network, gives the correlation coefficient between the two matrices, and conducts non-parameter test on the coefficient. It is based on the replacement of matrix data

QAP方法原理·基本步骤steps:一第一步,计算已知两个矩阵之间的相关系数。·把每个矩阵中的所有取值看成是一个长向量,每个向量包含n(n-1)个数字(对角线数字忽略不计);然后像比较任何两个变量之间的相关性那样计算这两个向量之间的相关系数- The first step is to calculate the correlation coefficientbetweenknowntwomatricesThink of all the values in each matrix as a long vector, eachcontaining n (n-1) digits (diagonal digits are neglected); thencalculate the correlation between the two vectors as ifcomparing the correlation between any two variables6
6 QAP方法原理 • 基本步骤steps : – 第一步,计算已知两个矩阵之间的相关系数。 • 把每个矩阵中的所有取值看成是一个长向量,每个向量包含 n(n-1)个数字(对角线数字忽略不计);然后像比较任何两个 变量之间的相关性那样计算这两个向量之间的相关系数 – The first step is to calculate the correlation coefficient between known two matrices. Think of all the values in each matrix as a long vector, each containing n (n-1) digits (diagonal digits are neglected); then calculate the correlation between the two vectors as if comparing the correlation between any two variables

第二步,对其中一个矩阵的行和相应的列同时进行随机置换,然后计算置换后的矩阵与另一个矩阵之间的相关系数。·重复这种计算过程几千次,将得到一个相关系数的分布,从中可以看到这种随机置换后计算出来的几千个相关系数大于或等于在第一步中计算出来的观察到的相关系数的比例 In the second step, the rows and corresponding columnsof one matrix are randomly permuted at the same time,and then the correlation coefficient between the permutedmatrix and the other matrix is calculated.- Repeat this process thousands of times and you get a distributionof correlation coefficients, from which you can see that thethousands of correlation coefficients calculated by this randomsubstitution are greater than or equal to theratio of the observedcorrelation coefficients calculated in the first step
第二步,对其中一个矩阵的行和相应的列同时进行随机置 换,然后计算置换后的矩阵与另一个矩阵之间的相关系数。 • 重复这种计算过程几千次,将得到一个相关系数的分布,从中可 以看到这种随机置换后计算出来的几千个相关系数大于或等于在 第一步中计算出来的观察到的相关系数的比例 • In the second step, the rows and corresponding columns of one matrix are randomly permuted at the same time, and then the correlation coefficient between the permuted matrix and the other matrix is calculated. – Repeat this process thousands of times and you get a distribution of correlation coefficients, from which you can see that the thousands of correlation coefficients calculated by this random substitution are greater than or equal to the ratio of the observed correlation coefficients calculated in the first step

QAP方法原理·基本步骤:一最后一步,比较在第一步中计算出来的观察到的相关系数与根据随机重排计算出来的相关系数的分布,看观察到的相关系数是落入拒绝域还是接受域,进而做出判断。也就说,如果上述比例低于0.05(假设研究者确定的显著性水平为0.05),就在统计意义上表明所研究的两个矩阵之间存在相关关系。 Inthelast step,the observed correlation coefficientcalculated inthefirst step is compared with the distribution of the correlationcoefficient calculated according to the random rearrangement, andthe observed correlation coefficient falls into the rejection domain ortheacceptancedomain,soastomakea judgment.Inotherwords,ifthe above ratiois less than0.05 (assuming that the significancelevel determined by the researcher is 0.05),there is a statisticalcorrelationbetweenthetwomatrices studied.8
8 QAP方法原理 • 基本步骤: – 最后一步,比较在第一步中计算出来的观察到的相关系数 与根据随机重排计算出来的相关系数的分布,看观察到的 相关系数是落入拒绝域还是接受域,进而做出判断。也就 说,如果上述比例低于0.05(假设研究者确定的显著性水 平为0.05),就在统计意义上表明所研究的两个矩阵之间 存在相关关系。 – In the last step, the observed correlation coefficient calculated in the first step is compared with the distribution of the correlation coefficient calculated according to the random rearrangement, and the observed correlation coefficient falls into the rejection domain or the acceptance domain, so as to make a judgment. In other words, if the above ratio is less than 0.05 (assuming that the significance level determined by the researcher is 0.05), there is a statistical correlation between the two matrices studied

QAP方法原理001110000111100001011000011110001111100000111000011110000101100001111000001110009
QAP方法原理 9 1000 0 0 1 1 0 1000 1 1 1 0 1 1000 0 1 1 1 0 1000 1 1 1 1 1 1000 1000 0 0 1 1 0 1000 1 1 1 0 1 1000 1 0 1 1 1 1000 0 1 1 0 0 1000

QAP方法原理000011011101101110110010011100010111100000011100010111100011111100010100011000101111000100110001101111100000110100011110.356348322549899110
QAP方法原理 10 1000 0 0 1 1 0 1000 1 1 1 0 1 1000 0 1 1 1 0 1000 1 1 1 1 1 1000 1000 0 0 1 1 0 1000 1 1 1 0 1 1000 1 0 1 1 1 1000 0 1 1 0 0 1000 00110111010111011111 00110111011011101100 0.356348322549899