正在加载图片...
from the Game Theory concept developed by Lloyd Shapley in the 1950s.It's aim is to fairly allocate predictor importance in regression analysis.Given n number of independent variables (IV),we will run all combination of linear regression models using this list of IVs against the dependent variable (DV)and get each model's R-Squared.To get the importance measure of each independent variable (IV),the average contribution to the total R-squared of each IV is computed by decomposing the total R-squared and computing for the proportion marginal contribution of each IV. Let's say we have 2 IVs A and B and a dependent variable Y.We can build 3 models as follows:1) Y-A 2)Y-B3)Y-A+B and each model would have their respective R-squared. To get the Shapley Value of A we have to decompose the r-squared of the third model and derive Attribute A's marginal contribution. ShapleyValue(A)=[R-squared(AB)-R-squared(B)]+R-squared(A)/2 We have used the cale.relimp()function from the relaimpo package to determine the Shapley Value of our predictors. sum(ins_model2_shapley$lmg) As we can see,the Shapley Value of our attributes sums up to the R-squared of our second regression model.Like what I have said,Shapley Value Regression is a variance decomposition method by means of computing the marginal contribution of each attribute. barplot(sort(ins_model2_shapley$lmg,decreasing=TRUE),col-c(2:10),main-"Relative Importance of Pr The Shapley Value scores of each attribute shows their marginal contribution to the overall r-squared (0.8664)of the second model.So we can conclude that,on the 86.64%total variance explained by our model a little over60%of it is due to the attribute smoker.Results also cemented our previous hypothesis that variable smoker is the singlemost important variable in predicting medical charges.If you would also notice,smoker is followed by bmi30:smoker,age2,age, and bmi30 where majority of which are variables we have derived and not included in the original dataset.Glad we have engineered those variables up!:) 总结 在本分析中,我们使用Shapley值回归来推导医疗费用的关键因素。它在处理多重共线性问题时非常 有用,另一方面,Shaple©y值回归按比例分解r平方以解决多重共线性问题(尽管在此数据集中多重共 线性不是问题)。我们还了解了特征工程在提高模型准确性方面的重要性。注意,吸烟对您的健康有害!from the Game Theory concept developed by Lloyd Shapley in the 1950s. It’s aim is to fairly allocate predictor importance in regression analysis. Given n number of independent variables (IV), we will run all combination of linear regression models using this list of IVs against the dependent variable (DV) and get each model’s R-Squared. To get the importance measure of each independent variable (IV), the average contribution to the total R-squared of each IV is computed by decomposing the total R-squared and computing for the proportion marginal contribution of each IV. Let’s say we have 2 IVs A and B and a dependent variable Y. We can build 3 models as follows: 1) Y~A 2) Y~B 3) Y~A+B and each model would have their respective R-squared. To get the Shapley Value of A we have to decompose the r-squared of the third model and derive Attribute A’s marginal contribution. 𝑆ℎ𝑎𝑝𝑙𝑒𝑦𝑉 𝑎𝑙𝑢𝑒(𝐴) = [𝑅 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑(𝐴𝐵) − 𝑅 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑(𝐵)] + 𝑅 − 𝑠𝑞𝑢𝑎𝑟𝑒𝑑(𝐴)/2 We have used the calc.relimp() function from the relaimpo package to determine the Shapley Value of our predictors. sum(ins_model2_shapley$lmg) As we can see, the Shapley Value of our attributes sums up to the R-squared of our second regression model. Like what I have said, Shapley Value Regression is a variance decomposition method by means of computing the marginal contribution of each attribute. barplot(sort(ins_model2_shapley$lmg,decreasing = TRUE),col=c(2:10),main="Relative Importance of Predictors",xlab="Predictor Labels",ylab="Shapley Value Regression",font.lab=2) The Shapley Value scores of each attribute shows their marginal contribution to the overall r-squared (0.8664) of the second model. So we can conclude that, on the 86.64% total variance explained by our model a little over 60% of it is due to the attribute smoker. Results also cemented our previous hypothesis that variable smoker is the singlemost important variable in predicting medical charges. If you would also notice, smoker is followed by bmi30:smoker, age2, age, and bmi30 where majority of which are variables we have derived and not included in the original dataset. Glad we have engineered those variables up! :) 总结 在本分析中,我们使用 Shapley 值回归来推导医疗费用的关键因素。它在处理多重共线性问题时非常 有用,另一方面,Shapley 值回归按比例分解 r 平方以解决多重共线性问题(尽管在此数据集中多重共 线性不是问题)。我们还了解了特征工程在提高模型准确性方面的重要性。注意,吸烟对您的健康有害! 6
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有