【回归分析】[6]--残差分析 - WMN7Q
1. 关于模型形式的假定:模型关于参数是线性的– 通过观察Y– X的散点图;
2. 关于误差的假定:a.方差服从正太分布 b.均值为0 c.方差相同 d.误差相互独立– pp图;
3. 关于预测变量的假定 : a.预测变量是非随机的 b.数据是无误差的 c.预测变量线性无关;
4. 关于观测的假定 : 影响力相同;
datafrash = {{{10., 8.04}, {8., 6.95}, {13., 7.58}, {9., 8.81}, {11., 8.33}, {14., 9.96}, {6., 7.24}, {4., 4.26}, {12., 10.84}, {7., 4.82}, {5., 5.68}}, {{10., 9.14}, {8., 8.14}, {13., 8.74}, {9., 8.77}, {11., 9.26}, {14., 8.1}, {6., 6.13}, {4., 3.1}, {12., 9.13}, {7., 7.26}, {5., 4.74}}, {{10., 7.46}, {8., 6.77}, {13., 12.74}, {9., 7.11}, {11., 7.81}, {14., 8.84}, {6., 6.08}, {4., 5.39}, {12., 8.15}, {7., 6.42}, {5., 5.73}}, {{8., 6.58}, {8., 5.76}, {8., 7.71}, {8., 8.84}, {8., 8.47}, {8., 7.04}, {8., 5.25}, {19., 12.5}, {8., 5.56}, {8., 7.91}, {8., 6.89}}};
data = SortBy[datafrash[[#]], First] & /@ {1, 2, 3, 4}; lm = LinearModelFit[#, x, x] & /@ data
Show[ListPlot[data[[#]], ImageSize -> Medium], Plot[lm[[#]][x], {x, 0, 20}, ImageSize -> Medium]] & /@ {1, 2, 3,4}
看一下三种残差图–残差,标准化残差,删除单个误差的残差
(*残差,标准化残差,删除单个误差的残差*) cc = lm[[#]]["FitResiduals"] & /@ {1, 2, 3, 4}; Row[ListPlot[#, ImageSize -> Medium, Filling -> Axis] & /@lm[[#]][{"FitResiduals", "StandardizedResiduals","StudentizedResiduals"}]] & /@ {1, 2, 3, 4}
Row[{ProbabilityPlot[cc[[#]], PlotLabel -> "pp图", PlotRange -> All, ImageSize -> Medium],QuantilePlot[cc[[#]], PlotLabel -> "qq图", PlotRange -> All,ImageSize -> Medium]}] & /@ {1, 2, 3, 4}
ListPlot[lm[[#]]["CookDistances"], Filling -> Axis,ImageSize -> Medium, PlotRange -> Full] & /@ {1, 2, 3, 4}
那些较大值对应的点都是有问题的点,可以看到,用库克距离成功找到了第三张图的第10个点和第四张图的第十一个点
hat = lm[[#]]["HatDiagonal"] & /@ {1, 2, 3, 4}; cc = lm[[#]]["FitResiduals"] & /@ {1, 2, 3, 4}; degreeoffree = lm[[#]]["ANOVATableDegreesOfFreedom"][[-3]] & /@ {1, 2, 3, 4}; SSE = lm[[#]]["ANOVATableSumsOfSquares"][[-2]] & /@ {1, 2, 3, 4}; hadi = hat[[#]]/(1 - hat[[#]]) + ((degreeoffree[[#]] + 1)/(1 - hat[[#]]))*((cc[[#]]^2)/(SSE[[#]] - cc[[#]]^2)) & /@ {1, 2, 3, 4}; ListPlot[hadi[[#]], Filling -> Axis, ImageSize -> Medium, PlotRange -> All] & /@ {1, 2, 3, 4}
(*找到异常点的位置*) Position[hadi, _?(# > 1 &)]
(*提取数据点*) Extract[data, %]