统计检验是将抽样结果和抽样分布相对照而做出判断的工做。主要分5个步骤:html
- 创建假设
- 求抽样分布
- 选择显著性水平和否认域
- 计算检验统计量
- 断定 —— 百度百科
假设检验(hypothesis test)亦称显著性检验(significant test),是统计推断的另外一重要内容,其目的是比较整体参数之间有无差异。假设检验的实质是判断观察到的“差异”是由抽样偏差引发仍是整体上的不一样,目的是评价两种不一样处理引发效应不一样的证据有多强,这种证据的强度用几率P来度量和表示。除t分布外,针对不一样的资料还有其余各类检验统计量及分布,如F分布、X2分布等,应用这些分布对不一样类型的数据进行假设检验的步骤相同,其差异仅仅是须要计算的检验统计量不一样。less
t.test() => Student's t-Testide
require(graphics) t.test(1:10, y = c(7:20)) # P = .00001855 t.test(1:10, y = c(7:20, 200)) # P = .1245 -- 不在显著
## 经典案例: 学生犯困数据 plot(extra ~ group, data = sleep)
## 传统表达式 with(sleep, t.test(extra[group == 1], extra[group == 2])) Welch Two Sample t-test data: extra[group == 1] and extra[group == 2] t = -1.8608, df = 17.776, p-value = 0.07939 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean of x mean of y 0.75 2.33 ## 公式形式 t.test(extra ~ group, data = sleep) Welch Two Sample t-test data: extra by group t = -1.8608, df = 17.776, p-value = 0.07939 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean in group 1 mean in group 2 0.75 2.33
单个整体ui
X<-c(159, 280, 101, 212, 224, 379, 179, 264, 222, 362, 168, 250, 149, 260, 485, 170) t.test(X, alternative = "greater", mu = 225) One Sample t-test data: X t = 0.66852, df = 15, p-value = 0.257 alternative hypothesis: true mean is greater than 225 95 percent confidence interval: 198.2321 Inf sample estimates: mean of x 241.5
两个整体url
X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3) Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1) t.test(X, Y, var.equal=TRUE, alternative = "less") Two Sample t-test data: X and Y t = -4.2957, df = 18, p-value = 0.0002176 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -1.908255 sample estimates: mean of x mean of y 76.23 79.43
成对数据t检验code
X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3) Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1) t.test(X-Y, alternative = "less") One Sample t-test data: X - Y t = -4.2018, df = 9, p-value = 0.00115 alternative hypothesis: true mean is less than 0 95 percent confidence interval: -Inf -1.803943 sample estimates: mean of x -3.2
var.test() => F Test to Compare Two Variancesorm
x <- rnorm(50, mean = 0, sd = 2) y <- rnorm(30, mean = 1, sd = 1) var.test(x, y) # x和y的方差是否相同? var.test(lm(x ~ 1), lm(y ~ 1)) # 相同.
X<-scan() 136 144 143 157 137 159 135 158 147 165 158 142 159 150 156 152 140 149 148 155 var.test(X,Y) F test to compare two variances data: X and Y F = 34.945, num df = 19, denom df = 9, p-value = 6.721e-06 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 9.487287 100.643093 sample estimates: ratio of variances 34.94489
X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3) Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1) var.test(X,Y) F test to compare two variances data: X and Y F = 1.4945, num df = 9, denom df = 9, p-value = 0.559 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.3712079 6.0167710 sample estimates: ratio of variances 1.494481
binom.test(445,500,p=0.85) Exact binomial test data: 445 and 500 number of successes = 445, number of trials = 500, p-value = 0.01207 alternative hypothesis: true probability of success is not equal to 0.85 95 percent confidence interval: 0.8592342 0.9160509 sample estimates: probability of success 0.89
binom.test(1,400,p=0.01,alternative="less") Exact binomial test data: 1 and 400 number of successes = 1, number of trials = 400, p-value = 0.09048 alternative hypothesis: true probability of success is less than 0.01 95 percent confidence interval: 0.0000000 0.0118043 sample estimates: probability of success 0.0025
X<-c(210, 312, 170, 85, 223) chisq.test(X) Chi-squared test for given probabilities data: X X-squared = 136.49, df = 4, p-value < 2.2e-16
X<-scan() 25 45 50 54 55 61 64 68 72 75 75 78 79 81 83 84 84 84 85 86 86 86 87 89 89 89 90 91 91 92 100 A<-table(cut(X, br=c(0,69,79,89,100))) #cut 将变量区域划分为若干区间 #table 计算因子合并后的个数 p<-pnorm(c(70,80,90,100), mean(X), sd(X)) p<-c(p[1], p[2]-p[1], p[3]-p[2], 1-p[3]) chisq.test(A,p=p) Chi-squared test for given probabilities data: A X-squared = 8.334, df = 3, p-value = 0.03959 #均值之间有无显著区别
大麦的杂交后代芒性状的比例 无芒:长芒: 短芒=9:3:4,而实际观测值为335:125:160 ,检验观测值是否符合理论假设?htm
chisq.test(c(335, 125, 160), p=c(9,3,4)/16) Chi-squared test for given probabilities data: c(335, 125, 160) X-squared = 1.362, df = 2, p-value = 0.5061
x<-0:6 y<-c(7,10,12,8,3,2,0) mean<-mean(rep(x,y)) q<-ppois(x,mean) n<-length(y) p[1]<-q[1] p[n]<-1-q[n-1] for(i in 2:(n-1)) p[i]<-1-q[i-1] chisq.test(y, p= rep(1/length(y), length(y)) ) Chi-squared test for given probabilities data: y X-squared = 19.667, df = 6, p-value = 0.003174 Z<-c(7, 10, 12, 8) n<-length(Z); p<-p[1:n-1]; p[n]<-1-q[n-1] chisq.test(Z, p= rep(1/length(Z), length(Z))) Chi-squared test for given probabilities data: Z X-squared = 1.5946, df = 3, p-value = 0.6606
P值越小越有理由拒绝无效假设,认为整体之间有差异的统计学证据越充分。须要注意:不拒绝H0不等于支持H0成立,仅表示现有样本信息不足以拒绝H0。
传统上,一般将P>0.05称为“不显著”,0.0l<P≤0.05称为“显著”,P≤0.0l称为“很是显著”。blog
注:本文参考来自张金龙科学网博客。ci