8.2 t-test와 단순 선형 회귀 (simple linear regression)의 관계

두 군의 평균을 비교하는 t.test는 사실 연속형 결과(종속)변수에다가 이분형 설명(독립)변수를 0과 1로 코딩하여 단순 선형 회귀(simple linear regression)한 것과 같다.

아래 결과를 보면 p-value가 0.000285 (선형 회귀에서는 am의 \(\beta\) 계수인 \(\beta_1\)의 p-value)이고, 차이에 대한 95% 신뢰구간(선형 회귀에서는 \(\beta_1\)의 신뢰구간)이 (3.64151, 10.84837)로 동일함을 볼 수 있다.

x = mtcars[mtcars$am == 0, "mpg"] # fuel efficiency of automatic transmission
y = mtcars[mtcars$am == 1, "mpg"] # fuel efficiency of manual transmission
t.test(y, x, var.equal=T)


    Two Sample t-test

data:  y and x
t = 4.1061, df = 30, p-value = 0.000285
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  3.64151 10.84837
sample estimates:
mean of x mean of y 
 24.39231  17.14737

r1 = lm(mpg ~ am, mtcars)
summary(r1) # beta1 is the estimate about the difference


Call:
lm(formula = mpg ~ am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.3923 -3.0923 -0.2974  3.2439  9.5077 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   17.147      1.125  15.247 1.13e-15 ***
am             7.245      1.764   4.106 0.000285 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared:  0.3598,    Adjusted R-squared:  0.3385 
F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

confint(r1) # confidence interval of betas. compare with t test result

               2.5 %   97.5 %
(Intercept) 14.85062 19.44411
am           3.64151 10.84837

위의 결과는 오차(error)의 분산에 대해 공통분산(등분산, 等分散)을 가정한 경우이고, 이분산(異分散)을 가정하는 경우에는 위의 결과에 Satterthwaite approximation을 추가로 사용하면 된다.