<- read.csv('../data/lab3/aps.csv')
apdata head(apdata,3)
commod concen lod
1 AP 0.016 0.001
2 AP 0.066 0.003
3 AP 0.130 0.025
USDA pesticide limit: 0.1477
commod
–> commodity. “AP” stands for apples.concen
–> the concentration of pesticide residue on each apple tested.lod
–> the limit of detection for the given pesticide detected on each apple (i.e. the lowest level of pesticide that can be detected by the lab)<- read.csv('../data/lab3/aps.csv')
apdata head(apdata,3)
commod concen lod
1 AP 0.016 0.001
2 AP 0.066 0.003
3 AP 0.130 0.025
Write out the null and alternative hypothesis for a one-way, two-tailed t-test using the example above. What did your results tell you? Can you reject the null hypothesis (at the 0.05 level)?
Null Hypothesis: There is no difference between the pesticide value of the apples (\(\mu_{apple}\)) and USDA’s limit for the amount of pesticides allowed in any food (0.1477). \[\mu_{apple} = 0.1477\] Alternative Hypothesis: There is a difference between the pesticide value of the apples (\(\mu_{apple}\)) and USDA’s limit for the amount of pesticides allowed in any food (0.1477). \[\mu_{apple} \neq 0.1477\]
t.test(apdata[,'concen'], mu=0.1477) # one-sample two-tail t-test
One Sample t-test
data: apdata[, "concen"]
t = -1.6837, df = 15775, p-value = 0.09225
alternative hypothesis: true mean is not equal to 0.1477
95 percent confidence interval:
0.1387950 0.1483754
sample estimates:
mean of x
0.1435852
Result Interpretation: The P-value is 0.092, greater than \(\alpha = 0.05\). Therefore, we failed to reject the null hypothesis. There is no difference between the pesticide value of the apples and the USDA’s pesticide limit.
Write out the null and alternative hypothesis for a one-way, one-tailed t-test using the example above. What did your results tell you? Can you reject the null hypothesis (at the 0.05 level)?
Null Hypothesis: The the pesticide value of the apples (\(\mu_{apple}\)) is greater than or equal to USDA’s pesticide limit (0.1477) . \[\mu_{apple} \ge 0.1477\]
Alternative Hypothesis: The the pesticide value of the apples is less than USDA’s pesticide limit (0.1477). \[\mu_{apple} < 0.1477\]
t.test(apdata[,'concen'], alternative='less', mu=0.1477) # one-sample one-tail t test
One Sample t-test
data: apdata[, "concen"]
t = -1.6837, df = 15775, p-value = 0.04613
alternative hypothesis: true mean is less than 0.1477
95 percent confidence interval:
-Inf 0.1476052
sample estimates:
mean of x
0.1435852
Result Interpretation: The P-value is 0.046, less than \(\alpha = 0.05\). Therefore we reject the null hypothesis. The the pesticide value of the apples is less than USDA’s pesticide limit (0.1477).
# function for compute z distribution CIs
<- function(means, zcrit, sem) {
cifunz <- means - zcrit*sem
cilower <- means + zcrit*sem
ciupper <- c(cilower, ciupper)
civals return(civals)
}
# function for computing t distribution CIs
<- function(means, tcrit, sem) {
cifunt <- means - tcrit*sem
cilower <- means + tcrit*sem
ciupper <- c(cilower, ciupper)
civals return(civals)
}
# one-tailed, one-sample t-test, alpha = 0.05
<- mean(apdata$concen)
meanval <- qt(0.95, df=length(apdata$concen) -1)
tcritival <- (sd(apdata$concen) / sqrt(length(apdata$concen)))
semval
cifunt(meanval, tcritival, semval)
[1] 0.1395652 0.1476052
Why did you use 0.95 in your qt function instead of 0.975 (which is what you did last week)?
Because in a one-tail t-test, I want to put the \(\alpha\) area on one side of the distribution (in this case only the right side). Therefore, I use a \(1-0.05=0.95\) instead of a \(1-0.05/2=0.975\) in my qt()
function.
Can you reject the null hypothesis based on the confidence intervals that you calculated?
Null Hypothesis: The the pesticide value of the apples (\(\mu_{apple}\)) is greater than or equal to USDA’s pesticide limit (0.1477) . \[\mu_{apple} \ge 0.1477\]
Alternative Hypothesis: The the pesticide value of the apples is less than USDA’s pesticide limit (0.1477). \[\mu_{apple} < 0.1477\]
Answer: I reject. Since this is a one-tail test and I only care the right side of the interval, which is \(0.1476<0.1477\). Therefore, the 0.1477 falls outside of the CIs and we thus reject the null hypothesis.
How would you change the code above if you ran a two-tailed, one-way t-test? Please calculate 95% confidence intervals for a two-tailed, one-way t-test and tell us whether you can reject the null hypothesis.
# two-tailed, one-sample t-test, alpha = 0.05
<- mean(apdata$concen)
meanval <- qt(0.975, df=length(apdata$concen) -1)
tcritival <- (sd(apdata$concen) / sqrt(length(apdata$concen)))
semval
cifunt(meanval, tcritival, semval)
[1] 0.1387950 0.1483754
Null Hypothesis: There is no difference between the pesticide value of the apples (\(\mu_{apple}\)) and USDA’s pesticide limit (0.1477). \[\mu_{apple} = 0.1477\]
Alternative Hypothesis: There is a difference between the pesticide value of the apples (\(\mu_{apple}\)) and USDA’s pesticide limit (0.1477). \[\mu_{apple} \neq 0.1477\]
Answer: I fail to reject the null hypothesis because the 0.1477 falls in the CIs (0.139, 0.148).
Let’s use the
cifunz
function written above andqnorm
to calculate 95% confidence intervals of a one-tailed, one-way test using the normal distribution. Can you reject the null hypothesis? How do the confidence intervals you calculated here (using the standard normal distribution) compare to those calculated originally (using the t distribution)?
# one-tailed, one-sample z-test, alpha = 0.05
<- mean(apdata$concen)
meanval <- qnorm(0.95)
zcritival <- (sd(apdata$concen)/sqrt(length(apdata$concen)))
semval cifunz(meanval, zcritival, semval)
[1] 0.1395654 0.1476050
Null Hypothesis: The the pesticide value of the apples (\(\mu_{apple}\)) is greater than or equal to USDA’s pesticide limit (0.1477) . \[\mu_{apple} \ge 0.1477\]
Alternative Hypothesis: The the pesticide value of the apples is less than USDA’s pesticide limit (0.1477). \[\mu_{apple} < 0.1477\]
Answer: The right side of the CIs is \(0.1476<0.1477\) (i.e. 0.1477 falls outside of the CIs). Therefore, I reject the null hypothesis. The CIs I calculated here using the z-distribution is almost the same to that using the t-distribution. This is due to our large sample size, making the t-distribution nearly normal.
# get the t statistics
# samplem --> mean of the sample
# refval --> the reference value
# sem --> the standard error of the mean
<- function(samplem, refval, sem) {
tstat <- (samplem - refval)/sem
val return(val)
}
t.test(apdata[,'concen'], alternative = 'less', mu = 0.1477)
One Sample t-test
data: apdata[, "concen"]
t = -1.6837, df = 15775, p-value = 0.04613
alternative hypothesis: true mean is less than 0.1477
95 percent confidence interval:
-Inf 0.1476052
sample estimates:
mean of x
0.1435852
<- mean(apdata$concen)
samplem <- 0.1477
refval <- sd(apdata$concen) / sqrt(length(apdata$concen))
sem <- tstat(samplem, refval, sem)
tval <- pt(tval, df = length(apdata$concen) - 1) pval
How would you use the above results to calculate the t-statistic, df, and p-value of a two-tailed, one-way t-test?
Answer: The t-statistic and the df are the same. For the p-value, we need to multiply it by 2.
<- subset(iris, Species %in% c('setosa', 'versicolor')) iris_sub
t.test(Sepal.Length ~ Species, data=iris_sub)
Welch Two Sample t-test
data: Sepal.Length by Species
t = -10.521, df = 86.538, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group versicolor is not equal to 0
95 percent confidence interval:
-1.1057074 -0.7542926
sample estimates:
mean in group setosa mean in group versicolor
5.006 5.936
Please write out the null and alternate hypothesis for the t-test above. Are you able to reject the null hypothesis? What does your result mean in non technical terms?
Null Hypothesis: There is no difference between the sepal length of setosa and that of versicolor.
Alternative Hypothesis: There is a difference between the sepal length of setosa and that of versicolor.
Result Interpretation: The p-value is way smaller than 0.05. Therefore, I am able to reject the null hypothesis. In non technical terms, my observed sepal length of the two species is highly unlikely to have occurred by chance, which means there is a meaningful difference between the sepal length of the two species.
Please repeat the above analysis for
Sepal.Width
,Petal.Length
, andPetal.Width
for theiris_sub
dataset. Please interpret the results of these 3 t-tests in non-technical terms.
t.test(Sepal.Width ~ Species, data=iris_sub)
Welch Two Sample t-test
data: Sepal.Width by Species
t = 9.455, df = 94.698, p-value = 2.484e-15
alternative hypothesis: true difference in means between group setosa and group versicolor is not equal to 0
95 percent confidence interval:
0.5198348 0.7961652
sample estimates:
mean in group setosa mean in group versicolor
3.428 2.770
Null Hypothesis: There is no difference between the sepal width of setosa and that of versicolor.
Alternative Hypothesis: There is a difference between the sepal width of setosa and that of versicolor.
Result Interpretation: The p-value is way smaller than 0.05. Therefore, I am able to reject the null hypothesis. In non technical terms, my observed sepal width of the two species is highly unlikely to have occurred by chance, which means there is a meaningful difference between the sepal width of the two species.
t.test(Petal.Length ~ Species, data=iris_sub)
Welch Two Sample t-test
data: Petal.Length by Species
t = -39.493, df = 62.14, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group versicolor is not equal to 0
95 percent confidence interval:
-2.939618 -2.656382
sample estimates:
mean in group setosa mean in group versicolor
1.462 4.260
Null Hypothesis: There is no difference between the petal length of setosa and that of versicolor.
Alternative Hypothesis: There is a difference between the petal length of setosa and that of versicolor.
Result Interpretation: The p-value is way smaller than 0.05. Therefore, I am able to reject the null hypothesis. In non technical terms, my observed petal length of the two species is highly unlikely to have occurred by chance, which means there is a meaningful difference between the petal length of the two species.
t.test(Petal.Width ~ Species, data=iris_sub)
Welch Two Sample t-test
data: Petal.Width by Species
t = -34.08, df = 74.755, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group versicolor is not equal to 0
95 percent confidence interval:
-1.143133 -1.016867
sample estimates:
mean in group setosa mean in group versicolor
0.246 1.326
Null Hypothesis: There is no difference between the petal width of setosa and that of versicolor.
Alternative Hypothesis: There is a difference between the petal width of setosa and that of versicolor.
Result Interpretation: The p-value is way smaller than 0.05. Therefore, I am able to reject the null hypothesis. In non technical terms, my observed petal width of the two species is highly unlikely to have occurred by chance, which means there is a meaningful difference between the petal width of the two species.