Confidence Intervals Part 4: Chi Squared Distribution
0.1 Libraries
library(data.table)
library(ggplot2)
library(tigerstats)
1 Intro
Just as there is variability in a sample mean, there is also variability in a sample standard deviation. The chi-square distribution can be used to find a confidence interval the standard deviation or variance.
2 Formula
Variance
\[\frac{(n-1)s^2}{\chi^{2}_{right}}<\sigma^2<\frac{(n-1)s^2}{\chi^2_{left}}\]
Standard Deviation
\[\sqrt{\frac{(n-1)s^2}{\chi^{2}_{right}}}<\sigma<\sqrt{\frac{(n-1)s^2}{\chi^2_{left}}}\]
For both formulas:
- n = sample size
- s = sample standard deviation
- \(\chi_{left} \text{ and } \chi_{right}\) These are the left and right bounds of the distribution. Unlike a normal distribution, the chi-square distribution is not symmetric so both numbers have to be found.
Also
degrees of freedom = n-1
3 Finding \(\chi^2_{left} \text{ and } \chi^2_{right}\)
Because the chi square distribution isn’t symmetric both left and right densities must be found.
For a 95% confidence interval there will be 2.5% on both sides of the distribution that will be excluded so we’ll be looking for the quantiles at .025% and .975%.
Using a Table
Go to the table (below) and find both .025 and .975 on the vertical columns and the numbers where they intersect 9 degrees of freedom. A probability of .975 gives us a \(\chi^2_{left}\) of 2.7 and .025 gives us a \(\chi^2_{right}\) of 19.023
In R:
qchisq(c(.025,.975),df=9, lower.tail=FALSE)
## [1] 19.022768 2.700389
lower.tail=FALSE
means that it’s giving the values for \(\chi^2\) starting from the right side of the distribution. We can get the same answer but starting from the left side of the distribution using lower.tail=TRUE
.
qchisq(c(.025,.975),df=9, lower.tail=TRUE)
## [1] 2.700389 19.022768
4 Degrees of Freedom
The distribution is right skewed and changes based on the number of degrees of freedom.
ChiSquare <- data.table(
DF = c(rep("DF1",301),rep("DF3",301),rep("DF6",301),rep("DF9",301),rep("DF12",301),rep("DF15",301)),
X = rep(seq(0,30, by = 0.1),6),
Density = c(
dchisq(seq(0,30, by = .1),df=1),
dchisq(seq(0,30, by = .1),df=3),
dchisq(seq(0,30, by = .1),df=6),
dchisq(seq(0,30, by = .1),df=9),
dchisq(seq(0,30, by = .1),df=12),
dchisq(seq(0,30, by = .1),df=15)
)
)
ChiSquare$DF <- factor(ChiSquare$DF, levels=c("DF1", "DF3","DF6", "DF9","DF12","DF15"))
ggplot(ChiSquare, aes(X,Density,colour=DF)) + geom_line() + ylim(0,0.3)
5 Rounding Rule for Confidence Interval for Variance or SD
When computing a confidence interval using raw data: * Round off to one or more decimal places greater than the original data.
When computing a confidence interval using a sample variance or standard deviation: * Round off to the same number of decimal places as the given sample variance or standard deviation
6 Example
IrisSepalSample <- sample(iris[iris$Species=="setosa","Sepal.Length"], 20)
IrisSepalSample
## [1] 5.4 5.0 4.8 4.8 5.0 4.7 5.1 4.9 5.7 4.8 5.0 5.1 5.5 5.4 4.9 4.9 4.6
## [18] 4.3 5.0 4.6
Now we punch that information into this formula and come up with a 95% confidence interval.
\[\sqrt{\frac{(n-1)s^2}{\chi^{2}_{right}}}<\sigma<\sqrt{\frac{(n-1)s^2}{\chi^2_{left}}}\]
c(
sqrt(((20-1)*sd(IrisSepalSample)^2)/qchisq(c(.025),df=19, lower.tail=FALSE)),
sqrt(((20-1)*sd(IrisSepalSample)^2)/qchisq(c(.975),df=19, lower.tail=FALSE))
)
## [1] 0.2538801 0.4875933
So the population standard deviation should be between those two values.
sd(iris[iris$Species=="setosa","Sepal.Length"])
## [1] 0.3524897
7 R Functions
- dchisq(x, df, ncp = 0, log = FALSE)
- pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)
- qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)
- rchisq(n, df, ncp = 0)
7.1 dchisq
This will give a density based on an x-axis value.
yaxis <- dchisq(seq(0,25,by=.1), df=9, ncp = 0, log = FALSE)
xaxis <- seq(0,25,by=.1)
SampleData<- data.table(xaxis = xaxis, yaxis=yaxis)
ggplot(SampleData, aes(xaxis,yaxis))+geom_point(size=0.5)
7.2 pchisq
This will give a probability based on a quantile. So 95% of the distribution is below this point.
pchisq(16.91898,df=9)
## [1] 0.95
7.3 qchisq
This will give a quantile based on a probability.
qchisq(.95, df=9)
## [1] 16.91898
ggplot(SampleData, aes(xaxis,yaxis))+geom_point(size=0.5)+geom_vline(xintercept=qchisq(.95,df=9))
7.4 rchisq
This will generate a chi square distributed vector n numbers long.
densityplot(rchisq(1000,df=9))
8 Chi-Square Distribution Table
df | 0.995 | 0.99 | 0.975 | 0.95 | 0.90 | 0.10 | 0.05 | 0.025 | 0.01 | 0.005 |
---|---|---|---|---|---|---|---|---|---|---|
1 | — | — | 0.001 | 0.004 | 0.016 | 2.706 | 3.841 | 5.024 | 6.635 | 7.879 |
2 | 0.010 | 0.020 | 0.051 | 0.103 | 0.211 | 4.605 | 5.991 | 7.378 | 9.210 | 10.597 |
3 | 0.072 | 0.115 | 0.216 | 0.352 | 0.584 | 6.251 | 7.815 | 9.348 | 11.345 | 12.838 |
4 | 0.207 | 0.297 | 0.484 | 0.711 | 1.064 | 7.779 | 9.488 | 11.143 | 13.277 | 14.860 |
5 | 0.412 | 0.554 | 0.831 | 1.145 | 1.610 | 9.236 | 11.070 | 12.833 | 15.086 | 16.750 |
6 | 0.676 | 0.872 | 1.237 | 1.635 | 2.204 | 10.645 | 12.592 | 14.449 | 16.812 | 18.548 |
7 | 0.989 | 1.239 | 1.690 | 2.167 | 2.833 | 12.017 | 14.067 | 16.013 | 18.475 | 20.278 |
8 | 1.344 | 1.646 | 2.180 | 2.733 | 3.490 | 13.362 | 15.507 | 17.535 | 20.090 | 21.955 |
9 | 1.735 | 2.088 | 2.700 | 3.325 | 4.168 | 14.684 | 16.919 | 19.023 | 21.666 | 23.589 |
10 | 2.156 | 2.558 | 3.247 | 3.940 | 4.865 | 15.987 | 18.307 | 20.483 | 23.209 | 25.188 |
11 | 2.603 | 3.053 | 3.816 | 4.575 | 5.578 | 17.275 | 19.675 | 21.920 | 24.725 | 26.757 |
12 | 3.074 | 3.571 | 4.404 | 5.226 | 6.304 | 18.549 | 21.026 | 23.337 | 26.217 | 28.300 |
13 | 3.565 | 4.107 | 5.009 | 5.892 | 7.042 | 19.812 | 22.362 | 24.736 | 27.688 | 29.819 |
14 | 4.075 | 4.660 | 5.629 | 6.571 | 7.790 | 21.064 | 23.685 | 26.119 | 29.141 | 31.319 |
15 | 4.601 | 5.229 | 6.262 | 7.261 | 8.547 | 22.307 | 24.996 | 27.488 | 30.578 | 32.801 |
16 | 5.142 | 5.812 | 6.908 | 7.962 | 9.312 | 23.542 | 26.296 | 28.845 | 32.000 | 34.267 |
17 | 5.697 | 6.408 | 7.564 | 8.672 | 10.085 | 24.769 | 27.587 | 30.191 | 33.409 | 35.718 |
18 | 6.265 | 7.015 | 8.231 | 9.390 | 10.865 | 25.989 | 28.869 | 31.526 | 34.805 | 37.156 |
19 | 6.844 | 7.633 | 8.907 | 10.117 | 11.651 | 27.204 | 30.144 | 32.852 | 36.191 | 38.582 |
20 | 7.434 | 8.260 | 9.591 | 10.851 | 12.443 | 28.412 | 31.410 | 34.170 | 37.566 | 39.997 |
21 | 8.034 | 8.897 | 10.283 | 11.591 | 13.240 | 29.615 | 32.671 | 35.479 | 38.932 | 41.401 |
22 | 8.643 | 9.542 | 10.982 | 12.338 | 14.041 | 30.813 | 33.924 | 36.781 | 40.289 | 42.796 |
23 | 9.260 | 10.196 | 11.689 | 13.091 | 14.848 | 32.007 | 35.172 | 38.076 | 41.638 | 44.181 |
24 | 9.886 | 10.856 | 12.401 | 13.848 | 15.659 | 33.196 | 36.415 | 39.364 | 42.980 | 45.559 |
25 | 10.520 | 11.524 | 13.120 | 14.611 | 16.473 | 34.382 | 37.652 | 40.646 | 44.314 | 46.928 |
26 | 11.160 | 12.198 | 13.844 | 15.379 | 17.292 | 35.563 | 38.885 | 41.923 | 45.642 | 48.290 |
27 | 11.808 | 12.879 | 14.573 | 16.151 | 18.114 | 36.741 | 40.113 | 43.195 | 46.963 | 49.645 |
28 | 12.461 | 13.565 | 15.308 | 16.928 | 18.939 | 37.916 | 41.337 | 44.461 | 48.278 | 50.993 |
29 | 13.121 | 14.256 | 16.047 | 17.708 | 19.768 | 39.087 | 42.557 | 45.722 | 49.588 | 52.336 |
30 | 13.787 | 14.953 | 16.791 | 18.493 | 20.599 | 40.256 | 43.773 | 46.979 | 50.892 | 53.672 |
40 | 20.707 | 22.164 | 24.433 | 26.509 | 29.051 | 51.805 | 55.758 | 59.342 | 63.691 | 66.766 |
50 | 27.991 | 29.707 | 32.357 | 34.764 | 37.689 | 63.167 | 67.505 | 71.420 | 76.154 | 79.490 |
60 | 35.534 | 37.485 | 40.482 | 43.188 | 46.459 | 74.397 | 79.082 | 83.298 | 88.379 | 91.952 |
70 | 43.275 | 45.442 | 48.758 | 51.739 | 55.329 | 85.527 | 90.531 | 95.023 | 100.425 | 104.215 |
80 | 51.172 | 53.540 | 57.153 | 60.391 | 64.278 | 96.578 | 101.879 | 106.629 | 112.329 | 116.321 |
90 | 59.196 | 61.754 | 65.647 | 69.126 | 73.291 | 107.565 | 113.145 | 118.136 | 124.116 | 128.299 |
100 | 67.328 | 70.065 | 74.222 | 77.929 | 82.358 | 118.498 | 124.342 | 129.561 | 135.807 | 140.169 |