Hypergeometric Distribution
The Hypergeometric Distribution is used for sampling without replacement.
Example
A committee of 4 people is to be selected from 7 women and 5 men. What is the probability that the committee will consist of 3 women and 1 man?\[ P(X)= \frac{\text{Combinations of 3 women from 7 total} \cdot \text{Combinations of 1 man from 5 total}}{\text{Combinations of 4 people selected from 12 total}}\]
First find all the ways that a committee of 3 women and 1 man can be found from a group of 7 women and 5 men. \[_{7}C_{3} \cdot{_5}C_{1}=35 \cdot 5=175\]
Now find the total number of ways that a committee of 4 people can be selected from 12 people.
\[_{12}C_4 = 495\]
- The probability of getting a committee of 3 women and 1 man from 7 women and 5 men is:
\[P(X) = \frac{_{7}C_{3} \cdot{_5}C_{1}}{_{12}C_{4}} = \frac{175}{495} = \frac {35}{99} \approx .35354\]
dhyper
To do the same thing with R use the dyhper function which works like this:
dhyper(x, m, n, k)
Where:
x = number of women chosen without replacement from a group that contains both men and women.
m = the total number of women in the group.
n = the total number of men in the group.
k = the total number of people selected from the group.
dhyper(3,7,5,4)
## [1] 0.3535354
The same function can be done but by putting the number of men selected in the x variable.
x = number of men chosen without replacement from a group that contains both men and women.
m = the total number of men in the group.
n = the total number of women in the group.
k = the total number of people selected from the group.
dhyper(1, 5, 7, 4)
## [1] 0.3535354
New Example
Ten boxes have been shipped to a store from a warehouse, 5 were damaged in shipment and 5 were not. If the store manager randomly selects 3 boxes what is the probability that all three boxes are damaged.
dhyper(x, m, n, k)
x = number of damaged boxes selected without replacement from a stack containing both damaged and undamaged boxes = 3
m = total number of damaged boxes in the stack = 5
n = total number of undamaged boxes in the stack = 5
k = number of boxes pulled from the stack = 3
dhyper(3,5,5,3)
## [1] 0.08333333
This can also be written like this:
\[ P(X)= \frac{_{5}C_{3} \cdot{_5}C_{0}}{_{10}C_{3}}= \frac{10}{120}=\frac{1}{12} \approx .083\]
Or to put it another way:
\[ P(X)= \frac{\text{Combinations of 3 damaged boxes from 5 total} \cdot \text{Combinations of 0 good boxes from 5 total}}{\text{Combinations of 3 boxes selected from 10 total}}\]
Plotting
We could also plot this too. In this scenario there are 50 damaged and 25 undamaged boxes. What is the probability of selecting between 0 and 10 damaged boxes if the manager pulls ten boxes from the total of 75.
library(ggplot2)
damaged <- c(0:10)
qplot(damaged,dhyper(damaged, 50,25,10,), xlab="Number of Damaged Boxes", ylab="Probability", main="Probability of selecting between 0 and 10 \ndamaged boxes from 50 damaged and 25 undamaged boxes.")