Hypergeometric Distribution


The Hypergeometric Distribution is used for sampling without replacement.

Example

A committee of 4 people is to be selected from 7 women and 5 men. What is the probability that the committee will consist of 3 women and 1 man?


\[ P(X)= \frac{\text{Combinations of 3 women from 7 total} \cdot \text{Combinations of 1 man from 5 total}}{\text{Combinations of 4 people selected from 12 total}}\]
OR

  • First find all the ways that a committee of 3 women and 1 man can be found from a group of 7 women and 5 men. \[_{7}C_{3} \cdot{_5}C_{1}=35 \cdot 5=175\]

  • Now find the total number of ways that a committee of 4 people can be selected from 12 people.

\[_{12}C_4 = 495\]

  • The probability of getting a committee of 3 women and 1 man from 7 women and 5 men is:

\[P(X) = \frac{_{7}C_{3} \cdot{_5}C_{1}}{_{12}C_{4}} = \frac{175}{495} = \frac {35}{99} \approx .35354\]


dhyper


To do the same thing with R use the dyhper function which works like this:

dhyper(x, m, n, k)

Where:

  • x = number of women chosen without replacement from a group that contains both men and women.

  • m = the total number of women in the group.

  • n = the total number of men in the group.

  • k = the total number of people selected from the group.

dhyper(3,7,5,4)
## [1] 0.3535354


The same function can be done but by putting the number of men selected in the x variable.

  • x = number of men chosen without replacement from a group that contains both men and women.

  • m = the total number of men in the group.

  • n = the total number of women in the group.

  • k = the total number of people selected from the group.

dhyper(1, 5, 7, 4)
## [1] 0.3535354


New Example

Ten boxes have been shipped to a store from a warehouse, 5 were damaged in shipment and 5 were not. If the store manager randomly selects 3 boxes what is the probability that all three boxes are damaged.
dhyper(x, m, n, k)

  • x = number of damaged boxes selected without replacement from a stack containing both damaged and undamaged boxes = 3

  • m = total number of damaged boxes in the stack = 5

  • n = total number of undamaged boxes in the stack = 5

  • k = number of boxes pulled from the stack = 3


dhyper(3,5,5,3)
## [1] 0.08333333


This can also be written like this:

\[ P(X)= \frac{_{5}C_{3} \cdot{_5}C_{0}}{_{10}C_{3}}= \frac{10}{120}=\frac{1}{12} \approx .083\]


Or to put it another way:

\[ P(X)= \frac{\text{Combinations of 3 damaged boxes from 5 total} \cdot \text{Combinations of 0 good boxes from 5 total}}{\text{Combinations of 3 boxes selected from 10 total}}\]

Plotting


We could also plot this too. In this scenario there are 50 damaged and 25 undamaged boxes. What is the probability of selecting between 0 and 10 damaged boxes if the manager pulls ten boxes from the total of 75.

library(ggplot2)
damaged <- c(0:10)
qplot(damaged,dhyper(damaged, 50,25,10,), xlab="Number of Damaged Boxes", ylab="Probability", main="Probability of selecting between 0 and 10 \ndamaged boxes from 50 damaged and 25 undamaged boxes.")