Multinomial Distribution
Multinomial Distributions allow you to calculate the probability of an event with more than two outcomes.
Example
A hypothetical survey has found that 50% of people want a dog for a pet, 30% want a cat, 20% want fish. If 5 people are randomly selected what is the probability that 3 want a dog, 1 wants a cat, and 1 wants fish.
dmultinom(c(3,1,1),5,c(.5,.3,.2))
## [1] 0.15
The math looks like this
\[ P(X) = \frac{5!}{3! \cdot 1! \cdot 1!} \cdot (0.50)^3 \cdot (0.30)^1 \cdot (0.20)^1 = 0.15\]
\[ P(X) = \frac{(n)!}{(X_1)!(X_2)!(X_3)!} \cdot (p_1)^{X_1} \cdot (p_2)^{X_2} \cdot (p_3)^{X_3}\]
where:
n = number of total people who will be randomly selected
X = number of people interested in the different types of pets
p = probability of people wanting those different types of pets
It’s important to point out that the sum of all p values must equal 1 and the sum of all X values must equal n.
Example
This data set shows the results of a survey of 237 people about their smoking habits. People rated the amount that they smoked and the categories were Heavy, Never, Occas, Regul, and NA.
library(MASS)
survey$Smoke[1:20]
## [1] Never Regul Occas Never Never Never Never Never Never Never Never
## [12] Never Never Never Never Never Never Never Never Never
## Levels: Heavy Never Occas Regul
summary(survey$Smoke)
## Heavy Never Occas Regul NA's
## 11 189 19 17 1
length(survey$Smoke)
## [1] 237
Assuming that this survey accurately represents the entire population, if you were to ask 30 random people about smoking what is the probability that:
2 are heavy smokers
19 never smoke
5 occasionally smoke
4 regularly smoke
0 NA’s
library(MASS)
library(plyr)
Pheavy <- count(survey$Smoke)[1,2]/length(survey$Smoke) #Probability of heavy smoker
Pnever <- count(survey$Smoke)[2,2]/length(survey$Smoke) #Probability of never smoking
Poccas <- count(survey$Smoke)[3,2]/length(survey$Smoke) #Probability of occasional smoker
Pregul <- count(survey$Smoke)[4,2]/length(survey$Smoke) #Probability of regular smoker
Pna <- count(survey$Smoke)[5,2]/length(survey$Smoke) #Probability of na
dmultinom(c(2,19,5,4,0), 30, c(Pheavy,Pnever,Poccas, Pregul, Pna))
## [1] 0.0009700987
Or to put it another way:
\[ P(X) = \frac{30!}{2! \cdot 19! \cdot 5! \cdot 4! \cdot 0!} \cdot (0.046)^2 \cdot (0.797)^{19} \cdot (0.080)^5 \cdot (0.072)^5 \cdot (0.004)^0 \approx 0.00097\]
The long drawn out way to do that same math operation in R looks like this:
factorial(30)/(factorial(2)*factorial(19)*factorial(5)*factorial(4)*factorial(0))*Pheavy^2*Pnever^19*Poccas^5*Pregul^4*Pna^0
## [1] 0.0009700987