Mapping PM 2.5 Emissions (Part 1)
Ian Maddaus
This project will map PM 2.5 emissions divided by square miles by county across the United States for the years 1999, 2002, 2005, and 2008.
Part 2: Here.
The data that we’re using comes from these sources:
- NEI (National Emissions Inventory) data from the EPA: https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip
- Fips county code and county area data (ACS_09_5YR_G001_with_ann.csv) came from factfinder.census.gov
Just grab the data here:
Loading Data and Libraries
Start by loading the relevant libraries.
library("data.table")
library("colorspace")
library("maps")
library("mapproj")
The maps library has fips county data, load that using the data command. Each county in the country has a unique fips code.
data(county.fips)
head(county.fips)
## fips polyname
## 1 1001 alabama,autauga
## 2 1003 alabama,baldwin
## 3 1005 alabama,barbour
## 4 1007 alabama,bibb
## 5 1009 alabama,blount
## 6 1011 alabama,bullock
This first step reads the emissions data into a data.table. This will give emissions for each source within each county in the country. This is a large data.table since each county has many emissions sources but you can see that the table has the fips code for each emissions source, the year, and the level of emissions.
NEI <- readRDS('summarySCC_PM25.rds')
NEI <- data.table(NEI)
head(NEI)
## fips SCC Pollutant Emissions type year
## 1: 09001 10100401 PM25-PRI 15.714 POINT 1999
## 2: 09001 10100404 PM25-PRI 234.178 POINT 1999
## 3: 09001 10100501 PM25-PRI 0.128 POINT 1999
## 4: 09001 10200401 PM25-PRI 2.036 POINT 1999
## 5: 09001 10200504 PM25-PRI 0.388 POINT 1999
## 6: 09001 10200602 PM25-PRI 1.490 POINT 1999
nrow(NEI)
## [1] 6497651
This step reads the county data into R which includes FIPS codes and county area by square mile. It also has a number of columns with no data in them and which we don’t need which is why it’s 38 columns instead of just two or three that would be useful for this project.
countyfips <- fread('ACS_09_5YR_G001_with_ann.csv', header = TRUE)
setnames(countyfips, old = colnames(countyfips), new = as.character(countyfips[1,]))
countyfips <- countyfips [-1, ]
dim(countyfips)
## [1] 3221 38
Shape the Data
This next step aggregates the NEI data returning total tons by fips/county and year. Each county will have 4 rows, one for each year in the data set. More info about data.table aggregation
NEI_fips_year_aggregate <- as.data.table(NEI[, j = list(Emissions = sum(Emissions, na.rm = TRUE)), by = list(fips, year)])
head(NEI_fips_year_aggregate)
## fips year Emissions
## 1: 09001 1999 5205.503
## 2: 09003 1999 5791.588
## 3: 09005 1999 2168.551
## 4: 09007 1999 1905.879
## 5: 09009 1999 4377.132
## 6: 09011 1999 4123.281
The county with the highest amount of PM2.5 emissions was the Yukon-Koyukuk Census Area in AK. This county has an area of 147,000 sq miles which is 14 times the size of Massachusetts. It doesn’t make sense for an area that large to indicate that there are more emissions than a county like Los Angeles which has lower total emissions but probably more per square mile, so now I take the emissions data and divide it by the area of each county to give emissions per square mile.
In this step I create data table’s for each year of pollution data.
NinetyNine <- NEI_fips_year_aggregate[NEI_fips_year_aggregate$year == 1999, ]
ZeroTwo <- NEI_fips_year_aggregate[NEI_fips_year_aggregate$year == 2002, ]
ZeroFive <- NEI_fips_year_aggregate[NEI_fips_year_aggregate$year == 2005, ]
ZeroEight <- NEI_fips_year_aggregate[NEI_fips_year_aggregate$year == 2008, ]
head(NinetyNine)
## fips year Emissions
## 1: 09001 1999 5205.503
## 2: 09003 1999 5791.588
## 3: 09005 1999 2168.551
## 4: 09007 1999 1905.879
## 5: 09009 1999 4377.132
## 6: 09011 1999 4123.281
Now I create a function called divideEmissionsBySqMiles. This function takes emissions data for a particular fips, finds the area in square miles for that fips in the countyfips file, and returns a vector with emissions/square miles. This function does not work with data sets where a fips code is listed more than once. It can’t be used with NEI_fips_year_aggregate because there are four emission records and four years for each fips code and so each fips code appears four times.
divideEmissionsBySqMiles <- function (emissionsdata, SqMilesData = countyfips) {
vec <- numeric()
x <- 0
for (i in emissionsdata$fips){
x <- x + 1
vec <- append(vec, emissionsdata[emissionsdata$fips == i, Emissions] / as.numeric(SqMilesData[SqMilesData$Id2 == i, 38, with = FALSE ]))
}
return (vec)
}
Now run divideEmissionsBySqMiles and cbind the vector results to each years data.table
NinetyNine <- cbind(NinetyNine, emissionsSqMiles = divideEmissionsBySqMiles(NinetyNine))
ZeroTwo <- cbind(ZeroTwo, emissionsSqMiles = divideEmissionsBySqMiles(ZeroTwo))
ZeroFive <- cbind(ZeroFive, emissionsSqMiles = divideEmissionsBySqMiles(ZeroFive))
ZeroEight <- cbind(ZeroEight, emissionsSqMiles = divideEmissionsBySqMiles(ZeroEight))
head(NinetyNine)
## fips year Emissions emissionsSqMiles
## 1: 09001 1999 5205.503 8.330844
## 2: 09003 1999 5791.588 7.878115
## 3: 09005 1999 2168.551 2.355625
## 4: 09007 1999 1905.879 5.161123
## 5: 09009 1999 4377.132 7.239105
## 6: 09011 1999 4123.281 6.201430
Showing the Data
Create a color palette. I ended up using one I found in http://colorbrewer2.org https://gka.github.io/palettes/
Emissions_palette4 <- c("#ffe4e1", "#ffc9bd", "#ffad98", "#ff8f73", "#ff6d4e", "#ff3d21", "#ee0001", "#cb0002", "#ab0002", "#8b0000")
I’m putting each county into one of 10 different emissions brackets depending on the level of emissions. This step will create one long list of all the Emissions/SqMile for the four years in the data set, then divide the full list into ten equal groups.
TotalDecile <- c(NinetyNine$emissionsSqMiles, ZeroTwo$emissionsSqMiles, ZeroFive$emissionsSqMiles, ZeroEight$emissionsSqMiles)
EmissionDecile <- quantile(TotalDecile, probs=seq(0,1, by=0.1), na.rm = TRUE)
EmissionDecile
## 0% 10% 20% 30% 40%
## 0.00271437 0.38921226 0.65833289 0.95219084 1.25523447
## 50% 60% 70% 80% 90%
## 1.58979619 1.99416136 2.51791664 3.37836938 5.47532188
## 100%
## 218.87191643
Now create a column in each table called colorbuckets that sorts each fips county into one of the deciles
NinetyNine$colorbuckets <- as.numeric(cut(NinetyNine$emissionsSqMiles, EmissionDecile))
ZeroTwo$colorbuckets <- as.numeric(cut(ZeroTwo$emissionsSqMiles, EmissionDecile))
ZeroFive$colorbuckets <- as.numeric(cut(ZeroFive$emissionsSqMiles, EmissionDecile))
ZeroEight$colorbuckets <- as.numeric(cut(ZeroEight$emissionsSqMiles, EmissionDecile))
head(NinetyNine)
## fips year Emissions emissionsSqMiles colorbuckets
## 1: 09001 1999 5205.503 8.330844 10
## 2: 09003 1999 5791.588 7.878115 10
## 3: 09005 1999 2168.551 2.355625 7
## 4: 09007 1999 1905.879 5.161123 9
## 5: 09009 1999 4377.132 7.239105 10
## 6: 09011 1999 4123.281 6.201430 10
Match the fips codes from the NEI data set and the data map set that lists county square miles.
cnty.fips <- county.fips$fips[match(map("county", plot=FALSE)$names,
county.fips$polyname)]
head(county.fips$fips[match(map("county", plot = FALSE)$names, county.fips$polyname)])
## [1] 1001 1003 1005 1007 1009 1011
cnty.fips[1] == as.numeric(NinetyNine[67,1, with=FALSE])
## [1] TRUE
NinetyNineColorsMatched <- NinetyNine$colorbuckets[match(cnty.fips, as.numeric(NinetyNine$fips))]
ZeroTwoColorsMatched <- ZeroTwo$colorbuckets[match(cnty.fips, as.numeric(ZeroTwo$fips))]
ZeroFiveColorsMatched <- ZeroFive$colorbuckets[match(cnty.fips, as.numeric(ZeroFive$fips))]
ZeroEightColorsMatched <- ZeroEight$colorbuckets[match(cnty.fips, as.numeric(ZeroEight$fips))]
## Warning in match(cnty.fips, as.numeric(ZeroEight$fips)): NAs introduced by
## coercion
head(NinetyNineColorsMatched)
## [1] 7 9 8 8 8 1
This will create text that will go in the legend of the maps.
leg.txt <- c("0-.39", ".39-.66", ".66-.95", ".95-1.26",
"1.26-1.59", "1.59-1.99", "1.99-2.51", "2.51-3.38", "3.38-5.47", ">5.47")
1999 Emissions Map
This will output the 1999 map as a png file.
png (file = 'PM25Emissions1999.png', width = 800, height = 800, pointsize = 12)
map("county", col = Emissions_palette4[NinetyNineColorsMatched], fill = TRUE, resolution = 0, lty = 0, projection = "polyconic")
map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2, projection="polyconic")
title("PM2.5 Emissions in Tons / SQ Mile / Year - 1999")
legend("top", fill = (Emissions_palette4), legend = (leg.txt), ncol = 2)
dev.off()
## quartz_off_screen
## 2
2002 Emissions Map
png (file = 'PM25Emissions2002.png', width = 800, height = 800, pointsize = 12)
map("county", col = Emissions_palette4[ZeroTwoColorsMatched], fill = TRUE, resolution = 0,
lty = 0, projection = "polyconic")
map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
projection="polyconic")
title("PM2.5 Emissions in Tons / SQ Mile / Year - 2002")
legend("top", fill = (Emissions_palette4),
legend = (leg.txt), ncol = 2)
dev.off()
## quartz_off_screen
## 2
2005 Emissions Map
png (file = 'PM25Emissions2005.png', width = 800, height = 800, pointsize = 12)
map("county", col = Emissions_palette4[ZeroFiveColorsMatched], fill = TRUE, resolution = 0,
lty = 0, projection = "polyconic")
map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
projection="polyconic")
title("PM2.5 Emissions in Tons / SQ Mile / Year - 2005")
legend("top", fill = (Emissions_palette4),
legend = (leg.txt), ncol = 2)
dev.off()
## quartz_off_screen
## 2
2008 Emissions Map
png (file = 'PM25Emissions2008.png', width = 800, height = 800, pointsize = 12)
map("county", col = Emissions_palette4[ZeroEightColorsMatched], fill = TRUE, resolution = 0,
lty = 0, projection = "polyconic")
map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
projection="polyconic")
title("PM2.5 Emissions in Tons / SQ Mile / Year - 2008")
legend("top", fill = (Emissions_palette4),
legend = (leg.txt), ncol = 2)
dev.off()
## quartz_off_screen
## 2
These maps are fine as far as they go, they do show declines in emissions nationwide from 1999 to 2008, but they also show you where people live in the United States. Pet Peeve #208 Geographic Profile Maps Which Are Basically Just Population Maps. From XKCD: