Mapping PM 2.5 Emissions (Part 1)


This project will map PM 2.5 emissions divided by square miles by county across the United States for the years 1999, 2002, 2005, and 2008.


Part 2: Here.


The data that we’re using comes from these sources:


Just grab the data here:


Loading Data and Libraries


Start by loading the relevant libraries.


library("data.table")
library("colorspace")
library("maps")
library("mapproj")


The maps library has fips county data, load that using the data command. Each county in the country has a unique fips code.

data(county.fips)
head(county.fips)
##   fips        polyname
## 1 1001 alabama,autauga
## 2 1003 alabama,baldwin
## 3 1005 alabama,barbour
## 4 1007    alabama,bibb
## 5 1009  alabama,blount
## 6 1011 alabama,bullock


This first step reads the emissions data into a data.table. This will give emissions for each source within each county in the country. This is a large data.table since each county has many emissions sources but you can see that the table has the fips code for each emissions source, the year, and the level of emissions.

NEI <- readRDS('summarySCC_PM25.rds')
NEI <- data.table(NEI)
head(NEI)
##     fips      SCC Pollutant Emissions  type year
## 1: 09001 10100401  PM25-PRI    15.714 POINT 1999
## 2: 09001 10100404  PM25-PRI   234.178 POINT 1999
## 3: 09001 10100501  PM25-PRI     0.128 POINT 1999
## 4: 09001 10200401  PM25-PRI     2.036 POINT 1999
## 5: 09001 10200504  PM25-PRI     0.388 POINT 1999
## 6: 09001 10200602  PM25-PRI     1.490 POINT 1999
nrow(NEI)
## [1] 6497651


This step reads the county data into R which includes FIPS codes and county area by square mile. It also has a number of columns with no data in them and which we don’t need which is why it’s 38 columns instead of just two or three that would be useful for this project.


countyfips <- fread('ACS_09_5YR_G001_with_ann.csv', header = TRUE)
setnames(countyfips, old = colnames(countyfips), new = as.character(countyfips[1,]))
countyfips <- countyfips [-1, ]
dim(countyfips)
## [1] 3221   38


Shape the Data


This next step aggregates the NEI data returning total tons by fips/county and year. Each county will have 4 rows, one for each year in the data set. More info about data.table aggregation


NEI_fips_year_aggregate <- as.data.table(NEI[, j = list(Emissions = sum(Emissions, na.rm = TRUE)), by = list(fips, year)])
head(NEI_fips_year_aggregate)
##     fips year Emissions
## 1: 09001 1999  5205.503
## 2: 09003 1999  5791.588
## 3: 09005 1999  2168.551
## 4: 09007 1999  1905.879
## 5: 09009 1999  4377.132
## 6: 09011 1999  4123.281


The county with the highest amount of PM2.5 emissions was the Yukon-Koyukuk Census Area in AK. This county has an area of 147,000 sq miles which is 14 times the size of Massachusetts. It doesn’t make sense for an area that large to indicate that there are more emissions than a county like Los Angeles which has lower total emissions but probably more per square mile, so now I take the emissions data and divide it by the area of each county to give emissions per square mile.


In this step I create data table’s for each year of pollution data.


NinetyNine <- NEI_fips_year_aggregate[NEI_fips_year_aggregate$year == 1999, ]
ZeroTwo <- NEI_fips_year_aggregate[NEI_fips_year_aggregate$year == 2002, ]
ZeroFive <- NEI_fips_year_aggregate[NEI_fips_year_aggregate$year == 2005, ] 
ZeroEight <- NEI_fips_year_aggregate[NEI_fips_year_aggregate$year == 2008, ]
head(NinetyNine)
##     fips year Emissions
## 1: 09001 1999  5205.503
## 2: 09003 1999  5791.588
## 3: 09005 1999  2168.551
## 4: 09007 1999  1905.879
## 5: 09009 1999  4377.132
## 6: 09011 1999  4123.281


Now I create a function called divideEmissionsBySqMiles. This function takes emissions data for a particular fips, finds the area in square miles for that fips in the countyfips file, and returns a vector with emissions/square miles. This function does not work with data sets where a fips code is listed more than once. It can’t be used with NEI_fips_year_aggregate because there are four emission records and four years for each fips code and so each fips code appears four times.


divideEmissionsBySqMiles <- function (emissionsdata,  SqMilesData = countyfips) {
  vec <- numeric()
  x <- 0
  for (i in emissionsdata$fips){  
    x <- x + 1
    vec <- append(vec, emissionsdata[emissionsdata$fips == i, Emissions] / as.numeric(SqMilesData[SqMilesData$Id2 == i, 38, with = FALSE ]))        
  } 
  return (vec)
}


Now run divideEmissionsBySqMiles and cbind the vector results to each years data.table


NinetyNine <- cbind(NinetyNine, emissionsSqMiles = divideEmissionsBySqMiles(NinetyNine))
ZeroTwo <- cbind(ZeroTwo, emissionsSqMiles = divideEmissionsBySqMiles(ZeroTwo))
ZeroFive <- cbind(ZeroFive, emissionsSqMiles = divideEmissionsBySqMiles(ZeroFive))
ZeroEight <- cbind(ZeroEight, emissionsSqMiles = divideEmissionsBySqMiles(ZeroEight))
head(NinetyNine)
##     fips year Emissions emissionsSqMiles
## 1: 09001 1999  5205.503         8.330844
## 2: 09003 1999  5791.588         7.878115
## 3: 09005 1999  2168.551         2.355625
## 4: 09007 1999  1905.879         5.161123
## 5: 09009 1999  4377.132         7.239105
## 6: 09011 1999  4123.281         6.201430


Showing the Data


Create a color palette. I ended up using one I found in http://colorbrewer2.org https://gka.github.io/palettes/

Emissions_palette4 <- c("#ffe4e1", "#ffc9bd", "#ffad98", "#ff8f73", "#ff6d4e", "#ff3d21", "#ee0001", "#cb0002", "#ab0002", "#8b0000")


I’m putting each county into one of 10 different emissions brackets depending on the level of emissions. This step will create one long list of all the Emissions/SqMile for the four years in the data set, then divide the full list into ten equal groups.


TotalDecile <- c(NinetyNine$emissionsSqMiles, ZeroTwo$emissionsSqMiles, ZeroFive$emissionsSqMiles, ZeroEight$emissionsSqMiles)
EmissionDecile <- quantile(TotalDecile, probs=seq(0,1, by=0.1), na.rm = TRUE)
EmissionDecile
##           0%          10%          20%          30%          40% 
##   0.00271437   0.38921226   0.65833289   0.95219084   1.25523447 
##          50%          60%          70%          80%          90% 
##   1.58979619   1.99416136   2.51791664   3.37836938   5.47532188 
##         100% 
## 218.87191643


Now create a column in each table called colorbuckets that sorts each fips county into one of the deciles


NinetyNine$colorbuckets <- as.numeric(cut(NinetyNine$emissionsSqMiles, EmissionDecile))
ZeroTwo$colorbuckets <- as.numeric(cut(ZeroTwo$emissionsSqMiles, EmissionDecile))
ZeroFive$colorbuckets <- as.numeric(cut(ZeroFive$emissionsSqMiles, EmissionDecile))
ZeroEight$colorbuckets <- as.numeric(cut(ZeroEight$emissionsSqMiles, EmissionDecile))
head(NinetyNine)
##     fips year Emissions emissionsSqMiles colorbuckets
## 1: 09001 1999  5205.503         8.330844           10
## 2: 09003 1999  5791.588         7.878115           10
## 3: 09005 1999  2168.551         2.355625            7
## 4: 09007 1999  1905.879         5.161123            9
## 5: 09009 1999  4377.132         7.239105           10
## 6: 09011 1999  4123.281         6.201430           10


Match the fips codes from the NEI data set and the data map set that lists county square miles.


cnty.fips <- county.fips$fips[match(map("county", plot=FALSE)$names,
                                    county.fips$polyname)]
head(county.fips$fips[match(map("county", plot = FALSE)$names, county.fips$polyname)])
## [1] 1001 1003 1005 1007 1009 1011
cnty.fips[1] == as.numeric(NinetyNine[67,1, with=FALSE])
## [1] TRUE



NinetyNineColorsMatched <- NinetyNine$colorbuckets[match(cnty.fips, as.numeric(NinetyNine$fips))]
ZeroTwoColorsMatched <- ZeroTwo$colorbuckets[match(cnty.fips, as.numeric(ZeroTwo$fips))]
ZeroFiveColorsMatched <- ZeroFive$colorbuckets[match(cnty.fips, as.numeric(ZeroFive$fips))]
ZeroEightColorsMatched <- ZeroEight$colorbuckets[match(cnty.fips, as.numeric(ZeroEight$fips))]
## Warning in match(cnty.fips, as.numeric(ZeroEight$fips)): NAs introduced by
## coercion
head(NinetyNineColorsMatched)
## [1] 7 9 8 8 8 1


This will create text that will go in the legend of the maps.


leg.txt <- c("0-.39", ".39-.66", ".66-.95", ".95-1.26",
             "1.26-1.59", "1.59-1.99", "1.99-2.51", "2.51-3.38", "3.38-5.47", ">5.47")



1999 Emissions Map


This will output the 1999 map as a png file.


png (file = 'PM25Emissions1999.png', width = 800, height = 800, pointsize = 12)  
map("county", col = Emissions_palette4[NinetyNineColorsMatched], fill = TRUE, resolution = 0, lty = 0, projection = "polyconic") 
map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2, projection="polyconic")  
title("PM2.5 Emissions in Tons / SQ Mile / Year - 1999")  
legend("top", fill = (Emissions_palette4), legend = (leg.txt), ncol = 2)  
dev.off() 
## quartz_off_screen 
##                 2

PM 2.5 Emissions


2002 Emissions Map

png (file = 'PM25Emissions2002.png', width = 800, height = 800, pointsize = 12)
map("county", col = Emissions_palette4[ZeroTwoColorsMatched], fill = TRUE, resolution = 0,
    lty = 0, projection = "polyconic")

map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
    projection="polyconic")

title("PM2.5 Emissions in Tons / SQ Mile / Year - 2002")
legend("top", fill = (Emissions_palette4),
       legend = (leg.txt), ncol = 2)
dev.off()
## quartz_off_screen 
##                 2

PM 2.5 Emissions 2002

2005 Emissions Map

png (file = 'PM25Emissions2005.png', width = 800, height = 800, pointsize = 12)
map("county", col = Emissions_palette4[ZeroFiveColorsMatched], fill = TRUE, resolution = 0,
    lty = 0, projection = "polyconic")

map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
    projection="polyconic")

title("PM2.5 Emissions in Tons / SQ Mile / Year - 2005")
legend("top", fill = (Emissions_palette4),
       legend = (leg.txt), ncol = 2)
dev.off()
## quartz_off_screen 
##                 2

PM 2.5 Emissions 2005

2008 Emissions Map

png (file = 'PM25Emissions2008.png', width = 800, height = 800, pointsize = 12)
map("county", col = Emissions_palette4[ZeroEightColorsMatched], fill = TRUE, resolution = 0,
    lty = 0, projection = "polyconic")

map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
    projection="polyconic")

title("PM2.5 Emissions in Tons / SQ Mile / Year - 2008")
legend("top", fill = (Emissions_palette4),
       legend = (leg.txt), ncol = 2)
dev.off()
## quartz_off_screen 
##                 2

PM 2.5 Emissions 2008


These maps are fine as far as they go, they do show declines in emissions nationwide from 1999 to 2008, but they also show you where people live in the United States. Pet Peeve #208 Geographic Profile Maps Which Are Basically Just Population Maps. From XKCD:

XKCD



Part 2: Here.