Reshape

Melt

Melt allows you to convert multiple variable columns into two columns, one column with the variable name and the second with the value of that variable. THis is an example with mtcars data.

library(reshape2)

mtcars$cars<- rownames(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##                                cars
## Mazda RX4                 Mazda RX4
## Mazda RX4 Wag         Mazda RX4 Wag
## Datsun 710               Datsun 710
## Hornet 4 Drive       Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant                     Valiant

Notice the variables “wt”, “gear”, and “carb”.

carsData <- melt(mtcars, id.vars = c("cars", "mpg", "cyl", "disp", "hp"), measure.vars = c("wt","gear", "carb"))

After the data has been melted, notice the variable column and also that each car now has three entries, one for “wt”, one for “gear”, and one for “carb”.

head(carsData)
##                cars  mpg cyl disp  hp variable value
## 1         Mazda RX4 21.0   6  160 110       wt 2.620
## 2     Mazda RX4 Wag 21.0   6  160 110       wt 2.875
## 3        Datsun 710 22.8   4  108  93       wt 2.320
## 4    Hornet 4 Drive 21.4   6  258 110       wt 3.215
## 5 Hornet Sportabout 18.7   8  360 175       wt 3.440
## 6           Valiant 18.1   6  225 105       wt 3.460
carsData[carsData$cars == "Mazda RX4",]
##         cars mpg cyl disp  hp variable value
## 1  Mazda RX4  21   6  160 110       wt  2.62
## 33 Mazda RX4  21   6  160 110     gear  4.00
## 65 Mazda RX4  21   6  160 110     carb  4.00


Again the same thing with airquality data.

names(airquality) <- tolower(names(airquality))
head(airquality)
##   ozone solar.r wind temp month day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
head(aqm)
##   month day variable value
## 1     5   1    ozone    41
## 2     5   2    ozone    36
## 3     5   3    ozone    12
## 4     5   4    ozone    18
## 6     5   6    ozone    28
## 7     5   7    ozone    23


acast & dcast


acast and dcast are basically the same. The difference is that acast returns a vector/matrix/array and dcast returns a data.frame.

The function will perform a function on a melted dataset and return data set.

levels(aqm$variable)
## [1] "ozone"   "solar.r" "wind"    "temp"
airqualityDcast <- dcast(aqm, formula = month ~ variable, mean, margins = c("month", "variable"))
class(airqualityDcast) ; airqualityDcast
## [1] "data.frame"
##   month    ozone  solar.r      wind     temp    (all)
## 1     5 23.61538 181.2963 11.622581 65.54839 68.70696
## 2     6 29.44444 190.1667 10.266667 79.10000 87.38384
## 3     7 59.11538 216.4839  8.941935 83.90323 93.49748
## 4     8 59.96154 171.8571  8.793548 83.96774 79.71207
## 5     9 31.44828 167.4333 10.180000 76.90000 71.82689
## 6 (all) 42.12931 185.9315  9.957516 77.88235 80.05722


The formula argument in dcast/acast, e.g. “month ~ variable”, determines what will be the variables that mean is applied to and how the data is grouped. In the example above (month~variable) returns the mean of the ozone, solar.r, wind, and temp grouped by each month.

This does the same thing as above but returns the mean of these values by the day of the month.

aqmDayDcast <- dcast(aqm, day ~ variable, mean, margins = c("month", "variable"))
head(aqmDayDcast)
##   day    ozone  solar.r  wind temp    (all)
## 1   1 77.75000 199.0000  6.78 80.2 91.62632
## 2   2 43.00000 174.8000  9.16 80.8 78.72632
## 3   3 33.25000 177.4000  9.62 79.4 77.11053
## 4   4 62.33333 197.2500  8.62 81.8 84.00588
## 5   5 48.66667 163.3333  8.46 79.2 67.14375
## 6   6 41.50000 223.3333 12.04 79.8 76.18824


The margins argument applies the mean (or whatever function) to all values in the rows and/or all values in the columns. For example:

dcast(aqm, month ~ variable, mean, margins = c("variable")) 
##   month    ozone  solar.r      wind     temp    (all)
## 1     5 23.61538 181.2963 11.622581 65.54839 68.70696
## 2     6 29.44444 190.1667 10.266667 79.10000 87.38384
## 3     7 59.11538 216.4839  8.941935 83.90323 93.49748
## 4     8 59.96154 171.8571  8.793548 83.96774 79.71207
## 5     9 31.44828 167.4333 10.180000 76.90000 71.82689
dcast(aqm, month ~ variable, mean, margins = c("month")) 
##   month    ozone  solar.r      wind     temp
## 1     5 23.61538 181.2963 11.622581 65.54839
## 2     6 29.44444 190.1667 10.266667 79.10000
## 3     7 59.11538 216.4839  8.941935 83.90323
## 4     8 59.96154 171.8571  8.793548 83.96774
## 5     9 31.44828 167.4333 10.180000 76.90000
## 6 (all) 42.12931 185.9315  9.957516 77.88235
dcast(aqm, month ~ variable, mean)
##   month    ozone  solar.r      wind     temp
## 1     5 23.61538 181.2963 11.622581 65.54839
## 2     6 29.44444 190.1667 10.266667 79.10000
## 3     7 59.11538 216.4839  8.941935 83.90323
## 4     8 59.96154 171.8571  8.793548 83.96774
## 5     9 31.44828 167.4333 10.180000 76.90000

Notice that the mean of all ozone values from the airquality dataset is the same as the ozone mean from the dcast(aqm) data with the “margins = c(”month“)” argument.

mean(airquality$ozone, na.rm=TRUE); mean(airquality$solar.r, na.rm=TRUE); mean(airquality$wind, na.rm=TRUE); mean(airquality$temp, na.rm=TRUE)
## [1] 42.12931
## [1] 185.9315
## [1] 9.957516
## [1] 77.88235


Plotting reshaped data

Melted data is a lot easier with lattice and ggplot.

library(ggplot2)
ggplot(aqm, aes(x=month, y=value, fill=variable)) + geom_bar(stat = "identity")

Cast data isn’t as effective for showing multiple variables on one plot.

ggplot(airqualityDcast, aes(x=month, y=ozone)) + geom_bar(stat = "identity")

head(aqm)
##   month day variable value
## 1     5   1    ozone    41
## 2     5   2    ozone    36
## 3     5   3    ozone    12
## 4     5   4    ozone    18
## 6     5   6    ozone    28
## 7     5   7    ozone    23
head(airqualityDcast)
##   month    ozone  solar.r      wind     temp    (all)
## 1     5 23.61538 181.2963 11.622581 65.54839 68.70696
## 2     6 29.44444 190.1667 10.266667 79.10000 87.38384
## 3     7 59.11538 216.4839  8.941935 83.90323 93.49748
## 4     8 59.96154 171.8571  8.793548 83.96774 79.71207
## 5     9 31.44828 167.4333 10.180000 76.90000 71.82689
## 6 (all) 42.12931 185.9315  9.957516 77.88235 80.05722

Error : Mapping a variable to y and also using stat=“bin”

If the geom_bar function is empty or if geom_bar contains the argument stat=“bin” it will throw an error, stat needs to be identity.



For more information, click here, here, and here.