How to create a Pivot Table in R
Why I make this?
Pivot Tables are beautiful. Even Steve Jobs thought so. But they’re a pain to create in R.
Not anymore!
How to do this?
If you’ve got data that looks like this –
S.No. | UID | District | Block | Age | Gender | Social Category |
---|---|---|---|---|---|---|
1 | 239140001 | HOOGHLY | ARAMBAGH | 7 | GIRL | GENERAL |
2 | 239140002 | HOWRAH | SHYAMPUR-I | 2 | BOY | GENERAL |
3 | 239140003 | KOLKATA | WARD 72 | 7 | BOY | SC |
4 | 239140004 | HOOGHLY | BALAGARH | 3 | GIRL | ST |
5 | 239140005 | BIRBHUM | DUBRAJPUR | 5 | BOY | SC |
6 | 239140006 | KOLKATA | WARD 101 | 8 | TRANS | OBC |
… | … | … | … | … | … | … |
and you want a pivot table that looks like this –
District | BOY | GIRL | TRANS |
---|---|---|---|
BIRBHUM | 23 | 19 | 0 |
HOOGHLY | 36 | 49 | 2 |
HOWRAH | 13 | 19 | 4 |
KOLKATA | 83 | 69 | 1 |
… | … | … | … |
This is what you need to do –
|
|
That’s it!!! And it’s a million times faster than running creating an empty data frame and then populating it with for
loops.
You can now create a pivot table for any two pairs by simply switching the columns!
Here’s what’s happening in the function –
- You segment the original data frame (
df
) with just the two columns that you need, i.e., District and Gender. You can do it by referring the column numbers directly (df[,c(3,6)]
) or by running awhich
call using the column name (df[,c(which(colnames(df)=="District"),which(colnames(df)=="Gender"))]
). - Next, you run the
table
function on it get the counts for each Gender against each District. We wrap this inside aas.data.frame
function because that’s just easier. - Now we use the
%>%
(pipe) operator to run thepivot_wider
function on the data frame we’ve just created. - We tell pivot_wider to take the column names from the column called “Gender” and the values for that [District,Gender] entity from the “Freq” column. Note that the “Freq” column is automatically created when we run the
table
function to get the counts.
Note: Make sure you’ve run library(dplyr)
and library(tidyr)
first so you can use the pipe operator and the pivot_wider
function.
So elegant, isn’t it?
The pivot_wider
function is helpful when you want to turn your rows into columns. In this example, we get a lot of rows from running the table
function. (The # of rows in Table = # of unique Districts x # of unique Genders)
A sister function is the pivot_longer
function, which is useful for the opposite case – when you have a lot of columns that you want to turn into rows.