We start by loading the package
ggplot2
.
library(ggplot2)
ggplot2
ggplot2
is a plotting package that
makes it simple to create complex plots from data in a data frame. It
provides a more programmatic interface for specifying what variables to
plot, how they are displayed, and general visual properties. Therefore,
we only need minimal changes if the underlying data change or if we
decide to change from a bar plot to a scatter plot. This helps in
creating publication quality plots with minimal amounts of adjustments
and tweaking.
ggplot2
functions like data in the
‘long’ format, i.e., a column for every dimension, and a row for every
observation. Well-structured data will save you lots of time when making
figures with ggplot2
ggplot graphics are built step by step by adding new elements. Adding layers in this fashion allows for extensive flexibility and customization of plots.
To build a ggplot, we will use the following basic template that can be used for different types of plots:
ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()
ggplot()
function and bind the plot to a
specific data frame using the data
argumentggplot(data = gapminder)
aes
) function),
by selecting the variables to be plotted and specifying how to present
them in the graph, e.g. as x/y positions or characteristics such as
size, shape, color, etc.ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp))
ggplot2
offers many
different geoms; we will use some common ones today, including:
geom_point()
for scatter plots, dot plots, etc.geom_line()
for trend lines, time series, etc.geom_barplot()
for, well, boxplots!geom_boxplot()
distributions.To add a geom to the plot use the +
operator. Because we
have two continuous variables, let’s use geom_point()
first:
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp)) +
geom_point()
The +
in the ggplot2
package is particularly useful because it allows you to modify existing
ggplot
objects. This means you can easily set up plot
templates and conveniently explore different types of plots, so the
above plot can also be generated with code like this:
# Assign plot to a variable
gdp_exp_plot <- ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp))
# Draw the plot
gdp_exp_plot +
geom_point()
Notes
ggplot()
function can be seen
by any geom layers that you add (i.e., these are universal plot
settings). This includes the x- and y-axis mapping you set up in
aes()
.ggplot()
function.+
sign used to add new layers must be placed at the
end of the line containing the previous layer. If, instead, the
+
sign is added at the beginning of the line containing the
new layer, ggplot2
will not add the new
layer and will return an error message.# This is the correct syntax for adding layers
gdp_exp_plot +
geom_point()
# This will not add the new layer and will return an error message
gdp_exp_plot + geom_point()
Building plots with ggplot2
is
typically an iterative process. We start by defining the dataset we’ll
use, lay out the axes, and choose a geom:
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp)) +
geom_point()
Then, we start modifying this plot to extract more information from
it. For instance, we can add transparency (alpha
) to avoid
overplotting:
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp)) +
geom_point(alpha = 0.5)
We can also add colors for all the points:
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp)) +
geom_point(alpha = 0.5, color = "blue")
Or to color each continent in the plot differently, you could use a
vector as an input to the argument color.
ggplot2
will provide a different color
corresponding to different values in the vector. Here is an example
where we color with continent
:
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp, color = continent)) +
geom_point(alpha = 0.5)
Notice that we can change the geom layer and colors will be still
determined by continent
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp, color = continent)) +
geom_jitter(alpha = 0.5)
To make our plot more readable, we can add axis labels:
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp, color = continent)) +
geom_point(alpha = 0.5) +
labs(x = "GDP per Capita",
y = "Life Expectancy")
Challenge
Use what you just learned to create a scatter plot of population over life expectancy with the continents showing in different colors. Make sure to give your plot relevant axis labels.
ggplot(data = gapminder, aes(x = pop, y = life_exp, color = continent)) +
geom_point() +
labs(x = "Population",
y = "Life Expectancy")
ggplot2
has a special technique called
faceting that allows the user to split one plot into multiple
plots based on a factor included in the dataset. We will use it to split
our plot into five panels, one for each continent.
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp, color = continent)) +
geom_point() +
labs(x = "GDP per Capita",
y = "Life Expectancy") +
facet_grid(. ~ continent)
We can also experiment with stacking the facets vertically, rather
than horizontally. The facet_grid
geometry allows you to
explicitly specify how you want your plots to be arranged via formula
notation (rows ~ columns
; a .
can be used as a
placeholder that indicates only one row or column).
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp, color = continent)) +
geom_point() +
labs(x = "GDP per Capita",
y = "Life Expectancy") +
facet_grid(continent ~ .)
Usually plots with white background look more readable when printed.
We can set the background to white using the function
theme_bw()
. Additionally, you can remove the grid:
ggplot(data = gapminder, aes(x = gdp_per_cap, y = life_exp, color = continent)) +
geom_point() +
labs(x = "GDP per Capita",
y = "Life Expectancy") +
facet_grid(continent ~ .) +
theme_bw() +
theme(panel.grid = element_blank())
Let’s try to plot the life expectancy of each country over each year.
ggplot(data = gapminder, aes(x = year, y = life_exp, color = continent)) +
geom_line()
This doesn’t look right at all. This is giving us a line per
continent, but we really want the lines to be per country, but colored
by the continent… there is a parameter of aes()
called
group
.
ggplot(data = gapminder, aes(x = year, y = life_exp, color = continent, group = country)) +
geom_line()
We can create barplots using the geom_bar
geom. Let’s
make a barplot showing the number of data points per continent.
ggplot(data = gapminder, aes(x = continent)) + geom_bar()
Notice, this looks a lot like our plot way back in episode 2!
We can create boxplots using the geom_boxplot
geom.
Let’s look at the boxplot of population split per continent.
ggplot(data = gapminder, aes(x = continent, y = pop)) + geom_boxplot()
This doesn’t look great because the populations of countries have
such a large range. We an fix this by changing the scale of the y-axis
to a logarithmic scale using the scale_y_log10()
function.
We may also want to change the axis labels to make it clear to the
reader that this was done:
ggplot(data = gapminder, aes(x = continent, y = pop)) +
geom_boxplot() +
scale_y_log10() +
labs(
x = 'Continent',
y = 'Population (log10 scale)'
)
Challenge
How can we add the individual data points to the boxplot?
ggplot(data = gapminder, aes(x = continent, y = pop)) +
geom_boxplot() +
scale_y_log10() +
geom_jitter() +
labs(
x = 'Continent',
y = 'Population (log10 scale)'
)
ggplot2
themesIn addition to theme_bw()
, which changes the plot
background to white, ggplot2
comes with
several other themes which can be useful to quickly change the look of
your visualization. The complete list of themes is available at https://ggplot2.tidyverse.org/reference/ggtheme.html.
theme_minimal()
and theme_light()
are popular,
and theme_void()
can be useful as a starting point to
create a new hand-crafted theme.
The ggthemes
package provides a wide variety of options (including an Excel 2003
theme). The ggplot2
extensions website provides a list of packages that extend the
capabilities of ggplot2
, including
additional themes.
Challenge
With all of this information in hand, please take another five minutes to either improve one of the plots generated in this exercise or create a beautiful graph of your own. Use the RStudio
ggplot2
cheat sheet for inspiration. Here are some ideas:
- See if you can change the size or shape of the plotting symbol.
- Can you find a way to change the name of the legend? What about its labels?
- Try using a different color palette (see http://www.cookbook-r.com/Graphs/Colors_(ggplot2)/).