9 Customization
You now have a number of different graphs in your toolkit to use when visualizing different types of relationships or comparisons in your data. You also learned a lot of ways to tinker with the geoms and their aesthetics. Now it is time to add some techniques and skills to customize the other elements of the graphs.
This will serve as a base plot to customize:
base_plot = penguins %>%
ggplot(aes(y = body_mass_g, x = island)) +
geom_jitter(height = 0, width = 0.1,
size = 2, alpha = 0.3,
aes(color=island)) +
stat_summary(fun.data = "mean_se",
geom = "pointrange",
color = "black",
aes(fill = island),
shape = 21,
size = 0.5)
base_plot
Saving the visualization to a variable prevents the need to copy/paste the entire code each time.
9.1 Scaling The Axes
By default, ggplot will use axes settings that it thinks best highlight the data being visualized. Here, the y-axis runs from ~2000 - 7000 and has intervals of 1000 starting at 3000.
This can be changed using a scale_*_*
function. The first *
is replaced by which axis you want to modify, and the second with what type of data or scale you want it to have. Here, the y-axis is continuous, so scale_y_continuous()
would be used. Its arguments can be used to modify what is shown on the axis. There are two reasons why you might want to change this.
Generally speaking, it is good practice to have your axes start at 0 and show the full scale. It can be misleading when parts of a scale are obscured and removed.
You might have an issue with points being jittered past the scale and result in the Warning message covered in previous lessons.
base_plot +
scale_y_continuous(breaks = seq(from = 0, to = 7000, by = 500),
labels = seq(from = 0, to = 7000, by = 500))
Hm. So it DID change the intervals from 1000 to 500, but it still only shows values from 3000 to 6000. This is because the actual range of the axis was not changed. Remember that all graphs are plotted with (x,y) cartesian coordinates. With this in mind, what needs to be modified are the limits of that coordinate system. This can be done with coord_cartesian()
.
base_plot +
coord_cartesian(ylim = c(0,7000)) +
scale_y_continuous(breaks = seq(from = 0, to = 7000, by = 500),
labels = seq(from = 0, to = 7000, by = 500))
Changing the limits of the coordinate system changes the zoom of your visualization and not the underlying data.
There are a number of other scale commands that override the default, such as scale_y_log10()
, scale_y_sqrt()
, and scale_y_reverse()
, to quickly and easily change the scale. Instead of having to completely transform all your data, just adding a transform to the axis in your plot can be a big time saver and much safer!
9.1.1 Get Rid Of Padding
Even though the y-axis starts at 0 now, there is a bit of space between the actual start of the graph and the 0. By default there will be a bit of padding. This can be particularly egregious when you have bars:
penguins %>%
ggplot(aes(y = body_mass_g, x = island)) +
stat_summary(fun = "mean", geom = "bar")
This can be eliminated with the expand
argument.
base_plot +
coord_cartesian(ylim = c(0,7000)) +
scale_y_continuous(breaks = seq(from = 0, to = 7000, by = 500),
labels = seq(from = 0, to = 7000, by = 500),
expand = c(0,0))
9.3 Themes
The theme of a visualization is where most of the nuanced customization will take place.
First, all the tweaks made thus far will be saved, again to prevent the need from more copy/pasting:
Again, to prevent the need for more copy/pasting and long code strings, all subsequent changes will just be added and saved to one variable. The full code will be displayed at the end.
modified_plot = base_plot +
coord_cartesian(ylim = c(0,7000)) +
scale_y_continuous(breaks = seq(from = 0, to = 7000, by = 500),
labels = seq(from = 0, to = 7000, by = 500),
expand = c(0,0)) +
labs(title = "How Does Penguin Weight Vary Across Islands?",
subtitle = "Islands Contain Different Penguin Species",
y = "Body Mass (grams)",
x = "Island",
tag = "Figure 1",
caption = "Note: Error bars represent +/- 1 SEM")
ggplot comes with a number of built in themes that have preset settings. Among the most common are theme_minimal()
, theme_bw()
, and theme_classic()
.
modified_plot +
theme_minimal()
modified_plot +
theme_bw()
modified_plot +
theme_classic()
You should start to develop your own personal taste for which visualizations you find the most pleasing and informative. For the sake of illustration, subsequent figures will use theme_classic()
.
While using one of ggplot’s built in themes serves as a good starting point, to make manual changes to the theme’s settings, you use the theme()
function. This function works as follows, generally speaking, you first specify the specific element of the visualization you want to modify, set that equal to what type of element it is with an element_*()
call (e.g., text, line, blank, etc.), and then specify what part of that element you want to change and what value it should be equal to.
For example, if you wanted to change the title so the text had a size 18 font, you would add the following to your code:
theme(plot.title = element_text(size = 18))
You do not have to, and should not, commit to memory all the different visualization element names and arguments to modify! In fact, just check out this handy cheat sheet!
A few of the most common elements and ways in which you may want to modify them are highlighted below. These should be pretty intuitive, so there will not be a ton of explanation.
9.3.1 Titles
Changing some of the the size, font face (by making some bold), and color, of text in your visualization will help improve its overall legibility.
modified_plot = modified_plot +
theme_classic() +
theme(plot.title = element_text(size = 18),
plot.subtitle = element_text(size = 10,
color = "grey50",
face = "bold"),
plot.tag = element_text(face = "bold"))
modified_plot
By default, most text in ggplot2 is left-aligned. To change the location of the text, you need to adjust its vertical or horizontal location in the coordinate system, using vjust
or hjust
, respectively. When adjusting:
- 0 == left-aligned
- 0.5 == centered
- 1 == right-aligned
modified_plot = modified_plot +
theme(plot.title = element_text(size = 18,
hjust = 0.5,
face = "bold"),
plot.subtitle = element_text(size = 10,
color = "grey50",
hjust = 0.5),
plot.tag = element_text(face = "bold"))
modified_plot
9.3.3 Axes Titles
When there are multiple elements in your visualization (e.g., multiple axis titles), you can either change both simultaneously (e.g., using axis.title
) or specify each individually (e.g., using axis.title.x
and axis.title.y
).
modified_plot +
theme(axis.title.x = element_text(size = 14,
face = "bold"),
axis.title.y = element_text(size = 14,
face = "bold"))
As the text gets bigger, the axes titles seem too close to the tick mark values. There needs to be some breathing room to keep the different elements visually distinct and easy to parse. Otherwise, your visualization will be in trouble This can be achieved by modifying the text’s margins – the area to the top, right, bottom, and left of it.
modified_plot = modified_plot +
theme(plot.subtitle = element_text(size = 10,
color = "grey50",
hjust = 0.5,
margin = margin(b = 0.2,
unit = "inch")),
axis.title.x = element_text(size = 16,
face = "bold",
margin = margin(t = 0.2,
unit = "inch")),
axis.title.y = element_text(size = 16,
face = "bold",
margin = margin(r = 0.2,
unit = "inch")))
modified_plot
Notice that the margins are in reference to the overall orientation of the graph. To add space between the y-axis title and the values, even though it is the “bottom” of the element, it is to the right of the overall graph. Thus, some “r” space is added in margin()
.
9.3.4 Axes Values
You saw previously how the actual values on the axes could be modified with a scale_*_*
function. The appearance of those values can also be customized. The values on the axis, as well as the tick marks and lines themselves!
modified_plot = modified_plot +
theme(axis.line.x = element_line(color = "gray40", size = 0.5),
axis.ticks = element_line(color = "gray40", size = 0.5),
axis.text = element_text(face = "bold",
color = "black",
size = 12))
modified_plot
Note: It is generally not recommended to use different colors for your axis lines or tick marks, only using black to make them darker/more consistent and punchy. So, while you can technically set them to be any color, avoid using anything but black.
Now THAT is looking like a publication-worthy visualization!
9.3.5 Setting a Default
So you have gone through and learned all these ways to customize your visualizations. You may have started to develop a sense of what you think looks best generally – e.g., particular font sizes, spacing, etc. While individual visualizations will of course need some tweaks (like the correct titles), you may have some settings you think can serve as a good baseline.
Instead of having to include all the same theme()
code for every single visualization you make:
theme(plot.title = element_text(size = 18,
face = "bold",
hjust = 0.5),
plot.subtitle = element_text(size = 10,
color = "grey50",
face = "bold",
hjust = 0.5,
margin = margin(b = 0.2,
unit = "inch")),
plot.tag = element_text(face = "bold"),
plot.caption = element_text(hjust = 0,
face = 'italic'),
axis.title.x = element_text(size = 16,
face = "bold",
margin = margin(t = 0.2,
unit = "inch")),
axis.title.y = element_text(size = 16,
face = "bold",
margin = margin(r = 0.2,
unit = "inch")),
axis.line.x = element_line(color = "gray40", size = 0.5),
axis.ticks = element_line(color = "gray40", size = 0.5),
axis.text = element_text(face = "bold",
color = "black",
size = 12))
You can just set a default that will automatically be applied to all visualizations you make with ggplot until you restart your R session. You can do this with theme_set()
theme_set(theme_classic() +
theme(plot.title = element_text(size = 18,
face = "bold",
hjust = 0.5),
plot.subtitle = element_text(size = 10,
color = "grey50",
face = "bold",
hjust = 0.5,
margin = margin(b = 0.2,
unit = "inch")),
plot.tag = element_text(face = "bold"),
plot.caption = element_text(hjust = 0,
face = 'italic'),
axis.title.x = element_text(size = 16,
face = "bold",
margin = margin(t = 0.2,
unit = "inch")),
axis.title.y = element_text(size = 16,
face = "bold",
margin = margin(r = 0.2,
unit = "inch")),
axis.line.x = element_line(color = "gray40", size = 0.5),
axis.ticks = element_line(color = "gray40", size = 0.5),
axis.text = element_text(face = "bold",
color = "black",
size = 12)))
Now all new visualizations will automatically have these settings applied by default!
penguins %>%
ggplot(aes(y = body_mass_g, x = island)) +
geom_violin(aes(fill = island),
alpha = 0.2) +
geom_jitter(alpha = 0.4, width = 0.05) +
stat_summary(fun.data = "mean_se",
geom = "pointrange",
aes(color = island),
size = 0.5)
9.3.6 Legends
Modifying the visualization’s legend can be a bit tricky because there are a lot of changes you can make. One of the first options is to get rid of it altogether, since your graphs should not rely on a legend for comprehension:
modified_plot +
theme(legend.position = "none")
However, in certain cases, particularly if you are displaying information from another variable (e.g., indicating penguin sex
by point shape
), a legend is needed and customization may be useful.
penguins %>%
drop_na() %>%
ggplot(aes(y = body_mass_g, x = island)) +
geom_point(size = 2, alpha = 0.3,
shape = 21,
aes(fill = sex),
position = position_jitterdodge()) +
stat_summary(fun.data = "mean_se",
geom = "pointrange",
color = "black",
aes(fill = sex),
shape = 21,
size = 0.5,
position = position_dodge(width = 0.75)) +
coord_cartesian(ylim = c(0,7000)) +
scale_y_continuous(breaks = seq(from = 0, to = 7000, by = 500),
labels = seq(from = 0, to = 7000, by = 500),
expand = c(0,0)) +
labs(title = "How Does Penguin Weight Vary Across Islands?",
subtitle = "Islands Contain Different Penguin Species",
y = "Body Mass (grams)",
x = "Island",
tag = "Figure 1",
caption = "Note: Error bars represent +/- 1 SEM",
# New code starts here:
fill = "Sex" # Change Legend label with the name of the mapped aesthetic.
) +
theme(legend.box.background = element_rect(color = "black", # Add a defined box around legend
size = 2),
legend.title = element_text(face = "bold"), # Modify the title
legend.text = element_text(face = "bold"), # Modify text
legend.title.align = 0.5, # Change alignment of legend title. Think of this like an hjust.
legend.key = element_rect(fill = "white", colour = "black") # Add defined boxes around legend key graphic
)
9.4 Combining Graphs
With these newly acquired data visualization skills, you will likely be creating more than a single visualization at a time. To combine and arrange multiple graphs into a single figure, you can use the patchwork
package!
Artwork by Horst (2022)
Take the following 3 graphs for example:
a = penguins %>%
ggplot(aes(y = body_mass_g, x = island)) +
geom_jitter(height = 0, width = 0.1,
size = 2, alpha = 0.3,
aes(color=island)) +
stat_summary(fun.data = "mean_se",
geom = "pointrange",
color = "black",
aes(fill = island),
shape = 21,
size = 0.5)
a
b = penguins %>%
ggplot(aes(y = body_mass_g, x = species)) +
geom_jitter(height = 0, width = 0.1,
size = 2, alpha = 0.3,
aes(color=species)) +
stat_summary(fun.data = "mean_se",
geom = "pointrange",
color = "black",
aes(fill = species),
shape = 21,
size = 0.5)
b
c = penguins %>%
drop_na() %>%
ggplot(aes(y = body_mass_g, x = sex)) +
geom_point(size = 2, alpha = 0.3,
shape = 21,
aes(fill = sex),
position = position_jitter(width = 0.1)) +
stat_summary(fun.data = "mean_se",
geom = "pointrange",
color = "black",
aes(fill = sex),
shape = 21,
size = 0.5)
c
Patchwork uses a pretty intuitive system to create different layouts when combining graphs by using the +
, /
, and |
operators.
+
combines graphs using the default layout (creating as square a grid as possible)
a + b
/
will stack graphs on top of each other
b / c
|
places graphs next to each other and can nest arrangements created with the other operators. For up to 3 graphs the result from using |
is identical to +
. Using |
best communicates the intended layout.
(c | (a / b))
Note: The format settings defined for a single graph may not be best for an arrangement of multiple graphs combined into a single visualization. In the visualizations above, the axes titles and labels are a little too big, causing some of them to overlap and be unreadable.
9.5 Saving Graphs
Once you have created your masterpieces, you may want to export them so they can be added to manuscripts created outside of R, posters, or anything else. To save your ggplot graphs, you can use ggsave()
. You have several file type options, but I would recommend pdf. It generally maintains a high resolution AND aspect ratio. In ggsave()
, you must specify:
- the
filename
you want your graph to be saved with (which can/should include a relative file path) - which
plot
you want to be saving - the
width
andheight
of the file
ggsave(plot = a,
filename = "visualizations/test_plot.pdf",
width = 12,
height = 8)