+ - 0:00:00
Notes for current slide
Notes for next slide

ggplot2, data visualization

Mikhail Dozmorov

Virginia Commonwealth University

09-20-2020

1 / 47

Why visualize data?

  • Anscombe's quartet comprises four datasets that have nearly identical simple descriptive statistics, yet appear very different when graphed. (See Wikipedia link below)

  • 11 observations (x, y) per group

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

2 / 47

Why visualize data?

  • Four groups

  • 11 observations (x, y) per group

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

3 / 47

Why visualized data?

https://github.com/stephlocke/datasauRus

4 / 47

Evolution of R graphics

  • Base graphics

  • Trellis plots

    • lattice package
    • Better design principles, choice of colors, symbol shapes, line styles
    • Known for distinct multi-panel layout
  • ggplot2 package

    • Implements Grammar of Graphics, developed by Leland Wilkinson
5 / 47

ggplot2 - the grammar of graphics

6 / 47

ggplot2 package

  • ggplot2 is a widely used R package that extends R's visualization capabilities. It takes the hassle out of things like creating legends, mapping other variables to scales like color, or faceting plots

  • Where does the "gg" in ggplot2 come from? The ggplot2 package provides an R implementation of Leland Wilkinson's Grammar of Graphics (1999)

    • The Grammar of Graphics allows you to think beyond the garden variety plot types (e.g. scatterplot, barplot) and the consider the components that make up a plot or graphic, such as how data are represented on the plot (as lines, points, etc.), how variables are mapped to coordinates or plotting shape or color, what transformation or statistical summary is required, and so on

https://ggplot2.tidyverse.org/

7 / 47

The basics of ggplot2 graphics

Specifically, ggplot2 allows you to build a plot layer-by-layer by specifying:

  • aesthetics that map variables in the data to axes on the plot or to plotting size, shape, color, etc.,

  • a geom, which specifies how the data are represented on the plot (points, lines, bars, etc.),

  • a stat, a statistical transformation or summary of the data applied prior to plotting,

  • facets, which we've already seen above, that allow the data to be divided into chunks on the basis of other categorical or continuous variables and the same plot drawn for each chunk.

8 / 47

The basics of ggplot2 graphics

  • Data mapped to graphical elements
  • Add graphical layers and transformations
  • Commands are chained with "+" sign
Object Description
Data The raw data that you want to plot
Aethetics aes() How to map your data on x, y axis, color, size, shape (aesthetics)
Geometries geom_ The geometric shapes that will represent the data
data +
aesthetic mappings of data to plot coordinates +
geometry to represent the data
9 / 47

Basic ggplot2 syntax

Specify data, aesthetics and geometric shapes

ggplot(data, aes(x=, y=, color=, shape=, size=, fill=)) +
geom_point(), or geom_histogram(), or geom_boxplot(), etc.

  • This combination is very effective for exploratory graphs.

  • The data must be a data frame in a long (not wide) format

  • The aes() function maps columns of the data frame to aesthetic properties of geometric shapes to be plotted.

  • ggplot() defines the plot; the geoms show the data; layers are added with +

10 / 47

Examples of ggplot2 graphics

diamonds %>% filter(cut == "Good", color == "E") %>%
ggplot(aes(x = price, y = carat)) +
geom_point() # aes(size = price) +

Try other geoms

geom_smooth() # method = lm
geom_line()
geom_boxplot()
geom_bar(stat="identity")
geom_histogram()
11 / 47

Moving beyond ggplot + geoms

Customizing scales

  • Scales control the mapping from data to aesthetics and provide tools to read the plot (ie, axes and legends).

  • Every aesthetic has a default scale. To add or modify a scale, use a scale function.

  • All scale functions have a common naming scheme: scale _ name of aesthetic _ name of scale

  • Examples: scale_y_continuous, scale_color_discrete, scale_fill_manual

12 / 47

ggplot2 example - update scale for y-axis

ggplot(iris, aes(x = Petal.Width, y = Sepal.Width,
color=Species)) + geom_point() +
scale_y_continuous(limits=c(0,5), breaks=seq(0,5,0.5))

13 / 47

ggplot2 example - update scale for color

ggplot(iris, aes(x = Petal.Width, y = Sepal.Width,
color=Species)) + geom_point() +
scale_color_manual(name="Iris Species",
values=c("red","blue","black"))

14 / 47

Split plots

  • Sometimes, one needs to create separate plots of subsets of data. These are called facets in ggplot2. Think par(mfrow=c(2,2)) analogy

  • Use facet_wrap() if you want to facet by one variable and have ggplot2 control the layout. Think 1D ribbon wrapped into 2D. Example:

    • facet_wrap( ~ var)
  • Use facet_grid() if you want to facet by one and/or two variables and control layout yourself. Think 2D grid. Examples:
    • facet_grid(. ~ var1) - facets in columns
    • facet_grid(var1 ~ .) - facets in rows
    • facet_grid(var1 ~ var2) - facets in rows and columns
15 / 47

ggplot2 example - facet_wrap

Note free x scales

ggplot(iris, aes(x = Petal.Width, y = Sepal.Width)) +
geom_point() + geom_smooth(method="lm") +
facet_wrap(~ Species, scales = "free_x")

16 / 47

gridExtra R package for more custom plot arrangement

library(gridExtra)
p1 <- iris %>% filter(Species == "setosa") %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width)) + geom_smooth()
p2 <- iris %>% filter(Species == "versicolor") %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width)) + geom_smooth()
grid.arrange(p1, p2, ncol = 2)

17 / 47

patchwork for simple plot arrangement

library(patchwork)
p1 <- iris %>% filter(Species == "setosa") %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width)) + geom_smooth()
p2 <- iris %>% filter(Species == "versicolor") %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width)) + geom_smooth()
p3 <- iris %>% filter(Species == "virginica") %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width)) + geom_smooth()
p1 + p2 + p3

https://patchwork.data-imaginist.com/

18 / 47

patchwork for simple plot arrangement

library(patchwork)
p1 <- iris %>% filter(Species == "setosa") %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width)) + geom_smooth()
p2 <- iris %>% filter(Species == "versicolor") %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width)) + geom_smooth()
p3 <- iris %>% filter(Species == "virginica") %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width)) + geom_smooth()
(p1 | p2) / p3

https://patchwork.data-imaginist.com/

19 / 47

stat functions

  • All geoms perform a default statistical transformation.

  • For example, geom_histogram() bins the data before plotting. geom_smooth() fits a line through the data according to a specified method.

  • In some cases the transformation is the "identity", which just means plot the raw data. For example, geom_point()

  • These transformations are done by stat functions. The naming scheme is stat_ followed by the name of the transformation. For example, stat_bin, stat_smooth, stat_boxplot

  • Every geom has a default stat, every stat has a default geom.

20 / 47

Example of stat="identity"

# ToothGrowth describes the effect of Vitamin C on Tooth growth in Guinea pigs
df <- data.frame(dose = c("D0.5", "D1", "D2"),
len = c(4.2, 10, 29.5))
ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity")

21 / 47

Rotating plots

# Horizontal bar plot
ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity") +
coord_flip()

ggplot2 barplots : Quick start guide - R software and data visualization

22 / 47

Update themes and labels

  • The default ggplot2 theme is excellent. It follows the advice of several landmark papers regarding statistics and visual perception. (Wickham 2009, p. 141)

  • However you can change the theme using ggplot2's themeing system. To date, there are seven built-in themes: theme_gray (default), theme_bw, theme_linedraw, theme_light, theme_dark, theme_minimal, theme_classic

  • Explore the cowplot R package by Claus Wilke, and its themes theme_cowplot(), theme_half_open(), theme_minimal_grid(), etc.

  • You can also update axis labels and titles using the labs function

https://wilkelab.org/cowplot/index.html

23 / 47

ggplot2 example - update labels

ggplot(iris, aes(x = Petal.Width, y = Sepal.Width,
color=Species)) + geom_point() +
labs(title="Sepal vs. Petal",
x="Petal Width (cm)", y="Sepal Width (cm)")

24 / 47

ggplot2 example - change theme

ggplot(iris, aes(x = Petal.Width, y = Sepal.Width,
shape=Species)) + geom_point() +
theme_bw()

25 / 47

cowplot - publication-quality plots

library(cowplot)
ggplot(iris, aes(x = Petal.Width, y = Sepal.Width,
shape=Species, color = Species)) + geom_point() +
theme_cowplot()

https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html

26 / 47

cowplot - publication-quality plots

library(cowplot)
ggplot(iris, aes(x = Petal.Width, y = Sepal.Width,
shape=Species, color = Species)) + geom_point() +
theme_minimal_grid()

https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html

27 / 47

cowplot - publication-quality plots

library(cowplot)
p1 <- ggplot(mtcars, aes(disp, mpg)) + geom_point()
p2 <- ggplot(mtcars, aes(qsec, mpg)) + geom_point()
plot_grid(p1, p2, labels = c('A', 'B'), label_size = 12)

28 / 47

Barplot

data(mpg)
ggplot(mpg, aes(x = class)) + geom_bar()

https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html

29 / 47

Barplot

class_agg <- data.frame(table(mpg$class))
names(class_agg) <- c("class", "count")
ggplot(class_agg, aes(x = class, y = count)) +
geom_bar(aes(fill = class), stat = "identity")

30 / 47

Horizontal Barplot

class_agg <- data.frame(table(mpg$class))
names(class_agg) <- c("class", "count")
ggplot(class_agg, aes(x = count, y = class)) +
geom_bar(aes(fill = class), stat = "identity")

Map data directly to the proper axes. Previously, coord_flip() was used.

31 / 47

Reorder levels using forcats

library(forcats)
class_agg <- data.frame(table(mpg$class))
names(class_agg) <- c("class", "count")
ggplot(class_agg, aes(x = count, y = fct_reorder(class, count))) +
geom_bar(aes(fill = class), stat = "identity")

https://forcats.tidyverse.org/reference/index.html

32 / 47

Density plot

ggplot(mpg, aes(x = hwy)) + geom_density()

33 / 47

Histogram

ggplot(mpg, aes(x = hwy)) +
geom_histogram() +
geom_density(aes(y=2 * ..count..))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

34 / 47

Stacked histogram

ggplot(mpg, aes(x = hwy, fill = class)) + geom_histogram(position = "stack")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

35 / 47

Side-by-side histogram

ggplot(mpg, aes(x = hwy, fill = class)) + geom_histogram(position = "dodge")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

36 / 47

Proportions

# useful for assessing percentages
ggplot(mpg, aes(x = hwy, fill = class)) + geom_histogram(position = "fill")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 35 rows containing missing values (geom_bar).

37 / 47

Smoothing

ggplot(iris, aes(x = Sepal.Width, y = Petal.Width)) +
geom_smooth() +
geom_smooth(method = "lm", color = "seagreen", se=FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

38 / 47

Additional aesthetic mapping

ggplot(iris, aes(x = Sepal.Width, y = Petal.Width)) +
geom_point(aes(color = Species))

39 / 47

Summary: Fine tuning ggplot2 graphics

Parameter Description
Facets facet_ Split one plot into multiple plots based on a grouping variable
Scales scale_ Maps between the data ranges and the dimensions of the plot
Visual Themes theme The overall visual defaults of a plot: background, grids, axe, default typeface, sizes, colors, etc.
Statistical transformations stat_ Statistical summaries of the data that can be plotted, such as quantiles, fitted curves (loess, linear models, etc.), sums etc.
Coordinate systems coord_ Expressing coordinates in a system other than Cartesian
40 / 47

Putting it all together

diamonds %>% # Start with the 'diamonds' dataset
filter(cut == "Ideal") %>% # Then, filter rows where cut == Ideal
ggplot(aes(price)) + # Then, plot using ggplot
geom_histogram() + # and plot histograms
facet_wrap(~ color) + # in a 'small multiple' plot, broken out by 'color'
ggtitle("Diamond price distribution per color") +
labs(x="Price", y="Count") +
theme(panel.background = element_rect(fill="lightblue")) +
theme(plot.title = element_text(family="Trebuchet MS", size=28, face="bold", hjust=0, color="#777777")) +
theme(axis.title.y = element_text(angle=0)) +
theme(panel.grid.minor = element_blank())
41 / 47

Saving ggplot2 plots

  • pdf() (or any other graphical device, e.g., jpeg, png, svg) and dev.off() works

  • ggsave() saves the ggplot object (or, the latest plot) into a file. File extension defines the graphical device

p <- ggplot(mpg, aes(x = class)) + geom_bar()
ggsave(filename = "test.jpg", plot = p, width = 7, height = 10, units = c("in"), dpi = 300)
42 / 47

Plotly - interactive ggplots. plot.ly/ggplot2/

suppressMessages(library(plotly))
p <- ggplot(iris, aes(x = Sepal.Width, y = Petal.Width)) + geom_point(aes(color = Species))
pp <- ggplotly(p)
pp
# Save with
# htmlwidgets::saveWidget(pp, "test_plotly.html")

Plotly

43 / 47

Interactive heatmaps

# BiocManager::install("talgalili/heatmaply")
suppressMessages(library(heatmaply))
suppressMessages(library(RColorBrewer))
heatmaply(scale(mtcars), colors = colorRampPalette(rev(brewer.pal(n = 7, name = "RdYlBu")))(100))

https://github.com/talgalili/heatmaply

44 / 47

ggplots in a loop

p1 <- ggplot(mtcars, aes(disp, mpg)) + geom_point()
p2 <- ggplot(mtcars, aes(qsec, mpg)) + geom_point()
x <- list(p1, p2)
lapply(x, print)

## [[1]]

##
## [[2]]

# for (i in 1:length(x)) {
# print(x[i])
# }
45 / 47

Graphic editors

  • Inkscape - vector graphics editor. Works with Scalable Vector Graphics (SVG) format. Export in any format, at any resolution.

    • Note svg graphic device in R. ggsave() also saves graphs in svg format
  • GIMP - raster graphics editor. Think Photoshop.

46 / 47

But wait... There's more

  • gganimate - animated ggplots
  • ggridges - ridgeline plots
  • ggrepel - nonoverlapping text labels
  • GGally - ggplot2 extension with pairwise plot, scatterplot, parallel coordinates plot, survival plot, network plots, and more.
  • Awesome ggplot2
47 / 47

Why visualize data?

  • Anscombe's quartet comprises four datasets that have nearly identical simple descriptive statistics, yet appear very different when graphed. (See Wikipedia link below)

  • 11 observations (x, y) per group

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

2 / 47
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow