# template elements
# presentation

R Workshop

Module 4: Data visualization with R (1)

2018-04-11
Bobae Kang
(Bobae.Kang@illinois.gov)

Agenda

  • Part 1: The Grammar of Graphics
  • Part 2: Maps and Interactive Plots

The Grammar of Graphics

plot of chunk unnamed-chunk-1

Source: Wickham, H. (2010). “A Layered Grammar of Graphics.”

"The Grammar of Graphics"

“The grammar of graphics takes us beyond a limited set of charts (words) to an almost unlimited world of graphical forms (statements).
-Wilkinson, L. (2005), p.1”

Wilkinson's "grammar"

Wilkinson's idea is implemented in the Graphics Production Language (GPL) and has the following components:

  • Data
  • Transformation
  • Element
  • Scale
  • Guide
  • Coordinate system

ggplot2 package

plot of chunk unnamed-chunk-2

Source: tidyverse.com

Motivation

This article proposes an alternative parameterization of the [graphical] grammar, based around the idea of building up a graphic from multiple layers of data. The grammar differs from Wilkinson's in its arrangement of the components, the development of a hierarchy of defaults, and in that it is embedded inside another programming language.
-Wickham, H. (2010), p.4

Comparison

plot of chunk unnamed-chunk-3

Source: Wickham, H. (2010). “A Layered Grammar of Graphics.”

Basic components

  • Data and aesthetic mappings
  • Geometic objects
  • Labels

Data and aesthetic mappings

# data and aesthetics
ggplot(data, mapping = aes(x, y, ...))
  • data is a data frame object (or its variant)
  • mapping defines aesthetic mappings of data
    • input is an output of the aes() function
    • x and y are columns in data input to be mapped to the x-axis and y-axis

aes components

aes component Description Input
colour Border color Name ("red"), rgb specification (#FF0000), or NA
fill Fill color Name ("red"), rgb specification (#FF0000), or NA
shape Shape of a point An integer value 0 to 24, or NA
linetype Linetype An integer value 0 to 6 or a string
size Size of line/point A non-negative numeric value
alpha Transparency A numeric value 0 to 1


ggplot2 aesthetics

plot of chunk unnamed-chunk-6

Source: Figure 1.1 in Wilke, C. (n.d).Data Visualiation.


shape values

plot of chunk unnamed-chunk-7

Source: Tidyverse. (n.d.). “Aesthetic specifications”. ggplot2.tidyverse.org.


linetype values

plot of chunk unnamed-chunk-8

Source: Tidyverse. (n.d.). “Aesthetic specifications”. ggplot2.tidyverse.org.

Geometric objects

# adding one or more geometric objects
ggplot(data, aes(x, y, ...)) +
  geom_*()

# with geom_specific `aes`
ggplot(data) +
  geom_*(aes(x, y, ...))
  • There are many geometric objects, or geoms for different graph types:
  • Each geom can take its own mapping input
    • Default is inheriting mapping input from ggplot()
    • All aes specifications can be directly provided for each geom

Basic geoms

geom Description Input
geom_histogram Histograms Continous x
geom_bar Bar plot with frequncies Discrete x
geom_col Bar plot with values Discrete x and continuous y
geom_point Points/scattorplots Discrete/continuous x and y
geom_jitter Jittered points Discrete/continuous x and y
geom_line Line plots Discrete/continuous x and y
geom_abline Reference line intercept and slope value
geom_hline, geom_vline Reference lines xintercept or yintercept
# geom histogram example
data <- ispcrime %>% filter(year == 2015, county != "Cook")
ggplot(data, aes(violentCrime)) +
  geom_histogram()

plot of chunk unnamed-chunk-11

# geom col example
data <- ispcrime %>% filter(county == "Cook") %>% gather("type", "count", murder:aggAssault)
ggplot(data, aes(type, count, fill = type)) +
  geom_col(width = 0.8)

plot of chunk unnamed-chunk-12

# geom point example
data <- ispcrime %>% filter(county != "Cook") %>% left_join(regions)
ggplot(data, aes(violentCrime, propertyCrime, color = region)) +
  geom_point(aes(size = violentCrime + propertyCrime), alpha = .5)

plot of chunk unnamed-chunk-13

# geom line example
data <- ispcrime %>% filter(county == "Cook")
ggplot(data, aes(year, violentCrime)) +
  geom_line(color = "maroon", size = 1.5) +
  geom_hline(yintercept = mean(data$violentCrime), linetype = "longdash")

plot of chunk unnamed-chunk-14

Other geoms

geom Description Input
geom_density Smoothed density estimates Continous x
geom_density2d Contours of a 2-d density estimates Continous x
geom_boxplot Box plots Disc. x and cont. y
geom_smooth Smoothed conditional means
geom_text, geom_label Text
geom_polygon Polygons

Labels

# adding labels
plot + labs(title, subtitle, caption, x, y, ...)
  • Each argument of labs() take a character vector of length 1
    • title and subtitle appear at the top-left.
    • caption appears at the bottom-right
    • x and y are for x-axis and y-axis names
  • Adjusting the position and style of labels is handled via theme()

alternatively …

plot +
  xlab(label) +
  ylab(label) +
  ggtitle(label, subtitle = NULL)
  • Each argument of the labs() can be added with a separate function.
    • xlab() is for x-axis name
    • ylab() is for y-axis name
    • ggtitle() is for plot title and subtitle
# a generic example with title, subtitle, and axes names
plot +
  labs(
    title = "This is plot title", subtitle = "This is plot subtitle",
    x = "x-axis here", y = "y-axis here",
    caption = "(and caption...)"
  )

plot of chunk unnamed-chunk-19

# a title with mathematical expressions
plot +
  ggtitle(label = expression(paste("Another plot title with math expressions like ", pi, " and ", sigma^{2})))

plot of chunk unnamed-chunk-20

Additional components

  • Scales
  • Guides
  • Facets
  • Coordinate systems
  • Themes

Scales

  • Scales control “the details of how data values are translated to visual properties”
  • Scale limits
  • Position scales (discrete, continuous, datetime)
  • Others

Scale limits

plot +
  xlim(...) +
  ylim(...) +
  lims(...)
  • xlim() changes x-axis limits
  • ylim() changes y-axis limits
  • lims() is a general function to change limits
  • ... in xlim() and ylim() are numeric values to set lower and upper limit for the corresponding axis
  • ... in lims() is a name-value pair, where the name is an aesthetic and the value is either a length-2 numeric, a character, a factor, or a datetime
# limit x axis to 2000 at the top;
# this removes points with violentCrime > 2000
plot + xlim(NA, 2000)

plot of chunk unnamed-chunk-22

Position scales (continuous)

scale_x_continuous(..., expand = waiver(), trans = "identity", position = "bottom")
scale_y_continuous(..., expand = waiver(), trans = "identity", position = "left")

# shortcuts for common transformations
scale_x_log10(...)
scale_y_log10(...)

scale_x_sqrt(...)
scale_y_sqrt(...)

scale_x_reverse(...)
scale_y_reverse(...)

Common scale_* arguments

Argument Description
name a name of the scale, used as the axis label or the legend title
breaks controls the breaks in the guide, which can be a character vector
labels controls the lable for each break; its input must be the same length as breaks input
limits a character vector specifying the data range for the scale

Position scales (discrete)

scale_x_discrete(..., expand = waiver(), position = "bottom")
scale_y_discrete(..., expand = waiver(), position = "left")

Position scales (datetime)

scale_x_date(...)
scale_y_date(...)

scale_x_datetime(...)
scale_y_datetime(...)

scale_x_time(...)
scale_y_time(...)
# apply the log 10 scale to the y-axis 
plot + scale_y_log10()

plot of chunk unnamed-chunk-27

Custom scale “manuals”

scale_*_manual(name, breaks, labels, limits, ..., values)
  • Scale manuals is used to create my own discrete scale
  • “Manual” is available for:
    • colour
    • fill
    • size
    • shape
    • linetype
    • alpha
plot + scale_color_manual(
  name = "",
  breaks = c("Central", "Northern", "Southern"),
  labels = c("Central region", "Northern region", "Southern region"),
  values = c("#00ffff", "#ffff00", "#ff00ff")
)

plot of chunk unnamed-chunk-29

Other custom scales

ggplot2 offers many more functions to customize scales.

See the full documentation on scales here.

Guides

guides(...)
guide_legend(...)
guide_colourbar() # equivalent to guide_colorbar()
  • guides can be used to set (or remove) guides for each scale
  • guide_legend() can be used to specify the legend components for each visual properties (e.g. colour, size, alpha, etc.)
  • guide_colourbar() or guide_colorbar() controls the continous color bar
  • guide_legned() and guide_colourbar() can be used as an input for each scale argument in guide()
plot + guides(
  colour = guide_legend(title = "Region", title.position = "bottom"),
  size = FALSE
)

plot of chunk unnamed-chunk-31

Coordinate systems

plot + coord_cartesian()
  • The default system is coord_cartesian
    • Can be tweatked with: coord_fixed, coord_flip, coord_map and coord_trans
  • An alternative, polar coordiante system can be used with coord_polar
    • Most commonly used for creating a pie chart
# default plot
plot

plot of chunk unnamed-chunk-34

# with coord_flip()
plot + coord_flip()

plot of chunk unnamed-chunk-35

# pie chart with coord_polar()
ggplot(ispcrime %>% filter(county == "Cook") %>% gather("type", "count", murder:aggAssault), aes("", count, fill = type)) +
  geom_col(width = 1) +
  coord_polar("y")

plot of chunk unnamed-chunk-36

Facets

plot + facet_grid(facets, scales, ...)
plot + facet_wrap(facets, nrow, ncol, scales, ...)
  • A great way to visualize multi-dimensional data as a series of 2D graphes
  • facets input takes a “formula” according to which the faceting is applied

facet_grid vs facet_wrap

  • facet_grid() and facet_wrap() are mostly similar to each other
  • However, they differ where:
    • facet_grid() facets the plot with a variable in a single direction (horizontal or vertical)
    • facet_wrap() simply places the facets next to each other and wraps them accoridng to the provided number of columns and/or rows.

facet formulas

Type Formula Description
Grid facet_grid(. ~ x) Facet horizontally across x values
Grid facet_grid(y ~ .) Facet vertically across y values
Grid facet_grid(y ~ x) Facet 2-dimensionally
Wrap facet_wrap(~ x) Facet across x values
Wrap facet_wrap(~ x + y) Facet across x and y values
# facet_grid horizontal
plot + facet_grid(. ~ region)

plot of chunk unnamed-chunk-40

# facet_grid horizontal with free scales
plot + facet_grid(. ~ region, scales = "free")

plot of chunk unnamed-chunk-41

# facet_grid vertical
plot + facet_grid(year ~ .)

plot of chunk unnamed-chunk-42

# facet_grid two-dimensional
plot + facet_grid(year ~ region)

plot of chunk unnamed-chunk-43

# facet wrap
plot + facet_wrap(~ year)

plot of chunk unnamed-chunk-44

# facet wrap with specified nrow/ncol
plot + facet_wrap(~ year, ncol = 3)

plot of chunk unnamed-chunk-45

# facet wrap with multiple variables
plot + facet_wrap(~ year + region, ncol = 3)

plot of chunk unnamed-chunk-46

Themes

# themes
plot + theme_gray(base_size = 11, base_family = "")
  • ggplot2 offers several predefined themes
    • the default theme is theme_gray() (or theme_grey())
    • base_size controls the base font size
    • base_family controls the base font family (“serif”, “sans”, “mono”)
  • ggthemes pacakge offers additional predefined themes
plot + theme_gray() # this is the default

plot of chunk unnamed-chunk-49

plot + theme_bw()

plot of chunk unnamed-chunk-50

plot + theme_linedraw()

plot of chunk unnamed-chunk-51

plot + theme_light()

plot of chunk unnamed-chunk-52

plot + theme_dark()

plot of chunk unnamed-chunk-53

plot + theme_minimal()

plot of chunk unnamed-chunk-54

plot + theme_classic()

plot of chunk unnamed-chunk-55

plot + theme_void()

plot of chunk unnamed-chunk-56

plot + ggthemes::theme_economist()

plot of chunk unnamed-chunk-57

plot + ggthemes::theme_fivethirtyeight()

plot of chunk unnamed-chunk-58

plot + ggthemes::theme_hc()

plot of chunk unnamed-chunk-59

plot + ggthemes::theme_solarized()

plot of chunk unnamed-chunk-60

plot + theme(...)
  • theme has arguments to control and motify individual components of a plot theme:
    • all line, rectangular, text and title elements
    • aspect ratio of the panel
    • axis title, text, ticks, and lines
    • legend background, margin, text, title, position, and more
    • panel aspect ratio, border, and grid lines
    • and more
  • Read the full documentation here

ggplot2 resources

Questions?

plot of chunk unnamed-chunk-62

Source: tenor.com

References

  • Grolemund, G. & Wickham, H. (2017).“Data visualization”. R for Data Science
  • Tidyverse. (n.d.). “References”. ggplot2.tidyverse.org
  • Wickham, H. (2010). “A Layered Grammar of Graphics”. Journal of Computational and Graphical Statistics 19(1):3-28.
  • Wilkinson, L. (2005). The Grammar of Graphics.
  • Wilkinson, L., Rope, D., Carr, D. & Rubin, M. (2000). “The Language of Graphics”. Journal of Computational and Graphical Statistics 9(3):530-543.

plot of chunk unnamed-chunk-63

Source: Wikimedia.org