This page contains the notes for the second part of R Workshop Module 4: Data visualization with R, which is part of the R Workshop series prepared by ICJIA Research Analyst Bobae Kang to enable and encourage ICJIA researchers to take advantage of R, a statistical programming language that is one of the most powerful modern research tools.
Click here to go to the workshop home page.
Click here to go to the workshop Modules page.
Click here to view the accompanying slides for Module 4, Part 2.
Navigate to the other workshop materials:
In the previous part, we studied the basic plotting with ggplot2
. In this part, we will explore some options for creating maps and interactive plots.
A lot of criminal justice datasets are geospatial, i.e., they are often collected on a certain geographical unit. In cases like this, maps offer a excellent way to visualize geographical distribution of certin attributes and trends.
In this section, we begin with learning one of the most popular geospatial data format called shapefile. Then we will touch on how to import shapefile data into R using rgdal
pakage, an R interface to Geospatial Data Abstraction Library (GDAL) for reading and writing geospatial data formats. Then we will take a quick look at spatial data types in R.
In practice, a lot of geosaptial datasets we work with come in the shapefile format. So, what is shapefile? The following two quotes offer brief answers to the question:
“A shapefile is a simple, nontopological format for storing the geometric location and attribute information of geographic features. Geographic features in a shapefile can be represented by points, lines, or polygons (areas).”
- “What is a shapefile?”, Esri
“The shapefile format is a popular geospatial vector data format for geographic information system (GIS) software […] developed and regulated by Esri […]. The shapefile format can spatially describe vector features: points, lines, and polygons […]. Each item usually has attributes that describe it, such as name or temperature.” - “Shapefile”, Wikipedia
A shapefile format in fact consists of a collection of files. The following are commonly included when using shapefile data in R:
File extension | Description |
---|---|
.shp |
The main file that stores the feature geometry; required. |
.shx |
The index file that stores the index of the feature geometry; required. |
.dbf |
The dBASE table that stores the attribute information of features; required. |
.prj |
The file that stores the coordinate system information; used by ArcGIS. |
library(rgdal)
spatial_object <- readORG(dsn, layer)
# example:
# il_counties <- read(dsn = "shapefiles", layer = "il_counties")
In practice, mapping of any kind starts with importing an existing shapefile into the R environment. The rgdal
package offers the readORG()
function for this work.
dsn
is the path to the directory with a shapefile to import, and layer
is the name of a shapefile to import. The output is a spatial vector object.
There are multiple spatial vector object types provided by the sp
package.
Spatial*
classes are those without attributes other than spatial ones (coordinates, lines, etc.). There are SpatialPoints
, SpatialMultiPoints
, SpatialPixels
, SpatialGrid
, `SpatialLines
, and SpatialPolygons
.
Spatial*DataFrame
classes are those with additional attributes in a data.frame
table, which can be accesssed using standard methods for accessing data in a data.frame
.
class(counties)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
The icjiar
package provides a spatial object named counties
for countes in Illinois. Using class()
function, we find that counties
is of the SpatialPolygonsDataFrame
class.
tmap
package for thematic mapsleaftlet
package for interactive mapsSource: tmap GitHub repo
tmap
?“With the tmap package, thematic maps can be generated with great flexibility. The syntax for creating plots is similar to that of ggplot2. The add-on package tmaptools contains tool functions for reading and processing shape files.” - “tmap in a nutshell”
While there are a variety of ways to plot maps using R, in my experience, tmap
is the most accessible R package for generating maps, making it easy to get visually appealing maps using data in the shapefile format.
qtm()
functionqtm(shape_object, ...)
tmap
offers the qtm()
function to generate a “quick thematic map”. qtm()
is comparable to qplot()
in ggplot2
(we didn’t cover qplot()
in the previous part; try ?qplot()
on your R console to see documenetation).
Although the goal of qtm()
is to provide a quick and easy way to generate a tmap
object, it offers the same level of flexibility as the main plotting interface using tm_()
functions. The main interface is stil recommended for complex plots.
Here is an example of using qtm()
to quickly generate a map of Illinois counties, colored by their judicial circuits.
qtm(counties, fill = "circuit")
tm_*()
interfacetm_shape(shape_object) +
tm_*() # add tmap elements as layers
The main interface for tmap
to creating maps use tm_*()
functions. qtm()
is a quick-and-dirty way to create a map and, in fact, is almost as flexible as the tm_*()
way. However, as we add more options to tweak our map, qtm()
loses its major benefit of simplicity. In constrast, the tm_*()
way allows us to organize or code better.
In the tm_*()
way, we begin with tm_shape()
, which creates a tmap
object from a spatial “shape object.” Then we add add layers to it, just as we would do with ggplot2
plotting.
There are three kinds of layers in tmap
: two types of “drawing” layers (base and derived) and “attribute” layers.
tmap
drawing layersThe following table lists tmap
’s base drawing layers, with their descriptions and options for aesthetic mappings.
Layer | Description | Aesthetics |
---|---|---|
tm_polygons |
Draws polygons | col |
tm_symbols |
Draws symbols | size, col, shape |
tm_lines |
Draws lines | col, lwd |
tm_raster |
Draws a raster | col |
tm_text |
Add text labels | text, size, col |
The table is duplicated from a tmap
vignette page
tmap
drawing layersThe following table lists tmap'
s derieved drawing layers, with their descriptions and options for aesthetic mappings.
Layer | Description | Aesthetics |
---|---|---|
tm_fill |
Fills the polygons | see tm_polygons |
tm_borders |
Draws polygon borders | none |
tm_bubbles |
Draws bubbles | see tm_symbols |
tm_squares |
Draws squares | see tm_symbols |
tm_dots |
Draws dots | see tm_symbols |
tm_markders |
Draws markers | see tm_symbols and tm_text |
tm_iso |
Draws iso/contour lines | see tm_lines and tm_text |
The table is duplicated from a tmap
vignette page
tmap
attribute layersThe following table lists tmap
’s attribute layers with their descriptions.
Layer | Description |
---|---|
tm_grid |
Add coordinate grid lines |
tm_credits |
Add credits text label |
tm_compass |
Add map compass |
tm_scale_bar |
Add scale bar |
The table is duplicated from a tmap
vignette page
This example code shows how to create the same map in the tm_*()
way.
tm_shape(counties) +
tm_borders() +
tm_fill(col = "circuit")
tm_layout(title = NA, scale = 1, title.size = 1.3, bg.color = "white", aes.color = c(fill = "grey85", borders = "grey40", symbols = "grey60", dots = "black", lines = "red", text = "black", na = "grey75"), ...)
tmap
offers tm_layout()
, a function to control all the layout settings, including title, fonts, colors, backgrounds, and many more.
Using tm_layout()
to tweak layout settings to find the right look can be overwhelming and tedious. As alternatives, there are tm_style_*()
functions and tm_format_*()
functions to easily apply some predefined layout settings.
More specifically, tm_style_*()
functions which offer predefined sets of styling-related layout settings such as background colors, colors and font. This is comparable to ggplot2
themes. On the other hand, tm_format_*()
functions which offer predefined sets of position-related layout settings such as margins.
The following table lists the predefined styles tmap
offers and their descriptions.
Style | Description |
---|---|
tm_style_white |
White background, commonly used colors (default) |
tm_style_gray |
Gray background, useful to highlight sequential palettes (e.g. in choropleths) |
tm_style_natural |
Emulation of natural view: blue waters and green land |
tm_style_bw |
Greyscale |
tm_style_classic |
Classic styled maps |
tm_style_col_blind |
Style for colorblind viewers |
tm_style_cobalt |
Inspired by latex beamer style cobalt |
tm_style_albatross |
Inspired by latex beamer style albatross |
tm_style_beaver |
Inspired by latex beamer style beaver |
As mentioned before, “style” in tmap
is roughly comparable to “theme” in ggplot2
. Here is one example of applying a style to the same tmap
plot we created above.
tm_shape(counties) +
tm_borders() +
tm_fill(col = "circuit") +
tm_style_classic()
Try out other “styles” and see how they change your map!
The following table lists the predefined formats tmap
offers and their descriptions.
Format | Description |
---|---|
tm_format_World |
Format specified for world maps |
tm_format_World_wide |
for world maps with more space for the legend |
tm_format_Europe |
for maps of Europe |
tm_format_Europe_wide |
for maps of Europe with more space for the legend |
tm_format_NLD |
for maps of the Netherlands |
tm_format_NLD_wide |
for maps of the Netherlands with more space for the legend |
“Format” has more to do with the spacing and arrangement of plot elements. Take a look at the following plot with a predefined format. Notice that now the plot is made “wider”.
tm_shape(counties) +
tm_borders() +
tm_fill(col = "circuit") +
tm_format_World_wide()
Try out other “formats” and see how they change your map!
tmap_mode("plot") # set to static "plot" mode
tmap_mode("view") # set to interactive "view" mode
ttmp() # toggle between modes
tmap
offers two different “modes” for generating maps. The “plot” mode generates a static map image, which is the default mode option. On the other hand, the “view” mode generates an interactive leaflet
map. We can use tmap_mode()
function to set the mode. ttmap()
is a shortcut function to toggle between modes. Note that the set mode is applied for all tmap
plots generated in that session, untile it is reset.
Additionally, tm_view()
is a function to specify options for the interactive “view” mode, some of which can be set with tm_layout()
.
If we want to create an interactive leaflet
map without changing the mode setting for the whole session, we can use tm_leaflet()
, which takes a tmap
object as the argument input.
Here we try the simple qmap()
we saw ealier in the interactive view mode. the default option comes with many useful features, including on-click tooltips (try click anywhere on the Illinois map) and an option for changing base maps (try hover your mouse pointer over the button below the zoom buttons).
tmap_mode("view")
qtm(counties, fill = "circuit")
Leaflet
Source: leafletjs.com
leaflet
?“Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB.”
-“Leaflet for R”, RStudio
leaflet
is a powerful library for generating interactive maps, perhaps one of the most powerful options that are freely available.
However, it takes much time and practice to get familiar with its API–even using R. There is no function in leaflet
that is equivalent to qtm()
and quickly generates a map with default settings. That said, using tmap
’s interactive view could be a great alterantive to using leaflet
’s API directly.
Nonetheless, if you are serious about creating some beautiful interactive maps, it certainly pays to learn leaflet
.
The following example creates a simple leaftlet
map. Note that you have to pretty much define everything you want to see on your map. On one hand, leaflet
provides a great degree of flexibility; on the other hand, it takes many more lines of code to get a relatively simple output.
pal <- colorFactor(topo.colors(5), counties$circuit)
leaflet(counties) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(fillColor = ~pal(circuit), color = "darkgrey", weight = 2) %>%
addLegend(pal = pal, values = ~circuit)
sp
package offers spplot()
, which builds on the base R plot functionality. spplot()
takes an object of spatial classes as its source data, making it easy to work with imported shapefiles (See Eubank, N. (2015) in Resources for more).ggplot2
can use geom_polygon()
and/or geom_map()
to plot spatial polygons with some modificiations (See Kahle, D. & Wickham, H. (2013) in Resources for more).ggmap
is an R package offering functions to easily downlaod base maps from various sources, which can be used with ggplot2
layers (See Kahle, D. & Wickham, H. (2013) in Resources or ggmap
github repo for more).Here are some resources on drawing maps in R. I strongly recommand you to read through the tmap
vignettes to get started with generating good-looking maps with tmap
.
Also, if you are interested in digging deeper into spatial data manipulation in R based on the sp
package’s spatial classes, make sure you check out an online manual/book by Hijmans (2016).
Finally, a new paradigm for working with spatial objects in R has recently emerged and is implemented by the sf
package. To learn more about sf
, take a look at Pebesma (n.d.) and the package vignettes listed on the webpage.
leaftlet
official documentation pagetmap
github repository
Source: Wikipedia Commons
More compelling visualizations can be created with incorporating interactivity. We have already seen some interactive plots with tmap
view mode and leaflet
. Here we will explore some options for creating plots in general with some interactive features. Please note that this section is not meant to serve as a comprehensive or exhaustive manual for any of the introduced packages.
ggiraph
: an htmlwidget
package for interactive ggplot2
graphicsplotly
: R API for the plotly.js libraryhighcharter
: R API for the highchart.js libraryggiraph
Source: ggiraph documentation page
ggiraph
?“ggiraph is an htmlwidget and a ggplot2 extension. It allows ggplot graphics to be animated.”
- Gohel, D. (package author/creator)
ggiraph
offers interactive geom
s to be used for a ggplot2
plot and renders the plot with interactive geom
s as an interactive visualization.
p <- plot + geom_*_interactive(...)
ggiraph(code = print(p), ...)
To add interactivity to ggplot2
plot with ggiraph
, we first need to create a ggplot
object (plot
in the code chunk above) with interactive “geom” layers. The ggiraph()
function than uses the ggplot
object with interactive layers to generate an interactive plot.
ggiraph
offers 12 interactive “geom” layers that can be integrated into a ggplot
object, including the following:
geom_bar_interactive
, geom_boxplot_interactive
, geom_histogram_interactive
, geom_line_interactive
, geom_map_interactive
, geom_path_interactive
, geom_point_interactive
, geom_polygon_interactive
, geom_rect_interactive
, geom_segment_interactive
, geom_text_interactive
, and geom_tile_interactive
.aes(tooltip, onclick, data_id)
Each interactive “geom” has mapping for the following interactive elements:
tooltip
is a column containing information to be displayed as tooltiponclick
is a column containing JavaScript instructions to run for a “click” eventdata_id
is a column containing id to be associated with elements. Please note that this mapping must be specified to use a customized “hover” effect.First we get the data we will use, which is a filtered subset of ispcrime
dataset joined with regions
dataset.
data <- ispcrime %>% filter(county != "Cook") %>% left_join(regions)
Since ggiraph
requires a ggplot
object to convert into an interactive, we create one. Note that we use geom_point_interactive()
layer, instead of ggplot2
’s native geom_point()
.
p <- ggplot(data, aes(x = violentCrime, propertyCrime, color = region)) +
geom_point_interactive(aes(tooltip = county, data_id = county))
Now we can plug the plot into ggiraph()
to make it an interactive plot. This example has hover_css
argument input for specifying the hover effect.
ggiraph(code = print(p), hover_css = "fill:orange; fill-opacity:.3; cursor:pointer;")
plotly
Source: wikimedia.org
plotly
?“Plotly’s R graphing library makes interactive, publication-quality graphs online. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, and 3D (WebGL based) charts.”
- “Plotly R Library”, plotly
ggplotly()
functionggplotly(p = ggplot2::last_plot(), ...)
plotly
offers the ggplotly()
function as a quick and easy way to convert a ggplot
object into an interactive plotly object.
The first argument of ggplotly()
, p
, is a ggplot2
’s ggplot
object to be made interactive. The default value for p
is the most recently created ggplot
object if ther is any.
We use the same subset of ispcrime
we created earlier to try out ggplotly
. As in the case of using ggiraphy
, we than create a ggplot
object, but without using any special layer.
# using the same data
p <- ggplot(data, aes(violentCrime, propertyCrime, colour = region)) +
geom_point() + labs(title = "Using ggplotly()")
We then simply plug the ordinary ggplot
object into ggplotly()
to get an interactive plot with some nice default features. These features include tooltips, an ability to zoom in and out, an option to download the plot as a static image, and many more.
ggplotly(p)
Try the interactive features of this plot to understand them better.
plot_ly()
interfaceplot_ly(data, x, y, color, alpha, symbol, size, ...)
# equivalent to add_type()
add_trace(p, ..., type = "type", inheret = TRUE)
Using the plot_ly()
function, we can create a native plotly
visualization. First, we use plot_ly()
, which takes a data frame and defines the aesthetic mappings. Then we add “traces” to the plotly
object, which are comparable to “geom” layers in ggplot2
.
In fact, we can specify traces within the plot_ly()
function. However, using add_trace()
and its variants make it possible to use a dplyr
-style workflow with pipe operators and better organize the code. By default, each trace added using add_trace()
inherits the mappings from p
.
add_
functionsplotly
offers various traces to use. The following table lists some add_
functions and their descriptions.
Function | Description | Equivalent to add_trace(...) |
---|---|---|
add_trace() |
add traces with options | NA |
add_markers() |
adds a scattorplot | type="scatter", mode="markers" |
add_lines() |
adds a line plot | type="scatter", mode="lines" |
add_bars() |
adds a bar plot | type="bar", mode="markers" |
add_histogram() |
adds a histogram | type="histogram" |
add_boxplot() |
adds a box plot | type="box" |
add_pie() |
adds a pie chart | type="", mode="" |
add_text() |
adds texts | |
add_polygons() |
adds polygons | type="", mode="" |
The following code uses the plot_ly()
interface to replicate the plot we created above. Here we use add_markers()
for generating a scattorplot. The output is a plot with a native plotly
layout.
plot_ly(data, x = ~violentCrime, y = ~propertyCrime, color = ~region) %>%
add_markers() %>%
layout(title = "Using plot_ly() interface")
highcharter
Source: highcharter github repo (jbkunst/highcharter)
highcharter
?“Highcharter is a R wrapper for Highcharts javascript libray and its modules. Highcharts is very mature and flexible javascript charting library and it has a great and powerful API.”
-Kunst, J. (package author)
Highcharts
is one of my favorite JavaScript visualization library that offers elegant plots with interactive features. In fact, you may have already seen Highcharts
plots in ICJIA R&A Unit’s online articles (e.g. Figure 1 and Figure 2 in this article).
hchart()
functionhchart(data, type, hcaes(x, y, ...))
highcharter
offers the hchart()
function to quickly generates a hichchart plot, which is comparable to qplot()
in ggplot2
. type
defines the type of plot (e.g. “scattor” for scattorplot), and hcaes()
works like aes()
in ggplot2
to define aesthetic mappings.
The following example uses hchart()
to quickly generate the same scattorplot we have made ealier. data
is the same subset of ispcrime
.
hchart(data, type = "scatter", hcaes(x = violentCrime, y = propertyCrime, group = region)) %>%
hc_title(text = "Using hchart() interface")
highchart()
interfacehighchart() %>%
hc_add_series(...) %>% # add a "series"
hc_xAxis(...) %>% # define x-axis
hc_yAxis(...) %>% # define y-axis
hc_title(...) %>% # add the main title
hc_chart(...) %>% # modify general plot options
hc_color(...) %>% # control colors
hc_*(...) # and more...
There is also the highchart()
interface to create a highchart plot in a way similar to the original JavaScript interface. In general, this is a more exacting way of creating a highchart plot, but learning it can be highly rewarding. A potential compromise is to use hchart()
to get the basic plot and chain additional hc_*()
functions to fine-tune the plot.
Here is the same scattorplot created using the formal interface.
highchart() %>%
hc_add_series(data, type = "scatter", hcaes(x = violentCrime, y = propertyCrime, group = region)) %>%
hc_title(text = "Using highchart() interface")
As usual, we have only scratched the surface of these interactive plotting packages. I recommand you to pick one package you are intersted and take a look through its official documentation to gain a deeper understanding of their APIs.
ggiraph
official documentation pagehighcharter
official documentation pageplotly
official documentation pageplotly
for R.To be honest, the interactive plotting packages we have explored above are only those I have some experience with. As you might have suspected, there are many more R pakcages for interactive plotting. The following are few of such packages that I find interesting:
rAmCharts
is an R interface to the amCharts
JavaScript library that offers interactive options for many common plot types and more. See rAmCharts
online documentation for more.chartjs
is an R interface to the Chart.js
JavaScript library and offers six chart types (bar, line, pie, doughnut, radar, and polar area) for interactive plots. See chartjs
website for more.googleVis
offers an R API to Google Charts, which offers rich set of interactive charts and data tools. See googleVis
tutorial slides by the package creator for more information.dygraphs
is an R interface to the dygraphs JavaScript library for interactive time-series plots. See dygraph
online documentation for more.There are also R packages for visualizing specific kinds of data. I find the following three to be particuarly interesting and worth exploring:
visNetwork
(wrapper for vis.js
) and networkD3
are two popular packages for interactive visualization of network/graph data in R. Start with igraph
or tidygraph
package to learn how to work with network objects in Rwordcloud2
(wrapper for worldcloud2.js
) is a package for creating word clouds, a popular way to visualize text data.data.tree
is an R package for managing as well as visualizing hierarchical data and tree structures.