# template elements
# presentation

R Workshop

Module 1: Introduction to R

2018-03-14
Bobae Kang
(Bobae.Kang@illinois.gov)

Introduction to the Workshop

plot of chunk unnamed-chunk-1

Instructor (me!)

plot of chunk unnamed-chunk-2

Workshop objectives

This workshop will help you to get started and provide you with the basic skills and techniques in using R for research and data analysis.

Ultimately, this workshop seeks to help you to gain the knowledge and confidence necesary to learn what they need to know for you own research projects.


  • Import and manipulate tabular data files using R
  • Create simple data visualizations (scatterplot, histogram, bar chart, line chart, etc.) to extract insight from data using R
  • Perform basic statistical analysis using R
  • Generate a report on a simple data analysis task using R


  • Understand the basic elements of the R programming language
  • Employ the programmatic approach to research and data analysis projects
  • Leverage online resources to find solutions to specific questions on using R for a given task.

A programming approach to research

plot of chunk unnamed-chunk-3

Source: pixabay.com

GUI workflow vs. programmatic workflow

plot of chunk unnamed-chunk-4
plot of chunk unnamed-chunk-5

GUI workflow

  • Download datasets (to the Download folder)
  • Examine each dataset in MS Excel
  • Copy + paste data into a single spreadsheet
  • Open the combined dataset in SPSS
  • Run a regression analysis in SPSS
  • Write a report in MS Word
  • Submit the report

Programmatic workflow

  • Set up a directory
  • Write an R program to download datasets
  • Write an R program to combine and clean datasets
  • Write an R program to run regression analysis and draw plots
  • Write an R program to generate a report
  • Submit the report

Benefits of a programming approach

  • Automation
  • Modularity
  • Reproducibility
  • Version control

Automation

  • Implementing the research work in programs that will run later to automatically execute the work
  • Producing consistent results

Modularity

In software design, modularity refers to a logical partitioning of the “software design” that allows complex software to be manageable for the purpose of implementation and maintenance.
- “Modularity”, Wikipedia

  • Breaking down different stages or steps of research work into smaller but meaningful parts
  • Separate programms for separate tasks
  • Writing custom functions

Reproducibility

Reproducibility refers to the ability of a researcher to duplicate the results of a prior study using the same materials and procedures as were used by the original investigator. […] Reproducibility is a minimum necessary condition for a finding to be believable and informative.
- U.S. NSF Subcommittee on Replicability in Science

  • Greater productivity in a collaborative project

Version control

  • The practice of managing changes in a document or a program in a systematic fashion
  • Protecting the work from (unintentional) corruptions
  • An example of version control system: Git

Introducing ...

[1] "Hello World!"

plot of chunk unnamed-chunk-6

Source: r-project.org

What is R?

“R is a language and environment for statistical computing and graphics.”
- The R Foundation

  • Built for data analysis and visualization
  • One of the the most popular choices of programming language among academic researchers and data scientists

plot of chunk unnamed-chunk-7

Source: David Robinson, 2017, “The Impresseive Growth of R”

Why R?

plot of chunk unnamed-chunk-8

Source: flickr.com

(Because … DUH!)

plot of chunk unnamed-chunk-9

Source: Reaction GIFs

And more reasons

  • Open source (free!)
  • Built for statistical analysis
  • Reproducible and transparent
  • Extensible through powerful third-party libraries
  • Enabling researchers to tackle a variety of tasks using a single platform

Comparisons

R vs MS Excel

  • License cost
  • Speed and scalability
  • Visualization
  • Complex and advanced analysis
  • Reproducibility

R vs IBM SPSS

  • License cost (again)
  • Syntax
  • Visualization
  • Reporting

R vs Tableau

  • License cost (DUH!)
  • Reproducibility
  • Data manipulation
  • Complex and advanced analysis

Conclusion

  • No intention to degrade other tools
    • Ease of use for intended tasks
  • R as a great addition to any researcher's toolbox
    • Highly performant, versatile, and flexible

plot of chunk unnamed-chunk-11

Source: RStudio

What is RStudio? Why use it?

  • Best Integrated Development Environment (IDE) for R
  • Powerful and convenient features
  • Interactive workflow
  • Open source (free again!)
  • … and many more!

Basic Setup

plot of chunk unnamed-chunk-13

Source: Wikimedia.org

Installing R

  • Visit https://cran.r-project.org/
  • Or simply google “download R” to find the link to download page

  • Installation requires the Administrator account

    • Talk to DoIT!

plot of chunk unnamed-chunk-14

plot of chunk unnamed-chunk-15

plot of chunk unnamed-chunk-16

plot of chunk unnamed-chunk-17

Installing RStudio

plot of chunk unnamed-chunk-18

plot of chunk unnamed-chunk-19

plot of chunk unnamed-chunk-20

Workshop Overview

Module 2

R basics

  • Part 1. Fundamentals of R programming.
    • R objects, expressions, functions, environments
  • Part 2: Gearing up for data analysis
    • tidyverse framework
    • Recommended R style guide

Module 3

Data analysis with R

  • Part 1. Getting started with tidyverse
    • Manipulating data with dplyr
    • Tidying up data with tidyr
  • Part 2. More on data analysis
    • character strings
    • date/datetimes
    • Importing/exporting data

Module 4

Data visualization with R

  • Part 1. The Grammar of Graphics
    • ggplot2 package
  • Part 2. Maps and interactive plots
    • Pacakges for maps
    • Pacakges for interactive plots

Module 5

Statistical modeling with R

  • Part 1. Basics of statistical modeling
    • Descriptive statistics
    • Linear modelss and generalized linear models
  • Part 2. Options for advanced modeling
    • Survival analysis
    • Time series analysis
    • Spatial regression analysis
    • Machine learning

Module 6

“To Infinity and Beyond”

  • Part 1. Sharing your work
    • R Markdown documents
    • Presentation slides
    • Shiny applications
    • Websites
  • Part 2. Leveraging online resources
    • Various online resources

Questions?

plot of chunk unnamed-chunk-22

Source: tenor.com

References