This page contains the notes for the second part of R Workshop Module 6: “To Infinity and Beyond”, which is part of the R Workshop series prepared by ICJIA Research Analyst Bobae Kang to enable and encourage ICJIA researchers to take advantage of R, a statistical programming language that is one of the most powerful modern research tools.

“To Infinity and Beyond” (2): Leveraging online resources

In this final part, we will explore options for getting help and find resource materials to get answers to our questions. R users are blessed with a wealth of information that is freely available online from a variety of sources. Knowing where to look when in need of help, we are truly ready to solve any problems using R.


Why online resources?

One of the key reasons we must know how to leverage online resources is simply that we cannot know everything.

In fact, no one knows everything! Many advanced R programmers still rely on the Internet partly because the ecosystem keeps evolving and partly because they, too, are only humans with plastic memories.

For us emerging R programmers, one of the greatest benefit of knowing where to look online comes from the fact that “Someone has already done it.” Questions we ask are often not so unique. Even when we are asking truly unique questions, there are many others who worked on similar questions or questions that address some part of ours. And, as R community is very active online, we are likley to find some information on what others have done on the Internet!

Source: AZ Quotes


Before going online

Source: Wikimedia Commons

“The 15 minute rule”

But, wait. Before going online, you should first try to solve the problem on your own. If you cannot solve it after “15 mintues,” then go online looking for advices.

Even though you shouldn’t be reinventing the wheel all the time, “first trying it yourself” will help you to become an indepedent thinker/programmer. In the long run, this will prove to be a critical skill you need to adapt to the changing R ecosystem.

“Oops, my bad”

When faced with some error, remember that we all make typos. Always. So we should check for typos!

Unfortunately, RStudio does not check for typos in our code automatically. However, we can refer to an error caused by the typos to figure out what went wrong.

Also, check if a package is loaded before using its functions. A common error message we might see would be the following:

Error in some_function() : could not find function "some_function"

Make sure the intended package is imported or the function is defined before using it.

Help function

# these are equivalent
?some_function
help(some_function)

Looking ino the documentation is often the best way to understand what a function is and how to use it. We can bring out the documentation if available using the ? followed by the function name or help().

Error messages and debugging

When an error is thrown, it comes with an error message. Error messages often have rich information about what went wrong and where it went wrong.

If we are working with custom functions we defined, RStudio’s debugging tools can help us to spot the source of an error in the script and debug it. See this article on debugging with RStudio. Also, see this video by RStudio on introduction to debugging.

Google

Source: Google.com

How to google for questions

“Googling” is a great technique to find answers to our own questions. Here are some tips to effectively take advantage of Google.

  1. Be succinct and specific. The search term should be a set of keywords. Here, package names and/or function names often make good keywords. Also, using the relevant error message as a serach term can help.
  2. Look at questions/answers on platforms like Stack Overflow and Quora
  3. Refer to “official” resources if available. From your search results, first try the links to CRAN, RStudio, package GitHub repository, and other “official” resource mateirals.

“Official” resources

Source: R Project

“Official” resources are those provided by authortative entities, such as CRAN, RStudio, and package authors/maintainers. Though we can get to such “official” resources via Google search. knowing how to find them directly can facilitate our search for answers.

CRAN website

The Comprehensive R Archive Network (CRAN) has many resources for R and R packages, including the following:

  • Manuals
  • Task Views
  • Package pages

Manuals

CRAN offers the following “manuals”:

Manuals page can be found under “Documentation” on the menu located on the left side of the CRAN website. Each manual can be viewed as an HTML page or downloaded as PDF or EPUB file.

Task Views

A Task View offers a brief introduction to a particular topic and an annotated list of relevant R packages.

CRAN has tasks views on a selection of topics, including:

CRAN Task Views page can be found under “CRAN” on the menu located on the left side of the CRAN website.

Package pages

Each contributed package that is listed on CRAN has a page on the CRAN website. Here we can find a reference manual and vignettes for the package.

To directly get to the package page, try on your broswer:

with replacing [package-name] with any existing package name.

Alternatively, we can search for a particular package on the CRAN website user interface. Packages page can be found under “Software” on the menu located on the left side of the CRAN website.

CRAN package page example (dplyr)

Here is an example page for dplyr package. It offers detailed information about the current version available on CRAN as well as links to its reference manual and vignettes (in the red box).

Pacakge reference manuals

R packages have reference manuals that contain documentation for all its contents, i.e. functions and datasets. Basically, it is a collection of help() documentations in a pdf format.

Reference manual can also be found by googling. Just try “package-name pdf” as your Google search term.

Pacakge vignettes

Packages often have vignettes to introduce its contents. Some vignettes can be accessed via vignette("package") on R console. Other vignettes are found on the pacakge page on CRAN.

Unfortunately, not all packages have vignettes, so don’t be suprised when you cannot find vignettes for certain packages.

RStudio website

RStudio’s website offer many useful resources under “Resources” menu, including the following:

Cheet sheets

Currently, 13 RStudio cheat sheets are available, including:

  • “Data Transformation with dplyr”
  • “Data Import”
  • “Data Visualization with ggplot2”
  • “Date and times with lubridate”
  • “Work with strings with stringr”

There are about 15 user-made cheat sheats as well.

Some RStudio cheat sheats can also be found in RStudio IDE menu at “Help > Cheatsheets”.

The image below shows the Cheet Sheets page on RStudio website under “Resources” menu.


And the following is the example cheat sheet for using dplyr to manipulate tabular datasets in R.

Webinar & videos

RStudio’s webinars and videos offer materials covering a variety of subjects. Some materials are organized by topics, including:

  • “RStudio Essentials”
  • “Shiny Essentials” and “Advanced Shiny”
    • Some videos here are also available via Shiny website
  • “The Essentials of Data Science”
  • “Advanced Data Science”

Materials from RStudio’s annual conference, rstudio::conf, are also made available.

Tidyverse website

Tidyverse has its own website to introduce tidyverse packages, share updates and news on tidyverse, and offer guides to training matarials.

There are also child websites for many of tidyverse packages with standardized URL: “[package-name].tidyverse.org”.

Tidyverse child websites

The following table lists tidyverse’s child websites for some of its packages:

Pacakge Description URL
ggplot2 For data visualization http://ggplot2.tidyverse.org/
dplyr For data manpulation http://dplyr.tidyverse.org/
tidyr For tidying up data http://tidyr.tidyverse.org/
readr For data implort/export http://readr.tidyverse.org/
purrr For better loops http://purrr.tidyverse.org/
tibble For extending data.frame http://tibble.tidyverse.org/
stringr For working with strings http://stringr.tidyverse.org/
forcats For working with factors http://forcats.tidyverse.org/
readxl For importing Excel files http://readxl.tidyverse.org/
haven For SPSS, SAS, and Stata data http://haven.tidyverse.org/
lubridate For working with datetimes http://lubridate.tidyverse.org/
magrittr For specialized pipe oprators http://magrittr.tidyverse.org/

R Markdown website

RStudio has a separate website focused on all things R Markdown.

The R Markdown website has useful resources such as its Articles page that offers a number of tutorials on creating various sorts of R Markdown documents and the Formats page that provides links to reference matarials on various R Markdown formats and templates.

Shiny website

RStudio also has a separate website on everything Shiny. Some of the useful resource materials can be found in the following pages:

First, its Video & wrttien tutorial page has links to tutorial videos and articles on Shiny as well as recorded conference presentations and webinars.

Second, the Articles page offers a list of web articles on building Shiny applications.

Finally, the Reference page contains links to upgrade notes and function references for lastest as well as previous versions of the Shiny package.

htmlwidgets website

htmlwidgets for R website presents brief descriptions and examples for various packages for incorporating interactive widgets into R ecosystem.

Currently, there are about 100 widgets registered as htmlwidgets. Visit its “Gallery” page to see what widgets are available.

Some popular htmlwidgets packagees include:

  • plotly and highcharter for interactive visualizations
  • leaflet for interactive maps
  • DT for interactive data tables


R Community

Source: “Community (TV series)”, Wikipedia

One of the greatest strengths of R is its community that is highly active and diverse. Naturally, a lot of quality resource materials on the Internet come from the members of R community.

R-bloggers

R-bloggers is a blog that collects and features articles and blog posts on R and programming in R from a variety of sources.

The blog offers an excellent way to stay up-to-date on new packages and developments in the R community. Its posts cover new updates in R and major R packages, tutorials, information on upcoming events and conferences, and much more.

Online “books”

There are many “books” written by R community members that are freely available online. Some excellent online books are as follows:

Also, visit the bookdown package website to find many more free online books on R!

R for Data Science

Source: R for Data Science

I especially recommend R for Data Science by Hadley Wickham and Garrett Grolemund as your first R book. Much of this workshop is inspired by this book. It is written for beginners and covers key concepts and applications of R programming for data analysis.

Helpful websites

There are many excellent websites providing tutorials and learning materials on R and data analysis with R. The following are some of my personal favorites:

And, of course, take advantage of this workshop’s website! :)


GitHub repositories

Source: GitHub

What is GitHub?

“GitHub is a development platform inspired by the way you work. From open source to business, you can host and review code, manage projects, and build software alongside millions of other developers.” - GitHub.com

Most R packages are available as GitHub repositories, which can be “cloned” and downloaded if wanted. Here we can view the source code that shows what the package functions are doing under the hood to get restults they promise. Not only can we understand better what the functions are doing, we can also use the source code as an inspiration for writing our own functions or even packages.

Also, many R package authors offer brief explanations and even quick tutorials for their packages on the GitHub repositories.

Github repository example (dplyr)

Here is a screenshot of dplyr GitHub repository:


Online courses

Source: worldview.stanford.edu

If you are a kind of person who learns best from taking courses on the subject matter, you can take advantage of online courses on R.

DataCamp

DataCamp is one of the best websites out there for learning R. It appears that many programmers and package authors at RStudio have courses on DataCamp. This means that you can learn about certain packages directly from the authors!

DataCamp requires registration and log-in to take the courses. There are some free courses available, but most are paid courses with one free chapter. Cost is $25/month with the annual plan or $29/month. Onces you are subscribed, all courses are made available.

DataCamp offers 70+ courses on R. In general, a course is short (~4 hours) and focused on a specific topic. DataCamp’s R courses cover materials that range from basic to intermediate level.

Coursera

Coursera is a MOOC (massive open online courses) site works with universities and offers learning materials that feel like a college course on a variety of subjects. Coursera has courses, specializations and online degrees. You can find out more about differences between these options here.

Coursera requires registration and log-in. Once you are logged in, you can “audit” any course for free. For a small fee ($29 to $99), you can get course Certificate and online support.

Some notable contents on Coursera include:

  • Data Science Specialization (10 courses)
  • Statistics with R Specialization (5 courses)

edX

edX is another MOOC site that offers university-level courses, which are generally free and self-paced. Like Coursera, edX courses are usually organized in a college-course like format.

Taking courses on edX requires registration and log-in. edX offer verified certificate for individual courses and XSeries certificate for XSeries programs for a small fee.

edX is perhaps better for learning basics on topics like:

  • Computer science and programming
  • Probability and statistics

References