# template elements
2018-04-18
Bobae Kang
(Bobae.Kang@illinois.gov)
Source: Wikimedia Commons
Source: Wikimedia Commons
“Survival analysis is used to analyze data in which the time until the event is of interest. The response variable is the time until that event and is often called a failure time, survival time, or event time.”
- Harrell Jr. (2015).
\[ S(t) = \text{Pr}(T > t),\quad 0 < t < \infty \]
\[ \lambda(t) = \lim_{dt\to0} \frac{\text{Pr}(t \leq T < t + dt)}{S(t)dt} = \frac{S'(t)}{S(t)} \]
survival package
survminer package for visualization
\[ \hat{S}(t) = \prod_{t_i \leq t} \Big( 1 - \frac{d_i}{n_i} \Big) \]
survival::Surv(time, time2, event, type, ...)
time is the follow up time for the right censored data
time is the starting time and time2 is the ending timeevent is the status indicator, where 0 = alive (i.e., event not occured), 1 = dead (i.e., event occured)type is a character string of the censoring type
Surv class object, used to fit survival modelssurvival::survfit(formula/model, data, ...)
survfit() returns a survival curve
formulamodel (e.g. Cox model)formula for the KM estimator must have the Surv object as the response variable
Surv(time, status) ~ xdata is optional; if provided, the columns of the input data frame can be used in the formula\[ \lambda(t|\boldsymbol{\text{x}}_i) = \lambda_0(t)\psi_i = \lambda_0(t)\text{exp}(\boldsymbol{\text{x}}_i^{\text{T}}\beta) \]
survival::coxph(formula, data, ...)
formula must have the Surv object as the response variabledata is optional; if provided, the columns of the input data frame can be used in the formulaplot(survfit)
survminer::ggsurvplot(survfit, data, ...)
survival package provides a plot method for survfit objects
survminer package offers an alternative way to plot survival curves
ggsurvplot() has ggplot2-like API and makes ggplot2 themes availableSource: Wikimedia Commons
Decomposition
Seasonality
Stationarity
Differencing
Source: “Stationary process”, Wikipedia
stats package (part of R “base packages”)forecast package
decompose(x, type = c("additive", "multiplicative"), ...)
stl(x, s.window, ...)
stats offers two functions for time series decompositiondecompose() uses moving averages
stl() uses LOESS (local regression)
stl is often recommended for time series decompositions.window is the span of the LOESS window for seasonal extraction (must be odd and at least 7)acf(x, lag.max = NULL, plot = TRUE,
type = c("correlation", "covariance", "partial"), ...)
pacf(x, lag.max = NULL, plot = TRUE, ...)
x is a univariate time seriestype = "correlation" is the default for an ACF plotpacf() is equivalent to acf() with type = "partial"\[ X_t = Z_t + \sum_{i=1}^p \phi_i X_{t-i}, \] \[ \text{rewritten as }\phi(B)X_t = Z_t \]
\[ X_t = Z_t + \sum_{i=1}^q \theta_i X_{t-i}, \] \[ \text{rewritten as }X_t =\theta(B)Z_t \]
\[ \phi(B)X_t =\theta(B)Z_t \]
\[ \phi(B)(1-B)^dX_t =\theta(B)Z_t \]
arima(x, order = c(0L, 0L, 0L), seasonal = list(order, period) ...)
x is a vector of a univariate time seriesorder is a sepcification of the (p, d, q) for ARIMA model, in that orderseasonal is a specification of the seasonal part of ARIMA model, consists of order and periodstats and tseries packagesSource: R Spatial
sp and sfsp is a mature package but its objects have un-tidy structures
spsf is a new package better suited for tidy framework
sf is an R implementation of the “Simple Features Access” standard (ISO 19125) for geospatial dataspdep functions use sp spatial objectsspdep reference manual.spdep offers *2nb() functions to create neighbors (nb object)
poly2nb() for continuity-based neighborsknn2nb() for distance-based neighborstri2nb() for grid-based neighborscell2nb() for grid neighborsnb2listw() function is used to generate a list of spatial weights (listw object) from an nb objectpoly2nb function
spdep::ploy2nb(pl, row.names = NULL, queen = TRUE, ...)
pl is a list of polygons (e.g. SpatialPolygons class)queen is TRUE, polygons with a single shared boundary point are considered neighbors; if FALSE, more than two shared poitns are needed nb2listw function
spdep::nb2listw(neighbours, ...)
nb2listw() function takes a neighbours object of class nb and returns a spatial weights list (listw)\[ I = \frac{\boldsymbol{\text{e}}^\text{T}\boldsymbol{\text{W}}\boldsymbol{\text{e}}/S_0}{\boldsymbol{\text{e}}^\text{T}\boldsymbol{\text{e}}/n} = \frac{\boldsymbol{\text{e}}^\text{T}\boldsymbol{\text{W}}\boldsymbol{\text{e}}/S_0}{\hat{\sigma}^2_{ML}} \]
\[ I_z = \frac{I - \text{E}[I]}{\sqrt{\text{Var}[I]}} \sim N(0, 1) \]
spdep::moran.test(x, listw, ...)
spdep::lm.morantest(model, listw, ...)
moran() takes a numeric vector of data and a spatial weights list (listw) created by nb2listwlm.morantest() takes a lm object and a spatial weights listp-value suggests special autocorrelationspdep::lm.LMtests(model, listw, test = "LMerr")
lm.LMtests() takes a lm model and a spatial weights listtest inputs include:
"LMerr" and "LMlag" for spatial error and spatial lag model "RLMerr" and "RLMlag" for robust LM tests"SARMA" for spatial ARMA model\[ \boldsymbol{\text{y}} = \rho\boldsymbol{\text{W}}\boldsymbol{\text{y}} + \boldsymbol{\text{X}}\beta + \boldsymbol{\text{u}} \]
\[ \boldsymbol{\text{y}} = \boldsymbol{\text{X}}\beta + \boldsymbol{\text{u}}, \text{where} \] \[ \boldsymbol{\text{u}} = \lambda\boldsymbol{\text{W}}\boldsymbol{\text{u}} + \varepsilon \]
spdep::lagsarlm(formula, data, listw, ...)
spdep::errorsarlm(formula, data, listw, ...)
formula and data works like in lm()listw is a spatial weights matrixSource: Wikimedia Commons
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
-Tom M. Matchell
caret package
mlr package

Source: Giphy