# template elements
2018-04-18
Bobae Kang
(Bobae.Kang@illinois.gov)
Source: Wikimedia Commons
Source: Wikimedia Commons
“Survival analysis is used to analyze data in which the time until the event is of interest. The response variable is the time until that event and is often called a failure time, survival time, or event time.”
- Harrell Jr. (2015).
\[ S(t) = \text{Pr}(T > t),\quad 0 < t < \infty \]
\[ \lambda(t) = \lim_{dt\to0} \frac{\text{Pr}(t \leq T < t + dt)}{S(t)dt} = \frac{S'(t)}{S(t)} \]
survival
package
survminer
package for visualization
\[ \hat{S}(t) = \prod_{t_i \leq t} \Big( 1 - \frac{d_i}{n_i} \Big) \]
survival::Surv(time, time2, event, type, ...)
time
is the follow up time for the right censored data
time
is the starting time and time2
is the ending timeevent
is the status indicator, where 0 = alive (i.e., event not occured), 1 = dead (i.e., event occured)type
is a character string of the censoring type
Surv
class object, used to fit survival modelssurvival::survfit(formula/model, data, ...)
survfit()
returns a survival curve
formula
model
(e.g. Cox model)formula
for the KM estimator must have the Surv
object as the response variable
Surv(time, status) ~ x
data
is optional; if provided, the columns of the input data frame can be used in the formula\[ \lambda(t|\boldsymbol{\text{x}}_i) = \lambda_0(t)\psi_i = \lambda_0(t)\text{exp}(\boldsymbol{\text{x}}_i^{\text{T}}\beta) \]
survival::coxph(formula, data, ...)
formula
must have the Surv
object as the response variabledata
is optional; if provided, the columns of the input data frame can be used in the formulaplot(survfit)
survminer::ggsurvplot(survfit, data, ...)
survival
package provides a plot
method for survfit
objects
survminer
package offers an alternative way to plot survival curves
ggsurvplot()
has ggplot2
-like API and makes ggplot2
themes availableSource: Wikimedia Commons
Decomposition
Seasonality
Stationarity
Differencing
Source: “Stationary process”, Wikipedia
stats
package (part of R “base packages”)forecast
package
decompose(x, type = c("additive", "multiplicative"), ...)
stl(x, s.window, ...)
stats
offers two functions for time series decompositiondecompose()
uses moving averages
stl()
uses LOESS (local regression)
stl
is often recommended for time series decompositions.window
is the span of the LOESS window for seasonal extraction (must be odd and at least 7)acf(x, lag.max = NULL, plot = TRUE,
type = c("correlation", "covariance", "partial"), ...)
pacf(x, lag.max = NULL, plot = TRUE, ...)
x
is a univariate time seriestype = "correlation"
is the default for an ACF plotpacf()
is equivalent to acf()
with type = "partial"
\[ X_t = Z_t + \sum_{i=1}^p \phi_i X_{t-i}, \] \[ \text{rewritten as }\phi(B)X_t = Z_t \]
\[ X_t = Z_t + \sum_{i=1}^q \theta_i X_{t-i}, \] \[ \text{rewritten as }X_t =\theta(B)Z_t \]
\[ \phi(B)X_t =\theta(B)Z_t \]
\[ \phi(B)(1-B)^dX_t =\theta(B)Z_t \]
arima(x, order = c(0L, 0L, 0L), seasonal = list(order, period) ...)
x
is a vector of a univariate time seriesorder
is a sepcification of the (p, d, q) for ARIMA model, in that orderseasonal
is a specification of the seasonal part of ARIMA model, consists of order
and period
stats
and tseries
packagesSource: R Spatial
sp
and sf
sp
is a mature package but its objects have un-tidy structures
sp
sf
is a new package better suited for tidy framework
sf
is an R implementation of the “Simple Features Access” standard (ISO 19125) for geospatial dataspdep
functions use sp
spatial objectsspdep
reference manual.spdep
offers *2nb()
functions to create neighbors (nb
object)
poly2nb()
for continuity-based neighborsknn2nb()
for distance-based neighborstri2nb()
for grid-based neighborscell2nb()
for grid neighborsnb2listw()
function is used to generate a list of spatial weights (listw
object) from an nb
objectpoly2nb
function
spdep::ploy2nb(pl, row.names = NULL, queen = TRUE, ...)
pl
is a list of polygons (e.g. SpatialPolygons
class)queen
is TRUE
, polygons with a single shared boundary point are considered neighbors; if FALSE
, more than two shared poitns are needed nb2listw
function
spdep::nb2listw(neighbours, ...)
nb2listw()
function takes a neighbours object of class nb
and returns a spatial weights list (listw
)\[ I = \frac{\boldsymbol{\text{e}}^\text{T}\boldsymbol{\text{W}}\boldsymbol{\text{e}}/S_0}{\boldsymbol{\text{e}}^\text{T}\boldsymbol{\text{e}}/n} = \frac{\boldsymbol{\text{e}}^\text{T}\boldsymbol{\text{W}}\boldsymbol{\text{e}}/S_0}{\hat{\sigma}^2_{ML}} \]
\[ I_z = \frac{I - \text{E}[I]}{\sqrt{\text{Var}[I]}} \sim N(0, 1) \]
spdep::moran.test(x, listw, ...)
spdep::lm.morantest(model, listw, ...)
moran()
takes a numeric vector of data and a spatial weights list (listw
) created by nb2listw
lm.morantest()
takes a lm
object and a spatial weights listp-value
suggests special autocorrelationspdep::lm.LMtests(model, listw, test = "LMerr")
lm.LMtests()
takes a lm
model and a spatial weights listtest
inputs include:
"LMerr"
and "LMlag"
for spatial error and spatial lag model "RLMerr"
and "RLMlag"
for robust LM tests"SARMA"
for spatial ARMA model\[ \boldsymbol{\text{y}} = \rho\boldsymbol{\text{W}}\boldsymbol{\text{y}} + \boldsymbol{\text{X}}\beta + \boldsymbol{\text{u}} \]
\[ \boldsymbol{\text{y}} = \boldsymbol{\text{X}}\beta + \boldsymbol{\text{u}}, \text{where} \] \[ \boldsymbol{\text{u}} = \lambda\boldsymbol{\text{W}}\boldsymbol{\text{u}} + \varepsilon \]
spdep::lagsarlm(formula, data, listw, ...)
spdep::errorsarlm(formula, data, listw, ...)
formula
and data
works like in lm()
listw
is a spatial weights matrixSource: Wikimedia Commons
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
-Tom M. Matchell
caret
package
mlr
package
Source: Giphy