Title: | Fitting Flexible Smooth-in-Time Hazards and Risk Functions via Logistic and Multinomial Regression |
---|---|
Description: | Fit flexible and fully parametric hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. Our formulation allows for arbitrary functional forms of time and its interactions with other predictors for time-dependent hazards and hazard ratios. From the fitted hazard model, we provide functions to readily calculate and plot cumulative incidence and survival curves for a given covariate profile. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots. Based on the case-base sampling approach of Hanley and Miettinen (2009) <DOI:10.2202/1557-4679.1125>, Saarela and Arjas (2015) <DOI:10.1111/sjos.12125>, and Saarela (2015) <DOI:10.1007/s10985-015-9352-x>. |
Authors: | Sahir Bhatnagar [aut, cre] (https://sahirbhatnagar.com/), Maxime Turgeon [aut] (https://www.maxturgeon.ca/), Jesse Islam [aut] (https://www.jesseislam.com/), Olli Saarela [aut] (https://www.dlsph.utoronto.ca/faculty-profile/saarela-olli/), James Hanley [aut] (https://jhanley.biostat.mcgill.ca/) |
Maintainer: | Sahir Bhatnagar <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.10.6 |
Built: | 2024-11-15 05:54:05 UTC |
Source: | https://github.com/sahirbhatnagar/casebase |
Using the output of the function fitSmoothHazard
, we can compute
absolute risks by integrating the fitted hazard function over a time period
and then converting this to an estimated survival for each individual.
Plot method for objects returned by the absoluteRisk
function. Current plot types are cumulative incidence and survival
functions.
absoluteRisk.CompRisk( object, time, newdata, method = c("numerical", "montecarlo"), nsamp = 100, onlyMain = TRUE, type = c("CI", "survival"), addZero = TRUE ) absoluteRisk( object, time, newdata, method = c("numerical", "montecarlo"), nsamp = 100, s = c("lambda.1se", "lambda.min"), onlyMain = TRUE, type = c("CI", "survival"), addZero = TRUE, ntimes = 100, ... ) ## S3 method for class 'absRiskCB' print(x, ...) ## S3 method for class 'absRiskCB' plot( x, ..., xlab = "time", ylab = ifelse(attr(x, "type") == "CI", "cumulative incidence", "survival probability"), type = "l", gg = TRUE, id.names, legend.title )
absoluteRisk.CompRisk( object, time, newdata, method = c("numerical", "montecarlo"), nsamp = 100, onlyMain = TRUE, type = c("CI", "survival"), addZero = TRUE ) absoluteRisk( object, time, newdata, method = c("numerical", "montecarlo"), nsamp = 100, s = c("lambda.1se", "lambda.min"), onlyMain = TRUE, type = c("CI", "survival"), addZero = TRUE, ntimes = 100, ... ) ## S3 method for class 'absRiskCB' print(x, ...) ## S3 method for class 'absRiskCB' plot( x, ..., xlab = "time", ylab = ifelse(attr(x, "type") == "CI", "cumulative incidence", "survival probability"), type = "l", gg = TRUE, id.names, legend.title )
object |
Output of function |
time |
A vector of time points at which we should compute the absolute risks. |
newdata |
Optionally, a data frame in which to look for variables with
which to predict. If omitted, the mean absolute risk is returned.
Alternatively, if |
method |
Method used for integration. Defaults to |
nsamp |
Maximal number of subdivisions (if |
onlyMain |
Logical. For competing risks, should we return absolute risks
only for the main event of interest? Defaults to |
type |
Line type. Only used if |
addZero |
Logical. Should we add time = 0 at the beginning of the
output? Defaults to |
s |
Value of the penalty parameter lambda at which predictions are
required (for class |
ntimes |
Number of time points (only used if |
... |
further arguments passed to |
x |
Fitted object of class |
xlab |
xaxis label, Default: 'time' |
ylab |
yaxis label. By default, this will use the |
gg |
Logical for whether the |
id.names |
Optional character vector used as legend key when |
legend.title |
Optional character vector of the legend title. Only used
if |
If newdata = "typical"
, we create a typical covariate profile for the
absolute risk computation. This means that we take the median for numerical
and date variables, and we take the most common level for factor variables.
In general, the output will include a column corresponding to the provided
time points. Some modifications of the time
vector are done:
time=0
is added, the time points are ordered, and duplicates are
removed. All these modifications simplify the computations and give an output
that can easily be used to plot risk curves.
If there is no competing risk, the output is a matrix where each column corresponds to the several covariate profiles, and where each row corresponds to a time point. If there are competing risks, the output will be a 3-dimensional array, with the third dimension corresponding to the different events.
The numerical method should be good enough in most situation, but Monte Carlo integration can give more accurate results when the estimated hazard function is not smooth (e.g. when modeling with time-varying covariates).
If time
was provided, returns the estimated absolute risk for
the user-supplied covariate profiles. This will be stored in a matrix or a
higher dimensional array, depending on the input (see details). If both
time
and newdata
are missing, returns the original data
with a new column containing the risk estimate at failure times.
A plot of the cumulative incidence or survival curve
matplot
,
absoluteRisk
,
as.data.table
, setattr
,
melt.data.table
# Simulate censored survival data for two outcome types library(data.table) set.seed(12345) nobs <- 1000 tlim <- 20 # simulation parameters b1 <- 200 b2 <- 50 # event type 0-censored, 1-event of interest, 2-competing event # t observed time/endpoint # z is a binary covariate DT <- data.table(z = rbinom(nobs, 1, 0.5)) DT[,`:=` ("t_event" = rweibull(nobs, 1, b1), "t_comp" = rweibull(nobs, 1, b2))] DT[,`:=`("event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp), "time" = pmin(t_event, t_comp))] DT[time >= tlim, `:=`("event" = 0, "time" = tlim)] out_linear <- fitSmoothHazard(event ~ time + z, DT, ratio = 10) linear_risk <- absoluteRisk(out_linear, time = 10, newdata = data.table("z" = c(0,1))) # Plot CI curves---- library(ggplot2) data("brcancer") mod_cb_tvc <- fitSmoothHazard(cens ~ estrec*log(time) + horTh + age + menostat + tsize + tgrade + pnodes + progrec, data = brcancer, time = "time", ratio = 1) smooth_risk_brcancer <- absoluteRisk(object = mod_cb_tvc, newdata = brcancer[c(1,50),]) class(smooth_risk_brcancer) plot(smooth_risk_brcancer)
# Simulate censored survival data for two outcome types library(data.table) set.seed(12345) nobs <- 1000 tlim <- 20 # simulation parameters b1 <- 200 b2 <- 50 # event type 0-censored, 1-event of interest, 2-competing event # t observed time/endpoint # z is a binary covariate DT <- data.table(z = rbinom(nobs, 1, 0.5)) DT[,`:=` ("t_event" = rweibull(nobs, 1, b1), "t_comp" = rweibull(nobs, 1, b2))] DT[,`:=`("event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp), "time" = pmin(t_event, t_comp))] DT[time >= tlim, `:=`("event" = 0, "time" = tlim)] out_linear <- fitSmoothHazard(event ~ time + z, DT, ratio = 10) linear_risk <- absoluteRisk(out_linear, time = 10, newdata = data.table("z" = c(0,1))) # Plot CI curves---- library(ggplot2) data("brcancer") mod_cb_tvc <- fitSmoothHazard(cens ~ estrec*log(time) + horTh + age + menostat + tsize + tgrade + pnodes + progrec, data = brcancer, time = "time", ratio = 1) smooth_risk_brcancer <- absoluteRisk(object = mod_cb_tvc, newdata = brcancer[c(1,50),]) class(smooth_risk_brcancer) plot(smooth_risk_brcancer)
Data on patients who underwent haematopoietic stem cell transplantation for acute leukemia.
bmtcrr
bmtcrr
A dataframe with 177 observations and 7 variables:
Gender of the individual
Disease: lymphoblastic or myeloblastic leukemia, abbreviated as ALL and AML, respectively
Phase at transplant (Relapse, CR1, CR2, CR3)
Age at the beginning of follow-up
Status indicator: 0=censored, 1=relapse, 2=competing event
Source of stem cells: bone marrow and peripheral blood, coded as BM+PB, or peripheral blood only, coded as PB
Failure time in months
Scrucca L, Santucci A, Aversa F. Competing risk analysis using R: an easy guide for clinicians. Bone Marrow Transplant. 2007 Aug;40(4):381-7. doi:10.1038/sj.bmt.1705727.
A data frame containing the observations from the GBSG2 study.
This is taken almost verbatim from the TH.data
package.
brcancer
brcancer
This data frame contains the observations of 686 women:
hormonal therapy, a factor at two levels no
and
yes
.
numeric version of horTh
of the patients in years.
menopausal status, a factor at two
levels pre
(premenopausal) and post
(postmenopausal).
Numeric version of menostat
tumor size (in mm).
tumor grade, a ordered factor at levels I < II <
III
.
number of positive nodes.
progesterone receptor (in fmol).
estrogen receptor (in fmol).
recurrence free survival time (in days).
censoring indicator (0- censored, 1- event).
Torsten Hothorn (2019). TH.data: TH's Data Archive. R package version 1.0-10. https://CRAN.R-project.org/package=TH.data
M. Schumacher, G. Basert, H. Bojar, K. Huebner, M. Olschewski,
W. Sauerbrei, C. Schmoor, C. Beyerle, R.L.A. Neumann and H.F. Rauschecker
for the German Breast Cancer Study Group (1994), Randomized
trial evaluating hormonal treatment and the duration of chemotherapy in
node-positive breast cancer patients. Journal of Clinical Oncology,
12, 2086–2093.
Checks for event categories and gives a warning message indicating which level is assumed to be the reference level.
checkArgsEventIndicator(data, event, censored.indicator)
checkArgsEventIndicator(data, event, censored.indicator)
data |
a |
event |
a character string giving the name of the event variable
contained in |
censored.indicator |
a character string of length 1 indicating which
value in |
A list of length two. The first element is the factored event, and the second element is the numeric representation of the event
if (requireNamespace("survival", quietly = TRUE)) { library(survival) # for veteran data checkArgsEventIndicator(data = veteran, event = "celltype", censored.indicator = "smallcell") checkArgsEventIndicator(data = veteran, event = "status") } data("bmtcrr") # from casebase checkArgsEventIndicator(data = bmtcrr, event = "Sex", censored.indicator = "M") checkArgsEventIndicator(data = bmtcrr, event = "D", censored.indicator = "AML") checkArgsEventIndicator(data = bmtcrr, event = "Status")
if (requireNamespace("survival", quietly = TRUE)) { library(survival) # for veteran data checkArgsEventIndicator(data = veteran, event = "celltype", censored.indicator = "smallcell") checkArgsEventIndicator(data = veteran, event = "status") } data("bmtcrr") # from casebase checkArgsEventIndicator(data = bmtcrr, event = "Sex", censored.indicator = "M") checkArgsEventIndicator(data = bmtcrr, event = "D", censored.indicator = "AML") checkArgsEventIndicator(data = bmtcrr, event = "Status")
This class inherits from vglm-class
.
summary(object, ...) ## S4 method for signature 'CompRisk' summary(object)
summary(object, ...) ## S4 method for signature 'CompRisk' summary(object)
object |
Object of class |
... |
Extra parameters |
originalData
Data.frame containing the original data (i.e. before
case-base sampling). This is used by the absoluteRisk
function.
typeEvents
Numeric factor which encodes the type of events being considered (including censoring).
timeVar
Character string giving the name of the time variable, as
appearing in originalData
eventVar
Character string giving the name of the event variable, as
appearing in originalData
This function uses parametric bootstrap to compute confidence intervals for
the risk estimates. Since it relies on MLE theory for the validity of these
intervals, this function only works for fit_obj
that was fitted using
family = "glm"
(i.e. the default).
## S3 method for class 'absRiskCB' confint(object, parm, level = 0.95, nboot = 500, ...)
## S3 method for class 'absRiskCB' confint(object, parm, level = 0.95, nboot = 500, ...)
object |
Output of function |
parm |
Output of function |
level |
The confidence level required. |
nboot |
The number of bootstrap samples to use. |
... |
Additional arguments for methods. |
If the package progress
is available, the function also reports on
progress of the sampling (which can take some time if there are many
covariate profiles and/or time points).
If there is only one covariate profile, the function returns a matrix with the time points, the risk estimates, and the confidence intervals. If there are more than one covariate profile, the function returns a list with three components.
This data was reconstructed from the curves in figure 2 (Manson 2003).Compares placebo to hormone treatment.
eprchd
eprchd
A dataframe with 16608 observations and 3 variables:
Years (continuous)
0=censored, 1=event
placebo, estPro
Manson, J. E., Hsia, J., Johnson, K. C., Rossouw, J. E., Assaf, A. R., Lasser, N. L., ... & Strickland, O. L. (2003). Estrogen plus progestin and the risk of coronary heart disease. New England Journal of Medicine, 349(6), 523-534.
data("eprchd") fit <- fitSmoothHazard(status ~ time + treatment, data = eprchd)
data("eprchd") fit <- fitSmoothHazard(status ~ time + treatment, data = eprchd)
This data set lists the individual observations for 159,893 men in the core age group between the ages of 55 and 69 years at entry.
ERSPC
ERSPC
A data frame with 159,893 observations on the following 3 variables:
Whether in Screening Arm (1) or non-Screening arm
(0) (numeric
)
The time, measured in years from randomization, at which follow-up was terminated
Whether follow-up was terminated by Death from Prostate Cancer (1) or by death from other causes, or administratively (0)
The men were recruited from seven European countries (centers). Each centre began recruitment at a different time, ranging from 1991 to 1998. The last entry was in December 2003. The uniform censoring date was December 31, 2006. The randomization ratio was 1:1 in six of the seven centres. In the seventh, Finland, the size of the screening group was fixed at 32,000 subjects. Because the whole birth cohort underwent randomization, this led to a ratio, for the screening group to the control group, of approximately 1 to 1.5, and to the non-screening arm being larger than the screening arm.
The randomization of the Finnish cohorts were carried out on January 1 of each of the 4 years 1996 to 1999. This, coupled with the uniform December 31 2006 censoring date, lead to large numbers of men with exactly 11, 10, 9 or 8 years of follow-up.
Tracked backwards in time (i.e. from right to left), the Population-Time plot shows the recruitment pattern from its beginning in 1991, and in particular the Jan 1 entries in successive years.
Tracked forwards in time (i.e. from left to right), the plot for the first 3 years shows attrition due entirely to death (mainly from other causes). Since the Swedish and Belgian centres were the last to close their recruitment - in December 2003 - the minimum potential follow-up is three years. Tracked further forwards in time (i.e. after year 3) the attrition is a combination of deaths and staggered entries.
The individual censored values were recovered by James Hanley from the Postcript code that the NEJM article (Schroder et al., 2009) used to render Figure 2 (see Liu et al., 2014, for details). The uncensored values were more difficult to recover exactly, as the 'jumps' in the Nelson-Aalen plot are not as monotonic as first principles would imply. Thus, for each arm, the numbers of deaths in each 1-year time-bin were estimated from the differences in the cumulative incidence curves at years 1, 2, ... , applied to the numbers at risk within the time-interval. The death times were then distributed at random within each bin.
The interested reader can 'see' the large numbers of individual censored values by zooming in on the original pdf Figure, and watching the Figure being re-rendered, or by printing the graph and watching the printer 'pause' while it superimposes several thousand dots (censored values) onto the curve. Watching these is what prompted JH to look at what lay 'behind' the curve. The curve itself can be drawn using fewer than 1000 line segments, and unless on peers into the PostScript) the almost 160,000 dots generated by Stata are invisible.
Liu Z, Rich B, Hanley JA. Recovering the raw data behind a non-parametric survival curve. Systematic Reviews 2014; 3:151. doi:10.1186/2046-4053-3-151.
Schroder FH, et al., for the ERSPC Investigators. Screening and Prostate-Cancer Mortality in a Randomized European Study. N Engl J Med 2009; 360:1320-8. doi:10.1056/NEJMoa0810084.
data("ERSPC") set.seed(12345) pt_object_strat <- casebase::popTime(ERSPC[sample(1:nrow(ERSPC), 10000),], event = "DeadOfPrCa", exposure = "ScrArm") plot(pt_object_strat, facet.params = list(ncol = 2))
data("ERSPC") set.seed(12345) pt_object_strat <- casebase::popTime(ERSPC[sample(1:nrow(ERSPC), 10000),], event = "DeadOfPrCa", exposure = "ScrArm") plot(pt_object_strat, facet.params = list(ncol = 2))
Miettinen and Hanley (2009) explained how case-base sampling can be used to estimate smooth-in-time parametric hazard functions. The idea is to sample person-moments, which may or may not correspond to an event, and then fit the hazard using logistic regression.
fitSmoothHazard( formula, data, time, family = c("glm", "gam", "glmnet"), censored.indicator, ratio = 100, ... ) fitSmoothHazard.fit( x, y, formula_time, time, event, family = c("glm", "glmnet"), censored.indicator, ratio = 100, ... ) prepareX(formula, data)
fitSmoothHazard( formula, data, time, family = c("glm", "gam", "glmnet"), censored.indicator, ratio = 100, ... ) fitSmoothHazard.fit( x, y, formula_time, time, event, family = c("glm", "glmnet"), censored.indicator, ratio = 100, ... ) prepareX(formula, data)
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under Details. |
data |
a data frame, list or environment containing the variables in the
model. If not found in data, the variables are taken from
|
time |
a character string giving the name of the time variable. See Details. |
family |
a character string specifying the family of regression models used to fit the hazard. |
censored.indicator |
a character string of length 1 indicating which
value in |
ratio |
integer, giving the ratio of the size of the base series to that of the case series. Defaults to 100. |
... |
Additional parameters passed to fitting functions (e.g.
|
x |
Matrix containing covariates. |
y |
Matrix containing two columns: one corresponding to time, the other to the event type. |
formula_time |
A formula describing how the hazard depends on time. Defaults to linear. |
event |
a character string giving the name of the event variable. |
The object data
should either be the output of the function
sampleCaseBase
or the source dataset on which case-base
sampling will be performed. In the latter case, it is assumed that
data
contains the two columns corresponding to the supplied time and
event variables. The variable time
is used for the sampling the base
series, and therefore it should represent the time variable on its original
(i.e. non transformed) scale. If time
is missing, the function looks
for a column named "time"
in the data. Note that the event variable is
inferred from formula
, since it is the left hand side.
For single-event survival analysis, it is also possible to fit the hazard
function using glmnet
or gam
. The choice of fitting family is
controlled by the parameter family
. The default value is glm
,
which corresponds to logistic regression. For competing risk analysis, only
glm
and glmnet
are allowed.
We also provide a matrix interface through fitSmoothHazard.fit
, which
mimics glm.fit
. This is mostly convenient for family =
"glmnet"
, since a formula interface becomes quickly cumbersome as the number
of variables increases. In this setting, the matrix y
should have two
columns and contain the time and event variables (e.g. like the output of
survival::Surv
). We need this linear function of time in order to
perform case-base sampling. Therefore, nonlinear functions of time should be
specified as a one-sided formula through the argument formula_time
(the left-hand side is always ignored).
prepareX
is a slightly modified version of the same function from
the glmnet
package. It can be used to convert a data.frame to a matrix
with categorical variables converted to dummy variables using one-hot
encoding
An object of glm
and lm
when there is only one event of
interest, or of class CompRisk
, which inherits from
vglm
, for a competing risk analysis. As such, functions like
summary
, deviance
and coefficients
give familiar
results.
# Simulate censored survival data for two outcome types from exponential # distributions library(data.table) nobs <- 500 tlim <- 20 # simulation parameters b1 <- 200 b2 <- 50 # event type 0-censored, 1-event of interest, 2-competing event # t observed time/endpoint # z is a binary covariate DT <- data.table(z = rbinom(nobs, 1, 0.5)) DT[, `:=`( "t_event" = rweibull(nobs, 1, b1), "t_comp" = rweibull(nobs, 1, b2) )] DT[, `:=`( "event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp), "time" = pmin(t_event, t_comp) )] DT[time >= tlim, `:=`("event" = 0, "time" = tlim)] out_linear <- fitSmoothHazard(event ~ time + z, DT, ratio = 10) out_log <- fitSmoothHazard(event ~ log(time) + z, DT, ratio = 10) # Use GAMs library(mgcv) DT[event == 2, event := 1] out_gam <- fitSmoothHazard(event ~ s(time) + z, DT, ratio = 10, family = "gam")
# Simulate censored survival data for two outcome types from exponential # distributions library(data.table) nobs <- 500 tlim <- 20 # simulation parameters b1 <- 200 b2 <- 50 # event type 0-censored, 1-event of interest, 2-competing event # t observed time/endpoint # z is a binary covariate DT <- data.table(z = rbinom(nobs, 1, 0.5)) DT[, `:=`( "t_event" = rweibull(nobs, 1, b1), "t_comp" = rweibull(nobs, 1, b2) )] DT[, `:=`( "event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp), "time" = pmin(t_event, t_comp) )] DT[time >= tlim, `:=`("event" = 0, "time" = tlim)] out_linear <- fitSmoothHazard(event ~ time + z, DT, ratio = 10) out_log <- fitSmoothHazard(event ~ log(time) + z, DT, ratio = 10) # Use GAMs library(mgcv) DT[event == 2, event := 1] out_gam <- fitSmoothHazard(event ~ s(time) + z, DT, ratio = 10, family = "gam")
Visualize estimated hazard curves as a function of time with
confidence intervals. This function takes as input, the result from the
fitSmoothHazard()
function. The user can also specify a
sequence of times at which to estimate the hazard function. These plots are
useful to visualize the non-proportional hazards, i.e., time dependent
interactions with a covariate.
hazardPlot( object, newdata, type = c("hazard"), xlab = NULL, breaks = 100, ci.lvl = 0.95, ylab = NULL, line.col = 1, ci.col = "grey", lty = par("lty"), add = FALSE, ci = !add, rug = !add, s = c("lambda.1se", "lambda.min"), times = NULL, ... )
hazardPlot( object, newdata, type = c("hazard"), xlab = NULL, breaks = 100, ci.lvl = 0.95, ylab = NULL, line.col = 1, ci.col = "grey", lty = par("lty"), add = FALSE, ci = !add, rug = !add, s = c("lambda.1se", "lambda.min"), times = NULL, ... )
object |
Fitted object of class |
newdata |
A data frame in which to look for variables with which to predict. This is required and must contain all the variables used in the model. Only one covariate profile can be used. If more than one row is provided, only the first row will be used. |
type |
Type of plot. Currently, only "hazard" has been implemented. Default: c("hazard") |
xlab |
x-axis label. Default: the name of the time variable from the
fitted |
breaks |
Number of points at which to estimate the hazard. This argument
is only used if argument |
ci.lvl |
Confidence level. Must be in (0,1), Default: 0.95 |
ylab |
y-axis label. Default: NULL which means the function will put sensible defaults. |
line.col |
Line color, Default: 1. See |
ci.col |
Confidence band color. Only used if argument |
lty |
Line type. See |
add |
Logical; if TRUE add to an already existing plot; Default: FALSE |
ci |
Logical; if TRUE confidence bands are calculated. Only available
for |
rug |
Logical. Adds a rug representation (1-d plot) of the event times
(only for |
s |
Value of the penalty parameter lambda at which predictions are
required (for class |
times |
Vector of numeric values at which the hazard should be
calculated. Default: NULL which means this function will use the minimum
and maximum of observed event times with the |
... |
further arguments passed to |
This is an earlier version of a function to plot hazards. We
recommend instead using the plot method for objects returned by
fitSmoothHazard()
. See plot.singleEventCB()
.
a plot of the hazard function and a data.frame of original data used
in the fitting along with the data used to create the plots including
predictedhazard
which is the predicted hazard for a given covariate
pattern and time predictedloghazard
is the predicted hazard on the log
scale. lowerbound
and upperbound
are the lower and upper confidence
interval bounds on the hazard scale (i.e. used to plot the confidence
bands). standarderror
is the standard error of the log hazard (only if
family="glm"
or family="gam"
)
data("simdat") mod_cb <- fitSmoothHazard(status ~ trt * eventtime, time = "eventtime", data = simdat[1:200,], ratio = 1, family = "glm") results0 <- hazardPlot(object = mod_cb, newdata = data.frame(trt = 0), ci.lvl = 0.95, ci = FALSE, lty = 1, line.col = 1, lwd = 2) head(results0) hazardPlot(object = mod_cb, newdata = data.frame(trt = 1), ci = FALSE, ci.lvl = 0.95, add = TRUE, lty = 2, line.col = 2, lwd = 2) legend("topleft", c("trt=0","trt=1"),lty=1:2,col=1:2,bty="y", lwd = 2)
data("simdat") mod_cb <- fitSmoothHazard(status ~ trt * eventtime, time = "eventtime", data = simdat[1:200,], ratio = 1, family = "glm") results0 <- hazardPlot(object = mod_cb, newdata = data.frame(trt = 0), ci.lvl = 0.95, ci = FALSE, lty = 1, line.col = 1, lwd = 2) head(results0) hazardPlot(object = mod_cb, newdata = data.frame(trt = 1), ci = FALSE, ci.lvl = 0.95, add = TRUE, lty = 2, line.col = 2, lwd = 2) legend("topleft", c("trt=0","trt=1"),lty=1:2,col=1:2,bty="y", lwd = 2)
plot
method for objects of class popTime
Create a data frame for population time plots to give a visual representation of incidence density
## S3 method for class 'popTime' plot( x, ..., xlab = "Follow-up time", ylab = "Population", add.case.series = TRUE, add.base.series = FALSE, add.competing.event = FALSE, casebase.theme = TRUE, ribbon.params = list(), case.params = list(), base.params = list(), competing.params = list(), color.params = list(), fill.params = list(), theme.params = list(), facet.params = list(), ratio = 1, censored.indicator, comprisk = FALSE, legend = TRUE, ncol, legend.position, line.width, line.colour, point.size, point.colour ) popTime(data, time, event, censored.indicator, exposure, percentile_number) checkArgsTimeEvent(data, time, event)
## S3 method for class 'popTime' plot( x, ..., xlab = "Follow-up time", ylab = "Population", add.case.series = TRUE, add.base.series = FALSE, add.competing.event = FALSE, casebase.theme = TRUE, ribbon.params = list(), case.params = list(), base.params = list(), competing.params = list(), color.params = list(), fill.params = list(), theme.params = list(), facet.params = list(), ratio = 1, censored.indicator, comprisk = FALSE, legend = TRUE, ncol, legend.position, line.width, line.colour, point.size, point.colour ) popTime(data, time, event, censored.indicator, exposure, percentile_number) checkArgsTimeEvent(data, time, event)
x |
an object of class |
... |
Ignored. |
xlab , ylab
|
The title of the respective axis. Default: 'Follow-up time' for xlab and 'Population' for ylab |
add.case.series |
Logical indicating if the case series should be added to the plot. Default: TRUE |
add.base.series |
Logical indicating if the base series should be added to the plot. Default: FALSE |
add.competing.event |
Logical indicating if the competing event should be added to the plot. Default: FALSE |
casebase.theme |
Logical indication if the casebase theme be used. The
casebase theme uses |
ribbon.params |
A list containing arguments that are passed to
|
case.params , base.params , competing.params
|
A list containing arguments
that are passed to |
color.params |
A list containing arguments that are passed to
|
fill.params |
A list containing arguments that are passed to
|
theme.params |
A list containing arguments that are passed to
|
facet.params |
A list containing arguments that are passed to
|
ratio |
If |
censored.indicator |
a character string of length 1 indicating which
value in |
comprisk |
If |
legend |
Logical indicating if a legend should be added to the plot.
Note that if you want to change the colors of the points, through the
|
ncol |
Deprecated. Use |
legend.position |
Deprecated. Specify the legend.position argument
instead in the |
line.width |
Deprecated. |
line.colour |
Deprecated. specify the fill argument instead in
|
point.size |
Deprecated. specify the size argument instead in the
|
point.colour |
Deprecated. Specify the values argument instead in the
|
data |
a |
time |
a character string giving the name of the time variable. See Details. |
event |
a character string giving the name of the event variable
contained in |
exposure |
a character string of length 1 giving the name of the
exposure variable which must be contained in |
percentile_number |
Default=0.5. Give a value between 0-1. if the percentile number of available subjects at any given point is less than 10, then sample regardless of case status. Depending on distribution of survival times and events event points may not be evenly distributed with default value. |
This function leverages the ggplot2
package to build
population time plots. It builds the plot by adding layers, starting with a
layer for the area representing the population time. It then sequentially
adds points to the plots to show the casebase sampling mechanism. This
function gives user the flexibility to add any combination of the
case.series, base.series and competing events. The case series and
competing events are sampled at random vertically on the plot in order to
visualise the incidence density using the popTime
function.
That is, imagine we draw a vertical line at a specific event time. We then
plot the point at a randomly sampled y-coordinate along this vertical line.
This is done to avoid having all points along the upper edge of the plot
(because the subjects with the least amount of observation time are plotted
at the top of the y-axis). By randomly distributing them, we can get a
better sense of the incidence density. The base series is sampled
horizontally on the plot using the sampleCaseBase
function.
It is assumed that data
contains the two columns
corresponding to the supplied time and event variables. If either the
time
or event
argument is missing, the function looks for
columns that contain the words "time"
, "event"
, or
"status"
in them (case insensitive). The function first looks for
the time variable, then it looks for the event variable. This order of
operation is important if for example the time variable is named
"event time"
and the event variable is named "event
indicator"
. This function will first (automatically) find the time
variable and remove this as a possibility from subsequent searches of the
event variable. The following regular expressions are used for the time and
event variables:
"[\s\W_]+time|^time\b"
"[\s\W_]+event|^event\b|[\s\W_]+status|^status\b"
This allows for "time"
to be preceded or followed by one or more
white space characters, one or more non-word characters or one or more
underscores. For example, the following column names would be recognized by
the function as the "time"
variable: "time of death",
"death_time", "Time", "time", "diagnosis_time", "time.diag", "diag__time"
.
But the following will not be recognized: "diagtime","eventtime",
"Timediag"
The methods for plot
return a population time plot, stratified
by exposure status in the case of popTimeExposure
. Note that these
are ggplot2
objects and can therefore be used in subsequent ggplot2
type plots. See examples and vignette for details.
An object of class popTime
(or popTimeExposure
if
exposure is specified), data.table
and data.frame
in this
order! The output of this function is to be used with the plot method for
objects of class popTime
or of class popTimeExposure
, which
will produce population time plots. This dataset augments the original data
with the following columns:
value of the
time variable in the original dataset - the one specified by the
time
user argument to this function
value of
the event variable in the original dataset - the one specified by the
event
user argument to this function
renames the user specified time column to time
renames the user specified event argument to event
ggplot2::geom_point()
,ggplot2::geom_ribbon()
,ggplot2::theme()
,
ggplot2::scale_colour_manual()
, ggplot2::scale_fill_manual()
,
sampleCaseBase
# change color of points library(ggplot2) data("bmtcrr") popTimeData <- popTime(data = bmtcrr, time = "ftime", event = "Status") fill_cols <- c("Case series" = "black", "Competing event" = "#009E73", "Base series" = "#0072B2") color_cols <- c("Case series" = "black", "Competing event" = "black", "Base series" = "black") plot(popTimeData, add.case.series = TRUE, add.base.series = TRUE, add.competing.event = FALSE, legend = TRUE, comprisk = TRUE, fill.params = list( name = element_blank(), breaks = c("Case series", "Competing event", "Base series"), values = fill_cols ), color.params = list( name = element_blank(), breaks = c("Case series", "Competing event", "Base series"), values = color_cols ) ) data("bmtcrr") popTimeData <- popTime(data = bmtcrr, time = "ftime") class(popTimeData) popTimeData <- popTime(data = bmtcrr, time = "ftime", exposure = "D") attr(popTimeData, "exposure")
# change color of points library(ggplot2) data("bmtcrr") popTimeData <- popTime(data = bmtcrr, time = "ftime", event = "Status") fill_cols <- c("Case series" = "black", "Competing event" = "#009E73", "Base series" = "#0072B2") color_cols <- c("Case series" = "black", "Competing event" = "black", "Base series" = "black") plot(popTimeData, add.case.series = TRUE, add.base.series = TRUE, add.competing.event = FALSE, legend = TRUE, comprisk = TRUE, fill.params = list( name = element_blank(), breaks = c("Case series", "Competing event", "Base series"), values = fill_cols ), color.params = list( name = element_blank(), breaks = c("Case series", "Competing event", "Base series"), values = color_cols ) ) data("bmtcrr") popTimeData <- popTime(data = bmtcrr, time = "ftime") class(popTimeData) popTimeData <- popTime(data = bmtcrr, time = "ftime", exposure = "D") attr(popTimeData, "exposure")
Plot method for objects returned by the fitSmoothHazard
function. Current plot types are hazard function and hazard ratio. The
visreg
package must be installed for type="hazard"
. This
function accounts for the possible time-varying exposure effects.
plotHazardRatio(x, newdata, newdata2, ci, ci.lvl, ci.col, rug, xvar, ...) ## S3 method for class 'singleEventCB' plot( x, ..., type = c("hazard", "hr"), hazard.params = list(), newdata, exposed, increment = 1, var, xvar = NULL, ci = FALSE, ci.lvl = 0.95, rug = !ci, ci.col = "grey" ) incrVar(var, increment = 1)
plotHazardRatio(x, newdata, newdata2, ci, ci.lvl, ci.col, rug, xvar, ...) ## S3 method for class 'singleEventCB' plot( x, ..., type = c("hazard", "hr"), hazard.params = list(), newdata, exposed, increment = 1, var, xvar = NULL, ci = FALSE, ci.lvl = 0.95, rug = !ci, ci.col = "grey" ) incrVar(var, increment = 1)
x |
Fitted object of class |
newdata |
Required for |
newdata2 |
|
ci |
Logical; if TRUE confidence bands are calculated. Only available
for |
ci.lvl |
Confidence level. Must be in (0,1), Default: 0.95. Only used
for |
ci.col |
Confidence band color. Only used if argument |
rug |
Logical. Adds a rug representation (1-d plot) of the event times
(only for |
xvar |
Variable to be used on x-axis for hazard ratio plots. If NULL,
the function defaults to using the time variable used in the call to
|
... |
further arguments passed to |
type |
plot type. Choose one of either |
hazard.params |
Named list of arguments which will override the defaults
passed to |
exposed |
function that takes |
increment |
Numeric value indicating how much to increment (if positive)
or decrement (if negative) the |
var |
specify the variable name for the exposed/unexposed (name is given
as a character variable). If this argument is missing, then the
|
This function has only been thoroughly tested for family="glm"
. If
the user wants more customized plot aesthetics, we recommend saving the
results to a data.frame
and using the graphical package of their choice.
a plot of the hazard function or hazard ratio. For type="hazard"
, a
data.frame
(returned invisibly) of the original data used in the fitting
along with the data used to create the plots including predictedhazard
which is the predicted hazard for a given covariate pattern and time.
predictedloghazard
is the predicted hazard on the log scale. lowerbound
and upperbound
are the lower and upper confidence interval bounds on the
hazard scale (i.e. used to plot the confidence bands). standarderror
is
the standard error of the log hazard or log hazard ratio (only if
family="glm"
or family="gam"
). For type="hr"
, log_hazard_ratio
and
hazard_ratio
is returned, and if ci=TRUE
, standarderror
(on the log
scale) and lowerbound
and upperbound
of the hazard_ratio
are
returned.
Mark Clements and Xing-Rong Liu (2019). rstpm2: Smooth Survival Models, Including Generalized Survival Models. R package version 1.5.1. https://CRAN.R-project.org/package=rstpm2
Breheny P and Burchett W (2017). Visualization of Regression Models Using visreg. The R Journal, 9: 56-71.
modifyList
, fitSmoothHazard()
,
graphics::par()
, visreg::visreg()
if (requireNamespace("splines", quietly = TRUE)) { data("simdat") # from casebase package library(splines) simdat <- transform(simdat[sample(1:nrow(simdat), size = 200),], treat = factor(trt, levels = 0:1, labels = c("control","treatment"))) fit_numeric_exposure <- fitSmoothHazard(status ~ trt*bs(eventtime), data = simdat, ratio = 1, time = "eventtime") fit_factor_exposure <- fitSmoothHazard(status ~ treat*bs(eventtime), data = simdat, ratio = 1, time = "eventtime") newtime <- quantile(fit_factor_exposure[["data"]][[fit_factor_exposure[["timeVar"]]]], probs = seq(0.05, 0.95, 0.01)) par(mfrow = c(1,3)) plot(fit_numeric_exposure, type = "hr", newdata = data.frame(trt = 0, eventtime = newtime), exposed = function(data) transform(data, trt = 1), xvar = "eventtime", ci = TRUE) #by default this will increment `var` by 1 for exposed category plot(fit_factor_exposure, type = "hr", newdata = data.frame(treat = factor("control", levels = c("control","treatment")), eventtime = newtime), var = "treat", increment = 1, xvar = "eventtime", ci = TRUE, ci.col = "lightblue", xlab = "Time", main = "Hazard Ratio for Treatment", ylab = "Hazard Ratio", lty = 5, lwd = 7, rug = TRUE) # we can also decrement `var` by 1 to give hazard ratio for control/treatment result <- plot(fit_factor_exposure, type = "hr", newdata = data.frame(treat = factor("treatment", levels = c("control","treatment")), eventtime = newtime), var = "treat", increment = -1, xvar = "eventtime", ci = TRUE) # see data used to create plot head(result) }
if (requireNamespace("splines", quietly = TRUE)) { data("simdat") # from casebase package library(splines) simdat <- transform(simdat[sample(1:nrow(simdat), size = 200),], treat = factor(trt, levels = 0:1, labels = c("control","treatment"))) fit_numeric_exposure <- fitSmoothHazard(status ~ trt*bs(eventtime), data = simdat, ratio = 1, time = "eventtime") fit_factor_exposure <- fitSmoothHazard(status ~ treat*bs(eventtime), data = simdat, ratio = 1, time = "eventtime") newtime <- quantile(fit_factor_exposure[["data"]][[fit_factor_exposure[["timeVar"]]]], probs = seq(0.05, 0.95, 0.01)) par(mfrow = c(1,3)) plot(fit_numeric_exposure, type = "hr", newdata = data.frame(trt = 0, eventtime = newtime), exposed = function(data) transform(data, trt = 1), xvar = "eventtime", ci = TRUE) #by default this will increment `var` by 1 for exposed category plot(fit_factor_exposure, type = "hr", newdata = data.frame(treat = factor("control", levels = c("control","treatment")), eventtime = newtime), var = "treat", increment = 1, xvar = "eventtime", ci = TRUE, ci.col = "lightblue", xlab = "Time", main = "Hazard Ratio for Treatment", ylab = "Hazard Ratio", lty = 5, lwd = 7, rug = TRUE) # we can also decrement `var` by 1 to give hazard ratio for control/treatment result <- plot(fit_factor_exposure, type = "hr", newdata = data.frame(treat = factor("treatment", levels = c("control","treatment")), eventtime = newtime), var = "treat", increment = -1, xvar = "eventtime", ci = TRUE) # see data used to create plot head(result) }
This function implements the case-base sampling approach described in Hanley and Miettinen (2009). It can be used to fit smooth-in-time parametric functions easily via logistic regression.
sampleCaseBase( data, time, event, ratio = 10, comprisk = FALSE, censored.indicator )
sampleCaseBase( data, time, event, ratio = 10, comprisk = FALSE, censored.indicator )
data |
a data.frame or data.table containing the source dataset. |
time |
a character string giving the name of the time variable. See Details. |
event |
a character string giving the name of the event variable. See Details. |
ratio |
Integer, giving the ratio of the size of the base series to that of the case series. Defaults to 10. |
comprisk |
Logical. Indicates whether we have multiple event types and that we want to consider some of them as competing risks. |
censored.indicator |
a character string of length 1 indicating which
value in |
The base series is sampled using a multinomial scheme: individuals are sampled proportionally to their follow-up time.
It is assumed that data
contains the two columns corresponding to the
supplied time and event variables. If either the time
or event
argument is missing, the function looks for columns with appropriate-looking
names (see checkArgsTimeEvent
).
The function returns a dataset, with the same format as the source dataset, and where each row corresponds to a person-moment sampled from the case or the base series.
The offset is calculated using the total follow-up time for
all individuals in the study. Therefore, we need time
to be on the
original scale, not a transformed scale (e.g. logarithmic). Otherwise, the
offset and the estimation will be wrong.
# Simulate censored survival data for two outcome types from exponential library(data.table) set.seed(12345) nobs <- 500 tlim <- 10 # simulation parameters b1 <- 200 b2 <- 50 # event type 0-censored, 1-event of interest, 2-competing event # t observed time/endpoint # z is a binary covariate DT <- data.table(z = rbinom(nobs, 1, 0.5)) DT[, `:=`( "t_event" = rweibull(nobs, 1, b1), "t_comp" = rweibull(nobs, 1, b2) )] DT[, `:=`( "event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp), "time" = pmin(t_event, t_comp) )] DT[time >= tlim, `:=`("event" = 0, "time" = tlim)] out <- sampleCaseBase(DT, time = "time", event = "event", comprisk = TRUE)
# Simulate censored survival data for two outcome types from exponential library(data.table) set.seed(12345) nobs <- 500 tlim <- 10 # simulation parameters b1 <- 200 b2 <- 50 # event type 0-censored, 1-event of interest, 2-competing event # t observed time/endpoint # z is a binary covariate DT <- data.table(z = rbinom(nobs, 1, 0.5)) DT[, `:=`( "t_event" = rweibull(nobs, 1, b1), "t_comp" = rweibull(nobs, 1, b2) )] DT[, `:=`( "event" = 1 * (t_event < t_comp) + 2 * (t_event >= t_comp), "time" = pmin(t_event, t_comp) )] DT[time >= tlim, `:=`("event" = 0, "time" = tlim)] out <- sampleCaseBase(DT, time = "time", event = "event", comprisk = TRUE)
This simulated data is and description is taken verbatim from the
simsurv
.
simdat
simdat
A dataframe with 1000 observations and 4 variables:
patient id
time of event
event indicator (1 = event, 0 = censored)
binary treatment indicator
Simulated data under a standard Weibull survival model that incorporates a time-dependent treatment effect (i.e. non-proportional hazards). For the time-dependent effect we included a single binary covariate (e.g. a treatment indicator) with a protective effect (i.e. a negative log hazard ratio), but we will allow the effect of the covariate to diminish over time. The data generating model will be
where where Xi is the binary
treatment indicator for individual i, and
are the
scale and shape parameters for the Weibull baseline hazard,
is
the log hazard ratio for treatment when t=1 (i.e. when log(t)=0), and
quantifies the amount by which the log hazard ratio for
treatment changes for each one unit increase in log(t). Here we are assuming
the time-dependent effect is induced by interacting the log hazard ratio with
log time. The true parameters are 1.
= -0.5 2.
=
0.15 3.
= 0.1 4.
= 1.5
See simsurv
vignette:
https://cran.r-project.org/package=simsurv/vignettes/simsurv_usage.html
Sam Brilleman (2019). simsurv: Simulate Survival Data. R package version 0.2.3. https://CRAN.R-project.org/package=simsurv
if (requireNamespace("splines", quietly = TRUE)) { library(splines) data("simdat") mod_cb <- casebase::fitSmoothHazard(status ~ trt + ns(log(eventtime), df = 3) + trt:ns(log(eventtime),df=1), time = "eventtime", data = simdat, ratio = 1) }
if (requireNamespace("splines", quietly = TRUE)) { library(splines) data("simdat") mod_cb <- casebase::fitSmoothHazard(status ~ trt + ns(log(eventtime), df = 3) + trt:ns(log(eventtime),df=1), time = "eventtime", data = simdat, ratio = 1) }
The SUPPORT dataset tracks four response variables: hospital death, severe functional disability, hospital costs, and time until death and death itself. The patients are followed for up to 5.56 years. Data included only tracks follow-up time and death.
support
support
A dataframe with 9104 observations and 34 variables after imputation and the removal of response variables like hospital charges, patient ratio of costs to charges and micro-costs. Ordinal variables, namely functional disability and income, were also removed. Finally, Surrogate activities of daily living were removed due to sparsity. There were 6 other model scores in the data-set and they were removed; only aps and sps were kept.
Stores a double representing age.
Death at any time up to NDI date: 31DEC94.
0=female, 1=male.
Days from study entry to discharge.
days of follow-up.
Each level of dzgroup: ARF/MOSF w/Sepsis, COPD, CHF, Cirrhosis, Coma, Colon Cancer, Lung Cancer, MOSF with malignancy.
ARF/MOSF, COPD/CHF/Cirrhosis, Coma and cancer disease classes.
the number of comorbidities.
years of education of patient.
The SUPPORT coma score based on Glasgow D3.
Average TISS, days 3-25.
Indicates race. White, Black, Asian, Hispanic or other.
Day in Hospital at Study Admit
Diabetes (Com 27-28, Dx 73)
Dementia (Comorbidity 6)
Cancer State
Mean Arterial Blood Pressure Day 3.
White blood cell count on day 3.
Heart rate day 3.
Respiration Rate day 3.
Temperature, in Celsius, on day 3.
PaO2/(0.01*FiO2) Day 3.
Serum albumin day 3.
Bilirubin Day 3.
Serum creatinine day 3.
Serum sodium day 3.
Serum pH (in arteries) day 3.
Serum glucose day 3.
BUN day 3.
urine output day 3.
ADL patient day 3.
Imputed ADL calibrated to surrogate, if a surrogate was used for a follow up.
SUPPORT physiology score
Apache III physiology score
Some of the original data was missing. Before imputation, there were
a total of 9105 individuals and 47 variables. Of those variables, a few
were removed before imputation. We removed three response variables:
hospital charges, patient ratio of costs to charge,s and patient
micro-costs. Next, we removed hospital death as it was directly informative
of our event of interest, namely death. We also removed functional
disability and income as they are ordinal covariates. Finally, we removed 8
covariates related to the results of previous findings: we removed SUPPORT
day 3 physiology score (sps
), APACHE III day 3 physiology score
(aps
), SUPPORT model 2-month survival estimate, SUPPORT model
6-month survival estimate, Physician's 2-month survival estimate for pt.,
Physician's 6-month survival estimate for pt., Patient had Do Not
Resuscitate (DNR) order, and Day of DNR order (<0 if before study). Of
these, sps
and aps
were added on after imputation, as they
were missing only 1 observation. First we imputed manually using the normal
values for physiological measures recommended by Knaus et al. (1995). Next,
we imputed a single dataset using mice with default settings. After
imputation, we noted that the covariate for surrogate activities of daily
living was not imputed. This is due to collinearity between the other two
covariates for activities of daily living. Therefore, surrogate activities
of daily living was removed.
Available at the following website: https://biostat.app.vumc.org/wiki/Main/SupportDesc. note: must unzip and process this data before use.
Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191-203. doi:10.7326/0003-4819-122-3-199502010-00007.
http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/Csupport.html
data("support") # Using the matrix interface and log of time x <- model.matrix(death ~ . - d.time - 1, data = support) y <- with(support, cbind(death, d.time)) fit_cb <- casebase::fitSmoothHazard.fit(x, y, time = "d.time", event = "death", formula_time = ~ log(d.time), ratio = 1)
data("support") # Using the matrix interface and log of time x <- model.matrix(death ~ . - d.time - 1, data = support) y <- with(support, cbind(death, d.time)) fit_cb <- casebase::fitSmoothHazard.fit(x, y, time = "d.time", event = "death", formula_time = ~ log(d.time), ratio = 1)