survival packagesurvival packageIn this instruction you will:
survival
package,We will use a built-in dataset (lung) that ships with
the survival package.
Install and load the package:
Key ideas:
1 if event occurred,
0 if right-censored (we only know the event did not occur
up to a certain time).lung dataset and Surv objectsInspect the built-in lung dataset:
We will use:
time – survival time in days,status – event indicator (1 = censored,
2 = death),sex – 1 = male,
2 = female,age – age in years.Create a Surv object:
lung$event <- ifelse(lung$status == 2, 1, 0) # 1 = death, 0 = censored
surv_obj <- Surv(time = lung$time, event = lung$event)
head(surv_obj)Task 2.1
Check how many censored vs event observations there are
(table(lung$event)).
Task 2.2
Compute the median and quartiles of time separately for men
and women.
Fit a Kaplan–Meier estimator with survfit:
Plot:
Fit separate curves by sex:
lung$sex_factor <- factor(lung$sex, levels = c(1, 2), labels = c("Male", "Female"))
km_sex <- survfit(Surv(time, event) ~ sex_factor, data = lung)
summary(km_sex)Plot both curves:
Task 3.1
Read off from the plot (or summary(km_sex)) the median
survival time for men and for women.
Task 3.2
Add a grid to the plot (e.g. using grid() after
plot) and comment briefly on which group appears to have
better survival.
The Cox model relates the hazard to covariates:
\[h(t \mid x) = h_0(t) \exp(\beta_1 x_1 + \beta_2 x_2 + \cdots),\]
where:
Model with sex only:
Key outputs:
sex_factorFemale,exp(coef) (hazard ratio),Interpretation examples:
exp(coef)[ "sex_factorFemale" ] – relative hazard for
females vs males after adjusting for age.exp(coef)[ "age" ] – multiplicative change in hazard
for a 1-year increase in age.You can plot adjusted survival curves for typical covariate values:
newdata <- data.frame(
sex_factor = factor(c("Male", "Female"), levels = levels(lung$sex_factor)),
age = rep(median(lung$age, na.rm = TRUE), 2)
)
fit_surv <- survfit(cox_sex_age, newdata = newdata)
plot(fit_surv,
col = c("red", "blue"),
lwd = 2,
xlab = "Days",
ylab = "Survival probability",
main = "Adjusted survival curves (median age)")
legend("topright",
legend = c("Male", "Female"),
col = c("red", "blue"),
lwd = 2,
bty = "n")Task 4.1
From summary(cox_sex_age), extract:
exp(10 * coef["age"])).Task 4.2
Write 3–4 sentences interpreting these hazard ratios in plain
language.
The Cox model assumes that hazard ratios are constant over
time. We can check this with cox.zph():
Interpretation:
cox.zph test for deviations from
proportional hazards,Task 5.1
Based on cox.zph(cox_sex_age), comment whether the
proportional hazards assumption appears reasonable for:
sex_factor,age.Task 5.2 (optional)
Try fitting a model with an additional covariate
(e.g. ph.ecog performance status) and repeat the
cox.zph check.
To better understand model behaviour, you can simulate your own survival data.
Example with one binary covariate (treatment) and
exponential baseline hazard:
set.seed(101)
n <- 300
treatment <- rbinom(n, size = 1, prob = 0.5)
lambda0 <- 0.002 # baseline rate
beta_trt <- log(0.6) # hazard ratio ~ 0.6 for treatment vs control
u <- runif(n)
time_sim <- -log(u) / (lambda0 * exp(beta_trt * treatment))
censor_time <- rexp(n, rate = 0.0008)
time_obs <- pmin(time_sim, censor_time)
event_sim <- as.integer(time_sim <= censor_time)
sim_df <- data.frame(
time = time_obs,
event = event_sim,
treatment = factor(treatment, levels = c(0, 1), labels = c("Control", "Treatment"))
)
sim_cox <- coxph(Surv(time, event) ~ treatment, data = sim_df)
summary(sim_cox)
exp(coef(sim_cox))sim_cox with
the true value exp(beta_trt) ≈ 0.6.treatment and comment on
how well they reflect the hazard ratio.In this lab you:
Surv objects,These tools are the foundation of practical survival analysis in R
using the survival package.