In this lab you will fit logistic regression models on synthetic data describing pilot training outcomes in a flight simulator. The response variable is binary:
success = 1 (pilot passed the simulator task),success = 0 (pilot failed the simulator task).Predictors include:
exam_score (continuous),prev_sim_training (factor: whether the pilot had
previous simulator training).When the response is binary, linear regression is not appropriate
because predicted values can fall outside [0, 1]. Logistic
regression models the probability of success:
\[P(success = 1 \mid x) = p,\]
with the logit link:
\[\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \cdot exam\_score + \beta_2 \cdot prev\_sim\_training.\]
In R:
We generate a realistic synthetic dataset for trainees.
set.seed(2026)
n <- 400
# Continuous predictor: theoretical exam score (0-100)
exam_score <- pmin(100, pmax(0, rnorm(n, mean = 68, sd = 12)))
# Factor predictor: prior simulator training
prev_sim_training <- sample(c("no", "yes"), size = n, replace = TRUE, prob = c(0.6, 0.4))
prev_sim_training <- factor(prev_sim_training, levels = c("no", "yes"))
# True data-generating logistic model (unknown in practice)
# Higher exam score and prior simulator training increase success chance.
eta <- -7.0 + 0.09 * exam_score + 1.1 * (prev_sim_training == "yes")
p_success <- 1 / (1 + exp(-eta))
# Binary outcome
success <- rbinom(n, size = 1, prob = p_success)
pilot_df <- data.frame(
exam_score = exam_score,
prev_sim_training = prev_sim_training,
success = success
)
head(pilot_df)
str(pilot_df)Quick check of success rate:
For logistic regression, exp(coef) gives odds
ratios.
Interpretation examples:
exp(beta_exam_score) = multiplicative change in odds
for a 1-point score increase.exp(beta_prev_sim_trainingyes) = odds ratio for “yes”
vs “no” prior training.score_grid <- seq(30, 100, by = 1)
pred_no <- data.frame(
exam_score = score_grid,
prev_sim_training = factor("no", levels = c("no", "yes"))
)
pred_yes <- data.frame(
exam_score = score_grid,
prev_sim_training = factor("yes", levels = c("no", "yes"))
)
p_no <- predict(m_both, newdata = pred_no, type = "response")
p_yes <- predict(m_both, newdata = pred_yes, type = "response")
plot(score_grid, p_no, type = "l", lwd = 2, col = "red",
ylim = c(0, 1), xlab = "Exam score", ylab = "Predicted probability of success",
main = "Predicted success probability")
lines(score_grid, p_yes, lwd = 2, col = "blue")
legend("topleft", legend = c("no prior simulator training", "yes prior simulator training"),
col = c("red", "blue"), lwd = 2, bty = "n")Task 8.1 - Build and interpret
m_score, m_train, and
m_both.Task 8.2 - Odds ratios
exp(coef(m_both)).exam_score,prev_sim_trainingyes.Task 8.3 - Predictions
Task 8.4 - Sensitivity experiment
Change data-generating parameters and re-run the lab:
exam_score,n.For each change, observe:
Test whether exam score effect differs by training group:
m_int <- glm(success ~ exam_score * prev_sim_training,
family = binomial,
data = pilot_df)
summary(m_int)
anova(m_both, m_int, test = "Chisq")Interpretation idea:
This lab demonstrates logistic regression with:
exam_score),prev_sim_training).Using synthetic data lets you control the true mechanism and directly observe how model outputs (coefficients, p-values, and predicted probabilities) respond.