lm
in Rlm in RThis instruction introduces simple linear regression in R using the
lm() function. You will:
lm() fits (conceptually)The simple linear regression model has the form:
\[y = \\beta_0 + \\beta_1 x + \\varepsilon,\]
where:
x and y are continuous variables,beta0 and beta1 are the intercept and
slope,eps is random noise.In R, you specify the model using a formula:
Interpretation:
y ~ x means “model y as a linear function
of x”.lm() estimates beta0 and
beta1 using least squares.Create a dataset where the true relationship is linear:
set.seed(1)
n <- 60
x <- runif(n, min = 0, max = 10) # continuous predictor
beta0_true <- 2
beta1_true <- 3
sigma_true <- 1.0
eps <- rnorm(n, mean = 0, sd = sigma_true) # noise
y <- beta0_true + beta1_true * x + eps
df <- data.frame(x = x, y = y)Optional sanity check:
lm()Key quantities you should look for in summary(fit):
(Intercept)) and slope
(x),R-squared (how much of the variance is explained by
the linear model),You can also view coefficients directly:
Task 2.1: Compare the estimated coefficients with the true values.
coef(fit) and check how close the estimates are
to:
beta0_true = 2beta1_true = 3Task 2.2: Change the noise level and observe what happens.
sigma_true (for
example 0.3, 1.5, 3.0).summary(fit),Task 2.3: Visual comparison.
abline(fit)).The Iris dataset is built-in in R:
We will model Sepal.Length as a linear function of
Petal.Length.
Plot observed points and fitted line:
The slope coefficient (Petal.Length) answers:
“By how much does the expected Sepal.Length change when
Petal.Length increases by 1 (one unit)?”
Task 4.1: Report and interpret.
summary(m1) (or coef(m1)).R-squared value.Task 4.2: Predict at one value.
Petal.Length, for example
x0 <- 4.0.predict(m1, newdata = data.frame(Petal.Length = x0)).Petal.Length close to
x0 and compare the observed Sepal.Length to
the prediction.SpeciesNow we extend the model by adding Species (a factor with
3 levels). This is an example where the response
(Sepal.Length) is continuous, but one predictor is
categorical.
SpeciesNotes:
Species is automatically treated as a factor by R
because it is stored as such in iris.To see the reference levels explicitly:
Species added or not)This compares the continuous-only model (m1) with the
extended model (m2).
sp_levels <- levels(iris$Species)
cols <- 1:length(sp_levels)
plot(iris$Petal.Length, iris$Sepal.Length,
pch = 19, col = as.numeric(iris$Species),
xlab = "Petal.Length", ylab = "Sepal.Length")
legend("topleft",
legend = sp_levels,
col = cols, pch = 19, bty = "n")
xx <- seq(min(iris$Petal.Length), max(iris$Petal.Length), length.out = 100)
for (k in seq_along(sp_levels)) {
sp <- sp_levels[k]
new_df <- data.frame(Petal.Length = xx, Species = sp)
yy <- predict(m2, newdata = new_df)
lines(xx, yy, col = k, lwd = 2)
}Task 6.1: Identify species effects.
summary(m2) and find which Species
coefficients have small p-values.Petal.Length)?Task 6.2: Compare predictions across species.
x0 <- 4.5.predict(m2, ...).Sepal.Length at
Petal.Length = 4.5?With lm() you can:
y ~ x,summary(),y ~ x + Species.If you want to model different slopes for different factor
levels, you can use interactions such as y ~ x * Species
(optional extension).