This laboratory introduces you to the R programming environment and basic data analysis techniques. R was specifically designed for statistics and data analysis, so tasks such as loading data, computing summaries, and creating visualisations are native parts of the language and its standard ecosystem.
As a first quick-and-dirty example, run the following code, which reads measurement data directly from the course website and inspects it:
tab = read.table("https://ccfd.github.io/courses/data/info1/przebieg.txt")
str(tab)
summary(tab)
plot(tab)R can be downloaded from CRAN. For this course, we recommend using RStudio, which provides a user-friendly interface for R.
Start RStudio and familiarize yourself with the interface: - Console: Where you type and execute R commands - Script Editor: Where you write and save your R scripts - Environment: Shows objects in your workspace - Plots/Help: Displays plots and help documentation
Execute the following commands in the R console:
# Basic arithmetic operations
2 + 3
5 * 4
10 / 2
2^3
# Variable assignment
x <- 5
y <- 10
x + y
# Creating vectors
numbers <- c(1, 2, 3, 4, 5)
numbersTask: Create a vector containing the first 10 even numbers and calculate their sum.
A data frame is R’s primary structure for storing tabular data. Create a simple data frame:
# Create a data frame
students <- data.frame(
name = c("Alice", "Bob", "Charlie", "Diana"),
age = c(20, 21, 19, 22),
score = c(85, 92, 78, 95)
)
students
# View structure of the data frame
str(students)
# Summary statistics
summary(students)Task: Create a data frame with information about 5 products, including columns for product name, price, and quantity in stock.
R can read data from various file formats. For CSV files:
# Read a CSV file (if it exists)
# data <- read.csv("filename.csv")
# For this exercise, create sample data
set.seed(123)
sample_data <- data.frame(
id = 1:20,
value = rnorm(20, mean = 50, sd = 10)
)Task: Create a data frame with 30 observations of a variable following a normal distribution with mean 100 and standard deviation 15.
Calculate basic descriptive statistics:
# Generate sample data
data <- rnorm(100, mean = 50, sd = 10)
# Mean
mean(data)
# Median
median(data)
# Standard deviation
sd(data)
# Variance
var(data)
# Minimum and maximum
min(data)
max(data)
# Range
range(data)
# Quantiles
quantile(data, c(0.25, 0.5, 0.75))Task: Generate 200 random numbers from a uniform distribution between 0 and 100, and calculate all descriptive statistics mentioned above.
Create basic plots:
# Histogram
hist(data, main = "Histogram of Data", xlab = "Values")
# Boxplot
boxplot(data, main = "Boxplot of Data")
# Scatter plot (if you have two variables)
x <- 1:50
y <- x + rnorm(50, 0, 5)
plot(x, y, main = "Scatter Plot", xlab = "X", ylab = "Y")Task: Create a histogram and boxplot for the uniform distribution data from the previous task.
R comes with many built-in datasets. Explore one:
# Load a built-in dataset
data(mtcars)
# View first few rows
head(mtcars)
# View structure
str(mtcars)
# Summary statistics
summary(mtcars)
# Access specific columns
mtcars$mpg
mtcars[, "mpg"]Task: Using the mtcars dataset: 1.
Calculate the mean and standard deviation of miles per gallon (mpg) 2.
Find the car with the highest horsepower (hp) 3. Create a scatter plot
of mpg vs. hp
Select specific rows and columns:
# Select rows where mpg > 20
high_mpg <- mtcars[mtcars$mpg > 20, ]
high_mpg
# Select specific columns
mpg_hp <- mtcars[, c("mpg", "hp")]
mpg_hp
# Combine conditions
fast_efficient <- mtcars[mtcars$mpg > 20 & mtcars$hp > 100, ]
fast_efficientTask: From the mtcars dataset, create a
subset containing only cars with: - More than 6 cylinders - Weight (wt)
less than 3.5 Display the mean horsepower for this subset.
# Add a new column
mtcars$km_per_liter <- mtcars$mpg * 0.425144
head(mtcars)
# Modify existing column (convert weight to kg)
mtcars$wt_kg <- mtcars$wt * 453.592
head(mtcars)Task: Add a column to mtcars that
categorizes cars as “Light” (wt < 3), “Medium” (3 ≤ wt < 4), or
“Heavy” (wt ≥ 4).
Complete the following exercises:
mtcars dataset:
?function_name or
help(function_name) for help on any function