In this laboratory you will learn how to connect to a MariaDB / MySQL
database from R, send SELECT queries and work with the
results as regular R data frames. We will use the public
world database on the teaching server and repeat most of
the SELECT exercises from the SQL lab, but in the context
of R.
The database server is:
info3.meil.pw.edu.plworldstatisticsstatisticsTo connect from R we will use the DBI interface and the
RMariaDB (or RMySQL) driver.
Then load the packages:
Create a connection object to the world database:
con <- dbConnect(
RMariaDB::MariaDB(),
host = "info3.meil.pw.edu.pl",
dbname = "world",
user = "statistics",
password = "statistics"
)If the connection is successful, con will be a DBI
connection object. You can list tables in the database:
and list columns of a given table, for example
Country:
At the end of the session, always close the connection:
In the rest of this instruction we assume that con is an
open connection to the world database.
SELECT queries from RTo run a SELECT query and get the results as a data
frame, use dbGetQuery(con, "SQL HERE").
SQL task: display all information contained in the
Country table.
Task 2.1
country data frame.dim(country).SQL task: display all values for two columns, for example
Name and Region.
Task 2.2
Modify the query so that it returns only Name and
Continent.
WHERESQL task: display names of all countries in Europe together with life expectancy.
qry <- "
SELECT Name, LifeExpectancy
FROM Country
WHERE Continent = 'Europe';
"
eu_life <- dbGetQuery(con, qry)
head(eu_life)Task 2.3
nrow(eu_life).mean(eu_life$LifeExpectancy, na.rm = TRUE).INSQL task: display names of all countries in Europe and Asia with life expectancy.
qry <- "
SELECT Name, LifeExpectancy
FROM Country
WHERE Continent IN ('Europe', 'Asia');
"
ea_life <- dbGetQuery(con, qry)
head(ea_life)Task 2.4
Using R, compute separate averages of LifeExpectancy for
Europe and for Asia from ea_life.
Hint: you can use
tapply(ea_life$LifeExpectancy, ea_life$Continent, mean, na.rm = TRUE).
SQL task: sort the information from the previous point by life expectancy.
qry <- "
SELECT Name, Continent, LifeExpectancy
FROM Country
WHERE Continent IN ('Europe', 'Asia')
ORDER BY LifeExpectancy DESC;
"
ea_sorted <- dbGetQuery(con, qry)
head(ea_sorted)
tail(ea_sorted)Task 2.5
Identify: - the country with the highest life expectancy in Europe or
Asia, - the country with the lowest life expectancy in these
continents.
SUM, AVGSQL task: display the total population and the average population of countries in Europe.
qry_sum <- "
SELECT SUM(Population) AS total_pop
FROM Country
WHERE Continent = 'Europe';
"
dbGetQuery(con, qry_sum)
qry_avg <- "
SELECT AVG(Population) AS avg_pop
FROM Country
WHERE Continent = 'Europe';
"
dbGetQuery(con, qry_avg)Task 2.6
qry_sum in an object and extract
the numeric value (e.g. res$total_pop).qry_avg.LIKESQL task: display names and codes of all countries whose names start
with "Ch".
Task 2.7
Modify the query to select all countries whose names contain the
substring "land" (use '%land%').
City tableSQL task: display all cities in Finland (country code
"FIN").
qry <- "
SELECT *
FROM City
WHERE CountryCode = 'FIN';
"
fin_cities <- dbGetQuery(con, qry)
head(fin_cities)Task 2.8
ORDER BY or sort in R using
fin_cities[order(-fin_cities$Population), ]).SQL task: display all cities in Poland, sorted by district.
qry <- "
SELECT *
FROM City
WHERE CountryCode = 'POL'
ORDER BY District;
"
pol_cities <- dbGetQuery(con, qry)
head(pol_cities)Task 2.9
Using R, create a table showing, for each district in Poland
(District), how many cities are listed in the table.
Hint: table(pol_cities$District).
SQL task: display names of countries that became independent after 1980.
qry <- "
SELECT Name, IndepYear
FROM Country
WHERE IndepYear > 1980;
"
indep_1980 <- dbGetQuery(con, qry)
head(indep_1980)Task 2.10
IndepYear not
NA?SQL task: display names of North American countries that became
independent between 1800 and 1900, sorted by IndepYear.
qry <- "
SELECT Name, IndepYear
FROM Country
WHERE Continent = 'North America'
AND IndepYear > 1800
AND IndepYear < 1900
ORDER BY IndepYear;
"
na_indep <- dbGetQuery(con, qry)
na_indepTask 2.11
Using R, compute the time span (difference between maximum and minimum
IndepYear) for these countries.
SELECT queriesSQL task: display names of cities with population greater than 3,000,000, together with country codes and population, sorted by country code (descending) and population (descending).
qry <- "
SELECT Name, CountryCode, Population
FROM City
WHERE Population > 3000000
ORDER BY CountryCode DESC, Population DESC;
"
big_cities <- dbGetQuery(con, qry)
head(big_cities)Task 3.1
SQL task: display all cities in Norway, without knowing the
CountryCode in advance.
qry <- "
SELECT *
FROM City
WHERE CountryCode = (
SELECT Code
FROM Country
WHERE Name = 'Norway'
);
"
norway_cities <- dbGetQuery(con, qry)
head(norway_cities)Task 3.2
Repeat the query for Spain and Poland,
changing only the country name in the nested SELECT.
Task 3.3
Display the name and population of the most populous country in South
America. Modify the query to find the least populous country in South
America (use MIN(Population) instead of
MAX).
GROUP BYSQL task: display the number of countries on each continent.
qry <- "
SELECT Continent, COUNT(*) AS NumberOfCountries
FROM Country
GROUP BY Continent;
"
countries_by_continent <- dbGetQuery(con, qry)
countries_by_continentTask 3.4
NumberOfCountries.Now we repeat some join‑based queries in R.
SQL task: display names of capitals of European countries using a
join between City and Country.
qry <- "
SELECT City.Name AS CityName, Country.Name AS CountryName
FROM City
INNER JOIN Country
ON City.ID = Country.Capital
WHERE Country.Continent = 'Europe';
"
eu_capitals <- dbGetQuery(con, qry)
head(eu_capitals)Task 4.1
CountryName and inspect the
first and last few rows.SQL task: display information about languages used in European countries.
qry <- "
SELECT Country.Name AS CountryName,
CountryLanguage.Language,
CountryLanguage.IsOfficial,
CountryLanguage.Percentage
FROM Country
INNER JOIN CountryLanguage
ON Country.Code = CountryLanguage.CountryCode
WHERE Country.Continent = 'Europe';
"
eu_lang <- dbGetQuery(con, qry)
head(eu_lang)Task 4.2
table(eu_lang$Language).IsOfficial = 'T'.SQL task: display the name and surface area of the smallest country in the world.
qry <- "
SELECT Name, SurfaceArea
FROM Country
WHERE SurfaceArea = (
SELECT MIN(SurfaceArea)
FROM Country
);
"
min_country <- dbGetQuery(con, qry)
min_countryTask 4.3
Modify the query so that it returns the smallest country in Africa.
SQL tasks: - display names of countries and names of their capitals, - display only Asian countries and their capitals, - display African countries and their capitals, sorted by country name, using table aliases.
Example with aliases:
qry <- "
SELECT c.Name AS CountryName,
ci.Name AS CapitalName
FROM Country AS c
INNER JOIN City AS ci
ON c.Capital = ci.ID;
"
country_capital <- dbGetQuery(con, qry)
head(country_capital)Task 4.4
c.Continent = 'Asia').CountryName.SQL tasks (adapted):
Example for countries where people speak Polish:
qry <- "
SELECT c.Name AS CountryName
FROM Country AS c
INNER JOIN CountryLanguage AS cl
ON c.Code = cl.CountryCode
WHERE cl.Language = 'Polish';
"
polish_countries <- dbGetQuery(con, qry)
polish_countriesTask 4.5
IndepYear > 1900) and for which
Language = 'Spanish' and
IsOfficial = 'T'.