sqlite3 --version
3.43.2 2023-10-10 13:08:14 1b37c146ee9ebb7acd0160c0ab1fd11017a419fa8a3187386ed8cb32b709aapl (64-bit)
November 12, 2024
For this lecture, we will use Unix shell, plus SQLite3 or DB Browser for SQLite.
You can see if the command-line tool sqlite3
(also known as “SQLite”) is already installed with
3.43.2 2023-10-10 13:08:14 1b37c146ee9ebb7acd0160c0ab1fd11017a419fa8a3187386ed8cb32b709aapl (64-bit)
If not, you can follow the instructions here:
Also, you will need to install these R packages:
In addition, please read through
Please install the packages above and be comfortable with running the functions in the example code here:
Material for this lecture was borrowed and adopted from
At the end of this lesson you will:
DBI
, RSQLite
, dbplyr
packages for making SQL queries in RFor the rest of the time in class, we will practice creating a SQL database and also writing SQL syntax to analyze a Beyoncé discography dataset.
For this in-class activity, you need to find a partner. Work together on the following problems. Use the following datasets related to Beyoncé’s songs for the activity.
songs <- data.frame(
song_id = 1:10,
title = c("Crazy in Love", "Halo", "Single Ladies", "Formation", "Irreplaceable",
"Drunk in Love", "Run the World", "Partition", "Love on Top", "Sorry"),
album_id = c(1, 2, 1, 3, 2, 3, 1, 3, 2, 3),
release_year = c(2003, 2008, 2008, 2016, 2006, 2013, 2011, 2013, 2011, 2016),
streams_millions = c(500, 600, 700, 800, 900, 1000, 450, 350, 750, 900)
)
# Sample data frame for Beyoncé's albums
albums <- data.frame(
album_id = 1:3,
album_title = c("Dangerously in Love", "I Am... Sasha Fierce", "Lemonade"),
release_year = c(2003, 2008, 2016)
)
beyonce.sqlite
using the DBI
and RSQLite
packages using the dbConnect()
function.dbWriteTable()
function, add the songs
and albums
datasets to the SQL database.# This code to ensure that the lecture builds and there no SQL database at the start of the exercise
sql_beyonce <- here::here("lectures","05-database-programming", "data", "beyonce.sqlite")
if(file.exists(sql_beyonce)){
file.remove(sql_beyonce)
}
[1] TRUE
library(DBI)
mydb <- dbConnect(drv = RSQLite::SQLite(),
dbname = sql_beyonce)
dbWriteTable(conn = mydb, name = "songs", value = songs)
dbWriteTable(conn = mydb, name = "albums", value = albums)
dbListTables(conn = mydb)
[1] "albums" "songs"
song_id title album_id release_year streams_millions
1 3 Single Ladies 1 2008 700
2 4 Formation 3 2016 800
3 5 Irreplaceable 2 2006 900
4 6 Drunk in Love 3 2013 1000
5 9 Love on Top 2 2011 750
6 10 Sorry 3 2016 900
results <- dbGetQuery(mydb, "SELECT release_year, title, MIN(streams_millions) AS min_streams
FROM songs
GROUP BY release_year;")
print(results)
release_year title min_streams
1 2003 Crazy in Love 500
2 2006 Irreplaceable 900
3 2008 Halo 600
4 2011 Run the World 450
5 2013 Partition 350
6 2016 Formation 800
Close connection when done
Discuss with your partner:
Here are some additional practice questions to help you think about the material discussed.
Using the survey.db
database discussed in the relational databases and SQL basics pre-reading material:
.schema
to identify column that contains integersname
column from the Site
table.SELECT personal, family FROM person;
or
select Personal, Family from PERSON;
Visited
table.Person
table, ordered by family name.