Introduction
Welcome to this database and SQL tutorial.
The Structured Query Language is somtimes seen as the most important invention; well, right after the washing machine;).
It is custom to begin with a “Hello World” example:
select "Hello World";
“Hello World” |
---|
Hello World |
Now try it yourself and press Run Code
. You can also
Start Over
or have a look at the proposed
Solution
.
select "Hello World"
Now try to display (with the keyword select
) some basic
calculations (1+2+3)*2.
select (1+2+3)*2
In SQLite You can concatenate strings and numbers by using
||
. In MySQL you need to use the concat
function. Concatenate “hello”, “world: 1+2+3 =” and 1+2+3.
select "hello" || " world: 1+2+3 = " || (1+2+3)
In MySQL the solution is:
select concat("hello", " world: 1+2+3 = ", 1+2+3)
Now, after this little “ice-breaker”, let us start learning SQL via an example.
The aim is to build a car sales system, and gain business insights.
First, we will need customers to buy cars. Of course, we must have
cars that can be sold to the customers. The sales transactions need to
be recorded as well. Hence, we need three tables: customer
,
car
and sale
.
We will use the SQLite database system
and assume that an empty database exists. Note, that this tutorial will
work for almost all relational databases such as PostgreSQL and MySQL. I have also
provided a similar tutorial for
Access
. If you are interested in the popularity of
database engines go to db-engines.com.
Table operations
We begin by creating a customer
table.
create table customer (info text)
Executing the above SQL command does not show any output. However, we
can see the table using the sqlite_master
table. The
following select query displays the customer table name.
select name from sqlite_master
name |
---|
customer |
In MySQL use show tables
.
We have not quite thought through this. Let us delete the table again.
drop table customer
What information do we need from a customer? Name, phone number, email and an address seem to be a reasonable starting point. How can we identify a customer? We could use the name as identifier. However, if there are two people with exactly the same name then there is an issue. A number to identify these individuals could resolve this. More generally using a unique integer number to identify a record is common practice. A unique identifier for a record is known as primary key. Note, it could be an integer number, but it could be any field (e.g. the customer’s name assuming its uniqueness). The following SQL statement puts this all together:
create table customer
(
id int primary key,
name text,
phone_number text,
email text,
address_building text,
address_street text,
postcode text
)
Here, the field names are id, name, phone_number, email, etcetera.
These are also known as column names. On the right of each field name is
a data type specifier. Here, we used int (representing whole
numbers, integers) and text (representing characters). Right next to the
field id
we wrote primary key
. This is a
constraint on the field, which means when inserting a record a
unique value needs to be provided. To summarise a field
contains a field name, its data type and a constraint.
Again, we can see the table in the master overview.
select * from sqlite_master
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | customer | customer | 2 | CREATE TABLE customer |
( id int primary key, name text, phone_number text, email text, address_building text, address_street text, postcode text ) | |index |sqlite_autoindex_customer_1 |customer | 3|NA |
Insert data
Now, let us insert a bit of test data.
insert into customer values
(1, 'Wolfgang','0779...','w.garn@surrey.ac.uk',
'20MS02','UoS','GU2 7XH')
The above represents an entire record. A row in a table is known as record or tuple.
To verify that we have really inserted the above values we can
execute the select
query.
select * from customer
id | name | phone_number | address_building | address_street | postcode | |
---|---|---|---|---|---|---|
1 | Wolfgang | 0779… | w.garn@surrey.ac.uk | 20MS02 | UoS | GU2 7XH |
Let us insert the customers with names Dominic, Michael and spook. When inserting incomplete records (i.e. some field values are missing) the column names (especially required fields, such as primary key) have to be specified.
insert into customer (id, name)
values (2,'Dominic'),(3,'Michael'),(4,'spook')
View the previously inserted records.
select * from customer
id | name | phone_number | address_building | address_street | postcode | |
---|---|---|---|---|---|---|
1 | Wolfgang | 0779… | w.garn@surrey.ac.uk | 20MS02 | UoS | GU2 7XH |
2 | Dominic | NA | NA | NA | NA | NA |
3 | Michael | NA | NA | NA | NA | NA |
4 | spook | NA | NA | NA | NA | NA |
Changing data
Okay, we really don’t want to have a “spook” customer. So, let us remove that customer.
delete from customer where name = "spook";
In order to display only the column id
and
name
we replace the asterisk.
select id, name from customer
id | name |
---|---|
1 | Wolfgang |
2 | Dominic |
3 | Michael |
Here, we see the “beauty” of SQL - it is intuitive and natural:
“delete from my customer table where the name is spook” (at least
somewhat natural). Important is that we used the where
clause, otherwise all records would disappear.
Let us assume we need to “correct” (update) the customer with identifier 1 and add the surname “Garn”.
update customer
set name = "Wolfgang Garn"
where id = 1
Observe the result of the update query.
select * from customer;
id | name | phone_number | address_building | address_street | postcode | |
---|---|---|---|---|---|---|
1 | Wolfgang Garn | 0779… | w.garn@surrey.ac.uk | 20MS02 | UoS | GU2 7XH |
2 | Dominic | NA | NA | NA | NA | NA |
3 | Michael | NA | NA | NA | NA | NA |
Now, assume we want a new field in the customer table, which
identifies the status
of being a current (someone in the
middle of purchase), potential (showed some interest) or no-longer
(moved on). This means we will alter (change) the table.
alter table customer
add status text
In MySQL all columns are displayed via
describe customer
.
select * from customer;
id | name | phone_number | address_building | address_street | postcode | status | |
---|---|---|---|---|---|---|---|
1 | Wolfgang Garn | 0779… | w.garn@surrey.ac.uk | 20MS02 | UoS | GU2 7XH | NA |
2 | Dominic | NA | NA | NA | NA | NA | NA |
3 | Michael | NA | NA | NA | NA | NA | NA |
If we decide adding the field was not a good idea, it can be undone
with alter table customer drop status
.
Practice
Add the surname “Garn” to the names Michael and Dominic by using the
update
function twice and show your result. Note you can
comment SQL code by using --
(two hyphens and a space).
update customer set name = 'Dominic Garn' where id=2
-- update customer set name = 'Michael Garn' where id=3
select name from customer
name |
---|
Wolfgang Garn |
Dominic Garn |
Michael |
Update (not insert) the status to current, no-longer and potential for customers with identifier 1, 2 and 3 respectively (Note: update three times, use comments).
update customer set status='current' where id = 1
-- update customer set status='no-longer' where id = 2
-- update customer set status='potential' where id = 3
Display the id, name and status.
select id, name, status from customer
id | name | status |
---|---|---|
1 | Wolfgang Garn | current |
2 | Dominic Garn | NA |
3 | Michael | NA |
Car Table
Now, that we have the customer table. Let us create a car table. So,
that we can sell these type of cars to customers. What fields should we
add? What car do you currently drive? How much was it? What fuel does it
need? This gives us the field names: manufacturer, model, price and fuel
type. We need to decide about the data types - text
seems
to be fine for all but the price
column, where we will use
float
. Again, let us introduce the column id
as primary key.
create table car (
id int primary key,
manufacturer text, -- Mazda
model text, -- CX-5
price float, -- £27,000
fuel_type varchar(255) -- diesel
)
Show the car table using the sqlite_master
.
select * from sqlite_master where name = "car"
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | car | car | 4 | CREATE TABLE car ( |
id int primary key, manufacturer text, – Mazda model text, – CX-5 price float, – £27,000 fuel_type varchar(255) – diesel ) |
Insert a few test-records.
insert into car values
(1,'Mazda' ,'CX-5' ,27000, 'diesel'),
(2,'Mazda' ,'MX-30' ,35000, 'electric'),
(3,'BMW' ,'i3' ,35000, 'electric'),
(4,'BMW' ,'2' ,27000, 'petrol'),
(5,'Volkswagen','e-golf',27000, 'electric')
select * from car
id | manufacturer | model | price | fuel_type |
---|---|---|---|---|
1 | Mazda | CX-5 | 27000 | diesel |
2 | Mazda | MX-30 | 35000 | electric |
3 | BMW | i3 | 35000 | electric |
4 | BMW | 2 | 27000 | petrol |
5 | Volkswagen | e-golf | 27000 | electric |
Insert two more records a Volkswagen - GTI with petrol, which costs £33k; and Ford - Fiesta Van with LPG, which costs £35k.
insert into car values
(6,'Volkswagen','GTI' ,33000, 'petrol'),
(7,'Ford','Fiesta Van' ,35000, 'LPG')
Display all car entries:
select * from car
id | manufacturer | model | price | fuel_type |
---|---|---|---|---|
1 | Mazda | CX-5 | 27000 | diesel |
2 | Mazda | MX-30 | 35000 | electric |
3 | BMW | i3 | 35000 | electric |
4 | BMW | 2 | 27000 | petrol |
5 | Volkswagen | e-golf | 27000 | electric |
6 | Volkswagen | GTI | 33000 | petrol |
7 | Ford | Fiesta Van | 35000 | LPG |
Lookup Tables
In the previous table we use fuel type
with several
repetition. In order to avoid inconsistent spelling (e.g. electric,
electrical or Electric). We could introduce a lookup table.
create table fuel (type varchar(255) primary key)
In MySQL setting a primary key (PK) on an attribute with
datatype text
requires the specification of how many
characters will be used for the PK. That means, we need to set the PK as
follows:
create table fuel (type text, primary key (type(255)));
;
alternatively, we could have used the datatype varchar
instead of text
. Generally, when dealing with keys based on
characters, it is easier to avoid the datatype text
.
A lookup table only has one field, which is a primary key. Let us insert the lookup values:
insert into fuel
values ('diesel'),('electric'),('petrol'),('LPG')
Display the table:
select * from fuel
type |
---|
diesel |
electric |
petrol |
LPG |
Foreign keys
Now the interesting part is, how do we “link” the fuel
table with the car
table. Obviously, it needs to be done
using the common fields car.fuel_type
and
fuel.type
.
By default, foreign keys are disabled in SQLite. hence, we need to enable them.
PRAGMA foreign_keys = ON;
We can check that the foreign keys functionality is on:
PRAGMA foreign_keys;
foreign_keys |
---|
1 |
The easiest way to add the foreign keys is to recreate the table
car
from scratch. That means, we need to drop the table
car
; and then create it again specifying a foreign key.
drop table car
create table car (
id int primary key,
manufacturer text, model text, price float,
fuel_type varchar(255) REFERENCES fuel(type) ON UPDATE CASCADE
)
In other database systems such as MySQL or postgreSQL you could alter the table instead:
alter table car add constraint constraint_fuel_type
foreign key (fuel_type) references fuel(type) on update cascade;
In MySQL, if you decide to add the foreign key in the table creeation then use:
create table car (
id int primary key,
manufacturer text, model text, price float,
fuel_type varchar(255),
foreign key (fuel_type) REFERENCES fuel(type) ON UPDATE CASCADE
);
We need to insert the data again.
insert into car values
(1,'Mazda' ,'CX-5' ,27000, 'diesel'),
(2,'Mazda' ,'MX-30' ,35000, 'electric'),
(3,'BMW' ,'i3' ,35000, 'electric'),
(4,'BMW' ,'2' ,27000, 'petrol'),
(5,'Volkswagen','e-golf',27000, 'electric'),
(6,'Volkswagen','GTI' ,33000, 'petrol'),
(7,'Fiesta Van','Ford' ,35000, 'LPG')
Lookup Table Benefit
Let us change the fuel type diesel
to
Diesel
in the lookup table.
update fuel set type='Diesel' where type = 'diesel'
Observe the updated value in the fuel
table.
select * from fuel
type |
---|
Diesel |
electric |
petrol |
LPG |
Did it update in the car
table automatically?
select * from car
id | manufacturer | model | price | fuel_type |
---|---|---|---|---|
1 | Mazda | CX-5 | 27000 | Diesel |
2 | Mazda | MX-30 | 35000 | electric |
3 | BMW | i3 | 35000 | electric |
4 | BMW | 2 | 27000 | petrol |
5 | Volkswagen | e-golf | 27000 | electric |
6 | Volkswagen | GTI | 33000 | petrol |
7 | Fiesta Van | Ford | 35000 | LPG |
That means, the cascaded update is a great benefit.
However, note that you cannot update the fuel_type in the
car
table:
update car set fuel_type='Electric' where fuel_type = 'electric'
will through an error message.
Practice
Create a lookup table for the customer status.
create table customer_status
(status varchar(255) primary key)
Insert the values: current, potential and no-longer.
insert into customer_status
values ('current'), ('potential'), ('no-longer')
Dropping (e.g. alter table customer drop column status
)
or adding a constraint the current status column in the
customer
table, works for most database systems. However,
SQLite does not support this. Hence, we drop the entire table and
recreate it.
drop table customer
create table customer
(
id int primary key,
name text,
phone_number text,
email text,
address_building text,
address_street text,
postcode text,
status text REFERENCES customer_status(status) ON UPDATE CASCADE
)
Now we insert the test data one more time.
insert into customer values
(1, 'Wolfgang','0779...','w.garn@surrey.ac.uk','20MS02','UoS','GU2 7XH','current'),
(2, 'Dominic','','','','','','no-longer'),
(3, 'Michael','','','','','','potential')
select id, name, status from customer
id | name | status |
---|---|---|
1 | Wolfgang | current |
2 | Dominic | no-longer |
3 | Michael | potential |
Now, we have the customer and car table. Both, are linked to a lookup table via the foreign keys status and fuel_type respectively.
Sale Table
Next, we’d like to create a table for the sales. This table will need a foreign key to the customer table and another one to link to the car table. What other sale’s information do we need. The date of the sale would be definitely good. A quantity field would be good assuming a car record represents “unlimited” supply of this type of car. Let us assume - for simplicity - that the price in the car table is a fixed retail price.
create table sale (
id int, -- sale id
customer_id int REFERENCES customer(id) ON UPDATE CASCADE,
car_id int REFERENCES car(id) ON UPDATE CASCADE,
sale_date text,
quantity int
)
Note, SQLite does not have a specific data type for date/time, but
text
, real
and int
can be used
(see sqlitetutorial.net/sqlite-date
for examples).
Let us add three records to the sale-table. Let us say say Wolfgang purchased a BMW i3 on the 2nd of March 2021, and bought a Mazda CX-5 on 17th November 2014. Dominic bought a GTI on the 15th of August 2020.
insert into sale values
(1, 1, 1, '2014-11-17 10:00:00.0',1),
(2, 1, 3, '2021-03-02 11:00:00.0',1),
(3, 2, 6, '2020-08-15 17:00:00.0',1)
select * from sale
id | customer_id | car_id | sale_date | quantity |
---|---|---|---|---|
1 | 1 | 1 | 2014-11-17 10:00:00.0 | 1 |
2 | 1 | 3 | 2021-03-02 11:00:00.0 | 1 |
3 | 2 | 6 | 2020-08-15 17:00:00.0 | 1 |
Queries
Now, it would be great to display the following information: all the sale records but with customer name, manufacturer, model and sale’s date. This means we have to collect the information from the customer, car and sale table.
select name, manufacturer, model, sale_date
from customer, car, sale
where sale.customer_id = customer.id and
sale.car_id = car.id
name | manufacturer | model | sale_date |
---|---|---|---|
Wolfgang | Mazda | CX-5 | 2014-11-17 10:00:00.0 |
Wolfgang | BMW | i3 | 2021-03-02 11:00:00.0 |
Dominic | Volkswagen | GTI | 2020-08-15 17:00:00.0 |
The function strftime('%Y-%m-%d %H:%M:%S', ...)
can be
used to extract any specific date/time information. date
and time
are two more useful functions.
Now, that we have built a basic car-sales-system it is time to operate it. Okay, some time has passed and the tables have been filled with some data.
Operational Queries
What operational queries would be good to know? How to find a customer record (phone number ,email) by knowing “roughly” the name? How to insert a new type of car? We have done that before. How to add a new customer or sale record. Again we have done that.
So, let us find the customer which contains the characters “Wolf”.
select name, email, phone_number from customer
where name like '%Wolf%'
name | phone_number | |
---|---|---|
Wolfgang | w.garn@surrey.ac.uk | 0779… |
Business Insight Queries
What business insights would be interesting? How much revenue have we generated? Which cars are the top sellers?
For the revenue we need the car retail price and the quantity sold.
select price, quantity
from car, sale
where sale.car_id = car.id
price | quantity |
---|---|
27000 | 1 |
35000 | 1 |
33000 | 1 |
The total revenue is:
select sum(price* quantity) as revenue
from car, sale
where sale.car_id = car.id
revenue |
---|
95000 |
Which cars are the top sellers?
select manufacturer, model, sum(quantity) as nb
from car, sale
where sale.car_id = car.id
group by manufacturer, model
order by nb desc
manufacturer | model | nb |
---|---|---|
BMW | i3 | 1 |
Mazda | CX-5 | 1 |
Volkswagen | GTI | 1 |
Summary
We introduced fundamental SQL statements by building a car-sales-system and showing how to use it. Concepts such as creating a table, inserting and updating its data were introduced. Simple queries and aggregates were mentioned.
Resources
- w3schools.com is a great systematic introduction into SQL.
- Shah (2020) is a hands-on introduction to data science (Chapter 7 introduces MySQL).
Acknowledgment
This tutorial was created using RStudio, R, rmarkdown, and many other
tools and libraries. The packages learnr
and
gradethis
were particularly useful. I’m very grateful to Prof. Andy
Field for sharing his disovr package,
which allowed me to improve the style of this tutorial and get more
familiar with learnr
. Allison Horst wrote a very
instructive blog “Teach
R with learnr: a powerful tool for remote teaching”, which
encouraged me to continue with learnr
. By the way, I find
her statistic
illustrations amazing. Irene Steves’
tutorial is great for learning how to use SQL in RStudio.
References