Getting started with SQLite

SQLite is an embedded relational database engine that is self-contained, meaning that it has everything it needs to run. You don't need to configure anything because it's serverless and it has a transactional SQL database engine. To get started all you need to do is download the relevant binary file. My motivation for checking out SQLite is that I wanted to organise a bunch of Excel files a collaborator has sent me. I've played around with MySQL a while ago but I wanted something much lighter and then I came across SQLite.

Firstly, I'll download the relevant binary:

#I'm using a Mac
wget http://www.sqlite.org/2015/sqlite-shell-osx-x86-3081002.zip
unzip sqlite-shell-osx-x86-3081002.zip

To test out SQLite, I'll create a CSV file using R:

R -q -e 'write.table(iris, file="iris.csv", quote=F, sep=",", col.names=F)'
head -5 iris.csv 
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5,3.6,1.4,0.2,setosa

I will use the row names as the primary key, which can be used to uniquely identify each record in the table. Now I'll import it into SQLite:

#A SQLite database is simply a ordinary disk file that is created as such
sqlite3 iris.db
SQLite version 3.8.10.2 2015-05-20 18:17:19
Enter ".help" for usage hints.

#see https://www.sqlite.org/datatype3.html
create table iris (id INTEGER PRIMARY KEY, sepal_length REAL, sepal_width REAL, petal_length REAL, petal_width REAL, species TEXT);
.separator ","
.import iris.csv iris

There are various differences to how SQLite interprets SQL. As you may have noticed already from above, some commands start with a period. I can no longer use show tables; to show all the tables. Instead to list all the tables in the database you type:

#show attached database
.databases
seq  name             file                                                      
---  ---------------  ----------------------------------------------------------
0    main             /Users/dtang/tmp/iris.db                                  
1    temp

.tables
iris

#no more describe table either
.schema iris
CREATE TABLE iris (id INTEGER PRIMARY KEY, sepal_length REAL, sepal_width REAL, petal_length REAL, petal_width REAL, species TEXT);

The SELECT statement seems to work as expected:

SELECT * from iris LIMIT 5;
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3.0,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5.0,3.6,1.4,0.2,setosa

SELECT * from iris WHERE sepal_length > 7;
103,7.1,3.0,5.9,2.1,virginica
106,7.6,3.0,6.6,2.1,virginica
108,7.3,2.9,6.3,1.8,virginica
110,7.2,3.6,6.1,2.5,virginica
118,7.7,3.8,6.7,2.2,virginica
119,7.7,2.6,6.9,2.3,virginica
123,7.7,2.8,6.7,2.0,virginica
126,7.2,3.2,6.0,1.8,virginica
130,7.2,3.0,5.8,1.6,virginica
131,7.4,2.8,6.1,1.9,virginica
132,7.9,3.8,6.4,2.0,virginica
136,7.7,3.0,6.1,2.3,virginica

SELECT * from iris WHERE sepal_length > 7 AND petal_width > 2;
103,7.1,3.0,5.9,2.1,virginica
106,7.6,3.0,6.6,2.1,virginica
110,7.2,3.6,6.1,2.5,virginica
118,7.7,3.8,6.7,2.2,virginica
119,7.7,2.6,6.9,2.3,virginica
136,7.7,3.0,6.1,2.3,virginica

To make the output more informative, we can change the output mode:

.mode column
.headers on
SELECT * from iris WHERE sepal_length > 7 AND petal_width > 2;
id          sepal_length  sepal_width  petal_length  petal_width  species   
----------  ------------  -----------  ------------  -----------  ----------
103         7.1           3.0          5.9           2.1          virginica 
106         7.6           3.0          6.6           2.1          virginica 
110         7.2           3.6          6.1           2.5          virginica 
118         7.7           3.8          6.7           2.2          virginica 
119         7.7           2.6          6.9           2.3          virginica 
136         7.7           3.0          6.1           2.3          virginica

#show settings
.show
        echo: off
         eqp: off
  explain: off
     headers: on
        mode: column
   nullvalue: ""
      output: stdout
colseparator: ","
rowseparator: "\n"
       stats: off
       width: 

To execute a SQL statement from the shell:

sqlite3 iris.db "SELECT * from iris WHERE sepal_length > 7 AND petal_width > 2;"
103|7.1|3.0|5.9|2.1|virginica
106|7.6|3.0|6.6|2.1|virginica
110|7.2|3.6|6.1|2.5|virginica
118|7.7|3.8|6.7|2.2|virginica
119|7.7|2.6|6.9|2.3|virginica
136|7.7|3.0|6.1|2.3|virginica

SQL joins

The power of relational databases comes from joining tables. There are two types of joins, INNER and OUTER, and you can better understand them by using Venn diagrams to visualise the statements. I use inner joins most of the time.

To demonstrate the inner join, I'll make another table:

R -q -e 'write.table(round(runif(150),2), file="random.csv", col.names=F, sep=",", quote=F)'
head -5 random.csv 
1,0.83
2,0.59
3,0.78
4,0.72
5,0.5

CREATE TABLE random (id INTEGER PRIMARY KEY, random REAL);
.import random.csv random

.tables
iris    random

SELECT * from random limit 5;
id          random    
----------  ----------
1           0.83      
2           0.59      
3           0.78      
4           0.72      
5           0.5

Now to perform the inner join:

SELECT * from random as r JOIN iris as i on r.id=i.id limit 5;
id          random      id          sepal_length  sepal_width  petal_length  petal_width  species   
----------  ----------  ----------  ------------  -----------  ------------  -----------  ----------
1           0.83        1           5.1           3.5          1.4           0.2          setosa    
2           0.59        2           4.9           3.0          1.4           0.2          setosa    
3           0.78        3           4.7           3.2          1.3           0.2          setosa    
4           0.72        4           4.6           3.1          1.5           0.2          setosa    
5           0.5         5           5.0           3.6          1.4           0.2          setosa

SELECT r.id,r.random from random as r JOIN iris as i on r.id=i.id limit 5;
id          random    
----------  ----------
1           0.83      
2           0.59      
3           0.78      
4           0.72      
5           0.5

Summary

I hope that importing the Excel spreadsheets into SQLite will help provide some organisation of the data. If I have time I can normalise the database to reduce data redundancy and circumvent the "Spreadsheet Syndrome".

Additional resources

To get help within SQLite, simply type:

.help

I found this tutorial quite informative.

Using the Perl DBI module to interact with SQLite.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
One comment Add yours

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.