SQLite is an embedded relational database engine that is self-contained, meaning that it has everything it needs to run. You don't need to configure anything because it's serverless and it has a transactional SQL database engine. To get started all you need to do is download the relevant binary file. My motivation for checking out SQLite is that I wanted to organise a bunch of Excel files a collaborator has sent me. I've played around with MySQL a while ago but I wanted something much lighter and then I came across SQLite.
Firstly, I'll download the relevant binary:
#I'm using a Mac wget http://www.sqlite.org/2015/sqlite-shell-osx-x86-3081002.zip unzip sqlite-shell-osx-x86-3081002.zip
To test out SQLite, I'll create a CSV file using R:
R -q -e 'write.table(iris, file="iris.csv", quote=F, sep=",", col.names=F)' head -5 iris.csv 1,5.1,3.5,1.4,0.2,setosa 2,4.9,3,1.4,0.2,setosa 3,4.7,3.2,1.3,0.2,setosa 4,4.6,3.1,1.5,0.2,setosa 5,5,3.6,1.4,0.2,setosa
I will use the row names as the primary key, which can be used to uniquely identify each record in the table. Now I'll import it into SQLite:
#A SQLite database is simply a ordinary disk file that is created as such sqlite3 iris.db SQLite version 3.8.10.2 2015-05-20 18:17:19 Enter ".help" for usage hints. #see https://www.sqlite.org/datatype3.html create table iris (id INTEGER PRIMARY KEY, sepal_length REAL, sepal_width REAL, petal_length REAL, petal_width REAL, species TEXT); .separator "," .import iris.csv iris
There are various differences to how SQLite interprets SQL. As you may have noticed already from above, some commands start with a period. I can no longer use show tables; to show all the tables. Instead to list all the tables in the database you type:
#show attached database .databases seq name file --- --------------- ---------------------------------------------------------- 0 main /Users/dtang/tmp/iris.db 1 temp .tables iris #no more describe table either .schema iris CREATE TABLE iris (id INTEGER PRIMARY KEY, sepal_length REAL, sepal_width REAL, petal_length REAL, petal_width REAL, species TEXT);
The SELECT statement seems to work as expected:
SELECT * from iris LIMIT 5; 1,5.1,3.5,1.4,0.2,setosa 2,4.9,3.0,1.4,0.2,setosa 3,4.7,3.2,1.3,0.2,setosa 4,4.6,3.1,1.5,0.2,setosa 5,5.0,3.6,1.4,0.2,setosa SELECT * from iris WHERE sepal_length > 7; 103,7.1,3.0,5.9,2.1,virginica 106,7.6,3.0,6.6,2.1,virginica 108,7.3,2.9,6.3,1.8,virginica 110,7.2,3.6,6.1,2.5,virginica 118,7.7,3.8,6.7,2.2,virginica 119,7.7,2.6,6.9,2.3,virginica 123,7.7,2.8,6.7,2.0,virginica 126,7.2,3.2,6.0,1.8,virginica 130,7.2,3.0,5.8,1.6,virginica 131,7.4,2.8,6.1,1.9,virginica 132,7.9,3.8,6.4,2.0,virginica 136,7.7,3.0,6.1,2.3,virginica SELECT * from iris WHERE sepal_length > 7 AND petal_width > 2; 103,7.1,3.0,5.9,2.1,virginica 106,7.6,3.0,6.6,2.1,virginica 110,7.2,3.6,6.1,2.5,virginica 118,7.7,3.8,6.7,2.2,virginica 119,7.7,2.6,6.9,2.3,virginica 136,7.7,3.0,6.1,2.3,virginica
To make the output more informative, we can change the output mode:
.mode column .headers on SELECT * from iris WHERE sepal_length > 7 AND petal_width > 2; id sepal_length sepal_width petal_length petal_width species ---------- ------------ ----------- ------------ ----------- ---------- 103 7.1 3.0 5.9 2.1 virginica 106 7.6 3.0 6.6 2.1 virginica 110 7.2 3.6 6.1 2.5 virginica 118 7.7 3.8 6.7 2.2 virginica 119 7.7 2.6 6.9 2.3 virginica 136 7.7 3.0 6.1 2.3 virginica #show settings .show echo: off eqp: off explain: off headers: on mode: column nullvalue: "" output: stdout colseparator: "," rowseparator: "\n" stats: off width:
To execute a SQL statement from the shell:
sqlite3 iris.db "SELECT * from iris WHERE sepal_length > 7 AND petal_width > 2;" 103|7.1|3.0|5.9|2.1|virginica 106|7.6|3.0|6.6|2.1|virginica 110|7.2|3.6|6.1|2.5|virginica 118|7.7|3.8|6.7|2.2|virginica 119|7.7|2.6|6.9|2.3|virginica 136|7.7|3.0|6.1|2.3|virginica
SQL joins
The power of relational databases comes from joining tables. There are two types of joins, INNER and OUTER, and you can better understand them by using Venn diagrams to visualise the statements. I use inner joins most of the time.
To demonstrate the inner join, I'll make another table:
R -q -e 'write.table(round(runif(150),2), file="random.csv", col.names=F, sep=",", quote=F)' head -5 random.csv 1,0.83 2,0.59 3,0.78 4,0.72 5,0.5 CREATE TABLE random (id INTEGER PRIMARY KEY, random REAL); .import random.csv random .tables iris random SELECT * from random limit 5; id random ---------- ---------- 1 0.83 2 0.59 3 0.78 4 0.72 5 0.5
Now to perform the inner join:
SELECT * from random as r JOIN iris as i on r.id=i.id limit 5; id random id sepal_length sepal_width petal_length petal_width species ---------- ---------- ---------- ------------ ----------- ------------ ----------- ---------- 1 0.83 1 5.1 3.5 1.4 0.2 setosa 2 0.59 2 4.9 3.0 1.4 0.2 setosa 3 0.78 3 4.7 3.2 1.3 0.2 setosa 4 0.72 4 4.6 3.1 1.5 0.2 setosa 5 0.5 5 5.0 3.6 1.4 0.2 setosa SELECT r.id,r.random from random as r JOIN iris as i on r.id=i.id limit 5; id random ---------- ---------- 1 0.83 2 0.59 3 0.78 4 0.72 5 0.5
Summary
I hope that importing the Excel spreadsheets into SQLite will help provide some organisation of the data. If I have time I can normalise the database to reduce data redundancy and circumvent the "Spreadsheet Syndrome".
Additional resources
To get help within SQLite, simply type:
.help
I found this tutorial quite informative.
Using the Perl DBI module to interact with SQLite.

This work is licensed under a Creative Commons
Attribution 4.0 International License.