User Tools

Site Tools


notes:sqlite_scaling

This is an old revision of the document!


SQLite Scaling

The following code snippet is intended to crudely measure the ability of SQLite to perform on large tables assuming a database with a single table of simple financial transaction records. This test is highly simplistic and shouldn't be taken as a general benchmark - for example, the filesystem cache should be full of the database table after the insertions so the queries that follow may be faster than on a system with more contention for system resources.

sqlite-test.py
#!/usr/bin/python
 
import os
import random
import sqlite3
import string
import time
 
NUM_RECORDS = 100000
 
conn = sqlite3.connect("/tmp/testdb")
cur = conn.cursor()
cur.execute("CREATE TABLE transactions ("
            " id INTEGER PRIMARY KEY,"
            " time INTEGER NOT NULL,"
            " recipient TEXT NOT NULL,"
            " amount INTEGER NOT NULL)")
 
print "Generating recipients..."
recipients = ["".join(random.choice(string.ascii_uppercase)
                      for i in xrange(random.randint(5, 10)))
              for j in xrange(100)]
 
print "Generating transactions..."
data = [(int(time.time()) + i, random.choice(recipients),
         int(random.triangular(1, 10000))) for i in xrange(NUM_RECORDS)]
 
print "Inserting %d records in blocks of 100..." % (NUM_RECORDS,)
start = time.time()
for i, (tr_time, recipient, amount) in enumerate(data):
    cur.execute("INSERT INTO transactions (time, recipient, amount)"
                " VALUES (?, ?, ?)", (tr_time, recipient, amount))
    if i % 100 == 0:
        conn.commit()
conn.commit()
print "Took %.2f secs" % (time.time() - start,)
 
print "Generating limits..."
limits = [random.triangular(100, 1000) for i in xrange(NUM_RECORDS)]
 
print "Performing %d random queries..." % (NUM_RECORDS,)
start = time.time()
for limit in limits:
    cur.execute("SELECT COUNT(*) FROM transactions WHERE amount > ?", (limit,))
    cur.fetchall()
print "Took %.2f secs" % (time.time() - start,)
 
conn.close()
os.remove("/tmp/testdb")

On a standard (not overpowered) Linux desktop, the following representative results were observed:

Number of operations Time to insert (secs) Time to select (secs)
1000 1.49 0.15
10000 14.84 13.99
100000 159.72 ???

It appears that the insertion time is scaling roughly linearly with the number of operations, but the query time is increasing at a steeper rate. This probably isn't surprising since table scans are required.

FIXME Repeat results with an index.

notes/sqlite_scaling.1363112472.txt.gz · Last modified: 2013/03/12 18:21 by andy