This is an old revision of the document!
The following code snippet is intended to crudely measure the ability of SQLite to perform on large tables assuming a database with a single table of simple financial transaction records. This test is highly simplistic and shouldn't be taken as a general benchmark - for example, the filesystem cache should be full of the database table after the insertions so the queries that follow may be faster than on a system with more contention for system resources.
#!/usr/bin/python import os import random import sqlite3 import string import time NUM_RECORDS = 100000 conn = sqlite3.connect("/tmp/testdb") cur = conn.cursor() cur.execute("CREATE TABLE transactions (" " id INTEGER PRIMARY KEY," " time INTEGER NOT NULL," " recipient TEXT NOT NULL," " amount INTEGER NOT NULL)") print "Generating recipients..." recipients = ["".join(random.choice(string.ascii_uppercase) for i in xrange(random.randint(5, 10))) for j in xrange(100)] print "Generating transactions..." data = [(int(time.time()) + i, random.choice(recipients), int(random.triangular(1, 10000))) for i in xrange(NUM_RECORDS)] print "Inserting %d records in blocks of 100..." % (NUM_RECORDS,) start = time.time() for i, (tr_time, recipient, amount) in enumerate(data): cur.execute("INSERT INTO transactions (time, recipient, amount)" " VALUES (?, ?, ?)", (tr_time, recipient, amount)) if i % 100 == 0: conn.commit() conn.commit() print "Took %.2f secs" % (time.time() - start,) print "Generating limits..." limits = [random.triangular(100, 1000) for i in xrange(NUM_RECORDS)] print "Performing %d random queries..." % (NUM_RECORDS,) start = time.time() for limit in limits: cur.execute("SELECT COUNT(*) FROM transactions WHERE amount > ?", (limit,)) cur.fetchall() print "Took %.2f secs" % (time.time() - start,) conn.close() os.remove("/tmp/testdb")
On a standard (not overpowered) Linux desktop, the following representative results were observed:
|Number of operations||Time to insert (secs)||Time to select (secs)|
It appears that the insertion time is scaling roughly linearly with the number of operations, but the query time is increasing at a steeper rate. This probably isn't surprising since table scans are required.
Repeat results with an index.