User Tools

Site Tools


notes:sqlite_scaling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
notes:sqlite_scaling [2013/03/12 18:21]
andy created
notes:sqlite_scaling [2013/03/13 11:53]
andy
Line 1: Line 1:
-====== SQLite Scaling ======+or ====== SQLite Scaling ====== 
 + 
 +The following code snippet is intended to crudely measure the ability of SQLite to perform on large tables assuming a database with a single table of simple financial transaction records. This test is highly simplistic and shouldn'​t be taken as a general benchmark - for example, the filesystem cache should be full of the database table after the insertions so the queries that follow may be faster than on a system with more contention for system resources. Also, the inclusion of the call to ''​fetchall()''​ might be a little contentious as it incurs some Python overhead as well as the SQLite load. However, I wasn't sure whether SQLite would actually perform the entire query without it, and I thought it was fairest to fetch all results at once rather than incurring additional pure Python overhead iterating over results.
  
-The following code snippet is intended to crudely measure the ability of SQLite to perform on large tables assuming a database with a single table of simple financial transaction records. This test is highly simplistic and shouldn'​t be taken as a general benchmark - for example, the filesystem cache should be full of the database table after the insertions so the queries that follow may be faster than on a system with more contention for system resources. 
  
 <code python sqlite-test.py>​ <code python sqlite-test.py>​
Line 21: Line 22:
             " recipient TEXT NOT NULL,"             " recipient TEXT NOT NULL,"
             " amount INTEGER NOT NULL)"​)             " amount INTEGER NOT NULL)"​)
 +            ​
 +# Uncomment to add index to "​amount"​ column
 +# cur.execute("​CREATE INDEX transactions_amount_idx ON transactions (amount)"​)
  
 print "​Generating recipients..."​ print "​Generating recipients..."​
Line 60: Line 64:
 ^ 1000 | 1.49 | 0.15 | ^ 1000 | 1.49 | 0.15 |
 ^ 10000 | 14.84 | 13.99 | ^ 10000 | 14.84 | 13.99 |
-^ 100000 | 159.72 | ??|+^ 100000 | 159.72 | 1539.56 | 
 +^ 1000000 | :?:   | :?:     |
  
 It appears that the insertion time is scaling roughly linearly with the number of operations, but the query time is increasing at a steeper rate. This probably isn't surprising since table scans are required. It appears that the insertion time is scaling roughly linearly with the number of operations, but the query time is increasing at a steeper rate. This probably isn't surprising since table scans are required.
  
-FIXME Repeat results with an index.+Uncommenting the line to create ​an index in the file above has the following effect (the percentage comparisons are relative to no index): 
 + 
 +^ Number of operations ^ Time to insert (secs) ^^ Time to select (secs) ^^ 
 +^ 1000 | 1.82 | +22% | 0.14 | -7%) | 
 +^ 10000 | 21.68 | +46% | 11.66 | -17% | 
 +^ 100000 | 373.52 | +134% | 1150.85 | -25% | 
 +^ 1000000 | :?:   | :?:   | :?:     | :?:  | 
 + 
 +As expected the insertion time increases somewhat and the query time reduces somewhat, but the effects aren't particularly pronounced until the table becomes full, and even then the acceleration of queries is somewhat limited compared to the increased overhead of inserts. In many cases, however, queries are likely to be significantly more frequent than inserts, so the trade-off may still be reasonable. 
notes/sqlite_scaling.txt · Last modified: 2013/03/13 16:37 by andy