User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
budget:start [2012/10/06 11:36]
andy created
budget:start [2013/03/27 16:25] (current)
andy [Classifying Transactions]
Line 3: Line 3:
 A monthly budgeting application,​ to plan a budget and stick to it. A monthly budgeting application,​ to plan a budget and stick to it.
 +===== Budget Analyser =====
 +The planning process can easily be done with a spreadsheet for now, but budget analysis is going to be time-consuming that way. So, the most beneficial application to write would be a simple budget tracker, which uses something like a naive bayesian classifier to allocate a category to each transaction and then produce monthly totals in different categories.
 +==== Classifying Transactions ====
 +A transaction can be expected to consist of the following minimum bits of information:​
 +  * A date.
 +  * An amount of money, which may be positive (for credits) or negative (for debits).
 +  * An identifying string or recipient.
 +Transactions may also optionally include:
 +  * An updated balance.
 +All of these items can be used in the classification of the transaction,​ but first they must be transformed into appropriate tokens. This is done separately for each item as follows:
 +^ Date | The month, day of the month, day of the week and the nth occurrence of that day within the month are all converted into tokens. For example, **2013-03-27** might yield tokens **''​mon-mar''​**,​ **''​mday-27''​**,​ **''​wday-wed''​** and **''​nthday-4''​**. |
 +^ Amount | A logarithmic scale is used to classify transactions,​ using base 2 for simplicity. To prevent weaker indicators around base 2 boundaries, the next log up is also included. For example, the amount **£38.15** would yield tokens **''​amnt-2^5''​** and **''​amnt-2^6''​**. |
 +^ Description | The description is split into tokens of alphanumerics using any other character as a separator and forced to lowercase. For example, the string **''​BRGAS-ELEC AC110298738''​** would yield tokens **''​brgas''​**,​ **''​elec''​** and **''​ac110298738''​**. |
 +These tokens are concatenated into a single list and these are all treated equally for the purposes of the classifier. This allows the classifier to learn which are the reliable indicators of any particular categorisation.
 +===== Attic =====
 +The previous contents of this page, for posterity.
 +FIXME: //Delete this// \\
 Also contains design notes for the [[pysf]] library. Also contains design notes for the [[pysf]] library.
 +**Brief notes:**
 +  * DB layer is basically a factory for classes.
 +  * Calculations based on results of DB layer.
 +  * Use SQLite as initial backend.
 +  * Reccurrence:​ day, week
budget/start.1349519816.txt.gz · Last modified: 2012/10/12 08:32 (external edit)