User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
budget:start [2012/10/07 15:03]
budget:start [2013/03/27 16:25] (current)
andy [Classifying Transactions]
Line 3: Line 3:
 A monthly budgeting application,​ to plan a budget and stick to it. A monthly budgeting application,​ to plan a budget and stick to it.
 +===== Budget Analyser =====
 +The planning process can easily be done with a spreadsheet for now, but budget analysis is going to be time-consuming that way. So, the most beneficial application to write would be a simple budget tracker, which uses something like a naive bayesian classifier to allocate a category to each transaction and then produce monthly totals in different categories.
 +==== Classifying Transactions ====
 +A transaction can be expected to consist of the following minimum bits of information:​
 +  * A date.
 +  * An amount of money, which may be positive (for credits) or negative (for debits).
 +  * An identifying string or recipient.
 +Transactions may also optionally include:
 +  * An updated balance.
 +All of these items can be used in the classification of the transaction,​ but first they must be transformed into appropriate tokens. This is done separately for each item as follows:
 +^ Date | The month, day of the month, day of the week and the nth occurrence of that day within the month are all converted into tokens. For example, **2013-03-27** might yield tokens **''​mon-mar''​**,​ **''​mday-27''​**,​ **''​wday-wed''​** and **''​nthday-4''​**. |
 +^ Amount | A logarithmic scale is used to classify transactions,​ using base 2 for simplicity. To prevent weaker indicators around base 2 boundaries, the next log up is also included. For example, the amount **£38.15** would yield tokens **''​amnt-2^5''​** and **''​amnt-2^6''​**. |
 +^ Description | The description is split into tokens of alphanumerics using any other character as a separator and forced to lowercase. For example, the string **''​BRGAS-ELEC AC110298738''​** would yield tokens **''​brgas''​**,​ **''​elec''​** and **''​ac110298738''​**. |
 +These tokens are concatenated into a single list and these are all treated equally for the purposes of the classifier. This allows the classifier to learn which are the reliable indicators of any particular categorisation.
 +===== Attic =====
 +The previous contents of this page, for posterity.
 +FIXME: //Delete this// \\
 Also contains design notes for the [[pysf]] library. Also contains design notes for the [[pysf]] library.
budget/start.1349622226.txt.gz · Last modified: 2012/10/12 07:32 (external edit)