# Pearce Wiki

### Site Tools

notes:bayesian_classification

# Differences

This shows you the differences between two versions of the page.

 notes:bayesian_classification [2013/03/15 13:12]andy [Combining words] notes:bayesian_classification [2013/03/15 14:11]andy [Combining words] Both sides previous revision Previous revision 2013/03/15 14:13 andy 2013/03/15 14:11 andy [Combining words] 2013/03/15 14:06 andy [Combining words] 2013/03/15 13:37 andy [Combining words] 2013/03/15 13:36 andy [Combining words] 2013/03/15 13:12 andy [Combining words] 2013/03/15 12:18 andy 2013/03/15 11:43 andy [Combining words] 2013/03/15 10:29 andy [Classification based on a word] 2013/03/14 16:33 andy [Combining words] 2013/03/14 16:16 andy 2013/03/14 15:21 andy 2013/03/14 11:56 andy created Next revision Previous revision 2013/03/15 14:13 andy 2013/03/15 14:11 andy [Combining words] 2013/03/15 14:06 andy [Combining words] 2013/03/15 13:37 andy [Combining words] 2013/03/15 13:36 andy [Combining words] 2013/03/15 13:12 andy [Combining words] 2013/03/15 12:18 andy 2013/03/15 11:43 andy [Combining words] 2013/03/15 10:29 andy [Classification based on a word] 2013/03/14 16:33 andy [Combining words] 2013/03/14 16:16 andy 2013/03/14 15:21 andy 2013/03/14 11:56 andy created Last revision Both sides next revision Line 91: Line 91: Please forgive the slightly loose use of notation, there are a few too many dimensions over which to iterate for clarity. Please forgive the slightly loose use of notation, there are a few too many dimensions over which to iterate for clarity. - One slight simplification to note is that as $P(C_i)$ is presumably determined by dividing a number of trained messages by the total number of messages trained, this means that the total number of messages trained ​can be cancelled out between ​the numerator and denominator ​and the raw number of messages in each category ​used instead. + One slight simplification to note results from the fact that $P(C_i)$ is presumably determined by dividing a number of trained messages by the total number of messages trained. Let $N_{C_i}$ indicate ​the number of messages trained ​in category $C_i$, $N$ indicate ​the number of messages trained overall ​and $N_{C_i}(W_a)$ indicate ​the number of messages ​containing token $W_a$ that were trained ​in category ​$C_i$. Thus the equation above becomes: + + \begin{equation*} P(C_i|W_a \cap W_b \cap ... \cap W_z) = \frac{\frac{1}{N}N_{C_i}\prod\limits_{j=a}^z{\frac{N_{C_i}(W_j)}{N_{C_i}}}}{\frac{1}{N}\sum\limits_{k=1}^n{N_{C_k}\prod\limits_{j=a}^z{\frac{N_{C_k}(W_j)}{N_{C_k}}}}} \end{equation*} + $$\Rightarrow P(C_i|W_a \cap W_b \cap ... \cap W_z) = \frac{\prod\limits_{j=a}^z{N_{C_i}(W_j)}}{N_{C_i}^{x-1}\sum\limits_{k=1}^n{\frac{1}{N_{C_k}^{x-1}}\prod\limits_{j=a}^z{N_{C_k}(W_j)}}}$$ + + Where $x$ is the total number of words. This version may help avoid underflow, but may instead ​be susceptible to overflow due to the exponentiation involved. ==== Two-category case ==== ==== Two-category case ====