bayes

Author	SHA1	Message	Date
Peter J. Holzer	e6dab8395f	Add option --no-used-evidence	2019-09-14 12:09:36 +02:00
Peter J. Holzer	d96d1fc96e	Improve overlap avoidance (#1 ) When a feature is used, we use it to split the input string in which it was found and use the fragments for subsequent feature searches. So overlaps are impossible.	2019-09-14 11:01:24 +02:00
Peter J. Holzer	e51294bca2	Add option --verbose	2019-09-01 15:19:23 +02:00
Peter J. Holzer	c49d6847f3	Write used evidence to database	2019-08-27 22:38:00 +02:00
Peter J. Holzer	631f97abe5	Avoid overlapping tokens For each used token, record the first, second and last third and exclude all tokens which include those.	2019-08-17 11:12:34 +02:00
Peter J. Holzer	f3817c4355	Implement basic idea I start with tokens of length 1, and add longer tokens iff they extend a previously seen token by one character. Probability computation follow's Paul Graham's "A Plan for Spam", except that I haven't implemented some of his tweaks (most importantly, I don't account for frequencs within a message like he does). While selecting tokens for judging a message, I ignore substrings of tokens that have been seen previously. This still results in the majority of tokens to overlap, which is probably not good.	2019-08-17 09:29:11 +02:00