Better feature extraction algorithm #1

Closed
opened 2019-09-01 15:22:41 +02:00 by hjp · 2 comments
Owner

Avoid overlapping evidence by:

put evidence in a list.
find all substrings
find spam-probability of all substrings·
order by prob, length
for each substring:
    if in evidence:
        keep.
        use it split each member of evidence list into new members
    else:
        skip
Avoid overlapping evidence by: ``` put evidence in a list. find all substrings find spam-probability of all substrings· order by prob, length for each substring: if in evidence: keep. use it split each member of evidence list into new members else: skip ```
Author
Owner

Downside: We may need to fetch a lot more evidence from the database.

Maybe do it in batches? First get the first 100, then the second, etc. until we have enough evidence. Since we only use the first 20, I think there is a good chance that the first 100 will be enough.

Or maybe we are already fetching everything anyway? Need to check.

Downside: We may need to fetch a lot more evidence from the database. Maybe do it in batches? First get the first 100, then the second, etc. until we have enough evidence. Since we only use the first 20, I think there is a good chance that the first 100 will be enough. Or maybe we are already fetching everything anyway? Need to check.
Author
Owner

Fixed in d96d1fc

Fixed in d96d1fc
hjp closed this issue 2019-09-14 11:07:16 +02:00
Sign in to join this conversation.
No Label
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: hjp/bayes#1
No description provided.