git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

trying to improve my knn algorithm


hunter.hammond.dev at gmail.com wrote:

> This is a knn algorithm for articles that I have gotten. Then determines
> which category it belongs to. I am not getting very good results :/

[snip too much code;)]

- Shouldn't the word frequency vectors be normalized? I don't see that in
  your code. Without that the length of the text may overshade its contents.

- There are probably words that are completely irrelevant. Getting
  rid of these should improve the signal-to-noise ratio.