Starting from:

$35

ECE684-Document vectors Solved

Document classification/TF-IDF                                                                                                                                   

Explore how term-document matrices and weightings can be used for document classification. You will be attempting to distinguish between documents from different categories in the Brown corpus.

Use the provided script as a starting point. Before beginning, read and understand what it’s doing. Then implement three sorts of document vectors:

1.   Raw counts of terms in each document.

2.   TF-IDF weighting, using the specific scheme described by Jurafsky and Martin (ch. 6).

3.   Another weighting of your own invention/discovery. This may be another TF-IDF variant, or something else entirely 

More products