Document Classification using Naïve Bayes Algorithm

The data set contains 19997 documents which belong to 20 different classes. We need to train our naïve bayes algorithm on 50% of the data set, i.e., 9998 documents(approximately 500 documents from each class) and use the remaining 50% as the testing set and predict the predicted classes (newsgroups) to calculate the accuracy of the naïve bayes algorithm.

Using the data provided by the website to classify 1000 documents into 20 newsgroup categories.

Data : https://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

Github Project Link : https://github.com/agx01/doc_NB_classfier