Clickbait Detection

Explore the effectiveness of classification algorithms for clickbait detection
Comparing Multinomial Naive Bayes and Logistic Regression models

About • Results • License

About

This repository contains the code and report for a project that explores the implementation of classification algorithms for clickbait detection. The objective is to evaluate the effectiveness of the Multinomial Naive Bayes Classifier and Logistic Regression models in identifying clickbait content using two different approaches: an accuracy-oriented approach and an approach targeting the minimization of False Positive Rate (FPR).

Additionally, the project provides valuable insights into the composition of clickbait headlines. It identifies the most impactful words for the classification models, shedding light on the characteristics that make headlines challenging to classify accurately. The analysis of the worst errors further enhances the understanding of the limitations of the model.

Results

The project report includes detailed analysis and visualizations of the experiment results. Here are some key findings:

Accuracy-oriented scenario:
- Test accuracy: 97.12%
FPR-oriented scenario:
- False Positive Rate (FPR): 0%
- Test accuracy: 84%
- Other alternatives with higher Accuracy are proposed in the results folder

For a comprehensive overview of the results, please refer to the full project report.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code as per the terms of the license.