Skip to content

TSuryavanshi/Hate-Speech-Detection-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Final Hate Speech Classification Project

This project builds a machine learning pipeline for detecting hate, offensive, and neutral speech in short-form text. It includes dataset processing, cleaning, balancing, TF-IDF vectorization, model training, and an example inference step.


πŸ“ Project Structure

.
β”œβ”€β”€ main.py
β”œβ”€β”€ hate_speech.csv
β”œβ”€β”€ cleaned_hate_dataset.csv        # (generated)
└── hate_speech_model.pkl           # (generated)

πŸ” Features

  • Cleans and normalizes tweet text
  • Removes stopwords, URLs, mentions, emojis, and special chars
  • Handles class imbalance via oversampling
  • Trains a neural network classifier (MLP)
  • Uses TF-IDF embeddings with n-grams
  • Saves trained model + vectorizer as artifacts
  • Contains inference demo for quick testing

🧠 Model Architecture

Algorithm: MLPClassifier (sklearn)

Key Params

  • hidden_layer_sizes=(100,)
  • activation="relu"
  • solver="adam"
  • max_iter=200
  • early_stopping=True

πŸ—‚ Dataset Details

Classes:

  • 0 β†’ Hate
  • 1 β†’ Offensive
  • 2 β†’ Neutral

The script balances the dataset by oversampling under-represented labels before training.


🧹 Text Cleaning Pipeline

The clean_text() function applies:

  • Lowercasing
  • URL + username removal
  • Hashtag stripping
  • Punctuation removal
  • Stopword filtering
  • Token reconstruction

Example input:

"@user Check out this link! http://site.com #Wow"

Output:

"Neutral"

πŸš€ Running the Project

1. Install dependencies

pip install -r requirements.txt

(If NLTK stopwords aren't installed, script will download them.)

2. Execute training

python main.py

This will:

  • Load dataset
  • Clean + balance
  • Train model
  • Save artifacts
  • Display metrics

πŸ“¦ Output Artifacts

After training completes:

  • cleaned_hate_dataset.csv
  • hate_speech_model.pkl

The .pkl file stores:

{
  "vectorizer": TfidfVectorizer,
  "model": MLPClassifier
}

πŸ§ͺ Example Prediction

The script ends by running inference on sample inputs such as:

"I hate you so much!"
"Check out my new blog post"
"Had a great time at the concert!"

Outputs include predicted class labels.


🧰 Requirements

Recommended environment:

  • Python 3.8+
  • scikit-learn
  • imbalanced-learn
  • pandas
  • nltk

πŸ“¬ Notes

  • Designed for academic + research use
  • Not production-hardened
  • Requires dataset hate_speech.csv to exist locally

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages