The StackOverflow Assistant Bot is a dialogue chatbot designed to:
- Answer programming-related questions using a StackOverflow dataset.
- Engage in general chit-chat for non-programming questions.
The bot leverages a pre-trained neural network engine from ChatterBot for chit-chat functionality.
The project relies on two main datasets:
tagged_posts.tsv: StackOverflow posts tagged with programming languages (positive samples).dialogues.tsv: Dialogue phrases from movie subtitles (negative samples).
The following models and artifacts are generated during the project:
intent_recognizer.pkl: A model for recognizing user intent.tag_classifier.pkl: A model for classifying programming languages.tfidf_vectorizer.pkl: A vectorizer used during training for text preprocessing.thread_embeddings_by_tags/: A directory containing thread embeddings, organized by tags.
utils.py: Contains reusable functions for the notebook and scripts.setup_google_colab.py: Script for setting up the project in Google Colab.
The bot:
- Identifies User Intent
Distinguishes between programming-related questions and general dialogue. - Predicts Programming Language
For programming-related queries, it predicts the relevant language to optimize search speed.
- Applies TF-IDF Transformation for text preprocessing.
- Implements the
tfidf_features()function to perform the transformation and save the vectorizer.
The project utilizes two datasets:
- Dialogue Data: A sample of 200,000 entries from
dialogues.tsv. - StackOverflow Data: A sample of 200,000 entries from
tagged_posts.tsv.
To set up the project locally:
# Clone the repository
git clone [repository_url]
# Install dependencies
pip install -r requirements.txt
# Download necessary data files
python download_data.pyUsage Below is a simple example of how to use the bot:
from stackoverflow_bot import StackOverflowBot
# Initialize the bot
bot = StackOverflowBot()
# Query the bot
response = bot.get_response("How do I use a list comprehension in Python?")
print(response)