My Projects
Sarcasm Detection Web App
Tools: Python, NLP, Naïve Bayes, Flask, Docker, Google Cloud Run, HTML/CSS, Pickle
• Built a sarcasm classification pipeline using NLP preprocessing (tokenization, TF-IDF vectorization) and a Naïve Bayes model (AUC: 0.88, F1: 78%).
• Serialized model with pickle, built a Flask-based API with a custom HTML interface.
• Containerized using Docker and deployed to Google Cloud Run for scalable, real-time inference.
Sentiment Analysis on YouTube Comments
Tools: Python, Google Cloud Natural Language API, Pandas, YouTube API
• Extracted and processed multilingual YouTube comments using YouTube Data API.
• Performed sentiment scoring using Google Cloud NLP, achieving 76% accuracy.
• Visualized sentiment trends and evaluated API performance, identifying weaknesses in neutral sentiment classification.
Fashion Image Classification using Transfer Learning
Tools: PyTorch, ResNet18, FashionMNIST, Matplotlib, Scikit-learn
• Fine-tuned a pre-trained ResNet18 on FashionMNIST with 10 fashion categories.
• Applied L2 regularization (weight decay), learning rate scheduling, and dropout to reduce overfitting.
• Achieved high classification accuracy and built confusion matrix to analyze misclassifications.
Customer Churn Prediction System
Tools: Python, Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn
• Evaluated multiple ML models (k-NN, Decision Tree, Random Forest, SVM, Gradient Boosting, Neural Networks).
• Conducted hyperparameter tuning using GridSearchCV, applied SMOTE for imbalance, and feature importance analysis.
• Recommended top-performing model based on precision-recall tradeoffs and ROC curves.
Demand Forecasting for Lyft Bike Rentals
Tools: R, ggplot2, dplyr, hypothesis testing, regression modeling
• Built regression models to identify demand drivers (temperature, weather, time-of-day).
• Applied ANOVA and t-tests to assess holiday and seasonal variations.
• Delivered actionable insights—e.g., optimal marketing for 20–34°C clear weather conditions.
Music Popularity Prediction
Tools: R, Logistic Regression, Lasso (glmnet), ANOVA, Chi-square, caret
• Analyzed 114,000 songs to study the interaction between genre, explicitness, and popularity.
• Used ANOVA and Chi-square tests to validate categorical variable impact.
• Developed Lasso-regularized logistic regression model for feature selection and binary classification.
Transportation and Investment Optimization
Tools: Excel, Solver, Risk Analysis, Portfolio Optimization
• Formulated linear optimization models to compare direct vs. hub shipping routes.
• Solved portfolio allocation problem by minimizing variance under return constraints (10%–13.5%).
• Visualized risk-return trade-offs using least squares trend fitting and scenario analysis.
Hardware Resource and Inventory Optimization
Tools: Excel, Solver, What-If Analysis, Scenario Manager
• Built a profit-maximizing model to allocate space and budget for 4 product lines.
• Balanced constraints including cost, space, and expected sales, recommending inventory strategy and budget flexibility.
Stock Price Forecasting — AAPL & HON
Tools: Excel, Time Series Forecasting, Exponential Smoothing, Regression
• Conducted short- vs. long-term forecast comparison using moving averages, trend regression, and exponential smoothing.
• Tuned smoothing parameters (α, β) to minimize MAPE and MAPD.
• Recommended short-term models for volatile stock price behavior.
Dam Project Evaluation via Monte Carlo Simulation
Tools: Excel, Monte Carlo, Probability Distributions, Chi-Square Goodness of Fit
• Simulated 10,000 benefit-cost ratios using triangular and normal distributions.
• Conducted statistical tests and scenario probability modeling to assess investment stability.
• Recommended Dam based on lower variability and higher likelihood of high returns.
Healthcare Data Warehouse Design
Tools: MySQL, SQL, Star Schema, 3NF, Data Warehousing
• Transformed raw healthcare data into a 3NF schema with clear fact/dimension separation.
• Built a star schema ERD to support efficient OLAP queries.
• Wrote advanced SQL queries to extract business KPIs and trends for prescriptions and member behavior.
Critical Care Bed Resource Allocation Analysis
Tools: SQL, MySQL, Data Aggregation, Ranking, Filtering
• Queried licensed, census, and staffed ICU/SICU beds from integrated hospital data.
• Identified Top 10 hospitals based on bed capacity for nurse staffing interventions.
• Delivered data-driven recommendations to support strategic resource allocation.