π MS in Data Science, UMass Dartmouth (August 2026) | π Open to Data Scientist / ML Engineer / Data Analyst roles
I build end-to-end machine learning systems β from raw data and feature engineering through deployment, serving, and monitoring. I care about models that are honest, reproducible, and defensible, not just high-scoring.
Languages: Python, SQL
ML / Modeling: LightGBM, XGBoost, scikit-learn, SHAP, Optuna, Isolation Forest
LLM / AI: OpenAI API, LangChain, RAG, FAISS, Text-to-SQL
MLOps: MLflow, Docker, FastAPI, GitHub Actions (CI/CD), Evidently (drift monitoring)
Data: PostgreSQL, Kafka, DuckDB, Parquet, Azure Blob Storage
Apps: Streamlit
End-to-end platform on 6M+ real deliveries (Cainiao LaDe, 5 cities). Two models running side by side: a supervised LightGBM disruption classifier and an unsupervised Isolation Forest anomaly detector. Includes a Kafka streaming pipeline, a FastAPI serving layer with a feature store, a hybrid Text-to-SQL + FAISS RAG assistant, a simulated A/B test (two-proportion z-test), Evidently drift monitoring, and a live operational dashboard deployed on AWS EC2.
Key finding: 30 couriers (0.6% of the network) generated 20% of all anomalies β turning 66K raw alerts into a 30-courier action list.
LightGBM churn model on 970K users Γ 110 features (time-based validation, AUC 0.9481). CLV-based segmentation into 5 actionable cohorts, deployed on Hugging Face Spaces with CI/CD.
LightGBM delay classifier on 18.2M US domestic flights with strict temporal validation (AUC 0.8629), SHAP analysis showing real_time_turn_gap as the #1 predictor, cascade/route-risk scoring, and a $2.78B annualized savings estimate. FastAPI + Docker + GitHub Actions CI/CD.
LightGBM global model (WRMSSE 0.4681) across 30,490 item-store series, beating AutoARIMA and 7 other baselines. PostgreSQL business insights and a LangChain + GPT-4o-mini natural language query interface.