Supercharge Your Model Training
-
Updated
Nov 12, 2025 - Python
Supercharge Your Model Training
Efficient Deep Learning Systems course materials (HSE, YSDA)
Designing IT and ML Applications using Systems Thinking Approach at IIT Bhilai (CS559)
Structured notes on designing scalable and fault-tolerant ML systems, to refresh your knowledge and help you prepare for a system design interview. Covers system design, MLOps, and case studies.
End-to-end personalized feed ranking system demonstrating retrieval → ranking pipelines, offline evaluation, realistic simulation, and business-aligned diagnostics inspired by large-scale social platforms.
Experimental web application demonstrating how an offline-trained financial fraud detection model can be exposed through a web interface. Built with Flask and a pre-trained XGBoost model to showcase ML inference flow, feature engineering, and result communication — not a production fraud prevention system.
Introduction to Machine Learning Systems - Educational materials for ML systems architecture, deployment, and production considerations.
Public engineering notes (ML systems, CV, MIT courses). Notes-only; sources linked.
Production-style ML inference system for Pneumonia detection from chest X-rays, featuring custom CNN architectures, versioned model serving, preprocessing parity, observability, drift detection, and rollback using FastAPI and Docker.
End-to-end fraud anomaly detection system using FastAPI, Isolation Forest, Streamlit, Docker, and a CI/CD pipeline.
An automated preprocessing pipeline for Telco Customer Churn data, including cleaning, feature engineering, and CI with GitHub Actions.
Deterministic decision gate for AI/ML systems. Risk-Gate enforces strict, schema-driven admissibility boundaries between AI/LLM intent and real system actions. It provides a fixed, human-owned decision structure with deterministic allow/block outcomes, explicit audit logging, and environment-specific policy via configuration — no ML, no heuristics,
Benchmarking and optimizing transformer inference across PyTorch, ONNXRuntime, and TensorRT with latency/throughput analysis on GPU and CPU.
Failure-first analysis of retrieval-augmented and agentic systems, focused on isolating and attributing failures across retrieval, planning, execution, memory, and policy layers.
A long-term, from-first-principles journey through machine learning and deep learning, centered on Dive into Deep Learning and extended with mathematical, probabilistic, and systems level understanding.
Scalable Training Telemetry and Metrics Visualization
A lightweight, reverse-mode Automatic Differentiation (AD) engine built from scratch using Python and NumPy. Supports dynamic computational graphs and complex linear algebra operations.
Add a description, image, and links to the ml-systems topic page so that developers can more easily learn about it.
To associate your repository with the ml-systems topic, visit your repo's landing page and select "manage topics."