Selected work
Projects
Production systems and internal tools. Each one is anchored to the metric it moved.
SOP-Grounded GenAI Support Assistant
LLM agent that triages production incidents against internal runbooks.
Retrieval-augmented Slack assistant that indexes 200+ internal SOPs and Confluence runbooks. Routes incidents to the correct on-call, summarizes prior similar tickets, and suggests next actions. Uses GPT-4o for reasoning and an embedding cache to keep latency under 2s p95.
Impact: Cut mean time to resolution by ~90% and removed dozens of repetitive lookups from L1/L2 oncall rotations.
Real-Time Order Flow Dashboard
React + FastAPI control tower for 10,000+ concurrent orders.
A holiday-grade observability dashboard for a Fortune 100 retailer. Streams order state changes, surfaces stuck flows, and lets oncall trigger remediation actions inline. Battle-tested through peak Black Friday / Cyber Monday traffic.
Impact: Prevented critical SEV-1 incidents during peak holiday traffic and gave business stakeholders a live single-pane-of-glass view.
Selenium + Python Anomaly Detection Suite
100+ scheduled pipelines for proactive checks across the order stack.
Catalog of headless Selenium and Python jobs that simulate user-critical journeys, validate API contracts, and detect data anomalies before they reach customers. Findings are routed to Slack and PagerDuty.
Impact: Saved 32 hours of manual QA / monitoring effort per week and surfaced regressions hours before customers reported them.
PyQt5 Internal Desktop Toolkit
Windows desktop utilities for engineers without web access.
Suite of PyQt5 desktop apps that bundle frequently-used scripts (log scraping, environment switching, deploy helpers) into a signed Windows installer for the broader engineering org.
Impact: Standardized day-to-day engineering workflows for 100+ internal users and reduced onboarding setup time from days to hours.