skills
A complete inventory of tools, frameworks, and techniques drawn from production projects, open-source work, and professional experience.
LLM Evaluation & Benchmarking
Skills developed through RealDataAgentBench (1,180+ evaluation runs across 12 frontier models and 39 tasks) and CostGuard (RDAB-calibrated validity scoring in production).
- LLM evaluation framework design — multi-dimensional scoring (correctness, code quality, efficiency, statistical validity)
- Benchmark construction: task design, ground-truth labeling, 95% confidence intervals
- Statistical validity scoring — uncertainty quantification, p-value detection, failure-mode penalty heuristics
- Correctness vs. statistical-validity gap analysis across frontier models
- Prompt engineering, few-shot evaluation, chain-of-thought benchmarking
- OpenAI API · Anthropic API · Hugging Face Inference API
- Models evaluated: GPT-4o · GPT-4.1 · GPT-4.1-mini · GPT-5 · Claude Sonnet 4.6 · Llama 3.3-70B · DeepSeek-R1 · Gemini 1.5 Pro · and more
LLM Reliability & Production Proxying
Skills developed through CostGuard — a self-hostable reliability layer for LLM-powered agents.
- Real-time LLM response interception and validity filtering
- Per-provider circuit breakers (CLOSED / OPEN / HALF_OPEN state machine)
- Automatic fallback chain routing on rejection
- Exact per-call cost tracking at $0.000001 precision across 12 models and 5 providers
- 6-type alerting engine: validity drops · cost spikes · high failure rates · circuit breaker events · consecutive rejections · custom thresholds
- Slack webhook integration and custom webhook delivery
- Request-ID tracing, rate limiting, structured JSON logging
LLM Fine-tuning
Skills developed through LoRA Fine-tuning of DeepSeek-R1-Distill-Qwen-1.5B on GSM8K mathematical reasoning.
- LoRA (Low-Rank Adaptation) — rank 16, alpha 32, 98.8% parameter reduction (18M / 1.5B trainable)
- PEFT (Parameter-Efficient Fine-Tuning)
- Unsloth — 2× inference throughput on a single consumer GPU
- Hugging Face Transformers ·
datasets·trl - GSM8K benchmark evaluation (achieved 58.2% accuracy on a 1.5B model)
- Training on resource-constrained hardware (single consumer GPU)
Observability & Monitoring
Skills developed through CostGuard (Prometheus + Grafana) and professional ML monitoring at Infosoft Solutions.
- Prometheus metrics — request counts, latency histograms, token usage, validity rates
- Grafana dashboards — real-time observability for LLM agent traffic
- Model monitoring in Tableau — accuracy drift, data quality metrics, upstream discrepancy detection
- Production classifier monitoring for 50K+ user behavioral records
- ETL pipeline quality monitoring (10K+ daily records, 45% latency reduction achieved)
ML Engineering & Data Pipelines
Skills developed at Infosoft Solutions and ION Technology Solutions.
- ETL pipeline design and optimization — Azure SQL to Python ML, 300K+ records across 180 tables
- Feature store feeding and ML feature engineering
- Random Forest classifier — trained on 50K+ records, 92% accuracy (23% improvement over prior rule-based system)
- Revenue prediction model — R² lifted from 0.44 to 0.80, accurate to within $120
- scikit-learn · XGBoost · PyTorch · MLflow
- Regression · Random Forest · classification pipelines
- Pandas · NumPy · statistical analysis
Engineering & Infrastructure
- Python · FastAPI · Streamlit · Docker · Docker Compose
- CI/CD (GitHub Actions)
- AWS: EC2 · ECR · S3
- Azure · Azure SQL
- SQL · REST APIs
- Pytest — 28+ proxy unit tests, integration test suite
- SQLite (persistent state for circuit breakers and alert history)
- Render · Fly.io · Koyeb (deployment platforms)
Vision AI & Other
Skills developed through AI-Assisted Medical Image Diagnosis.
- LLM-powered vision models for X-ray interpretation (Groq Cloud API)
- Structured findings report generation from radiological images
- Lightweight deployment for resource-constrained environments
- Groq Cloud API · Vision LLMs · Streamlit
RAG & Agent Frameworks
- RAG (Retrieval-Augmented Generation) pipeline design
- LangChain · LangGraph
- MCP (Model Context Protocol)
- Pinecone (vector store)
- CrewAI agent orchestration patterns
- LangGraph multi-step agent evaluation