skills

A complete inventory of tools, frameworks, and techniques drawn from production projects, open-source work, and professional experience.


LLM Evaluation & Benchmarking

Skills developed through RealDataAgentBench (1,180+ evaluation runs across 12 frontier models and 39 tasks) and CostGuard (RDAB-calibrated validity scoring in production).

  • LLM evaluation framework design — multi-dimensional scoring (correctness, code quality, efficiency, statistical validity)
  • Benchmark construction: task design, ground-truth labeling, 95% confidence intervals
  • Statistical validity scoring — uncertainty quantification, p-value detection, failure-mode penalty heuristics
  • Correctness vs. statistical-validity gap analysis across frontier models
  • Prompt engineering, few-shot evaluation, chain-of-thought benchmarking
  • OpenAI API · Anthropic API · Hugging Face Inference API
  • Models evaluated: GPT-4o · GPT-4.1 · GPT-4.1-mini · GPT-5 · Claude Sonnet 4.6 · Llama 3.3-70B · DeepSeek-R1 · Gemini 1.5 Pro · and more

LLM Reliability & Production Proxying

Skills developed through CostGuard — a self-hostable reliability layer for LLM-powered agents.

  • Real-time LLM response interception and validity filtering
  • Per-provider circuit breakers (CLOSED / OPEN / HALF_OPEN state machine)
  • Automatic fallback chain routing on rejection
  • Exact per-call cost tracking at $0.000001 precision across 12 models and 5 providers
  • 6-type alerting engine: validity drops · cost spikes · high failure rates · circuit breaker events · consecutive rejections · custom thresholds
  • Slack webhook integration and custom webhook delivery
  • Request-ID tracing, rate limiting, structured JSON logging

LLM Fine-tuning

Skills developed through LoRA Fine-tuning of DeepSeek-R1-Distill-Qwen-1.5B on GSM8K mathematical reasoning.

  • LoRA (Low-Rank Adaptation) — rank 16, alpha 32, 98.8% parameter reduction (18M / 1.5B trainable)
  • PEFT (Parameter-Efficient Fine-Tuning)
  • Unsloth — 2× inference throughput on a single consumer GPU
  • Hugging Face Transformers · datasets · trl
  • GSM8K benchmark evaluation (achieved 58.2% accuracy on a 1.5B model)
  • Training on resource-constrained hardware (single consumer GPU)

Observability & Monitoring

Skills developed through CostGuard (Prometheus + Grafana) and professional ML monitoring at Infosoft Solutions.

  • Prometheus metrics — request counts, latency histograms, token usage, validity rates
  • Grafana dashboards — real-time observability for LLM agent traffic
  • Model monitoring in Tableau — accuracy drift, data quality metrics, upstream discrepancy detection
  • Production classifier monitoring for 50K+ user behavioral records
  • ETL pipeline quality monitoring (10K+ daily records, 45% latency reduction achieved)

ML Engineering & Data Pipelines

Skills developed at Infosoft Solutions and ION Technology Solutions.

  • ETL pipeline design and optimization — Azure SQL to Python ML, 300K+ records across 180 tables
  • Feature store feeding and ML feature engineering
  • Random Forest classifier — trained on 50K+ records, 92% accuracy (23% improvement over prior rule-based system)
  • Revenue prediction model — R² lifted from 0.44 to 0.80, accurate to within $120
  • scikit-learn · XGBoost · PyTorch · MLflow
  • Regression · Random Forest · classification pipelines
  • Pandas · NumPy · statistical analysis

Engineering & Infrastructure

  • Python · FastAPI · Streamlit · Docker · Docker Compose
  • CI/CD (GitHub Actions)
  • AWS: EC2 · ECR · S3
  • Azure · Azure SQL
  • SQL · REST APIs
  • Pytest — 28+ proxy unit tests, integration test suite
  • SQLite (persistent state for circuit breakers and alert history)
  • Render · Fly.io · Koyeb (deployment platforms)

Vision AI & Other

Skills developed through AI-Assisted Medical Image Diagnosis.

  • LLM-powered vision models for X-ray interpretation (Groq Cloud API)
  • Structured findings report generation from radiological images
  • Lightweight deployment for resource-constrained environments
  • Groq Cloud API · Vision LLMs · Streamlit

RAG & Agent Frameworks

  • RAG (Retrieval-Augmented Generation) pipeline design
  • LangChain · LangGraph
  • MCP (Model Context Protocol)
  • Pinecone (vector store)
  • CrewAI agent orchestration patterns
  • LangGraph multi-step agent evaluation