skills | Venkata Manideep Patibandla

A complete inventory of tools, frameworks, and techniques drawn from production projects, open-source work, and professional experience.

Agent Engineering & Orchestration

Skills developed through Tether (durable execution for LLM agents), EnterpriseAgentEval (LangGraph-based multi-step agent pipelines), and client workflow-automation deployments as a Forward Deployed AI Engineer at SBL.

Durable agent execution — automatic checkpoint/resume, idempotent replay, cross-provider failover (OpenAI ↔ Anthropic)
LangGraph — multi-step agent workflows, state graphs, conditional edges, human-in-the-loop hooks
Multi-channel outbound and lead-enrichment automation wired into CRMs and operational systems
MCP (Model Context Protocol) — tooling integration for agent systems
Drop-in client wrapping — adding reliability and resilience layers without rewriting agent logic
RAG (Retrieval-Augmented Generation) pipeline design
LangChain · Pinecone (vector store)
Tool use, function calling, and structured output extraction
Multi-agent coordination and agent-to-agent communication patterns

LLM Reliability & Production Proxying

Skills developed through CostGuard — a self-hostable reliability layer for LLM-powered agents.

Real-time LLM response interception and validity filtering
Per-provider circuit breakers (CLOSED / OPEN / HALF_OPEN state machine)
Automatic fallback chain routing on rejection
Exact per-call cost tracking at $0.000001 precision across 12 models and 5 providers
6-type alerting engine: validity drops · cost spikes · high failure rates · circuit breaker events · consecutive rejections · custom thresholds
Slack webhook integration and custom webhook delivery
Request-ID tracing, rate limiting, structured JSON logging

ML Engineering & Data Pipelines

Skills developed through the automated valuation model (AVM) contract work and lead-enrichment pipelines at SBL.

Full ML lifecycle from data to deployment — cleaned ~38k subject–comp appraisal pairs, engineered 60+ features
LightGBM regression for property valuation — 5.1% MAPE, 68% of valuations within ±5% on a production-shaped holdout
Live RentCast API ingestion and a condition/quality scoring service
Comparable-selection and condition-based triage logic
Multi-source lead-enrichment pipelines — automated prospecting wired into CRMs and outbound
LightGBM · scikit-learn · XGBoost · PyTorch · MLflow
Regression · Random Forest · classification pipelines
Pandas · NumPy · statistical analysis

Engineering & Infrastructure

Python · FastAPI · Streamlit · Docker · Docker Compose
CI/CD (GitHub Actions)
AWS: EC2 · ECR · S3 · RDS
Azure
SQL · REST APIs · CRM/API integration
Pytest — 90+ unit tests, integration test suite
SQLite (persistent state for circuit breakers and alert history)
Railway · Render · Fly.io · Koyeb (deployment platforms)

Observability & Monitoring

Skills developed through CostGuard (Prometheus + Grafana), EnterpriseAgentEval (MLflow), and AVM model evaluation on production-shaped holdouts.

Prometheus metrics — request counts, latency histograms, token usage, validity rates
Grafana dashboards — real-time observability for LLM agent traffic
MLflow experiment tracking — agent runs, prompts, scores, token counts, latency, cost
Model evaluation in Tableau — MAPE / within-±5% tracking, error analysis, drift detection
Holdout evaluation on live-API data — benchmarking the AVM against the data vendor’s own model
Per-call cost and quality tracking across 12 models and 5 providers (CostGuard)

LLM Evaluation & Benchmarking

Skills developed through RealDataAgentBench (1,412+ evaluation runs across 12 frontier models and 39 tasks) and CostGuard (RDAB-calibrated validity scoring in production).

LLM evaluation framework design — multi-dimensional scoring (correctness, code quality, efficiency, statistical validity)
Benchmark construction: task design, ground-truth labeling, 95% confidence intervals
Statistical validity scoring — uncertainty quantification, p-value detection, failure-mode penalty heuristics
Correctness vs. statistical-validity gap analysis across frontier models
Prompt engineering, few-shot evaluation, chain-of-thought benchmarking
OpenAI API · Anthropic API · Hugging Face Inference API
Models evaluated: GPT-4o · GPT-4.1 · GPT-4.1-mini · GPT-5 · Claude Sonnet 4.6 · Llama 3.3-70B · DeepSeek-R1 · Gemini 1.5 Pro · and more

LLM Fine-tuning

Skills developed through LoRA Fine-tuning of DeepSeek-R1-Distill-Qwen-1.5B on GSM8K mathematical reasoning.

LoRA (Low-Rank Adaptation) — rank 16, alpha 32, 98.8% parameter reduction (18M / 1.5B trainable)
PEFT (Parameter-Efficient Fine-Tuning)
Unsloth — 2× inference throughput on a single consumer GPU
Hugging Face Transformers · datasets · trl
GSM8K benchmark evaluation (achieved 58.2% accuracy on a 1.5B model)
Training on resource-constrained hardware (single consumer GPU)

Vision AI & Other

Skills developed through AI-Assisted Medical Image Diagnosis.

LLM-powered vision models for X-ray interpretation (Groq Cloud API)
Structured findings report generation from radiological images
Lightweight deployment for resource-constrained environments
Groq Cloud API · Vision LLMs · Streamlit