AI Engineer
LLM Systems · Agent Workflows · Production ML
General Information
| Full Name | Venkata Manideep Patibandla |
| Focus | LLM Systems · Agent Workflows · Production ML |
| Location | New Haven, CT (open to relocation) |
| venkatamanideep.analytics@gmail.com |
Summary
- AI Engineer specializing in LLM systems — agent workflows, RAG, MCP tooling, evaluation, and reliability — with hands-on production ML experience. Leading development of an automated valuation model (AVM) from inception at an early-stage AI startup, valuing standard-condition homes within ~5% of appraised value from live API data. Creator of RealDataAgentBench, an open-source agent-evaluation benchmark of 1,412+ runs across 12 frontier models, and CostGuard, an LLM reliability and cost-routing proxy built on its findings.
Experience
- Apr 2026 - Present
AI/ML Engineer (Contract)
Stealth AI Startup, Hybrid
- Leading development of an Automated Valuation Model (AVM) from inception to its current production-ready stage on a 4-person team — estimating a home's appraised value from its address so lenders can skip physical appraisals on standard, low-risk properties.
- Own the full ML lifecycle from data to deployment: cleaned ~38k subject–comp appraisal pairs, engineered 60+ features, built live RentCast API ingestion and a condition/quality scoring service, and shipped comparable-selection and condition-based triage logic with LightGBM training and evaluation.
- Achieved 5.1% MAPE with 68% of valuations within ±5% on a production-shaped holdout using live API data only — less than half the 11.5% MAPE of the data vendor's own AVM.
- Apr 2024 - Present
Forward Deployed AI Engineer
SBL, Remote
- Served as primary technical liaison for enterprise and growth-stage clients (Emoha, Topmate, Clinik), translating requirements into deployed RevOps and workflow-automation solutions.
- Engineered multi-source lead-enrichment pipelines that replaced manual prospecting with repeatable automated workflows wired into CRMs and multi-channel outbound.
- Built integrations across CRMs, lead sources, communication channels, and operational systems, with multi-channel outbound automation and human-in-the-loop checkpoints.
- Diagnosed and resolved production issues (integration failures, CRM sync, workflow errors) across multiple concurrent client rollouts.
- Aug 2023 - Dec 2023
AI Engineer & Fractional Developer
AnternData Solutions, Bangalore, India
- Built LLM and AI-powered applications and orchestration workflows for global tech clients including TigerData (formerly Timescale, the $1B+ database company), using modern AI tooling, APIs, and retrieval (RAG) systems.
- Delivered production-ready apps, reference architectures, and POC demos with customer engineering teams, accelerating developer adoption.
- As a fractional developer, built technical demos and reference implementations and produced developer-facing content for customer platforms.
Open Source Projects
- 2025 - now
RealDataAgentBench
- Benchmark of 1,412+ scored runs across 12 frontier LLMs (GPT-5, GPT-4.1, Claude, Gemini, Grok, Llama) on 39 real data-science tasks — scoring correctness, code quality, efficiency, and statistical validity with per-run cost tracking and 95% confidence intervals.
- Surfaced the core reliability gap — models score 0.84–0.99 on correctness but 0.52–0.90 on statistical validity — and found gpt-4.1-mini statistically tied with top-ranked gpt-4.1 at 65× lower cost per task.
- 2025 - now
CostGuard
- Production proxy that intercepts LLM calls, validates responses with RDAB-derived scoring, and retries or falls back across 12 models and 5 providers — with per-call cost tracking surfacing routing decisions that cut spend 10–20× vs. GPT-4o defaults.
- Per-provider circuit breakers, a replay endpoint returning quality deltas with 95% bootstrap CIs, Prometheus/Grafana observability, rate limiting, and a six-type alerting engine over Slack and webhooks; 90+ unit tests.
- 2026
RelayOps
- Production-shaped telecom support agent with deterministic access control, scoped tools, tiered intent routing, hybrid RAG with citations, and guardrails blocking invented offers and PII — validated with adversarial evals (1.000 safe-route on a 100-case suite; zero billing escapes).
- 2026
Tether & LoRA fine-tuning
- Tether: a durable execution layer (checkpoint/resume, cross-provider failover) for long-running LLM agents.
- LoRA fine-tune of DeepSeek-R1-Distill-Qwen-1.5B reaching 58.2% on GSM8K with 98.8% fewer trainable parameters.
Technical Skills
-
LLM Systems
- LLM evaluation & benchmarking, RAG, agents, workflow orchestration, MCP, prompt engineering
- LangChain, LangGraph, LoRA fine-tuning (Unsloth, PEFT), Hugging Face, Pinecone, OpenAI API, Anthropic API
-
Engineering & Reliability
- Python, SQL, FastAPI, Docker, CI/CD, Pytest, Railway, AWS (EC2, ECR, S3), Azure, REST APIs
- Prometheus, Grafana, circuit breakers, drift detection, structured logging, CRM/API integration, workflow automation
-
ML & Data
- LightGBM, XGBoost, scikit-learn, PyTorch, Random Forest, regression, MLflow
- Pandas, NumPy, statistical analysis, Tableau
Education
- 2024 - 2025
M.S. Computer Science
Sacred Heart University, Fairfield, CT
- GPA: 3.8/4.0
- Upsilon Pi Epsilon (UPE) Honor Society
- 2019 - 2023
B.Tech Information Technology
GMR Institute of Technology, Vizianagaram, India
- GPA: 8.30/10
Certifications
- IBM Certified in Agentic AI