AI Engineer

LLM Systems · Agent Workflows · Production ML

General Information

Full Name Venkata Manideep Patibandla
Focus LLM Systems · Agent Workflows · Production ML
Location New Haven, CT (open to relocation)
Email venkatamanideep.analytics@gmail.com

Summary

  • AI Engineer specializing in LLM systems — agent workflows, RAG, MCP tooling, evaluation, and reliability — with hands-on production ML experience. Leading development of an automated valuation model (AVM) from inception at an early-stage AI startup, valuing standard-condition homes within ~5% of appraised value from live API data. Creator of RealDataAgentBench, an open-source agent-evaluation benchmark of 1,412+ runs across 12 frontier models, and CostGuard, an LLM reliability and cost-routing proxy built on its findings.

Experience

  • Apr 2026 - Present
    AI/ML Engineer (Contract)
    Stealth AI Startup, Hybrid
    • Leading development of an Automated Valuation Model (AVM) from inception to its current production-ready stage on a 4-person team — estimating a home's appraised value from its address so lenders can skip physical appraisals on standard, low-risk properties.
    • Own the full ML lifecycle from data to deployment: cleaned ~38k subject–comp appraisal pairs, engineered 60+ features, built live RentCast API ingestion and a condition/quality scoring service, and shipped comparable-selection and condition-based triage logic with LightGBM training and evaluation.
    • Achieved 5.1% MAPE with 68% of valuations within ±5% on a production-shaped holdout using live API data only — less than half the 11.5% MAPE of the data vendor's own AVM.
  • Apr 2024 - Present
    Forward Deployed AI Engineer
    SBL, Remote
    • Served as primary technical liaison for enterprise and growth-stage clients (Emoha, Topmate, Clinik), translating requirements into deployed RevOps and workflow-automation solutions.
    • Engineered multi-source lead-enrichment pipelines that replaced manual prospecting with repeatable automated workflows wired into CRMs and multi-channel outbound.
    • Built integrations across CRMs, lead sources, communication channels, and operational systems, with multi-channel outbound automation and human-in-the-loop checkpoints.
    • Diagnosed and resolved production issues (integration failures, CRM sync, workflow errors) across multiple concurrent client rollouts.
  • Aug 2023 - Dec 2023
    AI Engineer & Fractional Developer
    AnternData Solutions, Bangalore, India
    • Built LLM and AI-powered applications and orchestration workflows for global tech clients including TigerData (formerly Timescale, the $1B+ database company), using modern AI tooling, APIs, and retrieval (RAG) systems.
    • Delivered production-ready apps, reference architectures, and POC demos with customer engineering teams, accelerating developer adoption.
    • As a fractional developer, built technical demos and reference implementations and produced developer-facing content for customer platforms.

Open Source Projects

  • 2025 - now
    RealDataAgentBench
    • Benchmark of 1,412+ scored runs across 12 frontier LLMs (GPT-5, GPT-4.1, Claude, Gemini, Grok, Llama) on 39 real data-science tasks — scoring correctness, code quality, efficiency, and statistical validity with per-run cost tracking and 95% confidence intervals.
    • Surfaced the core reliability gap — models score 0.84–0.99 on correctness but 0.52–0.90 on statistical validity — and found gpt-4.1-mini statistically tied with top-ranked gpt-4.1 at 65× lower cost per task.
  • 2025 - now
    CostGuard
    • Production proxy that intercepts LLM calls, validates responses with RDAB-derived scoring, and retries or falls back across 12 models and 5 providers — with per-call cost tracking surfacing routing decisions that cut spend 10–20× vs. GPT-4o defaults.
    • Per-provider circuit breakers, a replay endpoint returning quality deltas with 95% bootstrap CIs, Prometheus/Grafana observability, rate limiting, and a six-type alerting engine over Slack and webhooks; 90+ unit tests.
  • 2026
    RelayOps
    • Production-shaped telecom support agent with deterministic access control, scoped tools, tiered intent routing, hybrid RAG with citations, and guardrails blocking invented offers and PII — validated with adversarial evals (1.000 safe-route on a 100-case suite; zero billing escapes).
  • 2026
    Tether & LoRA fine-tuning
    • Tether: a durable execution layer (checkpoint/resume, cross-provider failover) for long-running LLM agents.
    • LoRA fine-tune of DeepSeek-R1-Distill-Qwen-1.5B reaching 58.2% on GSM8K with 98.8% fewer trainable parameters.

Technical Skills

  • LLM Systems
    • LLM evaluation & benchmarking, RAG, agents, workflow orchestration, MCP, prompt engineering
    • LangChain, LangGraph, LoRA fine-tuning (Unsloth, PEFT), Hugging Face, Pinecone, OpenAI API, Anthropic API
  • Engineering & Reliability
    • Python, SQL, FastAPI, Docker, CI/CD, Pytest, Railway, AWS (EC2, ECR, S3), Azure, REST APIs
    • Prometheus, Grafana, circuit breakers, drift detection, structured logging, CRM/API integration, workflow automation
  • ML & Data
    • LightGBM, XGBoost, scikit-learn, PyTorch, Random Forest, regression, MLflow
    • Pandas, NumPy, statistical analysis, Tableau

Education

  • 2024 - 2025
    M.S. Computer Science
    Sacred Heart University, Fairfield, CT
    • GPA: 3.8/4.0
    • Upsilon Pi Epsilon (UPE) Honor Society
  • 2019 - 2023
    B.Tech Information Technology
    GMR Institute of Technology, Vizianagaram, India
    • GPA: 8.30/10

Certifications

  • IBM Certified in Agentic AI