<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://venkatamanideep.com/feed.xml" rel="self" type="application/atom+xml"/><link href="https://venkatamanideep.com/" rel="alternate" type="text/html" hreflang="en"/><updated>2026-06-07T19:18:00+00:00</updated><id>https://venkatamanideep.com/feed.xml</id><title type="html">blank</title><subtitle>AI Engineer specializing in LLM evaluation, benchmarking, and production-grade AI systems. Builder of CostGuard and RealDataAgentBench. </subtitle><entry><title type="html">How to Actually Use Claude: 18 Steps That Unlock 100% of Its Potential</title><link href="https://venkatamanideep.com/blog/2025/how-to-use-claude/" rel="alternate" type="text/html" title="How to Actually Use Claude: 18 Steps That Unlock 100% of Its Potential"/><published>2025-05-19T00:00:00+00:00</published><updated>2025-05-19T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/how-to-use-claude</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/how-to-use-claude/"><![CDATA[]]></content><author><name></name></author><category term="ai"/><category term="llm"/><category term="tooling"/><summary type="html"><![CDATA[18 practical setup and usage patterns — Projects, custom instructions, extended thinking, prompt caching — that turn Claude into a high-leverage engineering tool rather than a basic chat interface.]]></summary></entry><entry><title type="html">The Most Expensive Mistake in LLM Engineering (And How to Fix It With Data)</title><link href="https://venkatamanideep.com/blog/2025/expensive-mistake-llm-engineering/" rel="alternate" type="text/html" title="The Most Expensive Mistake in LLM Engineering (And How to Fix It With Data)"/><published>2025-05-14T00:00:00+00:00</published><updated>2025-05-14T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/expensive-mistake-llm-engineering</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/expensive-mistake-llm-engineering/"><![CDATA[]]></content><author><name></name></author><category term="llm"/><category term="ai"/><category term="tooling"/><category term="performance"/><summary type="html"><![CDATA[Why teams waste budget by selecting LLMs on benchmark scores instead of production metrics — and how to use real evaluation data to route tasks to the right model at the right cost.]]></summary></entry><entry><title type="html">KV Caching in LLMs</title><link href="https://venkatamanideep.com/blog/2025/kv-caching-llms/" rel="alternate" type="text/html" title="KV Caching in LLMs"/><published>2025-05-10T00:00:00+00:00</published><updated>2025-05-10T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/kv-caching-llms</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/kv-caching-llms/"><![CDATA[]]></content><author><name></name></author><category term="llm"/><category term="ai"/><category term="performance"/><summary type="html"><![CDATA[How transformer attention caching eliminates redundant computation during token generation — and what it means for LLM inference latency, memory trade-offs, and serving infrastructure.]]></summary></entry><entry><title type="html">You’re Doing RAG Wrong</title><link href="https://venkatamanideep.com/blog/2025/rag-wrong/" rel="alternate" type="text/html" title="You’re Doing RAG Wrong"/><published>2025-05-09T00:00:00+00:00</published><updated>2025-05-09T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/rag-wrong</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/rag-wrong/"><![CDATA[]]></content><author><name></name></author><category term="rag"/><category term="llm"/><category term="ai"/><category term="agents"/><summary type="html"><![CDATA[Rethinking RAG architecture — replacing naive text chunks with structured Q&A packets to improve retrieval accuracy, reduce corpus size, and build more reliable agent pipelines.]]></summary></entry><entry><title type="html">My LLM App Started Silently Getting Worse. I Almost Didn’t Notice. Here’s What I Built to Catch It.</title><link href="https://venkatamanideep.com/blog/2025/llm-silent-drift/" rel="alternate" type="text/html" title="My LLM App Started Silently Getting Worse. I Almost Didn’t Notice. Here’s What I Built to Catch It."/><published>2025-05-04T00:00:00+00:00</published><updated>2025-05-04T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/llm-silent-drift</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/llm-silent-drift/"><![CDATA[]]></content><author><name></name></author><category term="llm"/><category term="ai"/><category term="production"/><category term="tooling"/><summary type="html"><![CDATA[Building production observability into LLM apps — drift detection, SQLite audit trails, and Slack alerting to catch silent model degradation before it reaches users.]]></summary></entry><entry><title type="html">Every LLM Has a Superpower and a Blind Spot. I Built a Benchmark Around That Observation</title><link href="https://venkatamanideep.com/blog/2025/llm-superpower-blind-spot/" rel="alternate" type="text/html" title="Every LLM Has a Superpower and a Blind Spot. I Built a Benchmark Around That Observation"/><published>2025-04-24T00:00:00+00:00</published><updated>2025-04-24T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/llm-superpower-blind-spot</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/llm-superpower-blind-spot/"><![CDATA[]]></content><author><name></name></author><category term="ai"/><category term="llm"/><category term="testing"/><category term="benchmarking"/><summary type="html"><![CDATA[Building a model-selection engine by mapping where each frontier LLM excels in production — and engineering around the gaps.]]></summary></entry><entry><title type="html">I Prompted 5 Frontier LLMs to ‘Report Uncertainty’ — Here’s What Happened to Their Statistical Validity Scores</title><link href="https://venkatamanideep.com/blog/2025/llm-uncertainty-statistical-validity/" rel="alternate" type="text/html" title="I Prompted 5 Frontier LLMs to ‘Report Uncertainty’ — Here’s What Happened to Their Statistical Validity Scores"/><published>2025-04-18T00:00:00+00:00</published><updated>2025-04-18T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/llm-uncertainty-statistical-validity</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/llm-uncertainty-statistical-validity/"><![CDATA[]]></content><author><name></name></author><category term="ai"/><category term="llm"/><category term="benchmark"/><category term="rag"/><summary type="html"><![CDATA[Engineering uncertainty-aware prompting patterns for production LLM agents — and what actually happens to reliability scores when you do.]]></summary></entry><entry><title type="html">I Ran 163 Benchmarks Across 10 LLMs So You Don’t Have To. Here’s What I Found</title><link href="https://venkatamanideep.com/blog/2025/163-benchmarks-10-llms/" rel="alternate" type="text/html" title="I Ran 163 Benchmarks Across 10 LLMs So You Don’t Have To. Here’s What I Found"/><published>2025-04-15T00:00:00+00:00</published><updated>2025-04-15T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/163-benchmarks-10-llms</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/163-benchmarks-10-llms/"><![CDATA[]]></content><author><name></name></author><category term="ai"/><category term="llm"/><category term="performance"/><category term="tooling"/><summary type="html"><![CDATA[Systematic evaluation across 163 tasks and 10 LLMs — practical model selection guidance for production deployments with real cost data.]]></summary></entry><entry><title type="html">I Built a Benchmark That Proves Most LLM Agents Are Statistically Blind — And Why That Costs Companies Real Money</title><link href="https://venkatamanideep.com/blog/2025/llm-agents-statistically-blind/" rel="alternate" type="text/html" title="I Built a Benchmark That Proves Most LLM Agents Are Statistically Blind — And Why That Costs Companies Real Money"/><published>2025-04-11T00:00:00+00:00</published><updated>2025-04-11T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/llm-agents-statistically-blind</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/llm-agents-statistically-blind/"><![CDATA[]]></content><author><name></name></author><category term="llm"/><category term="ai"/><category term="machinelearning"/><summary type="html"><![CDATA[How a production evaluation system surfaced a critical LLM agent reliability failure — and what it costs when you deploy blind.]]></summary></entry><entry><title type="html">Everyone Is Calling It Prompt Engineering. They’re Already Behind.</title><link href="https://venkatamanideep.com/blog/2025/beyond-prompt-engineering/" rel="alternate" type="text/html" title="Everyone Is Calling It Prompt Engineering. They’re Already Behind."/><published>2025-04-10T00:00:00+00:00</published><updated>2025-04-10T00:00:00+00:00</updated><id>https://venkatamanideep.com/blog/2025/beyond-prompt-engineering</id><content type="html" xml:base="https://venkatamanideep.com/blog/2025/beyond-prompt-engineering/"><![CDATA[]]></content><author><name></name></author><category term="ai"/><category term="llm"/><summary type="html"><![CDATA[Why context engineering is replacing prompt engineering in production AI systems — and what to build instead.]]></summary></entry></feed>