Manideep's Blog

Writing about LLMs, benchmarking, and production AI systems

How to Actually Use Claude: 18 Steps That Unlock 100% of Its Potential

18 practical setup and usage patterns — Projects, custom instructions, extended thinking, prompt caching — that turn Claude into a high-leverage engineering tool rather than a basic chat interface.

1 min read · May 19, 2025

2025 · ai llm tooling
The Most Expensive Mistake in LLM Engineering (And How to Fix It With Data)

Why teams waste budget by selecting LLMs on benchmark scores instead of production metrics — and how to use real evaluation data to route tasks to the right model at the right cost.

1 min read · May 14, 2025

2025 · llm ai tooling performance
KV Caching in LLMs

How transformer attention caching eliminates redundant computation during token generation — and what it means for LLM inference latency, memory trade-offs, and serving infrastructure.

1 min read · May 10, 2025

2025 · llm ai performance
You're Doing RAG Wrong

Rethinking RAG architecture — replacing naive text chunks with structured Q&A packets to improve retrieval accuracy, reduce corpus size, and build more reliable agent pipelines.

1 min read · May 9, 2025

2025 · rag llm ai agents
My LLM App Started Silently Getting Worse. I Almost Didn't Notice. Here's What I Built to Catch It.

Building production observability into LLM apps — drift detection, SQLite audit trails, and Slack alerting to catch silent model degradation before it reaches users.

1 min read · May 4, 2025

2025 · llm ai production tooling