CostGuard
Production-grade LLM reliability proxy with cost tracking, circuit breakers, and observability
CostGuard is an open-source reliability layer for LLM-powered applications. It operates as middleware between your code and LLM providers, offering real-time response validation, automatic fallbacks, precise cost tracking, and dataset benchmarking — without requiring teams to build their own evaluation infrastructure.
Key Features
- Real-time Response Filtering — Every LLM response receives a validity score using RDAB-calibrated heuristics before being returned. Responses below your threshold are automatically rejected and retried via a fallback model chain.
- Per-Provider Circuit Breakers — Automatically stops requests to failing providers using CLOSED/OPEN/HALF_OPEN state management.
- Exact Cost Tracking — Per-call token accounting across 12 models and 5 providers, calculated to $0.000001 precision.
- Dataset Benchmarking — Upload CSV or Parquet files to benchmark all available models using RealDataAgentBench (1,180+ runs across 39 tasks).
- 6-Type Alerting Engine — Validity drops, cost spikes, failure rates, and circuit breaker events route to Slack or custom webhooks.
- Prometheus Metrics + Grafana — Full observability with request-ID tracing, rate limiting, and structured logging.
Impact
Cuts LLM spend 10–20× vs GPT-4o defaults. Integrated 12 models across 5 providers with $0.000001 precision cost tracking.
Tech Stack
Python · FastAPI · Streamlit · Docker · Prometheus · SQLite · Grafana · Pytest · AWS (EC2, ECR, S3)