CostGuard

Production-grade LLM reliability proxy with cost tracking, circuit breakers, and observability

CostGuard is an open-source reliability layer for LLM-powered applications. It operates as middleware between your code and LLM providers, offering real-time response validation, automatic fallbacks, precise cost tracking, and dataset benchmarking — without requiring teams to build their own evaluation infrastructure.

Key Features

  • Real-time Response Filtering — Every LLM response receives a validity score using RDAB-calibrated heuristics before being returned. Responses below your threshold are automatically rejected and retried via a fallback model chain.
  • Per-Provider Circuit Breakers — Automatically stops requests to failing providers using CLOSED/OPEN/HALF_OPEN state management.
  • Exact Cost Tracking — Per-call token accounting across 12 models and 5 providers, calculated to $0.000001 precision.
  • Dataset Benchmarking — Upload CSV or Parquet files to benchmark all available models using RealDataAgentBench (1,180+ runs across 39 tasks).
  • 6-Type Alerting Engine — Validity drops, cost spikes, failure rates, and circuit breaker events route to Slack or custom webhooks.
  • Prometheus Metrics + Grafana — Full observability with request-ID tracing, rate limiting, and structured logging.

Impact

Cuts LLM spend 10–20× vs GPT-4o defaults. Integrated 12 models across 5 providers with $0.000001 precision cost tracking.

Tech Stack

Python · FastAPI · Streamlit · Docker · Prometheus · SQLite · Grafana · Pytest · AWS (EC2, ECR, S3)

GitHub