AI Spending Cap Simulator: Test 3-Level Budget Controls
An AI agent with API access and no spending limit is a billing event waiting to happen. One misconfigured loop, one runaway prompt chain, and your balance drops by hundreds of dollars before the error surfaces. Most AI platforms offer no protection against this. FairStack’s 3-level spending cap system lets you set hard budget limits at the organization, project, and API key level — and this simulator lets you test those limits before you deploy.
Simulator
[Interactive component: Set org cap, project cap, API key cap. Run scenarios that trigger cap enforcement. See exactly which cap fires and when.]
The simulator models FairStack’s 3-level spending cap system with four pre-built scenarios and a custom mode. Set your caps, run the scenario, and watch how budget enforcement stops overspending at the exact threshold you define.
Why Spending Caps Matter for AI Agents
AI agents are autonomous. That is the point. But autonomy without budget constraints is a design flaw, not a feature.
The agent billing problem has three parts:
1. Agents generate in loops. A content pipeline that generates 10 images per article, processes 50 articles, and retries failures can execute 500+ generation calls in a single run. At $0.05 per image, that is $25. At $0.40 per video clip, the same loop costs $200. Without caps, the agent has no reason to stop.
2. Errors compound silently. A bug that sends malformed prompts to a video model still incurs GPU charges. The model runs, produces garbage output, and the agent retries — generating more garbage and more charges. By the time a human notices, the balance is drained.
3. Shared API keys share risk. If three developers share one API key, one developer’s runaway script affects everyone’s budget. Without per-key caps, there is no isolation between workloads.
No other AI generation platform solves all three problems. Replicate has no spending caps. fal.ai has no spending caps. ElevenLabs has usage limits tied to subscription tiers, not configurable caps. Runway has no programmatic budget enforcement.
FairStack built spending caps as a core infrastructure feature, not an afterthought.
How FairStack’s 3-Level System Works
FairStack enforces spending limits at three nested levels. Each level is independent — the tightest cap at any level wins.
Organization Cap ($500/month)
└── Project Cap ($100/month)
└── API Key Cap ($20/month)
Level 1: Organization Cap
The top-level safety net. Applies to all spending across the entire organization — web app, API, MCP server, every project, every key.
# Set organization spending cap via API
curl -X PATCH https://api.fairstack.ai/v1/org/settings \
-H "Authorization: Bearer fs_admin_key" \
-d '{"spending_cap_monthly_micro": 500000000}'
# 500,000,000 microdollars = $500/month
When the org cap is hit, every generation request across the organization returns a 429 SPENDING_CAP_REACHED error. No generation proceeds. No credits are deducted.
Use case: Company-wide maximum. “We will never spend more than $500/month on AI generation, regardless of which team or project is generating.”
Level 2: Project Cap
Isolates budget by workload. Each project within an organization can have its own monthly spending limit.
# Create a project with a $100/month cap
curl -X POST https://api.fairstack.ai/v1/projects \
-H "Authorization: Bearer fs_admin_key" \
-d '{
"name": "Marketing Content Pipeline",
"cap_monthly_micro": 100000000,
"cap_per_request_micro": 500000
}'
# $100/month cap + $0.50 max per single generation
Project caps include two controls:
- Monthly cap: Total spending limit for the project per billing cycle
- Per-request cap: Maximum cost for any single generation. Prevents expensive model calls (a $1.20 Sora 2 Pro generation) from eating the budget in one shot.
Use case: Budget isolation between teams. “Marketing can spend $100/month. Engineering R&D can spend $200/month. Neither team’s spending affects the other’s cap.”
Level 3: API Key Cap
The most granular control. Each API key can have its own spending limits, independent of the project and org caps.
# Create an API key with a $20/month cap
curl -X POST https://api.fairstack.ai/v1/api-keys \
-H "Authorization: Bearer fs_admin_key" \
-d '{
"name": "content-agent-prod",
"cap_total_micro": 20000000,
"cap_per_request_micro": 100000,
"allowed_modalities": ["image", "voice"]
}'
# $20 total cap + $0.10 max per request + only image and voice allowed
API key caps add a third control beyond monthly and per-request:
- Total cap: Lifetime spending limit for this key (not monthly — total)
- Per-request cap: Maximum cost per generation
- Modality restrictions: Limit which generation types the key can access (image, voice, video, music)
Use case: Agent-specific limits. “This agent can spend $20 total on image and voice generation. It cannot generate video. No single generation can cost more than $0.10.”
How the Three Levels Interact
Caps are evaluated in order: API key cap first, then project cap, then org cap. The tightest limit at any level blocks the request.
Request: Generate an image ($0.046)
Check 1: API key cap
└── Per-request: $0.046 < $0.10 limit ✅
└── Total spent: $15.20 < $20.00 limit ✅
Check 2: Project cap
└── Per-request: $0.046 < $0.50 limit ✅
└── Monthly spent: $87.30 < $100.00 limit ✅
Check 3: Org cap
└── Monthly spent: $412.50 < $500.00 limit ✅
Result: Generation proceeds. Credits deducted after completion.
If any check fails:
Request: Generate a video ($0.345)
Check 1: API key cap
└── Per-request: $0.345 > $0.10 limit ❌ BLOCKED
Result: 429 SPENDING_CAP_REACHED
"error": "per_request_cap_exceeded",
"cap_type": "api_key",
"cap_limit_micro": 100000,
"request_cost_micro": 345000
The response tells the agent exactly which cap was hit and why, so it can adjust its behavior programmatically.
Pre-Built Scenarios
Try these in the simulator to see cap enforcement in action.
Scenario 1: Content Pipeline
Setup: Marketing team running a blog content pipeline that generates hero images, social cards, and voice narration for each article.
| Cap | Limit |
|---|---|
| Org | $500/month |
| Project: “Blog Content” | $150/month |
| API Key: “content-bot” | $30/month, $0.10/request |
Run: Agent processes 20 articles, generating 3 images ($0.023 each) and 1 voice narration ($0.05) per article. Total cost: $2.78. All caps hold comfortably.
Then: Agent encounters a bug and enters a retry loop on video generation at $0.345/clip. The per-request API key cap ($0.10) blocks the video call immediately. Zero budget wasted on the bug.
Scenario 2: Multi-Team Isolation
Setup: Engineering org with three teams sharing one FairStack account.
| Cap | Limit |
|---|---|
| Org | $1,000/month |
| Project: “Product Photos” | $300/month |
| Project: “Voice Assistant” | $200/month |
| Project: “R&D Experiments” | $100/month |
Run: Product Photos team generates 5,000 images in a batch ($115 total). Voice Assistant runs normally ($45/month). R&D experiments hit their $100 cap mid-month. R&D stops generating; the other two projects continue unaffected.
Scenario 3: Agent Budget Guardrails
Setup: AI agent with MCP access processing customer requests autonomously.
| Cap | Limit |
|---|---|
| Org | $200/month |
| Project: “Customer Service” | $50/month |
| API Key: “cs-agent” | $5/day (via total cap rotation), $0.05/request, image-only |
Run: Agent handles 100 customer image requests per day at $0.005-0.04 each. The per-request cap prevents the agent from accidentally calling expensive models (Imagen 4 at $0.046 would be blocked by the $0.05 cap, Nano Banana Pro at $0.10 would be blocked). The daily rotation cap limits exposure to $5/day regardless of volume.
Scenario 4: Cost Simulation Before Execution
Setup: Developer building a pipeline that estimates costs before running.
# Step 1: Estimate the batch cost
curl -X POST https://api.fairstack.ai/v1/estimate \
-H "Authorization: Bearer fs_your_api_key" \
-d '{
"model": "flux-schnell",
"params": {"prompt": "product photo"},
"batch_size": 500
}'
# Response
{
"per_request": {
"estimated_cost_micro": 3450,
"estimated_cost_display": "$0.0035"
},
"batch": {
"total_cost_micro": 1725000,
"total_cost_display": "$1.73"
},
"caps": {
"api_key_remaining_micro": 18275000,
"project_remaining_micro": 98275000,
"org_remaining_micro": 498275000,
"would_exceed_cap": false
}
}
The estimate endpoint returns remaining budget at all three cap levels, so the agent knows before executing whether the batch will fit within its limits.
No Other Platform Does This
| Feature | FairStack | Replicate | fal.ai | ElevenLabs | Runway |
|---|---|---|---|---|---|
| Organization-level cap | Yes | No | No | No | No |
| Project-level cap | Yes | No | No | No | No |
| API key-level cap | Yes | No | No | No | No |
| Per-request cap | Yes | No | No | No | No |
| Modality restrictions per key | Yes | No | No | No | No |
| Cost estimation before execution | Yes | No | No | No | No |
| Cap hit details in error response | Yes | N/A | N/A | N/A | N/A |
Replicate, fal.ai, ElevenLabs, and Runway all rely on the same model: set a credit card limit and hope for the best. None offer programmatic, hierarchical budget enforcement. None return remaining budget in cost estimates. None let you restrict which modalities a specific API key can access.
For developers building autonomous AI systems, this is not a convenience feature. It is a production requirement.
API Code Example: Simulate Before You Execute
The complete flow for an agent that checks its budget before generating:
import requests
FAIRSTACK_API = "https://api.fairstack.ai/v1"
API_KEY = "fs_your_api_key"
headers = {"Authorization": f"Bearer {API_KEY}"}
def generate_with_budget_check(model: str, prompt: str, max_cost: float = 0.10):
"""Generate only if estimated cost is within budget."""
# Step 1: Estimate cost
estimate = requests.post(f"{FAIRSTACK_API}/estimate", headers=headers, json={
"model": model,
"params": {"prompt": prompt}
}).json()
cost = estimate["per_request"]["estimated_cost_micro"] / 1_000_000
# Step 2: Check against local budget
if cost > max_cost:
return {"error": f"Estimated cost ${cost:.4f} exceeds local limit ${max_cost}"}
# Step 3: Check against platform caps
if estimate["caps"]["would_exceed_cap"]:
return {"error": "Would exceed platform spending cap",
"details": estimate["caps"]}
# Step 4: Generate
result = requests.post(f"{FAIRSTACK_API}/image/generate", headers=headers, json={
"model": model,
"prompt": prompt
}).json()
return result
# Usage
result = generate_with_budget_check("flux-schnell", "a product photo of headphones")
This pattern — estimate, check, generate — is the standard for any agent that needs cost control. FairStack provides the infrastructure; your agent provides the logic.
FAQ
Can I change caps after setting them?
Yes. Caps can be updated at any time via the API or the web dashboard. Changes take effect immediately for the next generation request.
What happens when a cap resets?
Organization and project caps reset at the start of each billing cycle (monthly). API key total caps do not reset — they are lifetime limits. To create rolling daily or weekly caps, rotate API keys programmatically or use the project monthly cap as a proxy.
Do caps apply to the web app too?
Organization and project caps apply to all generation methods: web app, REST API, and MCP server. API key caps apply only to API and MCP requests (the web app uses session auth, not API keys).
Can an agent query its remaining budget?
Yes. The /v1/estimate endpoint returns remaining budget at all three cap levels in every response. Agents can also call /v1/org/usage for a full usage summary.
What is the MCP server?
MCP (Model Context Protocol) is a standard for AI agents to interact with external tools. FairStack’s MCP server lets Claude, GPT, and other AI agents generate images, voice, video, and music with full spending cap enforcement. Read the agent documentation for setup instructions.
Try the simulator with your own cap configuration. Set your org, project, and API key limits, then run scenarios to see exactly when and how caps fire. When you are ready, create an account and configure real caps in under 5 minutes. Or read the agent documentation for the full MCP and API setup guide.