API Platform / Pay-Per-Call
The pure usage-based pattern: there is no recurring fee, customers pay only for what they consume, and the unit of consumption is something countable (API calls, tokens, messages, minutes, GB). The hard parts are not the pricing curve - they are ingest at scale, deduplication, real-time visibility, and reconciling what the customer thinks they used against what you billed them.
Real-world examples. OpenAI (per-token, varies by model), Twilio (per-message, per-minute, per-recipient-country), Stripe API (per-call quotas, per-charge fees), Anthropic API (per-token), Mapbox (per-tile-load), SendGrid (per-email). Common shape: a published rate card with graduated discounts above thresholds, real-time usage dashboards in the customer portal, monthly invoice for the prior month's aggregated consumption.
The shape of the problem
The deceptive simplicity of "$X per call" hides:
- Event-rate scale. A real API platform receives millions of events per minute. The metering pipeline can't be the bottleneck - and it can't drop events under load, because dropped events become uncollected revenue.
- Idempotency at the edge. Network retries, client-side bugs, and outages all produce duplicate event submissions. Without deduplication, customers get billed for events you can prove they didn't actually originate.
- Real-time vs period totals. Customers expect to see a usage number in their dashboard within seconds of making a call - while the invoice number for the period only finalizes at month end. The same underlying data has to feed both views.
- Tier crossings during a period. Graduated discounts mean the per-unit rate changes mid-period as the customer crosses thresholds. The invoice has to walk the tiers and bill the cheaper tail correctly.
- Multi-dimensional metering. "Per call" is rarely just per call - it's per-call × per-region or per-token × per-model. The metering must split the same event stream multiple ways.
Kontorion blueprint
| Concern | Kontorion primitive |
|---|---|
| High-volume event ingest | POST /v1/transaction/usage-event with idempotency_key |
| Tier-walking pricing | STAIRCASE pricing model on the product |
| Real-time customer visibility | GET /v1/customers/{id}/usage |
| Per-region or per-model breakdown | Aggregator filters on event metadata |
| Unmatched / unknown SKU events | Keyed product with unmatched_price_key_policy |
| Customer-side retry safety | Server-side dedup on idempotency_key per aggregator window |
Build it
1. Define the metered product
Code
2. Configure the aggregator
Code
deduplication_enabled: true ensures that two events with the same idempotency_key within the window count as one.
3. Publish the graduated rate card
Code
Tier unit_amount is a decimal string in the price's currency. Each tier rate applies only to the units inside its band - graduated semantics. Customer making 250,000 calls pays: 10k × $0.10 + 90k × $0.05 + 150k × $0.03 = $9,500.
4. Ingest events at runtime
Code
Critical fields:
idempotency_keyshould be your internal request ID. Submitting the same key twice is a no-op.timestampshould be when the API call actually occurred, not when the event was forwarded - this matters when batched event submission lags.metadatakeys can drive aggregator filters and analytics breakdowns later.
5. Show the customer their real-time usage
Code
Response:
Code
The as_of timestamp is within seconds of the latest ingested event.
Variations
- Multi-dimensional pricing (per-call × per-model). Make the product keyed with
price_key_label: "model". Add one price per model (gpt-4,gpt-4-turbo,gpt-3.5). Events carryprice_key: "gpt-4". The same event stream produces per-model lines on the invoice. - Free tier with overage. Set the first STAIRCASE tier to
unit_amount: "0.00". The customer sees usage but is only billed above the free threshold. - Per-region pricing. Add a
country_codefield on the price for geo-targeted rates, or usemetadata.regionaggregator filters to split a single event stream into per-region products. - Spending caps. Listen for
usage_eventingestion + reconcile against a self-set monthly cap; throttle or notify when the customer crosses 90%.
What you don't have to build
- High-throughput event ingest pipeline (the
/v1/transaction/usage-eventendpoint absorbs the volume) - Idempotency / deduplication on retried submissions (built into the aggregator)
- Real-time aggregation between events and the dashboard query (eventually-consistent within seconds)
- Tier-walking math for graduated pricing (STAIRCASE tiers handle it deterministically)
- Per-region or per-model fan-out (filtered aggregators or keyed products do it without app-layer code)
- Reconciliation between dashboard usage and invoice quantity (same source of truth on both sides)
Next steps
- Usage Metering - aggregators, filters, deduplication
- Pricing Models - STAIRCASE walks vs VOLUME vs PACKAGE
- Keyed Prices - one product, many priced dimensions
- Analytics - per-customer, per-product usage trends