Save up to 25%

Run OpenAI

Smarter. Cheaper. Faster.

25%
Lower costs
60%
Faster than PayG
100%
Predictable pricing
import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic({
  apiKey: process.env.HICAP_API_KEY,
  baseURL: 'https://api.hicap.ai/v1',
})

const response = await client.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'Hello, Claude!' }
  ],
  // Same API. 25% less cost.
})
Getting started

Start saving in 3 steps

Switch to HiCap and cut inference costs fast—keep your existing SDK and ship in minutes.

01

Sign up and create your API key

Grab your HiCap API key from the dashboard (you'll use it as an api-key header).

export HICAP_API_KEY="YOUR-API-KEY"
02

Point your SDK to HiCap

Keep the OpenAI SDK—just change the base URL and add the api-key header.

baseURL: "https://api.hicap.ai/v2/openai" defaultHeaders: { "api-key": process.env.HICAP_API_KEY }
03

Choose a model and run requests

Call the standard Chat Completions endpoint and set model to whatever you want to run.

curl https://api.hicap.ai/v2/openai/chat/completions \ -H "Content-Type: application/json" \ -H "api-key: $HICAP_API_KEY" \ -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'
How it works

Reserved capacity.
Pay-as-you-go pricing.

We buy reserved GPU capacity in bulk, then let you tap into it on-demand.
You get the speed of provisioned throughput at a fraction of the cost.

Up to 25% lower costs

Access the same models at a fraction of pay-as-you-go pricing through reserved GPU capacity.

60% faster inference

Provisioned throughput means your requests skip the queue. No cold starts, no throttling.

All major models

GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash—switch between providers with one line of code.

Switch models instantly

Swap between GPT-4, Claude, Gemini with one line. Test different models without rewriting code.

Drop-in replacement

Works with existing OpenAI, Anthropic, and Google SDKs. Just change the base URL.

Enterprise-grade reliability

99.9% uptime SLA. Your requests are load-balanced across multiple providers for redundancy.