Smarter. Cheaper. Faster.
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic({
apiKey: process.env.HICAP_API_KEY,
baseURL: 'https://api.hicap.ai/v1',
})
const response = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [
{ role: 'user', content: 'Hello, Claude!' }
],
// Same API. 25% less cost.
})Switch to HiCap and cut inference costs fast—keep your existing SDK and ship in minutes.
Grab your HiCap API key from the dashboard (you'll use it as an api-key header).
export HICAP_API_KEY="YOUR-API-KEY"Keep the OpenAI SDK—just change the base URL and add the api-key header.
baseURL: "https://api.hicap.ai/v2/openai"
defaultHeaders: { "api-key": process.env.HICAP_API_KEY }Call the standard Chat Completions endpoint and set model to whatever you want to run.
curl https://api.hicap.ai/v2/openai/chat/completions \
-H "Content-Type: application/json" \
-H "api-key: $HICAP_API_KEY" \
-d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'We buy reserved GPU capacity in bulk, then let you tap into it on-demand.
You get the speed of provisioned throughput at a fraction of the cost.
Access the same models at a fraction of pay-as-you-go pricing through reserved GPU capacity.
Provisioned throughput means your requests skip the queue. No cold starts, no throttling.
GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash—switch between providers with one line of code.
Swap between GPT-4, Claude, Gemini with one line. Test different models without rewriting code.
Works with existing OpenAI, Anthropic, and Google SDKs. Just change the base URL.
99.9% uptime SLA. Your requests are load-balanced across multiple providers for redundancy.