The AI Inference platform

Workers AI lets you run AI inference globally with one API call. No GPUs to manage, no capacity planning. Just intelligent machine learning models running where they're needed, on Cloudflare's global network.

Serverless pricing

Pay-per-inference pricing with no idle costs. No guessing what.

Rich model catalog

50+ models running close to users in 200+ cities

Widely compatible

One API call, works with any OpenAI SDK or task type

Scale up, and down

Inference is hard to predict and spiky in nature, unlike training. GPU utilization is, on average, only 20-40% — with one-third of organizations utilizing less than 15%. Workers AI allows customers to save by only paying for usage. No guessing or committing to hardware that goes unused.

What you pay for
on a hyperscaler
What you pay for
on Cloudflare

AI models easily accessible via code, OpenAI SDK or API

Test, prototype, and evaluate the latest LLMs with the speed and reliability of a production environment, accessible in seconds.

Kimi K2.6

Powerful vision and agentic tool calling model

GLM 4.7 Flash

Rapid multilingual agent with expert tool calling

GPT-OSS-120B

Specialized for coding and debugging

Llama 4 Scout

Balanced generalist for everyday tasks

Run any AI model with one API call

Call any model directly from your code using a single endpoint. Workers AI handles provisioning, scaling, and latency optimization automatically.

Background Pattern
const response = await env.AI.run('@cf/moonshotai/kimi-k2.6', {  messages: [    { role: 'system', content: 'You are a friendly assistant' },    { role: 'user', content: 'What is the origin of the phrase Hello, World' },  ],});
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/moonshotai/kimi-k2.6 \  -X POST \  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \  -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'
export interface Env {  AI: Ai;}
export default {  async fetch(request, env): Promise<Response> {    const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {      prompt: 'a bengal cat vibe coding to music',      seed: Math.floor(Math.random() * 10),    });    // Convert from base64 string    const binaryString = atob(response.image);    // Create byte representation    const img = Uint8Array.from(binaryString, (m) => m.codePointAt(0));    return new Response(img, {      headers: {        'Content-Type': 'image/jpeg',      },    });  },} satisfies ExportedHandler<Env>;
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/black-forest-labs/flux-1-schnell  \  -X POST  \  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"  \  -d '{ "prompt": "cyberpunk cat", "seed": "Random positive integer" }'

Practical AI at the Edge

Run real-world AI workloads directly on Cloudflare's global network — from LLMs to image generation and embeddings. No GPU clusters, no orchestration layers — just fast, scalable inference wherever your users are.

Background Pattern