The AI Inference platform
Workers AI lets you run AI inference globally with one API call. No GPUs to manage, no capacity planning. Just intelligent machine learning models running where they're needed, on Cloudflare's global network.
Serverless pricing
Rich model catalog
Widely compatible
Scale up, and down
Inference is hard to predict and spiky in nature, unlike training. GPU utilization is, on average, only 20-40% — with one-third of organizations utilizing less than 15%. Workers AI allows customers to save by only paying for usage. No guessing or committing to hardware that goes unused.
on a hyperscaler
on Cloudflare
AI models easily accessible via code, OpenAI SDK or API
Test, prototype, and evaluate the latest LLMs with the speed and reliability of a production environment, accessible in seconds.
Kimi K2.6
Powerful vision and agentic tool calling model
GLM 4.7 Flash
Rapid multilingual agent with expert tool calling
GPT-OSS-120B
Specialized for coding and debugging
Llama 4 Scout
Balanced generalist for everyday tasks
Run any AI model with one API call
Call any model directly from your code using a single endpoint. Workers AI handles provisioning, scaling, and latency optimization automatically.
const response = await env.AI.run('@cf/moonshotai/kimi-k2.6', { messages: [ { role: 'system', content: 'You are a friendly assistant' }, { role: 'user', content: 'What is the origin of the phrase Hello, World' }, ],});curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/moonshotai/kimi-k2.6 \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \ -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> { const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', { prompt: 'a bengal cat vibe coding to music', seed: Math.floor(Math.random() * 10), }); // Convert from base64 string const binaryString = atob(response.image); // Create byte representation const img = Uint8Array.from(binaryString, (m) => m.codePointAt(0)); return new Response(img, { headers: { 'Content-Type': 'image/jpeg', }, }); },} satisfies ExportedHandler<Env>;curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/black-forest-labs/flux-1-schnell \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ -d '{ "prompt": "cyberpunk cat", "seed": "Random positive integer" }'Practical AI at the Edge
Run real-world AI workloads directly on Cloudflare's global network — from LLMs to image generation and embeddings. No GPU clusters, no orchestration layers — just fast, scalable inference wherever your users are.