DeepInfra Update Detected (2026-03-27)

2026-03-27DeepInfra


What changed

  • Simple Pricing | Machine Learning Infrastructure | Deep Infra We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… Accept Reject NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today! Models Automatic Speech Recognition Embeddings Reranker Text Generation Text To Image Text To Speech Text To Video Zero Shot Image Classification Docs Pricing GPUs Chat DeepStart Blog Feedback Contact Sales Log In Simple Pricing, Deep Infrastructure We have different pricing models depending on the model used. Some of our langauge models offer per token pricing. Most other models are billed for inference execution time. With this pricing model, you only pay for what you use. There are no long-term contracts or upfront costs, and you can easily scale up and down as your business needs change. Contact Sales DeepSeek DeepSeek's models are a suite of advanced AI systems that prioritize efficiency, scalability, and real-world applicability. Model Context $ per 1M input tokens $ per 1M output tokens Actions DeepSeek-V3.2 160k $0.26 / $0.13 cached $0.38 View more DeepSeek-OCR 8k $0.03 $0.10 View more DeepSeek-V3.1-Terminus 160k $0.21 / $0.13 cached $0.79 View more DeepSeek-V3.1 160k $0.21 / $0.13 cached $0.79 View more DeepSeek-V3-0324 160k $0.20 / $0.135 cached $0.77 View more DeepSeek-V3 160k $0.32 $0.89 View more DeepSeek-R1-0528 160k $0.50 / $0.35 cached $2.15 View more DeepSeek-R1-0528-Turbo 32k $1.00 $3.00 View more DeepSeek-R1-Distill-Llama-70B 128k $0.70 $0.80 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions DeepSeek-V3.2 160k $0.26 / $0.13 cached $0.38 View more DeepSeek-OCR 8k $0.03 $0.10 View more DeepSeek-V3.1-Terminus 160k $0.21 / $0.13 cached $0.79 View more DeepSeek-V3.1 160k $0.21 / $0.13 cached $0.79 View more DeepSeek-V3-0324 160k $0.20 / $0.135 cached $0.77 View more DeepSeek-V3 160k $0.32 $0.89 View more DeepSeek-R1-0528 160k $0.50 / $0.35 cached $2.15 View more DeepSeek-R1-0528-Turbo 32k $1.00 $3.00 View more DeepSeek-R1-Distill-Llama-70B 128k $0.70 $0.80 View more Qwen Qwen series offers a comprehensive suite of dense and mixture-of-experts models. Model Context $ per 1M input tokens $ per 1M output tokens Actions Qwen3-Max-Thinking 250k $1.20 / $0.24 cached $6.00 View more Qwen3-Max 250k $1.20 / $0.24 cached $6.00 View more Qwen3-Next-80B-A3B-Instruct 256k $0.09 $1.10 View more Qwen3-Coder-480B-A35B-Instruct-Turbo 256k $0.22 / $0.022 cached $1.00 View more Qwen3-Coder-480B-A35B-Instruct 256k $0.40 $1.60 View more Qwen3-235B-A22B-Thinking-2507 256k $0.23 / $0.20 cached $2.30 View more Qwen3-235B-A22B-Instruct-2507 256k $0.071 $0.10 View more Qwen3-32B 40k $0.08 $0.28 View more Qwen3-30B-A3B 40k $0.08 $0.28 View more Qwen3-14B 40k $0.12 $0.24 View more Qwen2.5-72B-Instruct 32k $0.12 $0.39 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions Qwen3-Max-Thinking 250k $1.20 / $0.24 cached $6.00 View more Qwen3-Max 250k $1.20 / $0.24 cached $6.00 View more Qwen3-Next-80B-A3B-Instruct 256k $0.09 $1.10 View more Qwen3-Coder-480B-A35B-Instruct-Turbo 256k $0.22 / $0.022 cached $1.00 View more Qwen3-Coder-480B-A35B-Instruct 256k $0.40 $1.60 View more Qwen3-235B-A22B-Thinking-2507 256k $0.23 / $0.20 cached $2.30 View more Qwen3-235B-A22B-Instruct-2507 256k $0.071 $0.10 View more Qwen3-32B 40k $0.08 $0.28 View more Qwen3-30B-A3B 40k $0.08 $0.28 View more Qwen3-14B 40k $0.12 $0.24 View more Qwen2.5-72B-Instruct 32k $0.12 $0.39 View more Llama 4 The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. Model Context $ per 1M input tokens $ per 1M output tokens Actions Llama-4-Scout-17B-16E 320k $0.08 $0.30 View more Llama-4-Maverick-17B-128E 1024k $0.15 $0.60 View more Llama-Guard-4-12B 160k $0.18 $0.18 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions Llama-4-Scout-17B-16E 320k $0.08 $0.30 View more Llama-4-Maverick-17B-128E 1024k $0.15 $0.60 View more Llama-Guard-4-12B 160k $0.18 $0.18 View more Llama 3 Meta Llama 3 are a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes. Model Context $ per 1M input tokens $ per 1M output tokens Actions Llama-3.3-70B-Instruct-Turbo 128k $0.10 $0.32 View more Llama-3.2-11B-Vision-Instruct 128k $0.049 $0.049 View more Meta-Llama-3.1-70B-Instruct 128k $0.40 $0.40 View more Meta-Llama-3.1-70B-Instruct-Turbo 128k $0.40 $0.40 View more Meta-Llama-3.1-8B-Instruct 128k $0.02 $0.05 View more Meta-Llama-3.1-8B-Instruct-Turbo 128k $0.02 $0.03 View more Meta-Llama-3-8B-Instruct 8k $0.03 $0.04 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions Llama-3.3-70B-Instruct-Turbo 128k $0.10 $0.32 View more Llama-3.2-11B-Vision-Instruct 128k $0.049 $0.049 View more Meta-Llama-3.1-70B-Instruct 128k $0.40 $0.40 View more Meta-Llama-3.1-70B-Instruct-Turbo 128k $0.40 $0.40 View more Meta-Llama-3.1-8B-Instruct 128k $0.02 $0.05 View more Meta-Llama-3.1-8B-Instruct-Turbo 128k $0.02 $0.03 View more Meta-Llama-3-8B-Instruct 8k $0.03 $0.04 View more Gemini Developed by Google DeepMind, Gemini is a family of state-of-the-art thinking models with native multimodal capabilities Model Context $ per 1M input tokens $ per 1M output tokens Actions gemini-2.5-pro 976k $1.25 $10.00 View more gemini-2.5-flash 976k $0.30 $2.50 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions gemini-2.5-pro 976k $1.25 $10.00 View more gemini-2.5-flash 976k $0.30 $2.50 View more Gemma Gemma is a family of lightweight, state-of-the-art open models from Google. Model Context $ per 1M input tokens $ per 1M output tokens Actions gemma-3-27b-it 128k $0.08 $0.16 View more gemma-3-12b-it 128k $0.04 $0.13 View more gemma-3-4b-it 128k $0.04 $0.08 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions gemma-3-27b-it 128k $0.08 $0.16 View more gemma-3-12b-it 128k $0.04 $0.13 View more gemma-3-4b-it 128k $0.04 $0.08 View more Nemotron NVIDIA Nemotron is a family of open models customized for efficiency, accuracy, and specialized workloads. Model Context $ per 1M input tokens $ per 1M output tokens Actions NVIDIA-Nemotron-3-Super-120B-A12B 256k $0.10 / $0.10 cached $0.50 View more Nemotron-3-Nano-30B-A3B 256k $0.05 $0.20 View more NVIDIA-Nemotron-Nano-12B-v2-VL 128k $0.20 $0.60 View more Llama-3.1-Nemotron-70B-Instruct 128k $1.20 $1.20 View more Llama-3.3-Nemotron-Super-49B-v1.5 128k $0.10 $0.40 View more NVIDIA-Nemotron-Nano-9B-v2 128k $0.04 $0.16 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions NVIDIA-Nemotron-3-Super-120B-A12B 256k $0.10 / $0.10 cached $0.50 View more Nemotron-3-Nano-30B-A3B 256k $0.05 $0.20 View more NVIDIA-Nemotron-Nano-12B-v2-VL 128k $0.20 $0.60 View more Llama-3.1-Nemotron-70B-Instruct 128k $1.20 $1.20 View more Llama-3.3-Nemotron-Super-49B-v1.5 128k $0.10 $0.40 View more NVIDIA-Nemotron-Nano-9B-v2 128k $0.04 $0.16 View more Claude Developed by Anthropic, Claude is a family of highly performant, trustworthy AI models built for complex reasoning, advanced coding, and nuanced language understanding Model Context $ per 1M input tokens $ per 1M output tokens Actions claude-4-opus 195k $16.50 $82.50 View more claude-4-sonnet 195k $3.30 $16.50 View more claude-3-7-sonnet-latest 195k $3.30 / $0.33 cached $16.50 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions claude-4-opus 195k $16.50 $82.50 View more claude-4-sonnet 195k $3.30 $16.50 View more claude-3-7-sonnet-latest 195k $3.30 / $0.33 cached $16.50 View more Phi Phi models offer cost-effective, high-performance AI solutions. Model Context $ per 1M input tokens $ per 1M output tokens Actions phi-4 16k $0.07 $0.14 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions phi-4 16k $0.07 $0.14 View more Mistral Developed by Mistral AI, a leading French research lab, Mistral is a family of open-source AI models built for multilingual excellence, advanced reasoning, and cost-effective performance Model Context $ per 1M input tokens $ per 1M output tokens Actions Mistral-Small-3.2-24B-Instruct-2506 125k $0.075 $0.20 View more Mistral-Small-24B-Instruct-2501 32k $0.05 $0.08 View more Mistral-Nemo-Instruct-2407 128k $0.02 $0.04 View more Mixtral-8x7B-Instruct-v0.1 32k $0.54 $0.54 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions Mistral-Small-3.2-24B-Instruct-2506 125k $0.075 $0.20 View more Mistral-Small-24B-Instruct-2501 32k $0.05 $0.08 View more Mistral-Nemo-Instruct-2407 128k $0.02 $0.04 View more Mixtral-8x7B-Instruct-v0.1 32k $0.54 $0.54 View more Voxtral Voxtral is a family of audio models with state-of-the-art speech to text capabilities. Model $ per minute of audio input Actions Voxtral-Small-24B-2507 $0.00300 View more Voxtral-Mini-3B-2507 $0.00100 View more Model $ per minute of audio input Actions Voxtral-Small-24B-2507 $0.00300 View more Voxtral-Mini-3B-2507 $0.00100 View more Mixture of experts Mixture of expert models split the computations into multiple expert subnetworks providing a strong performance. Model Context $ per 1M input tokens $ per 1M output tokens Actions Mixtral-8x7B-Instruct-v0.1 32k $0.54 $0.54 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions Mixtral-8x7B-Instruct-v0.1 32k $0.54 $0.54 View more Less than 10 billion parameters Our fastest and best value models but they might not be so precise. Model Context $ per 1M input tokens $ per 1M output tokens Actions Meta-Llama-3-8B-Instruct 8k $0.03 $0.04 View more Meta-Llama-3.1-8B-Instruct 128k $0.02 $0.05 View more gemma-3-4b-it 128k $0.04 $0.08 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions Meta-Llama-3-8B-Instruct 8k $0.03 $0.04 View more Meta-Llama-3.1-8B-Instruct 128k $0.02 $0.05 View more gemma-3-4b-it 128k $0.04 $0.08 View more Between 10 and 70 billion parameters Models that are fine-tuned for a balance between speed and precision. Model Context $ per 1M input tokens $ per 1M output tokens Actions MythoMax-L2-13b 4k $0.40 $0.40 View more gemma-3-27b-it 128k $0.08 $0.16 View more gemma-3-12b-it 128k $0.04 $0.13 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions MythoMax-L2-13b 4k $0.40 $0.40 View more gemma-3-27b-it 128k $0.08 $0.16 View more gemma-3-12b-it 128k $0.04 $0.13 View more 70 billion parameters and up Models are our most capable models capable of handling complex tasks but also our most expensive and might be slower to respond. Model Context $ per 1M input tokens $ per 1M output tokens Actions Meta-Llama-3.1-70B-Instruct 128k $0.40 $0.40 View more Model Context $ per 1M input tokens $ per 1M output tokens Actions Meta-Llama-3.1-70B-Instruct 128k $0.40 $0.40 View more Flux Developed by Black Forest Labs, Flux is a family of state-of-the-art image generation and editing models that deliver exceptional visual quality with breakthrough prompt accuracy and photorealism. Model $ per image Actions FLUX-2-dev $0.01 x (w / 1024) x (h / 1024) x (iters / 28) View more FLUX.1-Kontext-dev $0.01 x (w / 1024) x (h / 1024) x (iters / 25) View more FLUX-2-klein-9b $0.015 x (w / 1024) x (h / 1024) View more FLUX-2-klein-4b $0.014 x (w / 1024) x (h / 1024) View more FLUX-2-max $0.07 View more FLUX-2-pro $0.015 View more FLUX-1-Redux-dev $0.012 x (w / 1024) x (h / 1024) x (iters / 25) View more FLUX-1-dev $0.009 x (w / 1024) x (h / 1024) x (iters / 25) View more FLUX-1-schnell $0.0005 x (w / 1024) x (h / 1024) x iters View more FLUX-pro $0.05 View more FLUX-1.1-pro $0.04 View more Model $ per image Actions FLUX-2-dev $0.01 x (w / 1024) x (h / 1024) x (iters / 28) View more FLUX.1-Kontext-dev $0.01 x (w / 1024) x (h / 1024) x (iters / 25) View more FLUX-2-klein-9b $0.015 x (w / 1024) x (h / 1024) View more FLUX-2-klein-4b $0.014 x (w / 1024) x (h / 1024) View more FLUX-2-max $0.07 View more FLUX-2-pro $0.015 View more FLUX-1-Redux-dev $0.012 x (w / 1024) x (h / 1024) x (iters / 25) View more FLUX-1-dev $0.009 x (w / 1024) x (h / 1024) x (iters / 25) View more FLUX-1-schnell $0.0005 x (w / 1024) x (h / 1024) x iters View more FLUX-pro $0.05 View more FLUX-1.1-pro $0.04 View more Custom LLMs You can deploy your own model on our hardware and pay for uptime. You get dedicated SXM-connected GPUs (for multi-GPU setups), automatic scaling to handle load fluctuations and a very competitive price. Read More Dedicated A100, H100, H200, B200 and B300 GPUs for your custom LLM needs Billed in minute granularity Invoiced weekly Deploy GPU Memory Price A100 80GB $0.89 / GPU-hour H100 80GB $1.79 / GPU-hour H200 141GB $2.19 / GPU-hour B200 180GB $2.79 / GPU-hour B300 270GB $4.20 / GPU-hour Dedicated Instances and Clusters For dedicated instances, DGX H100, B200 and B300 clusters with 3.2Tbps bandwidth, please contact us at dedicated@deepinfra.com Embeddings Pricing Model Context $ per 1M input tokens bge-base-en-v1.5 512 $0.005 bge-en-icl 8k $0.01 bge-large-en-v1.5 512 $0.01 bge-m3 8k $0.01 bge-m3-multi 8k $0.01 gte-base 512 $0.005 gte-large 512 $0.01 e5-base-v2 512 $0.005 e5-large-v2 512 $0.01 multilingual-e5-large 512 $0.01 multilingual-e5-large-instruct 512 $0.01 all-MiniLM-L12-v2 512 $0.005 all-MiniLM-L6-v2 512 $0.005 all-mpnet-base-v2 512 $0.005 multi-qa-mpnet-base-dot-v1 512 $0.005 paraphrase-MiniLM-L6-v2 512 $0.005 text2vec-base-chinese 512 $0.005 Hardware All models run on H100 or A100 GPUs, optimized for inference performance and low latency. Auto Scaling Our system will automatically scale the model to more hardware based on your needs. We limit each account to 200 concurrent requests. If you want more drop us a line Billing You have to add a card or pre-pay or you won't be able to use our services. An invoice is always generated at the beginning of the month, and also throughout the month if you hit your tier invoicing threshold. You can also set a spending limit to avoid surprises. Usage Tiers Every user is part of a usage tier. As your usage and your spending goes up, we automatically move you to the next usage tier. Every tier has an invoicing threshold. Once reached an invoice is automatically generated. Tier Qualification & Invoicing Threshold $ Tier 1 $20 Tier 2 $100 paid $100 Tier 3 $500 paid $500 Tier 4 $2,000 paid $2,000 Tier 5 $10,000 paid $10,000 Have questions or need a custom solution? Contact Sales Company Pricing Docs Compare DeepStart About Careers Contact us Trust Center DeepGPT Latest Models deepseek-ai / DeepSeek-V3.1 zai-org / GLM-4.6 moonshotai / Kimi-K2-Instruct-0905 anthropic / claude-3-7-sonnet-latest deepseek-ai / DeepSeek-V3.2-Exp Featured Models Qwen / Qwen3-Max stepfun-ai / Step-3.5-Flash black-forest-labs / FLUX-2-klein-9b Qwen / Qwen3.5-9B ResembleAI / chatterbox-turbo © 2026 Deep Infra. All rights reserved. Privacy Policy Terms of Service

Why it matters

Consistency in pricing and model information ensures accurate comparisons and helps users make informed decisions.

Evidence Snippets

Before:

Simple Pricing | Machine Learning Infrastructure | Deep Infra We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… Accept Reject NVI...

After:

Simple Pricing | Machine Learning Infrastructure | Deep Infra We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic… Accept Reject NVI...

🛠️ Specialist Infrastructure Alert

Is your internal system ready for this change? Don't wait for the next price hike or terms shift to catch you off guard.

👉 Download the $7 Spark Audit Checklist

Developed by Grid Logic Technical Intelligence.