Fireworks AI Update Detected (2026-03-27)

2026-03-27Fireworks AI


What changed

  • Excited to launch a multi-year partnership bringing Fireworks to Microsoft Azure Foundry! Learn more Pricing to seamlessly scale from idea to enterprise Start building in seconds, self-serve. Contact us for enterprise deployments with faster speeds, lower costs, and higher rate limits. Get started Contact Us Serverless Inference Get started in seconds with per token pricing, zero setup and no cold starts See Pricing Fine Tuning Customize open models with your own data with minimal setup See Pricing On Demand Deployments Pay per GPU second for faster speeds, higher rate limits, and lower costs at scale See Pricing Serverless Pricing Pay per token, with high rate limits and postpaid billing. Get started with $1 in free credits. Text and Vision Base model $ / 1M tokens Less than 4B parameters $0.10 4B - 16B parameters $0.20 More than 16B parameters $0.90 MoE 0B - 56B parameters (e.g. Mixtral 8x7B) $0.50 MoE 56.1B - 176B parameters (e.g. DBRX, Mixtral 8x22B) $1.20 DeepSeek V3 family $0.56 input, $1.68 output GLM-4.7 $0.60 input, $2.20 output GLM-5 $1.00 input, $0.20 cached input, $3.20 output Qwen3 VL 30B A3B $0.15 input, $0.60 output Kimi K2 Instruct , Kimi K2 Thinking $0.60 input, $2.50 output Kimi K2.5 $0.60 input, $0.10 cached input, $3.00 output Kimi K2.5 Turbo $0.99 input, $0.16 cached input, $4.94 output OpenAI gpt-oss-120b $0.15 input, $0.60 output OpenAI gpt-oss-20b $0.07 input, $0.30 output MiniMax M2 family $0.30 input, $0.03 cached input, $1.20 output • Cached input tokens are priced at 50% for all text and vision language models, unless otherwise specified above • Batch inference is priced at 50% of our serverless pricing for both input and output tokens. Learn more here . Speech to Text (STT) Pay per second of audio input Model $ / audio minute (billed per second) Whisper-v3-large $0.0015 Whisper-v3-large-turbo $0.0009 • Diarization adds a 40% surcharge to pricing • Batch API prices are reduced 40% Image Generation Image model name $ / step Approx $ / image All Non-Flux Models (SDXL, Playground, etc) $0.00013 per step ($0.0039 per 30 step image) $0.0002 per step ($0.006 per 30 step image) FLUX.1 [dev] $0.0005 per step ($0.014 per 28 step image) N/A on serverless FLUX.1 [schnell] $0.00035 per step ($0.0014 per 4 step image) N/A on serverless FLUX.1 Kontext Pro $0.04 per image N/A FLUX.1 Kontext Max $0.08 per image N/A • All models besides the Flux Kontext models are charged by the number of inference steps (denoising iterations). The Flux Kontext models are charged a flat rate per generated image. Embeddings Base model parameter count $ / 1M input tokens up to 150M $0.008 150M - 350M $0.016 Qwen3 8B $0.1 Fine Tuning Pricing Serve fine-tuned models for the same price as base models. Supervised & Preference Fine Tuning Priced per 1M training tokens Base Model Supervised Fine Tuning Direct Preference Optimization Models up to 16B parameters $0.50 $1.00 Models 16.1B - 80B $3.00 $6.00 Models 80B - 300B (e.g. Qwen3-235B, gpt-oss-120B) $6.00 $12.00 Models >300B (e.g. DeepSeek V3, Kimi K2) $10.00 $20.00 • SFT and DPO prices are shown in $ per 1M training tokens. Training tokens can be estimated with number of tokens in training dataset * number of epochs. Estimation should be multiplied by the average number conversation turns /2 for tuning with intermediate thinking traces. • Please note that when fine-tuning with reasoning traces, including the reasoning_content field for assistant turns will increase the total number of tuned tokens because multi-turn conversations are unrolled into user, assistant, and thinking traces. For further details, please refer to example 2 in the documentation about SFT fine tuning . • Fine-tuning with images (VLM supervised fine-tuning) is also billed per 1M tokens. See this FAQ on calculating image tokens. Reinforcement Fine Tuning Reinforcement fine tuning jobs are priced per GPU hour (billed per second), at the same price as Fireworks on-demand deployment. Please see the section below for details on RFT pricing. On-Demand Pricing Pay per GPU second, with no extra charges for start-up times On demand deployments GPU Type $ / hour (billed per second) A100 80 GB GPU $2.90 H100 80 GB GPU $6.00 H200 141 GB GPU $6.00 B200 180 GB GPU $9.00 • For estimates of per-token prices, see this blog . Results vary by use case, but we often observe improvements like ~250% higher throughput and 50% faster speed on Fireworks compared to open source inference engines. Fireworks - Pricing

Why it matters

See raw diff.

Evidence Snippets

Before:

Fireworks - Pricing Excited to launch a multi-year partnership bringing Fireworks to Microsoft Azure Foundry! Learn more Pricing to seamlessly scale from idea to enterprise Start building in seconds, self-serve. Contact us for enterprise de...

After:

Excited to launch a multi-year partnership bringing Fireworks to Microsoft Azure Foundry! Learn more Pricing to seamlessly scale from idea to enterprise Start building in seconds, self-serve. Contact us for enterprise deployments with faste...

🛠️ Specialist Infrastructure Alert

Is your internal system ready for this change? Don't wait for the next price hike or terms shift to catch you off guard.

👉 Download the $7 Spark Audit Checklist

Developed by Grid Logic Technical Intelligence.