Google updates Gemini 2.5 Flash models to deliver faster responses and improved performance

summary
Summary

Google has released new preview versions of its lightweight Gemini 2.5 Flash and Flash Lite models. Both are still in the experimental phase but now offer faster response times, handle multimedia more efficiently, and can tackle more complex tasks.

According to the Artificial Analysis Index, the updated models deliver better benchmark performance with higher response speeds, thanks to lower token usage. Pricing remains the same, but reduced token consumption means deploying the models is now less expensive.

The new Gemini 2.5 Flash models process tasks faster and use less compute. | Image: Google

According to Google, the latest Gemini 2.5 Flash Lite is better at following complex instructions and system prompts. It now produces shorter, more accurate answers, which helps lower token costs and reduce latency. Google also claims improvements in multimodal tasks, including audio transcription, image analysis, and translation quality.

For the larger Gemini 2.5 Flash model, Google says it has been updated to use external tools more effectively in complex, multi-step tasks. The company reports an increase on the SWE-Bench Verified Benchmark, from 48.9 to 54 percent. This benchmark measures how well AI models can solve real-world software development problems.

Ad

Efficiency has improved for both models: with the “Thinking” feature enabled, the model delivers higher quality output while using fewer tokens, which speeds up responses and lowers costs.

The new Flash models consume fewer thinking tokens. | Image: Google

How to use Gemini 2.5 Flash and Flash Lite

The new models are now available in Google AI Studio and Vertex AI. To try them, use the model ID gemini-2.5-flash-lite-preview-09-2025 for Flash Lite or gemini-2.5-flash-preview-09-2025 for Flash.

Google is also rolling out an alias system for easier access to the latest versions. The aliases gemini-flash-latest and gemini-flash-lite-latest always point to the newest models, so you don’t have to update model names manually each time.

However, Google notes that pricing, features, and rate limits for these aliases may change as new versions are released. If you need stable conditions, stick with the fixed model names: gemini-2.5-flash and gemini-2.5-flash-lite. Google will announce any changes to the “-latest” aliases at least two weeks in advance.

Model Input tokens (text/image/video) Input tokens (audio) Output tokens (including “thinking tokens”)
Gemini 2.5 Flash $0.30 per 1 million $1.00 per 1 million $2.50 per 1 million
Gemini 2.5 Flash Lite $0.10 per 1 million $0.30 per 1 million $0.40 per 1 million

Batch API requests are billed at 50 percent of the standard interactive price. Full pricing details are available in the official Gemini API price list.

Recommendation

OpenAI launches o1 and ChatGPT Pro for $200 per month

OpenAI launches o1 and ChatGPT Pro for $200 per month

Continue Reading