Gemini 3.5 Pro 六月開閘:200 萬 token 與 Deep Think,Google 把「最強」留到最後一手

I/O 2026 上 Google 先放出 3.5 Flash,把旗艦級 Pro 押到六月。如今 3.5 Pro 帶著 200 萬 token 上下文與 Deep Think 推理進入 Vertex AI 企業預覽,定價卻是 Flash 的近十倍。這場「先免費、後收費」的節奏,台灣的開發者與企業該怎麼接?

Gemini 3.5 Pro Opens in June: 2 Million Tokens and Deep Think, Google Saves the Best for Last

On May 19th, I was in the Taipei office watching the live broadcast of Google I/O 2026, waiting for the rumored Gemini 3.5 Pro. However, Google only introduced the 3.5 Flash, and the engineering manager on the main stage briefly mentioned the Pro: "We are also working hard on 3.5 Pro, which is already being used internally, and we expect to release it next month." This "next month" has now come true - in June, Gemini 3.5 Pro has arrived with a number that will take many people's breath away: a 2 million token context window.

I'm breaking down this event because Google's release pace itself is news. First, they released the free Flash to over a billion people worldwide, and then they released the Pro, which requires hardware and charges a fee, a month later, in two separate phases. This is not a technical issue, but a business rhythm - and if Taiwanese teams follow their old habits and "wait for the strongest one to come out," they may misjudge the timeline.

Event Background

Let's clarify the timeline. The Gemini 3.5 series officially debuted on May 19th at Google I/O 2026, with the first release being the 3.5 Flash, which went live globally on the same day. Free users can access it through the Gemini App and Google's AI mode, while developers can use the Gemini API, Google AI Studio, and Antigravity. The title of Google's official blog post that day was "frontier intelligence with action" - emphasizing that this generation of models not only answers questions but also "takes action," highlighting agentic capabilities and long-process tasks.

As for the flagship 3.5 Pro, Google stated in the release announcement that it would be released "next month." As of June 19th, the 3.5 Pro is still in the limited preview phase for Vertex AI enterprise customers and has not yet been released to general consumers through the Gemini App, Google AI Studio, or personal subscription plans. In other words, what you're using on your mobile app now is likely still the Flash, not the Pro.

This gap is important. Many articles in the market equate "I/O announcement" with "available to everyone," but in reality, the Pro is currently only open to existing Vertex AI enterprise accounts, and there is no public self-sign-up entrance - you need to contact Google Cloud's customer manager. During my verification process, I saw some sites comparing it to rumors about certain competing products, but I won't adopt those unconfirmed parts - I'll only discuss verifiable facts.

Key Points This Time

  • 2 million token context window. This is twice that of the Flash (1 million) and one of the largest contexts among publicly announced production-level frontier models. Converted to Chinese, it's roughly equivalent to reading hundreds of millions of characters at once, which can handle an entire set of regulations, an entire program library, or hundreds of pages of financial reports.
  • Deep Think reasoning mode. Notably, its implementation method is as a parameter switch (thinkingConfig) on the API. The same model ID (reported as gemini-3.5-pro-preview-06) handles both standard requests and deep reasoning requests, depending on whether you have the switch turned on. This is a convenient design for developers, but it also means that the token cost of deep reasoning needs to be controlled manually.
  • Two-phase release of Flash and Pro. Flash was released for free globally on the same day, while Pro was delayed by a month and first given to enterprises. Google has essentially separated "cheap, fast, and widespread" from "strongest, most expensive, and prioritizing paid enterprises."
  • Flash's hard indicators. Official data shows that 3.5 Flash achieved 76.2% on Terminal-Bench 2.1, 1656 Elo on GDPval-AA, and 83.6% on MCP Atlas, with an output speed claimed to be four times faster than other frontier models, and it also outperformed the previous generation's 3.1 Pro on multiple coding and agency benchmarks.
  • Pricing is nearly ten times that of Flash. Multiple media outlets quote the Pro's pricing as around $15 per million input tokens and $60 per output token, roughly ten times that of the 3.5 Flash. The first to use Pro will be subscribers to Google's $20 monthly Pro plan and $250 Ultra plan.

Market Impact Analysis

For general Taiwanese users: Honestly, you don't need to worry about "Pro not being on the app" in the short term. Daily tasks like searching for information, writing emails, translating, and organizing meeting records can be handled by the 3.5 Flash, which is free and fast - it's the default in the app, and you're using it when you open it. The scenarios that truly require 2 million tokens are those that need to analyze an entire book, contract, or year's worth of conversations at once, which general users won't encounter often. My suggestion is: first get familiar with the free Flash, and then evaluate whether to upgrade to Pro when it becomes available in the $20 plan - no need to rush.

For Taiwanese enterprise applications: This is the real battlefield for this release. 2 million tokens are a significant incentive for industries like law, accounting, and manufacturing, which need to process technical documents - reading an entire set of SOPs, a batch of claims documents, or an entire project's historical emails at once, without needing to split and reassemble them. However, there are two realities to consider. First, Pro is currently tied to Vertex AI enterprise identities, with no public self-sign-up path, so you need to contact Google Cloud's customer manager, and the lead time for introduction needs to be factored into project scheduling. Second, the nearly tenfold pricing means "if it can be solved with Flash, don't force it to Pro." A more practical architecture is layered: using Flash for large, simple requests and routing only those that truly require deep reasoning or ultra-long context to Pro, keeping costs under control.

For Taiwanese developers: The implementation of Deep Think as a parameter switch is good news for engineering practices - no need to maintain two model routes, one ID can handle everything. However, "easy to turn on" also means "easy to forget to turn off," and deep reasoning will noticeably consume more tokens, with a steeper cost curve than expected. I would suggest setting Deep Think to require explicit conditions to trigger, rather than being always on. Additionally, the preview version of Pro's API may still be adjusted before formal GA, so don't hard-bind core production workflows to the preview version. Those who want to practice first can use the already GA Gemini API and Google AI Studio to set up long context processing workflows and then switch to Pro when it opens.

Future Development Trends

I believe three directions will become clearer in the second half of the year.

First, the "context length" arms race will shift from showing off to practicality. 2 million tokens sound impressive, but the real test is "the accuracy of retrieving information in ultra-long text" - being able to input it doesn't mean being able to find it accurately. The competition among companies will be about who can precisely locate information in ultra-long texts, not just about the size of the context window.

Second, "staged releases" will become a common tactic for major manufacturers. First, use free, sufficient small models to occupy user habits, and then reserve the flagship for paid enterprises - Google has demonstrated this clearly. For competitors like ChatGPT, Claude, and Perplexity, the era of simply comparing parameters is passing; the comparison is about "how good the free layer is and whether the paid layer is worth it."

Third, agentic capabilities will be the next dividing line. Google emphasized "intelligence with action" this time, meaning models need to be able to run multi-step workflows, call tools, and complete tasks on their own. Whoever can make AI "get things done" and not just "talk" will grasp the enterprise budget.

TheAI Academy Summary and Commentary

The June opening of Gemini 3.5 Pro is, on the surface, a model going online, but beneath it is Google's sense of rhythm: using the free Flash to scale and the paid Pro to collect from enterprises. 2 million tokens and Deep Think are real capabilities, but "currently only in Vertex AI enterprise preview" is also a real situation - don't be fooled by the "I/O has been announced" title into thinking it's available to everyone.

Commentary: What's most worth Taiwanese readers learning from this time is not "Google has released a stronger model," but "the same company, with a free version and a flagship version, released a month apart." For individuals, use the free Flash to its fullest first; for enterprises, first understand the introduction threshold and layered costs of Vertex AI, and then decide whether to pursue Pro. Teams that can save money and divert traffic will win, not because of the model, but because of how it's used.

Specific suggestions for Taiwanese readers: 1. Use the free Gemini Flash for daily needs now, no need to wait. 2. If enterprises want Pro, first evaluate whether 2 million tokens are really needed or if Flash can solve it, and don't pay ten times the price for unnecessary length. 3. Developers should first practice long context processing workflows on the already GA API and then switch to Pro when it opens. For more tool comparisons and task breakdowns, refer to our /tools and /tasks.

This article involves third-party product pricing and functionality. The actual content is subject to Google's official announcements and does not constitute any investment or purchasing advice.

Data Sources

Compiled based on public information, with official sources as the standard.

Frequently Asked Questions

Gemini 3.5 Pro 現在一般人能用了嗎?

截至 2026 年六月中,3.5 Pro 仍只開放給 Vertex AI 企業客戶限定預覽,尚未進到消費者版的 Gemini App、Google AI Studio 或個人訂閱方案。一般使用者在 App 裡用到的多半還是已全球免費上線的 3.5 Flash。

Gemini 3.5 Pro 的 200 萬 token 上下文有什麼用?

200 萬 token 約等於數百萬字,可一次讀入整套法規、整個程式庫或幾百頁文件,適合法律、會計、製造業等需要處理超長文件的場景。它是 Flash(100 萬 token)的兩倍,也是目前已公布的量產前沿模型中最大的上下文之一。

Deep Think 模式怎麼運作?

Deep Think 並非另一個模型,而是 API 上的參數開關(thinkingConfig)。同一個模型 ID 依你是否開啟而切換標準或深度推理。方便,但深度推理會明顯增加 token 成本,建議設定觸發條件而非預設全開。

3.5 Flash 跟 3.5 Pro 差在哪、我該用哪個?

Flash 已免費全球上線、速度快、適合多數日常與大量請求;Pro 最強、上下文最長,但定價約為 Flash 十倍且目前僅限企業預覽。務實做法是分層:簡單請求走 Flash,真正需要深度推理或超長上下文時才路由到 Pro。

繁體中文版 →