Model Guide10 min readReviewed Apr 20, 2026

Qwen3.6-Plus: Alibaba's Agentic Coding Model (SWE-Bench 78.8%, Terminal-Bench 61.6%)

Qwen3.6-Plus is Alibaba Cloud's latest agentic coding model, released in April 2026 as part of the Qwen3 family. It scores 78.8% on SWE-Bench Verified and a class-leading 61.6% on Terminal-Bench 2.0, making it one of the strongest publicly benchmarked models for real-world coding and terminal-based agent workflows. With 1M-token context and full multimodal support (text, image, and video), it is positioned as a general-purpose frontier model that also excels at software engineering tasks.

Published Apr 19, 2026Updated Apr 20, 2026
  • Qwen3.6-Plus scores 78.8% SWE-Bench Verified and a class-leading 61.6% on Terminal-Bench 2.0.
  • 1M-token context window with full multimodal support (text, image, and video).
  • API available via OpenAI-compatible endpoints in Beijing, Singapore, and US regions.
Quick note: This guide is based on public docs and release pages, but you should still verify current pricing, limits, supported tools, and region-specific billing on the official source before you pay, subscribe, or integrate.

Qwen3 family timeline

The Qwen3 family has expanded rapidly since the initial Qwen3 release on April 29, 2025. Understanding the timeline helps distinguish the dense-flagship models from the MoE (Mixture of Experts) efficiency variants and the coding-specialized releases.

Qwen3.6-Plus official snapshot infographic
A route-aware summary of the official Qwen 3.6 release, Model Studio pricing rows, and the published DashScope endpoint choices. Source: Official Qwen 3.6 release.
Official Alibaba Model Studio pricing page screenshot for Qwen3.6-Plus

Official screenshot

Qwen3.6-Plus already has route-specific public pricing on the Model Studio side

The Alibaba pricing page is the safest public surface for route-aware Qwen3.6-Plus billing because it distinguishes mainland and international rows directly on the official page.

  • Useful when articles need one source-backed image for regional pricing differences.
  • Pairs well with benchmark claims from the Qwen 3.6 release page.

Source: Alibaba Cloud Model Studio pricing.

Qwen3 family release timeline
ReleaseDateKey details
Qwen3 (initial)Apr 29, 2025First Qwen3 release; MoE and dense variants for open-weight models
Qwen3-MaxSep 5, 2025>1T parameter dense LLM; flagship general-purpose model
Qwen3-Coder-PlusSep 2025256K context; 92 programming languages; coding-specialized variant
Qwen3.5 seriesFeb 16, 2026Flash, 27B, 35B-A3B, 122B-A10B Plus; efficiency-focused MoE releases
Qwen3.6-PlusApr 20261M context; multimodal (text+image+video); agentic coding focus

Qwen3.6-Plus coding and agent benchmarks

Qwen3.6-Plus targets agentic coding workflows directly. Its Terminal-Bench 2.0 score of 61.6% is the highest in its class, and its SWE-Bench Verified result of 78.8% places it competitively against models like Claude Opus 4.5 and MiniMax M2.5. The MCPMark score of 48.2% also leads the field for tool-calling and MCP integration.

Qwen3.6-Plus coding and agent benchmarks
BenchmarkScoreNotes
SWE-Bench Verified78.8%Top-tier; competitive with Claude Opus 4.5
SWE-Bench Multilingual73.8%Strong multilingual coding capability
SWE-Bench Pro56.6%Harder professional-level tasks
Terminal-Bench 2.061.6%Best in class for terminal agent workflows
Claw-Eval Avg74.8Strong agentic evaluation score
MCPMark48.2%Best score for MCP tool-calling integration
LiveCodeBench v687.1%Competitive live coding performance
SWE-Bench Verified comparison

Qwen3.6-Plus vs leading coding models on the most widely cited software engineering benchmark.

Claude Opus 4.580.9

Official Anthropic benchmark.

MiniMax M2.580.2

Official MiniMax M2.5 release.

Qwen3.6-Plus78.8

Official Qwen 3.6 release.

GLM-577.8

Shown in Qwen's official comparison table.

Kimi K2.576.8

Official Kimi K2.5 technical blog.

Source: Official Qwen 3.6 release.

Terminal-Bench 2.0 comparison

Terminal-Bench measures multi-step terminal and agent workflow performance. Qwen3.6-Plus leads this benchmark class.

Claude Opus 4.665.4

Official Anthropic benchmark.

Qwen3.6-Plus61.6

Official Qwen 3.6 release.

MiniMax M2.757.0

Official MiniMax M2.7 release.

GLM-556.2

Shown in Qwen's official comparison table.

Source: Official Qwen 3.6 release.

General knowledge and reasoning benchmarks

Beyond coding, Qwen3.6-Plus scores 88.5% on MMLU-Pro and 90.4% on GPQA, confirming strong general knowledge and graduate-level reasoning. Its WMT24++ score of 84.3% also reflects top-tier multilingual translation capability.

Qwen3.6-Plus reasoning and language benchmarks
BenchmarkScore
MMLU-Pro88.5%
GPQA90.4%
WMT24++84.3%

Vision and multimodal benchmarks

Qwen3.6-Plus is fully multimodal, supporting text, image, and video inputs. Its vision benchmarks are competitive with or ahead of models like GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro across document understanding, mathematical visual reasoning, real-world QA, and video comprehension.

  • MMMU 86.0 and MathVision 88.0 reflect strong visual-reasoning and math-from-image performance.
  • OmniDocBench1.5 at 91.2 and CC-OCR at 83.4 show excellent document and OCR understanding.
  • VideoMME (with subtitles) at 87.8 confirms robust video comprehension for multimodal workflows.
Vision benchmarks: Qwen3.6-Plus vs leading multimodal models
BenchmarkQwen3.6-PlusGPT-5.2Claude Opus 4.5Gemini 3 Pro
MMMU86.0
MathVision88.0
RealWorldQA85.4
OmniDocBench1.591.2
CC-OCR83.4
VideoMME (w/ sub)87.8

Published pricing and endpoints

Qwen3.6-Plus is available through Alibaba Cloud's DashScope / Model Studio route, and the public pricing page already splits mainland and international rows rather than pretending there is one universal price. That route-aware pricing story is now part of the model guide, not just a billing appendix.

The API transport story is similarly clear. Alibaba publishes OpenAI-compatible endpoints for Beijing, Singapore, and the US, plus an Anthropic-compatible route on the international side.

  • OpenAI-compatible endpoints: `dashscope.aliyuncs.com`, `dashscope-intl.aliyuncs.com`, and `dashscope-us.aliyuncs.com`.
  • Anthropic-compatible route: `https://dashscope-intl.aliyuncs.com/apps/anthropic`.
  • Qwen3-Coder-Plus is still a separate coding-specific route with its own release history and positioning.
Official public pricing rows for `qwen3.6-plus`
RouteContext bandInputOutputNotes
Mainland China0-256K2 CNY / 1M input tokens12 CNY / 1M output tokensThe public page also shows a 90-day new-account validity note.
Mainland China256K-1M8 CNY / 1M input tokens48 CNY / 1M output tokensUse this row for the 1M-context route.
International (Singapore)0-256K3.7471 CNY / 1M input tokens22.4826 CNY / 1M output tokensPublished in the international pricing section.
International (Singapore)256K-1M14.9884 CNY / 1M input tokens44.965 CNY / 1M output tokensPublished separately from the mainland route.
BuyGLM shows package prices in USD. When a source page is published in CNY, the displayed value uses a fixed 1 USD = 8 CNY conversion and should still be checked against the live vendor page before payment.

Ready to try Qwen3.6-Plus?

Start with the DashScope API endpoint closest to your region, using the OpenAI SDK. Check the official release page for the latest pricing and model availability.

Sources and official links

Frequently asked questions

How does Qwen3.6-Plus differ from Qwen3-Coder-Plus?

Qwen3.6-Plus is a general-purpose multimodal model with 1M context that also excels at coding. Qwen3-Coder-Plus is a separate coding-specialized model released in September 2025 with 256K context and support for 92 programming languages. They serve different use cases: choose Qwen3.6-Plus for multimodal and long-context tasks, and Qwen3-Coder-Plus for focused coding workflows.

Is Qwen3.6-Plus available outside China?

Yes. Alibaba Cloud provides API endpoints in Singapore (dashscope-intl) and the US (dashscope-us), both OpenAI-compatible. International users should use these regional endpoints for lower latency.

What does Terminal-Bench 2.0 actually measure?

Terminal-Bench 2.0 evaluates multi-step terminal and agent workflows. It tests whether a model can complete complex sequences of shell commands, tool calls, and file manipulations autonomously. Qwen3.6-Plus leads this benchmark at 61.6%, ahead of MiniMax M2.7 (57.0%) and GLM-5 (56.2%).