How's GLM's performance?
I'll check the status of your GLM-5 document and retrieve the performance information once it's ready.
Great! The document is ready. Since this is a 40-page document, let me first check its structure to locate the performance sections efficiently.
Now let me extract the key performance sections. Based on the structure, I'll focus on the evaluation sections which contain benchmark results:
Perfect! I now have the key performance information. Let me provide a comprehensive summary of GLM-5's performance:
GLM-5 Performance Summary
GLM-5 demonstrates state-of-the-art performance among open-source models, significantly improving over its predecessor GLM-4.7 and narrowing the gap to proprietary models like Claude Opus 4.5.
Key Performance Highlights:
Reasoning & General Tasks:
- HLE (with Tools): 50.4% - outperforms Claude Opus 4.5 (43.4%) and Gemini 3 Pro (45.8%)
- LongBench v2: 64.5 - achieves highest score among all models except Gemini 3 Pro (68.2), demonstrating excellent long-context reasoning
- HMMT 2025: Outperforms Claude Opus 4.5 and Gemini 3 Pro on both Feb. (97.9) and Nov. (96.9) editions
Coding Benchmarks:
- SWE-bench Verified: 77.8% - SOTA among open-source models, beats Gemini 3 Pro (76.2%)
- SWE-bench Multilingual: 73.3% - surpasses both Gemini 3 Pro (65.0%) and GPT-5.2 (72.0%)
- Terminal-Bench 2.0: 56.2-61.1% - comparable to Claude Opus 4.5
Agentic Abilities:
- BrowseComp (with Context Management): 75.9% - SOTA across all frontier LLMs
- Vending-Bench 2: $4,432 - demonstrates strong long-horizon business task capabilities
- GDPval-AA Elo: 1,409 - outperforms Claude Opus 4.5 (1,400) in economic scenarios
GLM-5 achieves comparable performance to dual-GPU international clusters on a single Chinese node, while reducing deployment costs in long-sequence scenarios by 50%.