GLM-4.5 Language Model Overview Introduction GLM-4.5 and GLM-4.5-Air are state-of-the-art foundational language models optimized for agent-oriented applications. Both utilize a Mixture-of-Experts (MoE) architecture. GLM-4.5: 355 billion parameters with 32 billion active per forward pass. GLM-4.5-Air: 106 billion parameters with 12 billion active, optimized for cost-efficiency. They are pretrained on 15 trillion tokens and fine-tuned on code, reasoning, and agent-specific tasks. The context length is extended to 128k tokens, with reinforcement learning applied to enhance reasoning, coding, and agent capabilities. --- Key Features and Capabilities Input/Output and Context Input: Text Output: Text Context Length: 128,000 tokens Maximum Output Tokens: 96,000 tokens Functionality Highlights Deep Thinking: Supports advanced reasoning and analysis. Streaming Output: Real-time response streaming. Function Call: Powerful tool invocation for external tool integration. Context Caching: Intelligent caching for long conversations. Structured Output: Supports structured formats like JSON for better integration. --- GLM-4.5 Series Models Models in the Series GLM-4.5: High-parameter, powerful reasoning model. GLM-4.5-Air: Cost-effective, lightweight with strong performance. GLM-4.5-X: High performance with strong reasoning and ultra-fast response. GLM-4.5-AirX: Lightweight, strong performance, ultra-fast. GLM-4.5-Flash: Free tier, strong performance focusing on reasoning, coding, and agents. --- Performance and Efficiency Parameter Efficiency Compared to competitive models like DeepSeek-R1 and Kimi-K2, GLM-4.5 achieves better benchmark performance with fewer parameters, showcasing architectural efficiency. GLM-4.5-Air excels in reasoning benchmarks, surpassing models like Gemini 2.5 Flash, Qwen3-235B, Claude 4 Opus. Positioned on the Pareto frontier for performance-to-parameter ratio. Cost and Speed API costs: ~ $0.2 per million input tokens, $1.1 per million output tokens. Generation speed: Over 100 tokens/second on high-speed version. Supports low-latency and high concurrency deployments. --- Real-World Evaluation Integrated into Claude Code and benchmarked against Claude 4 Sonnet, Kimi-K2, Qwen3-Coder. Evaluated on 52 programming & development tasks spanning six domains. Shows strong advantage in tool invocation reliability and task completion rate. Comparable user experience to leading commercial models. Full test problems and agent interaction trajectories publicly available for validation. --- Usage Highlights Thinking Mode Controlled by parameter thinking.type with options: enabled (default): dynamic deep thinking mode. disabled: skips deep reasoning for faster, simpler answers. Task Types Simple: Fact retrieval, translations without complex reasoning. Moderate: Requires some reasoning or stepwise logic. Difficult: Complex math, coding, or strategic reasoning involving many internal steps. Web Development Use Case Core capability: Intelligent code generation and completion with bug fixing. Supports major languages like Python, JavaScript, and Java. Generates structured, scalable code per natural language. Use cases include full product prototype creation in under 5 minutes. --- Resources API Documentation for calling the model. Open datasets for benchmarking and agent task validation available on Hugging Face. --- Quick Start Example (cURL)