GLM-5.2-SM120 showcases a 469 billion parameter model optimized for inference on 4x RTX PRO 6000 Blackwell GPUs. This open-source project provides a one-command vLLM recipe for serving, featuring a 250K context window and DeepSeek Sparse Attention with Mixture-of-Experts speculative decoding for efficient performance.
Opening Kapyn…