Hands-on Test: Zhipu GLM 5.1 vs Kimi K2.6-code-preview • Lucky Snail

Stop scrambling for Zhipu — I tested Kimi’s latest K2.6-code-preview and it crushes GLM 5.1

Because Claude Code keeps getting stricter, I couldn’t pay for Claude anymore, so I ordered the Max plan from the domestic top-tier model Zhipu. After actually using it, the results were still not great.

So I bought Kimi’s Max plan instead. They just released the K2.6-code-preview model. Below I’ll show two examples that clearly and comprehensively demonstrate the gap between the two models. In coding, Kimi is definitely ahead of Zhipu. While not many people know about this yet, I suggest you go subscribe quickly — I suspect after people find out, it’ll sell out. Also I noticed Kimi is hiring AI infra engineers recently, probably preparing for the upcoming traffic surge.

My test setup:

On a Mac, using Claude Code as the harness agent, TypeScript as the programming language.
Full evaluation through two tasks: template project scaffolding and blog project development, using the same prompts and environment.
Using Claude model as the code and system architecture quality judge, while I’m in charge of actual testing and giving conclusions.

Alright, I bet you’re curious now. Let’s dive in. (Oh, and if you want to know how to integrate GLM, Kimi with Claude Code, see the end of the article.)

1) fastify template project scaffolding

Prompt:

Now search the web, and based on the web create a lightweight, high-quality fastify typescript starter template for me to use as a backend service base starter. Also I need you to use git to make one commit per step, and finally publish the project using gh after completion.

This prompt tests the model’s ability to gather and organize web information, understand requirements, break down tasks, etc. Let’s see how Kimi and GLM performed.

Kimi:

After starting, the API service works normally.

GLM:

After starting, it doesn’t work — had to fix a bug before it worked.

Finally I had Claude Code analyze the code, and the result: Kimi wins.

The fastify-demo here is Kimi’s version. Claude Code gave clear reasoning why Kimi’s implementation is better.

Long task test

Prompt:

Now I want to refactor my blog. Its URL is: https://luckysnail.cn/ , the corresponding GitHub repo is: https://github.com/coderPerseus/blog . I want to rebuild it using Astro, based on https://github.com/chrismwilliams/astro-theme-cactus . Requirements:
1. Improve the UI and page design based on astro-theme-cactus's current clean style — make it look better, with Chinese elements, but keep it simple.
2. Use a suitable light purple as the theme color.
3. Sync the current blog data. The data is currently stored in GitHub issues, and I want to keep using this repo's issues as the data source.
4. It should be AI-friendly: the blog supports AI automatic translation to English, and an AI short summary at the beginning.
5. Support English and Chinese, light and dark themes (with transition animations when switching).

This is a large, complex task requiring many steps. It tests the AI model’s ability in:

Long task execution
Front-end aesthetics
Backend data and AI integration
Working with existing data and resources — also the most common scenario in daily work

After about half an hour, both models finished their work. Let’s see the results.

Kimi:

Zhipu GLM:

It’s very obvious — Zhipu’s implementation is simply not good, and it even had errors initially, which for someone doing vibe coding is completely unacceptable.

Out of curiosity, I also compared MiniMax and Codex with the same prompt.

MiniMax:

Codex:

I didn’t have Claude model do the development. Two reasons:

I need Claude to be the judge — if Claude were one of the participants, it might be unfair.
Claude is way too expensive — if I let it run, my 5-hour quota would probably be used up before it finishes.

Now let’s look at Claude model’s summary of the code and results from these four models.

Here’s my prompt, and below is the conclusion:

The conclusion says Codex is best, followed by Kimi. Although Codex’s implementation had bilingual support, it wasn’t really fully functional. If we ignore the code, I think Kimi is the best, because Kimi’s front-end aesthetic is on point. Codex is indeed much weaker in terms of front-end aesthetics.

Real project development experience

After testing, I used Kimi’s K2.6-code-preview model for two days on real projects. I’m even more convinced that it’s truly powerful. Here’s a record of a problem that Codex failed to solve twice, but Kimi fixed in one go:

Problem: After confirming a click in a scrollable area, auto-scroll to top fix.

I had tried with Codex 5.4 twice without success, but Kimi fixed it in one try. Codex is the model I consider most powerful for debugging, but it lost to Kimi here. In the end, I had Codex learn from Kimi’s problem-solving approach — a domestic AI rising star.

Summary

Through testing and deep usage, I believe Kimi’s K2.6-code-preview is definitely a dark horse that hasn’t been discovered yet. If you’re still considering which big model to subscribe to, I recommend getting Kimi’s starter plan — 49 yuan per month, and under normal usage you probably won’t use up the quota.

MiniMax, Kimi, GLM integration with Claude Code

# MiniMax
export ANTHROPIC_AUTH_TOKEN=sk-xxx
export ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
export ANTHROPIC_DEFAULT_OPUS_MODE=MiniMax-M2.7
export ANTHROPIC_SMALL_FAST_MODEL=MiniMax-M2.7
export ANTHROPIC_DEFAULT_SONNET_MODEL=MiniMax-M2.7
export ANTHROPIC_DEFAULT_HAIKU_MODEL=MiniMax-M2.7
export API_TIMEOUT_MS=3000000
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
# kimi
export ANTHROPIC_BASE_URL=https://api.kimi.com/coding/
export ANTHROPIC_API_KEY=sk-xxx
export ANTHROPIC_DEFAULT_OPUS_MODE=K2.6-code-preview
export ANTHROPIC_DEFAULT_SONNET_MODEL=K2.6-code-preview
export ANTHROPIC_DEFAULT_HAIKU_MODEL=K2.6-code-preview
export API_TIMEOUT_MS=3000000
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
# zai glm
export ANTHROPIC_AUTH_TOKEN=xxx
export ANTHROPIC_BASE_URL=https://open.bigmodel.cn/api/anthropic
export ANTHROPIC_DEFAULT_OPUS_MODE=glm-5.1
export ANTHROPIC_DEFAULT_SONNET_MODEL=glm-5.1
export ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-5.1
export API_TIMEOUT_MS=3000000
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

Since I also have an official Claude subscription provided by my company, I usually just paste the corresponding service into the terminal, use it temporarily in that window, and manage it conveniently with a clipboard tool.

You might see a warning in the terminal like below, but it’s fine — just use it as normal.

Thanks for reading! Hope this article helps you choose your AI service.

(This article is 100% handcrafted, no AI involved.)