速报! Grok-3-Reasoning 果然很强!

速报! Grok-3-Reasoning 果然很强!

速报！Grok-3-Reasoning 果然很强！评测结果如图，这个是 Grok-3 官方发布会上的绘制火星登陆动画的那个 prompt 的横评结果。稍后我将放出评测视频，敬请期待！

（另外默认的那个Grok-3我怀疑是Grok-3-mini … 表现略一般。顺带这个测试DeepSeek-R1 炸了，完全不行。估计是这个场景相关的训练特别少。不过不用担心，随着咱们测试集的丰富，最终的平均水平更能反应一个模型的综合实力）

对了，prompt 是这个，感兴趣的同学可以自己试试：Generate code for an animated 3d plot of a launch from earth landing on mars and then back to earth at the next launch window

评测数据地址：github.com/KCORES/kcores-llm-arena/tree/main/benchmark-mars-mission

2

You must log in or register to comment.

戈多被我绑了
1 year ago
这么短时间召集团队就能做到第一，说明ai技术壁垒不是那么强嘛