headinfer 这个项目通过将 KV-cache 卸载到内存实现了 4M 长度的上下文推理

KTransformers推出之后，其它项目也开始使用offload到系统内存的方式来提升性能了，headinfer 这个项目通过将 KV-cache 卸载到内存实现了 4M 长度的上下文推理。

项目还处于早期，请谨慎使用：github.com/wdlctc/headinfer

You must log in or register to comment.