vLLM的最新v1架构中支持Flash Attention3

vLLM的最新v1架构中支持Flash Attention3了，整体性能优化后吞吐量提升到了1.7倍。显卡比较多的同学可以下载最新版本试试了

You must log in or register to comment.