Commit Graph

12 Commits

Author SHA1 Message Date
yangjianfengo1
3754a9906d [Feature] block sparse attention (#3668)
* 支持稀疏attn

* fix bug

* code style

* fix moba attn get kv shape

* 修复a100编译

* codestyle

* code style

* code style

* code style

* fix conflict

* 增加单侧

* code style

* 增加eblite 加载时间

* fix bug

* for ci

* for ci

* for ci

* for ci

* 支持mlp block size 128

* 增加小算子单测

* fix 单测 mlp

* 将环境变量加入到config里面

* fix rollout config

* 修复显存

* add test server

* add test server

* fix mlp  最后一层使用full attn
2025-08-29 19:46:30 +08:00
Jiang-Jia-Jun
c694fa2879 Revert "[Feature] block sparse attention (#3209)" (#3647)
This reverts commit 646a0c2fd8.
2025-08-27 17:35:04 +08:00
yangjianfengo1
646a0c2fd8 [Feature] block sparse attention (#3209)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn

* fix bug

* code style

* fix moba attn get kv shape

* 修复a100编译

* codestyle

* code style

* code style

* code style

* fix conflict

* 增加单侧

* code style

* 增加eblite 加载时间

* fix bug

* for ci

* for ci

* for ci

* for ci

* 支持mlp block size 128

* 增加小算子单测

* fix 单测 mlp

* 将环境变量加入到config里面

* fix rollout config
2025-08-26 07:16:04 -07:00
Kane2011
b4fef2cf29 [MetaxGPU] Support FastDeploy on metax gpu (#3241)
* [MetaxGPU] Support FastDeploy on metax gpu

* Update metax_worker.py

1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;

* Update __init__.py

1. remove metax's key work comment

* Update __init__.py

1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import

---------

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-08-13 11:11:54 +08:00
lifulll
2c6a9e887e native top_p_sampling (#2901) 2025-07-22 14:09:59 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
lifulll
1f28bdf994 dcu adapter ernie45t (#2756)
Co-authored-by: lifu <lifu@sugon.com>
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-09 18:56:27 +08:00
EnflameGCU
d0f4d6ba3a [GCU] Support gcu platform (#2702)
baseline: e7fa57ebae

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-08 13:00:52 +08:00
liddk1121
1b54a2831e Adapt for iluvatar gpu (#2684) 2025-07-07 16:53:14 +08:00
Yuanle Liu
240bdac2a4 [feat] support fa3 backend for pd disaggregated (#2695)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* delete use_fast_ffn
2025-07-03 22:33:27 +08:00
Jiang-Jia-Jun
05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00