Commit Graph

95 Commits

Author SHA1 Message Date
Jiang-Jia-Jun
666ab65a51 [Polish Code] Remove useless notes 2025-08-14 14:04:52 +08:00
EnflameGCU
d1a92e3e17 [GCU] Enable gcu CI (#3190)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [GCU] Update to the latest version

* [GCU] Enable CI
2025-08-13 11:48:24 +08:00
Kane2011
b4fef2cf29 [MetaxGPU] Support FastDeploy on metax gpu (#3241)
* [MetaxGPU] Support FastDeploy on metax gpu

* Update metax_worker.py

1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;

* Update __init__.py

1. remove metax's key work comment

* Update __init__.py

1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import

---------

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-08-13 11:11:54 +08:00
RichardWooSJTU
283da92bfa fix ep lm head (#3244)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-12 15:38:28 +08:00
Zero Rains
42af0b4b64 [V1 Loader] Support DeepSeekV3(bf16) (#3294)
* Support new loader for DeepSeekV3(bf16)

* update paddle version

* remove useless attr
2025-08-11 13:39:28 +08:00
gaoziyuan
a799d14df1 [Bugfix] Fix model accuracy in some ops (#3231)
* fix noaux_tc op

* fix

* update

* fix qk norm

* fix linear for prequant loader

* test

* fix

* fix

* rm some print

* fix noaux_tc op

* test

* Fix the confused enable_early_stop when only set early_stop_config (#3214)

* fix the confused early_stop_config when only set early_stop_config

* pre-commit

* write a general method

* Add ci case for min token and max token (#3229)

Co-authored-by: xujing43 <xujing43@baidu.com>

* add some evil cases (#3240)

* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases

* qwen3_moe (#3084)

* [Feature] support seed parameter (#3161)

* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix

* 【Fix Bug】 修复 fa3 支持集中式bug (#3235)

* fix fa3 集中式bug

* 增加qknorm参数

* fix qk norm

* fix

* update

* fix linear for prequant loader

* fix

* fix

* rm some print

* fix

* fix moe init weight&scale

* fix moe init weight&scale

---------

Co-authored-by: bukejiyu <395822456@qq.com>
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: Zero Rains <linjunlu@zerorains.top>
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com>
Co-authored-by: xujing43 <xujing43@baidu.com>
Co-authored-by: Divano <dddivano@outlook.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com>
Co-authored-by: qingqing01 <dangqingqing@baidu.com>
2025-08-08 17:30:37 +08:00
Zero Rains
ce1f353c70 Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend (#3148)
* w4a8 bug

* fix w4a8 bug

* remove code

* modify the triton backend

* fix ep

* fix the bug with tensor_wise_fp8 in triton backend

* fix the RL

* fix bug by merge

* fix the bug in w4a8

* fix the tensor_wise_fp8 bug

* fix RL
2025-08-08 15:55:47 +08:00
freeliuzc
71267840f7 【Fix】fix mtp bug (#3139) 2025-08-08 13:30:12 +08:00
bukejiyu
b76b17fc1b qwen3 0.3B fix (#3255)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-08 11:35:40 +08:00
yzwu
fbdd6b0663 [Iluvatar GPU] Optimze attention and moe performance (#3234) 2025-08-08 10:51:24 +08:00
bukejiyu
37569cca86 [feat]add fast_weights_iterator (#3258)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add fast_weights_iterator

* update

* update
2025-08-07 22:36:46 +08:00
bukejiyu
9408e667a5 [bugfix]fix blockwisefp8 and all_reduce (#3243)
* fix

* update

* fix linear for prequant loader
2025-08-06 23:54:33 +08:00
yangjianfengo1
3a15e0c53e 【Fix Bug】 修复 fa3 支持集中式bug (#3235)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix fa3 集中式bug

* 增加qknorm参数
2025-08-06 16:24:27 +08:00
lizexu123
afff4d37ea [Feature] support seed parameter (#3161)
* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix
2025-08-06 15:20:47 +08:00
bukejiyu
20839abccf qwen3_moe (#3084) 2025-08-06 14:45:27 +08:00
Yuan Xiaolan
7ce00e597c support qk norm (#3145) 2025-08-05 16:46:14 +08:00
Yuan Xiaolan
af543b7f0f revise get_moe_scores (#3164) 2025-08-05 16:43:07 +08:00
RichardWooSJTU
1e9a8e8cef fix lm head bias (#3185)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:24 +08:00
RichardWooSJTU
f5c64a074c [EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization (#3182)
* Add support for mixed-ep across multi nodes

* code refine

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:11 +08:00
lizhenyun01
fe540f6caa [plugin] Custom model_runner/model support (#3186)
* support custom model&&model_runner

* fix merge

* add test && update doc

* fix codestyle

* fix unittest

* load model in rl
2025-08-04 18:52:39 -07:00
Yuan Xiaolan
1f8289e106 fix expertwise_scale (#3181) 2025-08-04 20:06:15 +08:00
yangjianfengo1
64d7a3194d 集中式支持fa3 (#3112) 2025-08-01 18:03:36 +08:00
Yuan Xiaolan
5f56d289a7 fix is_permuted (#3098)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 19:58:05 +08:00
RAM
d850660872 [Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989)
* reset decoder_block_shape_q buffer

* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch

* update decode_max_tile_size

* fix pre-commit

* update block_multihead_attn_backend

* update flas attn backend

* update MLA Attention

* update XPU Attention

* update gcu,iluvatar model runner

* Update MTP

* fix MTP bug
2025-07-31 00:09:31 +08:00
bukejiyu
db698bda01 qwen loader (#3057) 2025-07-30 19:09:38 +08:00
bukejiyu
8e203666d9 w4a8 offline (#3074)
* w4a8 offline

* update

* update

* update
2025-07-30 16:33:30 +08:00
Yuan Xiaolan
35935da9e5 support W4A8 EPLB (#3075) 2025-07-30 14:34:12 +08:00
Zero Rains
b2f9a42d87 [Feature] Support repetition early stop (#3024)
* support repetition early stop and support user to set the parameter

* remove log

* fix codestyle

* add the early_stop_config to rollout_config

* update config and EarlyStopper class

* fix the bug for triton

* modify the stop method

* update description

* modify the usage for stop_flags

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-07-29 22:42:54 +08:00
Yuan Xiaolan
3214fb5393 support model loading for w4a8 offline quant (#3064)
支持W4A8 EP 对离线量化权重的load
2025-07-29 21:54:37 +08:00
Longzhi Wang
be0a0f2bb2 fix arguement error in ep when pd (#3060) 2025-07-29 17:17:24 +08:00
YuanRisheng
502ee92a0a Unify server-side and model-side Config (Part3) (#3047)
* merge model config

* fix arch

* fix rl
2025-07-29 17:07:44 +08:00
Longzhi Wang
907d561523 fix ep when paddle version mismatch (#3056) 2025-07-29 15:06:49 +08:00
Yuan Xiaolan
b1d787a272 [fix] w4a8 model loading and hadamard config (#3013) 2025-07-28 18:17:59 +08:00
AIbin
ec52d39e68 【Inference Optimize】Update wint2 weight n-dim reorder (#3042) 2025-07-28 16:31:56 +08:00
Longzhi Wang
247010d298 fix arguement error (#3030) 2025-07-28 11:03:29 +08:00
YuanRisheng
6ccc10ad47 Unify server-side and model-side Config (Part1) (#3018)
* move cache config

* fix mtp
2025-07-28 10:51:52 +08:00
Longzhi Wang
0700c90caa [Feat] support mixed ep (#2969)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support mixed ep

* fix comment

* fix comment

* update mixep

* fix conflict

* fix typo

* update

* fix typo

* fix code style

* fix conflict
2025-07-25 15:29:30 +08:00
chen
332154f504 [feature] Support FA2 (#3009) 2025-07-25 14:09:00 +08:00
xiaoxiaohehe001
2970b00dfa [Feature] Support_eplb (#2997)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb

* [Feature] support_eplb

* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
EnflameGCU
c40df1802e [GCU] Update to develop (#2988) 2025-07-24 19:30:52 +08:00
Zero Rains
0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
lizhenyun01
29c3292f02 support c4 attn && fix cache 2025-07-24 12:00:52 +08:00
lizexu123
832d25334a [Code Simplification] fix init_distributed_environment() (#2982) 2025-07-24 11:43:28 +08:00
bukejiyu
bfeb664ab8 update (#2978)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chen
172e69fe17 FA3 fix bug (#2987) 2025-07-23 19:07:43 +08:00
chen
ad202272ed 【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942) 2025-07-23 13:02:50 +08:00
lizhenyun01
e51f018577 support chunk_prefill in fa3 2025-07-23 12:19:20 +08:00
K11OntheBoat
93bb68aa71 [Feature] Marlin MoE backend supports DeepseekV3 (#2962)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 18:11:15 +08:00
Nyakku Shigure
48e6a0ca26 [SOT] Mark dynamic dims by type annotations (#2771)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations

* fix conflict of forward_meta

* mark more attn backend

* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS

* auto infer implicit 0 dim dynamic dim

* revert manual marked dims

* revert missing update

* auto infer can use unsafe code in warmup stage

* check -> type_match

* fix codestyle

* restore blank line

* empty commit

* add need_warmup nonlocal;

* add doc for resolver

* add missing type hints

* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
lifulll
2c6a9e887e native top_p_sampling (#2901) 2025-07-22 14:09:59 +08:00