Commit Graph

58 Commits

Author SHA1 Message Date
Zero Rains
e37e86b3b8 [V1 Loader]support param create and load for wint2 and xpu backend (#3581)
* support wint2 backend'

* [V1 Loader]support param create and load for wint2 and xpu backend

* update weight shape name

* update

* update

* update baseline.txt

* update model name

* update baseline.txt

* fix codestyle

* remove debug coode
2025-08-28 09:49:36 +08:00
lzy
1265f6c192 deepgemm don't support tp+ep (for ci) (#3638)
* deepgemm don't support tp+ep (for ci)

* deepgemm don't support tp+ep (for ci)
2025-08-27 16:39:19 +08:00
gaoziyuan
82e64b13e1 [NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598)
* [Feature] update ep

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix queue ports idx

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* Update engine.py

* fix ci

* fix some bug in mixed ep

* add server fix and op fix

* rm some log

* fix code style

* ltd fix

* fix

* fix

* fix some bug

* fix bug

* fix bug

* fix style

* Update config.py

* Update splitwise_connector.py

* Update cache_messager.py

* Update __init__.py

* merge and fix

* Update engine.py

* Update common_engine.py

* Update run_ci_xpu.sh

* Update ernie_processor.py

* Update ernie_processor.py

---------

Co-authored-by: ltd0924 <ltd0924@sina.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
2025-08-26 19:59:02 +08:00
bukejiyu
3200a80de3 [v1 loader]support fp8 (#3593)
* support fp8

* update ci
2025-08-26 02:42:46 -07:00
xiaoxiaohehe001
9afa236e39 [NewFeatures] support eplb (#3547)
* [NewFeatures] support eplb

* fix eplb
2025-08-26 16:19:30 +08:00
lzy
d339df2e90 Supports DP+TP+EP hybrid parallel deployment strategy (#3489)
* Support DP+TP+EP hybrid parallel deployment strategy

* Support DP+TP+EP hybrid parallel deployment strategy

* fix conflict

* add moe_tp_ep function split_allgather_out

* del tp_group in moe_cutlass_backend

* for ci

* fix parallel_config for ci

* del log
2025-08-26 00:04:01 -07:00
lizexu123
c43a4bec00 [Features] support hugging face qwen3 dense and qwen2 model (#3574)
* support qwen2 and qwen3 hugging face

* fix moe

* defualt_v1 loader

* hugging_face_format deprecated

* modify hugging_face_foramt to model_format

* model_format auto

* fix environemt

* fix bug

* fix qwen3-0.6 bug

* model_format is str

* fix
2025-08-26 10:54:53 +08:00
Yuan Xiaolan
9205c88da1 support w4afp8 EP inference (#3044)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-25 11:27:45 +08:00
bukejiyu
77514e3e1e [V1 Loader] support weight_only (#3413)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* support wint4/wint8

* delete smoe case

* update ci

* print log
2025-08-23 13:13:41 +08:00
YuanRisheng
85fbf5455a [V1 Loader]Ernie VL support loader v1 (#3494)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* ernie vl support new loader

* add unittest

* fix test
2025-08-22 11:16:57 +08:00
Zero Rains
30b3f2dc07 [BugFix][V1 Loader] fix the bug in creat weight for block_wise_fp8 (#3486)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-20 05:52:54 -07:00
Zero Rains
fef447e350 [V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend (#3447)
* support deepgemm backend

* support marlin backend

* remove print

* fix process_prequanted_weights
2025-08-19 14:15:53 +08:00
YuanRisheng
09c979f3dd [V1 Loader] Support Ernie text(moe and dense) (#3110)
* new loader support 0.3B

* fix weight

* support parallel load

* support parallel load

* fix slice

* support moe

* delete code

* perfect code

* perfect code
2025-08-14 20:25:28 +08:00
Kane2011
b4fef2cf29 [MetaxGPU] Support FastDeploy on metax gpu (#3241)
* [MetaxGPU] Support FastDeploy on metax gpu

* Update metax_worker.py

1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;

* Update __init__.py

1. remove metax's key work comment

* Update __init__.py

1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import

---------

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-08-13 11:11:54 +08:00
Zero Rains
42af0b4b64 [V1 Loader] Support DeepSeekV3(bf16) (#3294)
* Support new loader for DeepSeekV3(bf16)

* update paddle version

* remove useless attr
2025-08-11 13:39:28 +08:00
gaoziyuan
a799d14df1 [Bugfix] Fix model accuracy in some ops (#3231)
* fix noaux_tc op

* fix

* update

* fix qk norm

* fix linear for prequant loader

* test

* fix

* fix

* rm some print

* fix noaux_tc op

* test

* Fix the confused enable_early_stop when only set early_stop_config (#3214)

* fix the confused early_stop_config when only set early_stop_config

* pre-commit

* write a general method

* Add ci case for min token and max token (#3229)

Co-authored-by: xujing43 <xujing43@baidu.com>

* add some evil cases (#3240)

* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases

* qwen3_moe (#3084)

* [Feature] support seed parameter (#3161)

* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix

* 【Fix Bug】 修复 fa3 支持集中式bug (#3235)

* fix fa3 集中式bug

* 增加qknorm参数

* fix qk norm

* fix

* update

* fix linear for prequant loader

* fix

* fix

* rm some print

* fix

* fix moe init weight&scale

* fix moe init weight&scale

---------

Co-authored-by: bukejiyu <395822456@qq.com>
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: Zero Rains <linjunlu@zerorains.top>
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com>
Co-authored-by: xujing43 <xujing43@baidu.com>
Co-authored-by: Divano <dddivano@outlook.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com>
Co-authored-by: qingqing01 <dangqingqing@baidu.com>
2025-08-08 17:30:37 +08:00
Zero Rains
ce1f353c70 Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend (#3148)
* w4a8 bug

* fix w4a8 bug

* remove code

* modify the triton backend

* fix ep

* fix the bug with tensor_wise_fp8 in triton backend

* fix the RL

* fix bug by merge

* fix the bug in w4a8

* fix the tensor_wise_fp8 bug

* fix RL
2025-08-08 15:55:47 +08:00
bukejiyu
20839abccf qwen3_moe (#3084) 2025-08-06 14:45:27 +08:00
Yuan Xiaolan
af543b7f0f revise get_moe_scores (#3164) 2025-08-05 16:43:07 +08:00
RichardWooSJTU
f5c64a074c [EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization (#3182)
* Add support for mixed-ep across multi nodes

* code refine

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:11 +08:00
Yuan Xiaolan
1f8289e106 fix expertwise_scale (#3181) 2025-08-04 20:06:15 +08:00
Yuan Xiaolan
5f56d289a7 fix is_permuted (#3098)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 19:58:05 +08:00
Yuan Xiaolan
35935da9e5 support W4A8 EPLB (#3075) 2025-07-30 14:34:12 +08:00
Yuan Xiaolan
3214fb5393 support model loading for w4a8 offline quant (#3064)
支持W4A8 EP 对离线量化权重的load
2025-07-29 21:54:37 +08:00
Longzhi Wang
be0a0f2bb2 fix arguement error in ep when pd (#3060) 2025-07-29 17:17:24 +08:00
YuanRisheng
502ee92a0a Unify server-side and model-side Config (Part3) (#3047)
* merge model config

* fix arch

* fix rl
2025-07-29 17:07:44 +08:00
Longzhi Wang
907d561523 fix ep when paddle version mismatch (#3056) 2025-07-29 15:06:49 +08:00
Yuan Xiaolan
b1d787a272 [fix] w4a8 model loading and hadamard config (#3013) 2025-07-28 18:17:59 +08:00
AIbin
ec52d39e68 【Inference Optimize】Update wint2 weight n-dim reorder (#3042) 2025-07-28 16:31:56 +08:00
Longzhi Wang
247010d298 fix arguement error (#3030) 2025-07-28 11:03:29 +08:00
Longzhi Wang
0700c90caa [Feat] support mixed ep (#2969)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support mixed ep

* fix comment

* fix comment

* update mixep

* fix conflict

* fix typo

* update

* fix typo

* fix code style

* fix conflict
2025-07-25 15:29:30 +08:00
xiaoxiaohehe001
2970b00dfa [Feature] Support_eplb (#2997)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb

* [Feature] support_eplb

* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
Zero Rains
0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
bukejiyu
bfeb664ab8 update (#2978)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chen
ad202272ed 【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942) 2025-07-23 13:02:50 +08:00
K11OntheBoat
93bb68aa71 [Feature] Marlin MoE backend supports DeepseekV3 (#2962)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 18:11:15 +08:00
lifulll
2c6a9e887e native top_p_sampling (#2901) 2025-07-22 14:09:59 +08:00
zhink
0262ef7eb3 custom all reduce support cuda graph (#2938)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag

* rename communication_op to communication
2025-07-21 22:52:03 +08:00
周周周
ff4569f135 remove some code in ep.py (#2947) 2025-07-21 22:44:57 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
Yuanle Liu
dda4a9f848 rl update (#2861) 2025-07-16 00:33:10 -07:00
freeliuzc
2d1184aefe [Fix] fix expert_parallel bug in decoder stage (#2848) 2025-07-16 11:08:18 +08:00
Yuanle Liu
61b3997b85 refactor rl get_name_mappings_to_training (#2847)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training

* fix tp>1

* change variable name(ffn1->up_gate_proj/ffn2->down_proj)

* change variable name(linear_weight->weight/linear_bias->bias)

* add rl names mapping for vl

* fix ernie 0.3B error

* fix develop code

* fix
2025-07-15 07:31:42 -07:00
AIbin
fd91da7b41 【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 (#2842)
* update supported_models doc
2025-07-15 14:35:40 +08:00
YuanRisheng
4c7b8bc458 Simplify the Config code (#2770)
* simplify the code

* fix vl

* delete config

* fix

* perfect code

* fix ci

* fix xpu

* fix xpu

* fix server

* resolve conflict

* fix mtp

* resolve conflict

* fix xpu

* fix xpu

* fix vl

* fix log

* fix qwen moe

* fix qwen moe

* fix qwen moe
2025-07-14 19:50:05 +08:00
chen
888780ffde [Feature] block_wise_fp8 support triton_moe_backend (#2767) 2025-07-09 19:22:47 +08:00
lifulll
1f28bdf994 dcu adapter ernie45t (#2756)
Co-authored-by: lifu <lifu@sugon.com>
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-09 18:56:27 +08:00
yulangz
be21ef5047 [XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B (#2765)
* fix no quant xpu moe

* change dir of xpu moe weight only
2025-07-09 15:57:51 +08:00
RichardWooSJTU
fee544e808 fix ep prefill (#2762) 2025-07-09 14:03:05 +08:00
EnflameGCU
d0f4d6ba3a [GCU] Support gcu platform (#2702)
baseline: e7fa57ebae

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-08 13:00:52 +08:00