Zero Rains
e37e86b3b8
[V1 Loader]support param create and load for wint2 and xpu backend ( #3581 )
...
* support wint2 backend'
* [V1 Loader]support param create and load for wint2 and xpu backend
* update weight shape name
* update
* update
* update baseline.txt
* update model name
* update baseline.txt
* fix codestyle
* remove debug coode
2025-08-28 09:49:36 +08:00
Jiang-Jia-Jun
c694fa2879
Revert "[Feature] block sparse attention ( #3209 )" ( #3647 )
...
This reverts commit 646a0c2fd8
.
2025-08-27 17:35:04 +08:00
lzy
1265f6c192
deepgemm don't support tp+ep (for ci) ( #3638 )
...
* deepgemm don't support tp+ep (for ci)
* deepgemm don't support tp+ep (for ci)
2025-08-27 16:39:19 +08:00
chen
ce9c0917c5
[Precision] Support lm_head layer running in float32 ( #3597 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support lm_head fp32 bf16 fp16
* support lm_head fp32 bf16 fp16
* add doc and check code
* lm_head_fp32 specify lm_head as fp32
* code check
* check doc
2025-08-27 11:34:53 +08:00
xiaoxiaohehe001
ad319a87cc
support fa3 rope3d ( #3622 )
2025-08-27 11:31:29 +08:00
yangjianfengo1
646a0c2fd8
[Feature] block sparse attention ( #3209 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
2025-08-26 07:16:04 -07:00
gaoziyuan
82e64b13e1
[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop ( #3598 )
...
* [Feature] update ep
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix queue ports idx
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* Update engine.py
* fix ci
* fix some bug in mixed ep
* add server fix and op fix
* rm some log
* fix code style
* ltd fix
* fix
* fix
* fix some bug
* fix bug
* fix bug
* fix style
* Update config.py
* Update splitwise_connector.py
* Update cache_messager.py
* Update __init__.py
* merge and fix
* Update engine.py
* Update common_engine.py
* Update run_ci_xpu.sh
* Update ernie_processor.py
* Update ernie_processor.py
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-26 19:59:02 +08:00
bukejiyu
3200a80de3
[v1 loader]support fp8 ( #3593 )
...
* support fp8
* update ci
2025-08-26 02:42:46 -07:00
xiaoxiaohehe001
9afa236e39
[NewFeatures] support eplb ( #3547 )
...
* [NewFeatures] support eplb
* fix eplb
2025-08-26 16:19:30 +08:00
Yuanle Liu
56e2d7e668
adaptive rms_norm's dtype ( #3617 )
...
* adaptive rms_norm's dtype
* adaptive rms_norm's dtype
* add approve coverage
---------
Co-authored-by: liuyuanle <liuyuanle@baidu.com >
2025-08-26 15:29:15 +08:00
lzy
d339df2e90
Supports DP+TP+EP hybrid parallel deployment strategy ( #3489 )
...
* Support DP+TP+EP hybrid parallel deployment strategy
* Support DP+TP+EP hybrid parallel deployment strategy
* fix conflict
* add moe_tp_ep function split_allgather_out
* del tp_group in moe_cutlass_backend
* for ci
* fix parallel_config for ci
* del log
2025-08-26 00:04:01 -07:00
AIbin
0a0d2959b9
qkv_a_proj horizontal fusion ( #3591 )
...
Support DSK qkv_a_proj horizontal fusion under V0 Loder
2025-08-26 14:25:57 +08:00
lizexu123
c43a4bec00
[Features] support hugging face qwen3 dense and qwen2 model ( #3574 )
...
* support qwen2 and qwen3 hugging face
* fix moe
* defualt_v1 loader
* hugging_face_format deprecated
* modify hugging_face_foramt to model_format
* model_format auto
* fix environemt
* fix bug
* fix qwen3-0.6 bug
* model_format is str
* fix
2025-08-26 10:54:53 +08:00
Kane2011
2ae7ab28d2
[MetaxGPU] adapt to the latest fastdeploy on metax gpu ( #3492 )
2025-08-25 17:44:20 +08:00
chen
9cab3f47ff
[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing ( #3552 )
...
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-08-25 14:11:49 +08:00
Yuan Xiaolan
9205c88da1
support w4afp8 EP inference ( #3044 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-25 11:27:45 +08:00
bukejiyu
bdbac0aa3d
support qwen2 weight only ( #3571 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-24 11:14:34 +08:00
bukejiyu
77514e3e1e
[V1 Loader] support weight_only ( #3413 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* support wint4/wint8
* delete smoe case
* update ci
* print log
2025-08-23 13:13:41 +08:00
Zero Rains
79f0dbbb55
[V1 Loader] Support qwen2(bf16) ( #3502 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen2(bf16)
* merge bias_loader and weight_loader
2025-08-23 01:08:23 +08:00
YuanRisheng
85fbf5455a
[V1 Loader]Ernie VL support loader v1 ( #3494 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* ernie vl support new loader
* add unittest
* fix test
2025-08-22 11:16:57 +08:00
Zero Rains
30b3f2dc07
[BugFix][V1 Loader] fix the bug in creat weight for block_wise_fp8 ( #3486 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-20 05:52:54 -07:00
Ryan
bcdfc1d6b9
Add custom op declaration for all_reduce
( #3473 )
...
* add custom op declaration
* roll back try except
2025-08-20 20:29:58 +08:00
Zero Rains
fef447e350
[V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend ( #3447 )
...
* support deepgemm backend
* support marlin backend
* remove print
* fix process_prequanted_weights
2025-08-19 14:15:53 +08:00
Zero Rains
8b12c80f90
[FixBug] compute early stopping with real batch size ( #3418 )
...
* [FixBug] compute early stopping with real batch size
* update
* fix test_sampler
2025-08-18 22:09:21 -07:00
AIbin
beec24fd89
【Inference Optimize】DeepSeek-v3 model inference performance optimization ( #3455 )
...
* DSK_OPT_01
* update FA3
2025-08-19 10:42:42 +08:00
chen
5585cf7aa5
fix mtp_rej_topp input ( #3450 )
2025-08-18 16:12:42 +08:00
chen
f0f00a6025
[OPs] Universal optimization and Fix early_stop cuda 700 ( #3375 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* delete nonzero
* delete setup_ops_base.py
* check if
* check gcp infer_seed.cpu()
* fix repetition_early_stopper_kernel cuda 700
2025-08-14 22:40:44 +08:00
YuanRisheng
09c979f3dd
[V1 Loader] Support Ernie text(moe and dense) ( #3110 )
...
* new loader support 0.3B
* fix weight
* support parallel load
* support parallel load
* fix slice
* support moe
* delete code
* perfect code
* perfect code
2025-08-14 20:25:28 +08:00
lzy
1e06b9fa6d
make append_attn supports mask_offset ( #3138 )
...
* make append_attn supports mask_offset
* add unittest
2025-08-14 03:40:55 -07:00
Jiang-Jia-Jun
666ab65a51
[Polish Code] Remove useless notes
2025-08-14 14:04:52 +08:00
EnflameGCU
d1a92e3e17
[GCU] Enable gcu CI ( #3190 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [GCU] Update to the latest version
* [GCU] Enable CI
2025-08-13 11:48:24 +08:00
Kane2011
b4fef2cf29
[MetaxGPU] Support FastDeploy on metax gpu ( #3241 )
...
* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-08-13 11:11:54 +08:00
RichardWooSJTU
283da92bfa
fix ep lm head ( #3244 )
...
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-12 15:38:28 +08:00
Zero Rains
42af0b4b64
[V1 Loader] Support DeepSeekV3(bf16) ( #3294 )
...
* Support new loader for DeepSeekV3(bf16)
* update paddle version
* remove useless attr
2025-08-11 13:39:28 +08:00
gaoziyuan
a799d14df1
[Bugfix] Fix model accuracy in some ops ( #3231 )
...
* fix noaux_tc op
* fix
* update
* fix qk norm
* fix linear for prequant loader
* test
* fix
* fix
* rm some print
* fix noaux_tc op
* test
* Fix the confused enable_early_stop when only set early_stop_config (#3214 )
* fix the confused early_stop_config when only set early_stop_config
* pre-commit
* write a general method
* Add ci case for min token and max token (#3229 )
Co-authored-by: xujing43 <xujing43@baidu.com >
* add some evil cases (#3240 )
* add repitation early stop cases
* add repitation early stop cases
* add bad cases
* add bad cases
* add evil cases
* qwen3_moe (#3084 )
* [Feature] support seed parameter (#3161 )
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
* 【Fix Bug】 修复 fa3 支持集中式bug (#3235 )
* fix fa3 集中式bug
* 增加qknorm参数
* fix qk norm
* fix
* update
* fix linear for prequant loader
* fix
* fix
* rm some print
* fix
* fix moe init weight&scale
* fix moe init weight&scale
---------
Co-authored-by: bukejiyu <395822456@qq.com >
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
Co-authored-by: Zero Rains <linjunlu@zerorains.top >
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com >
Co-authored-by: xujing43 <xujing43@baidu.com >
Co-authored-by: Divano <dddivano@outlook.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com >
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com >
Co-authored-by: qingqing01 <dangqingqing@baidu.com >
2025-08-08 17:30:37 +08:00
Zero Rains
ce1f353c70
Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend ( #3148 )
...
* w4a8 bug
* fix w4a8 bug
* remove code
* modify the triton backend
* fix ep
* fix the bug with tensor_wise_fp8 in triton backend
* fix the RL
* fix bug by merge
* fix the bug in w4a8
* fix the tensor_wise_fp8 bug
* fix RL
2025-08-08 15:55:47 +08:00
freeliuzc
71267840f7
【Fix】fix mtp bug ( #3139 )
2025-08-08 13:30:12 +08:00
bukejiyu
b76b17fc1b
qwen3 0.3B fix ( #3255 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-08 11:35:40 +08:00
yzwu
fbdd6b0663
[Iluvatar GPU] Optimze attention and moe performance ( #3234 )
2025-08-08 10:51:24 +08:00
bukejiyu
37569cca86
[feat]add fast_weights_iterator ( #3258 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add fast_weights_iterator
* update
* update
2025-08-07 22:36:46 +08:00
bukejiyu
9408e667a5
[bugfix]fix blockwisefp8 and all_reduce ( #3243 )
...
* fix
* update
* fix linear for prequant loader
2025-08-06 23:54:33 +08:00
yangjianfengo1
3a15e0c53e
【Fix Bug】 修复 fa3 支持集中式bug ( #3235 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix fa3 集中式bug
* 增加qknorm参数
2025-08-06 16:24:27 +08:00
lizexu123
afff4d37ea
[Feature] support seed parameter ( #3161 )
...
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
2025-08-06 15:20:47 +08:00
bukejiyu
20839abccf
qwen3_moe ( #3084 )
2025-08-06 14:45:27 +08:00
Yuan Xiaolan
7ce00e597c
support qk norm ( #3145 )
2025-08-05 16:46:14 +08:00
Yuan Xiaolan
af543b7f0f
revise get_moe_scores ( #3164 )
2025-08-05 16:43:07 +08:00
RichardWooSJTU
1e9a8e8cef
fix lm head bias ( #3185 )
...
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-05 15:40:24 +08:00
RichardWooSJTU
f5c64a074c
[EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization ( #3182 )
...
* Add support for mixed-ep across multi nodes
* code refine
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-05 15:40:11 +08:00
lizhenyun01
fe540f6caa
[plugin] Custom model_runner/model support ( #3186 )
...
* support custom model&&model_runner
* fix merge
* add test && update doc
* fix codestyle
* fix unittest
* load model in rl
2025-08-04 18:52:39 -07:00
Yuan Xiaolan
1f8289e106
fix expertwise_scale ( #3181 )
2025-08-04 20:06:15 +08:00