Commit Graph

174 Commits

Author SHA1 Message Date
Zero Rains
e37e86b3b8 [V1 Loader]support param create and load for wint2 and xpu backend (#3581)
* support wint2 backend'

* [V1 Loader]support param create and load for wint2 and xpu backend

* update weight shape name

* update

* update

* update baseline.txt

* update model name

* update baseline.txt

* fix codestyle

* remove debug coode
2025-08-28 09:49:36 +08:00
Jiang-Jia-Jun
c694fa2879 Revert "[Feature] block sparse attention (#3209)" (#3647)
This reverts commit 646a0c2fd8.
2025-08-27 17:35:04 +08:00
lzy
1265f6c192 deepgemm don't support tp+ep (for ci) (#3638)
* deepgemm don't support tp+ep (for ci)

* deepgemm don't support tp+ep (for ci)
2025-08-27 16:39:19 +08:00
chen
5ad8721506 check (#3639) 2025-08-27 14:32:13 +08:00
chen
ce9c0917c5 [Precision] Support lm_head layer running in float32 (#3597)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support lm_head fp32 bf16 fp16

* support lm_head fp32 bf16 fp16

* add doc and check code

* lm_head_fp32 specify lm_head as fp32

* code check

* check doc
2025-08-27 11:34:53 +08:00
xiaoxiaohehe001
ad319a87cc support fa3 rope3d (#3622) 2025-08-27 11:31:29 +08:00
yangjianfengo1
646a0c2fd8 [Feature] block sparse attention (#3209)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn

* fix bug

* code style

* fix moba attn get kv shape

* 修复a100编译

* codestyle

* code style

* code style

* code style

* fix conflict

* 增加单侧

* code style

* 增加eblite 加载时间

* fix bug

* for ci

* for ci

* for ci

* for ci

* 支持mlp block size 128

* 增加小算子单测

* fix 单测 mlp

* 将环境变量加入到config里面

* fix rollout config
2025-08-26 07:16:04 -07:00
gaoziyuan
82e64b13e1 [NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598)
* [Feature] update ep

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix queue ports idx

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* Update engine.py

* fix ci

* fix some bug in mixed ep

* add server fix and op fix

* rm some log

* fix code style

* ltd fix

* fix

* fix

* fix some bug

* fix bug

* fix bug

* fix style

* Update config.py

* Update splitwise_connector.py

* Update cache_messager.py

* Update __init__.py

* merge and fix

* Update engine.py

* Update common_engine.py

* Update run_ci_xpu.sh

* Update ernie_processor.py

* Update ernie_processor.py

---------

Co-authored-by: ltd0924 <ltd0924@sina.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
2025-08-26 19:59:02 +08:00
Yuanle Liu
cbce94a00e rename ernie_xxx to ernie4_5_xxx (#3621)
* rename ernie_xxx to ernie4_5_xxx

* ci fix
2025-08-26 19:29:27 +08:00
YuanRisheng
642480f5f6 [CI] Standard unittest (#3606)
* standard unittest

* fix bugs

* fix script
2025-08-26 19:03:11 +08:00
bukejiyu
3200a80de3 [v1 loader]support fp8 (#3593)
* support fp8

* update ci
2025-08-26 02:42:46 -07:00
RAM
00898603c8 [CUDAGraph]Add debug func (#3616)
* add print dot files

* refine code
2025-08-26 16:43:48 +08:00
xiaoxiaohehe001
9afa236e39 [NewFeatures] support eplb (#3547)
* [NewFeatures] support eplb

* fix eplb
2025-08-26 16:19:30 +08:00
Yuanle Liu
56e2d7e668 adaptive rms_norm's dtype (#3617)
* adaptive rms_norm's dtype

* adaptive rms_norm's dtype

* add approve coverage

---------

Co-authored-by: liuyuanle <liuyuanle@baidu.com>
2025-08-26 15:29:15 +08:00
lzy
d339df2e90 Supports DP+TP+EP hybrid parallel deployment strategy (#3489)
* Support DP+TP+EP hybrid parallel deployment strategy

* Support DP+TP+EP hybrid parallel deployment strategy

* fix conflict

* add moe_tp_ep function split_allgather_out

* del tp_group in moe_cutlass_backend

* for ci

* fix parallel_config for ci

* del log
2025-08-26 00:04:01 -07:00
freeliuzc
52eda7fdb3 [Feature][MTP]support new speculative decoding method named hybrid mtp with ngram (#3610) 2025-08-26 14:29:22 +08:00
AIbin
0a0d2959b9 qkv_a_proj horizontal fusion (#3591)
Support DSK qkv_a_proj horizontal fusion under V0 Loder
2025-08-26 14:25:57 +08:00
lizexu123
c43a4bec00 [Features] support hugging face qwen3 dense and qwen2 model (#3574)
* support qwen2 and qwen3 hugging face

* fix moe

* defualt_v1 loader

* hugging_face_format deprecated

* modify hugging_face_foramt to model_format

* model_format auto

* fix environemt

* fix bug

* fix qwen3-0.6 bug

* model_format is str

* fix
2025-08-26 10:54:53 +08:00
RAM
2fa173e327 [Executor] CUDAGraph support RL training (#3265)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* add clear graph opt backend

* cuda graph support rl

* add branch

* 1.fix dynamic_weight_manager bug 2.add clear api for CasualLM

* open test case

* fix typo

* update mkdocs.yaml

* [Docs]Update mkdocs.yml

* update test case

* use unittest in graph test case
2025-08-25 20:59:30 +08:00
Kane2011
2ae7ab28d2 [MetaxGPU] adapt to the latest fastdeploy on metax gpu (#3492) 2025-08-25 17:44:20 +08:00
chen
9cab3f47ff [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552)
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing

* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs

* delete some code

* code check

* code check and add doc

* fix tokenizer.decoder(-1), return 'Invalid Token'

* add ci for temp_scaled and top_p logprobs

* check test

* check seq len time shape

* logprob clip inf

---------

Co-authored-by: sunlei1024 <sunlei5788@gmail.com>
2025-08-25 14:11:49 +08:00
Yuan Xiaolan
9205c88da1 support w4afp8 EP inference (#3044)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-25 11:27:45 +08:00
bukejiyu
bdbac0aa3d support qwen2 weight only (#3571)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-24 11:14:34 +08:00
bukejiyu
77514e3e1e [V1 Loader] support weight_only (#3413)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* support wint4/wint8

* delete smoe case

* update ci

* print log
2025-08-23 13:13:41 +08:00
Zero Rains
79f0dbbb55 [V1 Loader] Support qwen2(bf16) (#3502)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen2(bf16)

* merge bias_loader and weight_loader
2025-08-23 01:08:23 +08:00
YuanRisheng
85fbf5455a [V1 Loader]Ernie VL support loader v1 (#3494)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* ernie vl support new loader

* add unittest

* fix test
2025-08-22 11:16:57 +08:00
YuanRisheng
c389a4013c Unify server-side and model-side Config(Part-5) (#3497)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* move config

* fix xpu

* fix

* fix vl

* fix vl

* fix unitest

* fix args

* add unitest

* fix test
2025-08-21 19:00:21 +08:00
Zero Rains
30b3f2dc07 [BugFix][V1 Loader] fix the bug in creat weight for block_wise_fp8 (#3486)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-20 05:52:54 -07:00
Ryan
bcdfc1d6b9 Add custom op declaration for all_reduce (#3473)
* add custom op declaration

* roll back try except
2025-08-20 20:29:58 +08:00
kevin
67298cf4c0 add error traceback info (#3419)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info

* update error msg

* update code

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-19 19:32:04 +08:00
Zero Rains
fef447e350 [V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend (#3447)
* support deepgemm backend

* support marlin backend

* remove print

* fix process_prequanted_weights
2025-08-19 14:15:53 +08:00
Zero Rains
8b12c80f90 [FixBug] compute early stopping with real batch size (#3418)
* [FixBug] compute early stopping with real batch size

* update

* fix test_sampler
2025-08-18 22:09:21 -07:00
AIbin
beec24fd89 【Inference Optimize】DeepSeek-v3 model inference performance optimization (#3455)
* DSK_OPT_01

* update FA3
2025-08-19 10:42:42 +08:00
lizexu123
32b39620bc [Code Simplification] remove cum_offsets (#3410)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-18 20:21:25 +08:00
Jundong Liu
70ee910cd5 [Excutor] Change cudagraph hashkey from batch size to num_tokens (#3454) 2025-08-18 16:16:48 +08:00
Jundong Liu
ea4a3b479c [Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool (#3404)
* 修复buffer申请不够大,增加打印forwardmetadata的工具

* fix mistake

* Make CPU tensor in CPUPlace

* Add test about forward_meta_str and Add unitest_requirement

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-08-18 16:14:09 +08:00
chen
5585cf7aa5 fix mtp_rej_topp input (#3450) 2025-08-18 16:12:42 +08:00
chen
e88f5552db fix cpu __ini__.py (#3448)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-17 12:38:54 +08:00
chen
f0f00a6025 [OPs] Universal optimization and Fix early_stop cuda 700 (#3375)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* delete nonzero

* delete setup_ops_base.py

* check if

* check gcp infer_seed.cpu()

* fix repetition_early_stopper_kernel cuda 700
2025-08-14 22:40:44 +08:00
YuanRisheng
09c979f3dd [V1 Loader] Support Ernie text(moe and dense) (#3110)
* new loader support 0.3B

* fix weight

* support parallel load

* support parallel load

* fix slice

* support moe

* delete code

* perfect code

* perfect code
2025-08-14 20:25:28 +08:00
lzy
1e06b9fa6d make append_attn supports mask_offset (#3138)
* make append_attn supports mask_offset

* add unittest
2025-08-14 03:40:55 -07:00
lizexu123
7b596d0877 [BugFix] fix real_bsz in ep (#3366)
* Your commit message here

* fix ep

* delete cuda_graph
2025-08-14 17:31:19 +08:00
Jiang-Jia-Jun
666ab65a51 [Polish Code] Remove useless notes 2025-08-14 14:04:52 +08:00
Zero Rains
be94bdd0b0 [Loader V1] modify layername for DeepSeekV3 (#3336)
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
2025-08-13 15:47:06 +08:00
EnflameGCU
d1a92e3e17 [GCU] Enable gcu CI (#3190)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [GCU] Update to the latest version

* [GCU] Enable CI
2025-08-13 11:48:24 +08:00
yzwu
ce9180241e [Iluvatar GPU] Modify the names of some variables (#3273) 2025-08-13 11:38:02 +08:00
Kane2011
b4fef2cf29 [MetaxGPU] Support FastDeploy on metax gpu (#3241)
* [MetaxGPU] Support FastDeploy on metax gpu

* Update metax_worker.py

1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;

* Update __init__.py

1. remove metax's key work comment

* Update __init__.py

1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import

---------

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-08-13 11:11:54 +08:00
RichardWooSJTU
283da92bfa fix ep lm head (#3244)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-12 15:38:28 +08:00
Jiang-Jia-Jun
c56c99837a Revert "[BugFix] num_seqs (#3291)" (#3316)
This reverts commit e0aeac58e1.
2025-08-11 16:16:51 +08:00
Yuanle Liu
9571c458f0 enhance eos_tokens (#3274)
* enhance eos_tokens

* update

* update
2025-08-11 14:47:52 +08:00