YuBaoku
819b2dbbae
Revert "【New Feature】W4afp8 supports per group quantization ( #4272 )" ( #4854 )
...
This reverts commit 93fcf7e4ec .
2025-11-06 17:48:28 +08:00
zhupengyang
b54eb7ad81
[XPU] ep+tp all2all ( #4836 )
2025-11-06 17:26:14 +08:00
Echo-Nie
c18b177f21
fix the get_act_fn,_load_st_projector ( #4824 )
2025-11-06 16:13:35 +08:00
Echo-Nie
e4f1267186
bug: fix list to List ( #4818 )
2025-11-06 16:13:12 +08:00
Juncai
08ca0f6aea
[Feature] [PD] add simple router and refine splitwise deployment ( #4709 )
...
* add simple router and refine splitwise deployment
* fix
2025-11-06 14:56:02 +08:00
Ayakouji
831266da7a
[Fix] fix ernie4_5_vl model torch format loadding ( #4447 )
...
* fix
* add test
* fix test
* fix test
* update
2025-11-06 14:34:21 +08:00
kxz2002
5bdd40da5d
[BugFix] Fix ernie_vl_reasoning_parsers.py 'end_token' to 'think_end_token' ( #4805 )
...
* fix ernie_vl_reasoning_parsers.py 'end_token' to 'think_end_token'
* add unit tests
2025-11-06 11:28:55 +08:00
周周周
69fa741763
remove seq_lens_this_time ( #4821 )
2025-11-06 11:06:28 +08:00
K11OntheBoat
62dfad4a5f
[PD Disaggregation] Support Qwen3-MoE use PD + EP inference. ( #4691 )
...
support Qwen-MoE PD/EP
2025-11-06 10:32:15 +08:00
yangjianfengo1
93fcf7e4ec
【New Feature】W4afp8 supports per group quantization ( #4272 )
...
* w4afp8 支持per group
* code style
* 精度完成
* revert append attn utils
* ffn1 动态量化
* ffn2 支持动态量化
* code style
* code style
* 修改单测
* 修改单测
* fix bug
* Implement conditional parameter creation for layers
Add parameter creation for up_gate_proj_in_scale when ep_size > 1.
* code style
* fix conflict
* code style
* code style
* 修复w4aint8 精度
* fix ci
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-11-05 21:00:23 +08:00
李泳桦
fcd2f05dff
[BugFix] fix messages being inplace modified in offline chat api ( #4831 )
2025-11-05 20:46:33 +08:00
ApplEOFDiscord
131d76dd64
[Bug Fix] process transparent image ( #4807 )
...
* process transparent image
* english comments
* process transparency at downloading
* fix
* remove useless codes
2025-11-05 17:15:24 +08:00
yinwei
ea1dd0e735
[XPU]Support V1 loader in weight_only Model ( #4808 )
...
* support v1 loader in wint8
* code style
* update
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1498356.gajl.baidu.com >
2025-11-05 17:09:11 +08:00
chenjian
cc8f5312f5
[Feature] Add timestamp for profiler ( #4726 )
...
* [Feature] Add timestamp for profiler
* fix bug for offine inference
* fix for ci
* fix
* fix ci
2025-11-05 12:04:59 +08:00
周周周
876e4a8935
remove input_ids from ForwardMeta ( #4793 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-05 11:55:51 +08:00
kxz2002
9676cc87d6
fix parser register name ( #4795 )
...
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com >
2025-11-05 11:27:30 +08:00
zhupengyang
2fd254e5b7
support ep+tp at op layer ( #4688 )
2025-11-05 11:15:57 +08:00
周周周
937eb3c6ed
[get_padding_offset.] clean get_padding_offset.cu ( #4777 )
...
[get_padding_offset.] clean get_padding_offset.cu (#4777 )
2025-11-05 10:47:40 +08:00
chen
1c3ca48128
[Feature][Executor] GPU Model Runner Supports prompt_logprobs and max_logprobs ( #4769 )
2025-11-05 10:43:25 +08:00
xiaozude
74722308f2
[Metax] adapt cutlass moe and fix mla attention ( #4602 )
...
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-11-05 10:03:49 +08:00
Haonan Luo
2c281e617c
Update Unit Test for PaddleOCR-VL ( #4802 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix paddleocr prefix cache bug
* add test for paddleocr_vl
* disable prefix-caching in ocr
* add test for paddleocr_vl
* Fix top_p for rejection sampling
* add test for ocr processor; fix top_p for rejection sampling
* add test for ocr processor; fix top_p for rejection sampling
* add test for ocr processor; fix top_p for rejection sampling
* add test for ocr processor; fix top_p for rejection sampling
* add test for ocr processor; fix top_p for rejection sampling
---------
Co-authored-by: ming1753 <ideaminghp@163.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
2025-11-04 22:40:15 +08:00
李泳桦
1b61d62ecf
[fix] fix v0 pd, let worker step_shm_value create=False ( #4780 )
2025-11-04 20:37:57 +08:00
lzy
3e9dda39ab
supports pd partn ( #4615 )
...
* supports pd partn
* fix codestype
2025-11-04 16:36:35 +08:00
lzy
af7e0f27f3
supports internode_ll_two_stage ( #4162 )
...
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports D internode_ll_two_stage
* fix codestype
* fix xpu internode_ll_two_stage
* fix xpu internode_ll_two_stage
2025-11-04 16:35:40 +08:00
kxz2002
8a40374bfe
[BugFix] Fix ernie4_5_vl_processor.py and qwen_vl_processor.py can not disable thinking ( #4762 )
...
* fix ernie4_5_vl_processor.py and qwen_vl_processor.py
* add unit test
2025-11-04 16:00:32 +08:00
ddchenhao66
bffa08b74b
[XPU] fix thinking bug where output only contains reasoning_content ( #4761 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-11-04 14:32:35 +08:00
freeliuzc
855a2a609a
fix attn_params ( #4787 )
2025-11-04 13:01:38 +08:00
kevin
5233825562
test scheduler ( #4739 )
2025-11-03 20:12:14 +08:00
Yuan Xiaolan
8690cf8569
fix Cfp8 for RL load ( #4144 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-03 17:51:51 +08:00
Neil Zhu
c95d0740ec
[Metax] adapt cutlass moe for ernie-vl ( #4685 )
2025-11-03 17:44:27 +08:00
chenjian
25498efcf3
[Optimize] Support and robust for tpN for PD ( #4595 )
...
* [Optimize] Support and robust for tpN for PD
* fix
* fix
* support dpM tpN for cache messager
* fix
* fix token counter
* fix bug for merge develop
* fix bug
* robust cache messager for v0
2025-11-03 15:38:31 +08:00
luukunn
7b35488779
【DataProcessor】add options thinking_mode ( #4735 )
...
* add thinking_mode
* add thinking_mode
* add thinking_mode
* add thinking_mode
* add thinking_mode
* add thinking_mode
* add unit test
2025-11-03 14:30:07 +08:00
yinwei
377f3bf5f2
[XPU] add v1 support for bf16 ( #4744 )
...
* support v1 loader
* update code style
* update code
2025-11-03 14:13:17 +08:00
chenjian
f83d0cf127
[Feature] Support eplb for fd ( #4599 )
...
* support eplb
* support eplb
---------
Co-authored-by: kevin <chengyf112@gmail.com >
2025-11-03 14:08:15 +08:00
freeliuzc
11398790d3
[Speculative Decoding][MTP]Support attn mask offset ( #4641 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [MTP]Merge support attn (#4591 )
* support mask_offset in speculate decoding
* fix dummpy run output
* add unit test
* fix unit test import
* support attn_mask_offset in mtp mode
* add update_attn_mask op
* fix unit test && fix code-style
2025-11-03 10:08:01 +08:00
freeliuzc
f44f4bafd1
support mtp in splitewise and scheduler_v1 mode ( #4743 )
2025-11-03 10:07:15 +08:00
lizexu123
4ac6de9a3c
[Feature] support pooling model runner ( #4590 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen3-embedding
* support qwen3-embedding-0.6b
* fix
* fix bug
* fix test_return_token_ids.py and update enable_thinking
* fix mtp dummy_run
* merge develop
* fix np.float32
* delete FD_DISABLE_CHUNKED_PREFILL and FD_USE_GET_SAVE_OUTPUT_V1
* delete and build_stream_transfer_data
* fix test_update_v1:
* fix
* fix
* update dummy_run post_process
* delete test_update_v1
* fix
* fix dummy_run
* fix model_path
* fix model_path
* fix dummy_run
2025-10-31 22:32:05 +08:00
Yuanle Liu
b301bd6c31
[BugFix] fix thinking bug ( #4710 )
...
* fix thinking bug
* fix ut
* update
* fix
2025-10-31 22:00:31 +08:00
ddchenhao66
3cbca75cc8
[XPU] xpu support neox style ROPE ( #4719 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-31 18:14:25 +08:00
Jundong Liu
88a94c821b
[FDConfig] [PD Disaggregation] [Graph Optimization] Close Cudagraph for P node when PD Disaggregation ( #4632 )
...
* Close cudagraph for P node when PD Disaggregation
* fix problem
2025-10-31 16:44:25 +08:00
AIbin
316f784016
fix wint2 config ( #4721 )
2025-10-31 15:44:14 +08:00
kevin
c801d31c9c
add checker ( #4711 )
2025-10-31 15:26:35 +08:00
kevin
096d87d335
fix bug ( #4679 )
2025-10-31 14:59:18 +08:00
李泳桦
0f75b62de2
[BugFix] Fix profile run in pd-disaggregated deployment ( #4584 )
...
* [fix] fix pd+dp+ep bug
* [fix] fix again
* [ci] fix code style
2025-10-31 14:42:00 +08:00
kevin
64e875b460
[Scheduler] update v1 prefill batch ( #4611 )
...
* update v1 prefill batch
* update code
* update code
2025-10-31 14:03:01 +08:00
Sunny-bot1
9b18f0b55d
cache scale load ( #4624 )
2025-10-31 11:58:33 +08:00
GoldPancake
1f3ce65b58
[Feature] support mtp distribution equivalence verification ( #4699 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-31 11:45:04 +08:00
Ryan
28de91b50f
[Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B ( #4645 )
...
* 45TVL support sot+CUDAGraph
* mv unitest from ce_deploy 2 e2e
* add test_EB_VL_Lite_sot_serving
* rm useless line
* add openai_client
* fix unitest && reduce computing resources
2025-10-31 11:38:43 +08:00
kxz2002
a2870ed4a9
[Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” ( #4668 )
...
* parser register name unify
* change ernie_x1 to ernie-x1
* change ernie4_5_vl to ernie-45-vl
* fix unit test
2025-10-31 10:45:27 +08:00
kxz2002
82bd7e5db4
[BugFix] Fix finish reason in _create_chat_completion_choice ( #4582 )
...
* fix n_param _create_chat_completion_choicel
* fix unit test
* fix final_res
* modify unit tests
2025-10-31 10:42:19 +08:00