Zhang Yulong
33ff0bfe38
Update disaggregated.md ( #3495 )
...
修复文档错误
2025-08-20 19:39:18 +08:00
YUNSHEN XIE
e197894977
add e2e cases ( #3476 )
...
* add e2e cases
* fix
2025-08-20 18:50:14 +08:00
Zhang Yulong
9ff2dfb162
Create eb45-8k-fp8-tp1-dp8_ep.yaml ( #3485 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
混合架构EP并行yaml
2025-08-20 14:33:54 +08:00
YuBaoku
33d369586b
[CI] remove useless case ( #3482 )
2025-08-20 14:20:30 +08:00
xiaolei373
5d131485d8
add error log to file ( #3431 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* feat(log):add_request_and_response_log
* feat[log]:add error log to file
2025-08-20 09:52:34 +08:00
YUNSHEN XIE
3a6058e445
Add stable ci ( #3460 )
...
* add stable ci
* fix
* update
* fix
* rename tests dir;fix stable ci bug
* add timeout limit
* update
2025-08-20 08:57:17 +08:00
kevin
67298cf4c0
add error traceback info ( #3419 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info
* update error msg
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-19 19:32:04 +08:00
yangjianfengo1
b047681c5d
【New Feature】支持Fp8 group Gemm 24稀疏 ( #3463 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* 支持24稀疏
* code style
* 增加stmatrix 宏定义判断
* code style
2025-08-19 02:54:47 -07:00
ltd0924
d587fb257f
[CI] add test generation demo ( #3270 )
...
* Create test_generation.py
* update
* update
* format
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update setup.py
* Delete test/plugins/test_model_runner_register.py
---------
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-19 17:12:40 +08:00
Zero Rains
fef447e350
[V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend ( #3447 )
...
* support deepgemm backend
* support marlin backend
* remove print
* fix process_prequanted_weights
2025-08-19 14:15:53 +08:00
chen
6735626014
fix request_output sampling_params ( #3154 ) ( #3464 )
2025-08-19 13:52:50 +08:00
ltd0924
bca8905b40
[BugFix] fix control signal release failed ( #3390 )
...
* [BugFix] fix control signal release failed
* [BugFix] fix control signal release failed
* update
* update
* update
2025-08-19 13:51:38 +08:00
Zero Rains
8b12c80f90
[FixBug] compute early stopping with real batch size ( #3418 )
...
* [FixBug] compute early stopping with real batch size
* update
* fix test_sampler
2025-08-18 22:09:21 -07:00
luukunn
3a7a20d191
[Feature] Pass through the chat_template_kwargs
to the data processing module ( #3421 )
...
* fix chat_template_args
* fix args
* add offline
* add offline
* fix
* fix
* fix default enable_thinking value
* fix default enable_thinking value
* modify condition
* Revert "modify condition"
This reverts commit 26430bdeb1
.
* fix unit test
2025-08-19 10:50:01 +08:00
lizexu123
a053ab889b
[BugFix] fix num_running_requests in cuda_graph ( #3457 )
...
* fix cuda_grpah
* add note
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-08-19 10:47:22 +08:00
AIbin
beec24fd89
【Inference Optimize】DeepSeek-v3 model inference performance optimization ( #3455 )
...
* DSK_OPT_01
* update FA3
2025-08-19 10:42:42 +08:00
zhuzixuan
c95b3395e9
【BugFix】completion接口echo回显支持 ( #3245 )
...
* wenxin-tools-511,修复v1/completion无法回显的问题。
* 支持多prompt的回显
* 支持多prompt情况下的流式回显
* 补充了 completion 接口支持 echo 的单元测试
* pre-commit
* 移除了多余的test文件
* 修复了completion接口echo支持的单测方法
* 补充了单元测试文件
* 补充单测
* unittest
* 补充单测
* 修复单测
* 删除不必要的assert.
* 重新提交
* 更新测试方法
* ut
* 验证是否是正确思路单测
* 验证是否是正确思路单测
* 验证是否是正确思路单测3
* 优化单测代码,有针对性地缩小单测范围。
* 优化单测代码2,有针对性地缩小单测范围。
* 优化单测代码3,有针对性地缩小单测范围。
* support 'echo' in chat/completion.
* update
* update
* update
* update
* update
* update
* 补充了关于tokenid的单元测试
* update
* 修正index错误
* 修正index错误
2025-08-19 10:41:51 +08:00
lizexu123
32b39620bc
[Code Simplification] remove cum_offsets ( #3410 )
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-18 20:21:25 +08:00
YUNSHEN XIE
2cf96ddd68
add publish workflow ( #3063 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add publish job
* update
* update
2025-08-18 16:42:36 +08:00
luukunn
9c129813f9
[Feature] add custom chat template ( #3251 )
...
* add custom chat_template
* add custom chat_template
* add unittest
* fix
* add docs
* fix comment
* add offline chat
* fix unit test
* fix unit test
* fix
* fix pre commit
* fix unit test
* add unit test
* add unit test
* add unit test
* fix pre_commit
* fix enable_thinking
* fix pre commit
* fix pre commit
* fix unit test
* add requirements
2025-08-18 16:34:08 +08:00
Jundong Liu
70ee910cd5
[Excutor] Change cudagraph hashkey from batch size to num_tokens ( #3454 )
2025-08-18 16:16:48 +08:00
Jundong Liu
ea4a3b479c
[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool ( #3404 )
...
* 修复buffer申请不够大,增加打印forwardmetadata的工具
* fix mistake
* Make CPU tensor in CPUPlace
* Add test about forward_meta_str and Add unitest_requirement
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-08-18 16:14:09 +08:00
chen
5585cf7aa5
fix mtp_rej_topp input ( #3450 )
2025-08-18 16:12:42 +08:00
Divano
246cd7b3a5
Perf ( #3453 )
...
* add repitation early stop cases
* add repitation early stop cases
* add stress tool
2025-08-18 15:37:46 +08:00
gaoziyuan
6fdd83da10
fix some bug ( #3434 )
2025-08-18 14:39:13 +08:00
freeliuzc
a12d0bc549
[Feature][MTP]update multi-draft-token strategy ( #3369 )
...
* update multi-draft-token strategy
* fix format
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-08-18 13:59:56 +08:00
Zhang Yulong
3ee6053e5d
Add ci case ( #3355 )
...
* add ci cases
* debug
debug H20 baseline
* Update run_pre_ce.sh
* Update test_EB_Lite_serving.py
* Update test_EB_VL_Lite_serving.py
* Update test_EB_Lite_serving_mtp.py
* Update test_Qwen3-MoE_serving.py
* Update test_Qwen2-7B-Instruct_serving.py
* Update run_pre_ce.sh
2025-08-18 11:35:56 +08:00
chen
e88f5552db
fix cpu __ini__.py ( #3448 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-17 12:38:54 +08:00
RAM
33c0197ebe
[Docs] Update mkdocs.yml ( #3444 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Updata docs of graph opt backend
* update best_practices
* update mkdocs.yaml
* [Docs]Update mkdocs.yml
2025-08-15 21:57:40 +08:00
RAM
154308102e
[Docs]Updata docs of graph opt backend ( #3442 )
...
* Updata docs of graph opt backend
* update best_practices
2025-08-15 21:30:32 +08:00
yongqiangma
5703d7aa0f
update installation readme ( #3429 )
2025-08-15 19:09:41 +08:00
yangjianfengo1
615930bc05
Update README ( #3426 )
...
* 修改READMe
* code style
* code style
2025-08-15 18:46:28 +08:00
JYChen
6f11171478
fix some docs error ( #3439 )
2025-08-15 18:45:27 +08:00
yinwei
354575b6d1
[Docs]Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95 ( #3428 )
...
* XPU Update 2.1 Release Documentation
* code style check
* Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95
2025-08-15 18:34:37 +08:00
YUNSHEN XIE
cc8ee50f27
add accuracy check ci ( #3389 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add accuracy ci
* fix
* fix
* update
* rename ci jobs
2025-08-15 15:17:43 +08:00
GoldPancake
4bd6a9fa7d
[Bugs] Fix DeepGEMM pre-compile tools. ( #3351 )
...
Fix some miss cache problems.
Add README.md.
2025-08-15 14:37:49 +08:00
ming1753
d4e3a20300
[Docs] Release 2.1 docs and fix some description ( #3424 )
2025-08-15 14:27:19 +08:00
yinwei
fbb6dcb9e4
[Docs]XPU Update 2.1 Release Documentation ( #3423 )
...
* XPU Update 2.1 Release Documentation
* code style check
2025-08-15 14:07:47 +08:00
JYChen
562e01c979
update docs ( #3420 )
2025-08-15 13:00:08 +08:00
Jiang-Jia-Jun
cca96ab1e4
Update Dockerfile.gpu
2025-08-15 12:29:20 +08:00
Jiang-Jia-Jun
7132fa9ec2
Update dockerfile
2025-08-15 12:28:08 +08:00
Sunny-bot1
6c1f3ff897
topk_gating_softmax support bias ( #3405 )
2025-08-15 11:57:45 +08:00
ltd0924
5a84324798
[Doc] Add multinode deployment documents ( #3417 )
...
* Create multi-node_deployment.md
* Create multi-node_deployment.md
* Update mkdocs.yml
2025-08-15 10:37:04 +08:00
chen
f0f00a6025
[OPs] Universal optimization and Fix early_stop cuda 700 ( #3375 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* delete nonzero
* delete setup_ops_base.py
* check if
* check gcp infer_seed.cpu()
* fix repetition_early_stopper_kernel cuda 700
2025-08-14 22:40:44 +08:00
YuanRisheng
09c979f3dd
[V1 Loader] Support Ernie text(moe and dense) ( #3110 )
...
* new loader support 0.3B
* fix weight
* support parallel load
* support parallel load
* fix slice
* support moe
* delete code
* perfect code
* perfect code
2025-08-14 20:25:28 +08:00
xjkmfa
ab60292f89
【CI】 evil case ( #3359 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* 边缘检测 ,攻击性测试
* 边缘检测 ,攻击性测试
* 边缘检测 ,攻击性测试
* 边缘检测 ,攻击性测试
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2025-08-14 20:00:47 +08:00
freeliuzc
cacc52bf21
modify readme ( #3409 )
2025-08-14 19:47:36 +08:00
Sunny-bot1
79d8ae4c38
[UT Fix] Fix bad_words test ( #3385 )
...
* fix bad_words test
* add streaming
* fix
* fix
2025-08-14 03:55:02 -07:00
lzy
1e06b9fa6d
make append_attn supports mask_offset ( #3138 )
...
* make append_attn supports mask_offset
* add unittest
2025-08-14 03:40:55 -07:00
memoryCoderC
6031f9a5f5
[BugFix] fix ErnieProcessor not set raw_prediction ( #3400 )
2025-08-14 18:07:49 +08:00