Commit Graph

3021 Commits

Author SHA1 Message Date
yinwei
354575b6d1 [Docs]Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95 (#3428)
* XPU Update 2.1 Release Documentation

* code style check

* Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95
2025-08-15 18:34:37 +08:00
YUNSHEN XIE
cc8ee50f27 add accuracy check ci (#3389)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add accuracy ci

* fix

* fix

* update

* rename ci jobs
2025-08-15 15:17:43 +08:00
GoldPancake
4bd6a9fa7d [Bugs] Fix DeepGEMM pre-compile tools. (#3351)
Fix some miss cache problems.
Add README.md.
2025-08-15 14:37:49 +08:00
ming1753
d4e3a20300 [Docs] Release 2.1 docs and fix some description (#3424) 2025-08-15 14:27:19 +08:00
yinwei
fbb6dcb9e4 [Docs]XPU Update 2.1 Release Documentation (#3423)
* XPU Update 2.1 Release Documentation

* code style check
2025-08-15 14:07:47 +08:00
JYChen
562e01c979 update docs (#3420) 2025-08-15 13:00:08 +08:00
Jiang-Jia-Jun
cca96ab1e4 Update Dockerfile.gpu 2025-08-15 12:29:20 +08:00
Jiang-Jia-Jun
7132fa9ec2 Update dockerfile 2025-08-15 12:28:08 +08:00
Sunny-bot1
6c1f3ff897 topk_gating_softmax support bias (#3405) 2025-08-15 11:57:45 +08:00
ltd0924
5a84324798 [Doc] Add multinode deployment documents (#3417)
* Create multi-node_deployment.md

* Create multi-node_deployment.md

* Update mkdocs.yml
2025-08-15 10:37:04 +08:00
chen
f0f00a6025 [OPs] Universal optimization and Fix early_stop cuda 700 (#3375)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* delete nonzero

* delete setup_ops_base.py

* check if

* check gcp infer_seed.cpu()

* fix repetition_early_stopper_kernel cuda 700
2025-08-14 22:40:44 +08:00
YuanRisheng
09c979f3dd [V1 Loader] Support Ernie text(moe and dense) (#3110)
* new loader support 0.3B

* fix weight

* support parallel load

* support parallel load

* fix slice

* support moe

* delete code

* perfect code

* perfect code
2025-08-14 20:25:28 +08:00
xjkmfa
ab60292f89 【CI】 evil case (#3359)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* 边缘检测 ,攻击性测试

* 边缘检测 ,攻击性测试

* 边缘检测 ,攻击性测试

* 边缘检测 ,攻击性测试

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2025-08-14 20:00:47 +08:00
freeliuzc
cacc52bf21 modify readme (#3409) 2025-08-14 19:47:36 +08:00
Sunny-bot1
79d8ae4c38 [UT Fix] Fix bad_words test (#3385)
* fix bad_words test

* add streaming

* fix

* fix
2025-08-14 03:55:02 -07:00
lzy
1e06b9fa6d make append_attn supports mask_offset (#3138)
* make append_attn supports mask_offset

* add unittest
2025-08-14 03:40:55 -07:00
memoryCoderC
6031f9a5f5 [BugFix] fix ErnieProcessor not set raw_prediction (#3400) 2025-08-14 18:07:49 +08:00
YUNSHEN XIE
f72db9386c Add requirements for running unit tests (#3350)
* Add requirements for running unit tests

* update
2025-08-14 17:37:18 +08:00
lizexu123
7b596d0877 [BugFix] fix real_bsz in ep (#3366)
* Your commit message here

* fix ep

* delete cuda_graph
2025-08-14 17:31:19 +08:00
gaoziyuan
0ea8712018 fix op tests (#3398) 2025-08-14 16:45:25 +08:00
Sunny-bot1
2e7831185f [Optimize]Add norm_weights feature for topk_gating_softmax (#3372)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-14 15:05:23 +08:00
Jiang-Jia-Jun
666ab65a51 [Polish Code] Remove useless notes 2025-08-14 14:04:52 +08:00
Jiang-Jia-Jun
dd583fb16a [BugFix] Fix default log level of paddleformers (#3376)
* [BugFix] Fix default log level of paddleformers

* [BugFix] Fix default log level of paddleformers

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-08-14 11:36:24 +08:00
xiaolei373
d4f610e4cd feat(log):add_request_and_response_log (#3373)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-13 23:27:41 +08:00
ming1753
396dba0d62 [Bug Fix] Fix V1 video bug (#3388)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-13 23:04:07 +08:00
YUNSHEN XIE
1ace375fc3 Optimize CI execution workflow (#3371)
* Optimize CI execution workflow

* fix
2025-08-13 18:47:31 +08:00
Zero Rains
be94bdd0b0 [Loader V1] modify layername for DeepSeekV3 (#3336)
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
2025-08-13 15:47:06 +08:00
memoryCoderC
f702a675a1 fix TestOpenAIServingCompletion fail (#3368) 2025-08-13 15:45:07 +08:00
EnflameGCU
d1a92e3e17 [GCU] Enable gcu CI (#3190)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [GCU] Update to the latest version

* [GCU] Enable CI
2025-08-13 11:48:24 +08:00
yzwu
ce9180241e [Iluvatar GPU] Modify the names of some variables (#3273) 2025-08-13 11:38:02 +08:00
Kane2011
b4fef2cf29 [MetaxGPU] Support FastDeploy on metax gpu (#3241)
* [MetaxGPU] Support FastDeploy on metax gpu

* Update metax_worker.py

1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;

* Update __init__.py

1. remove metax's key work comment

* Update __init__.py

1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import

---------

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-08-13 11:11:54 +08:00
Ryan
ed6bff215a fix custom op order rms_norm_eps (#3348) 2025-08-13 10:12:49 +08:00
Sunny-bot1
8224b21525 Refactor moe_topk_select op to use apply_norm_weight as a template parameter (#3345)
* Refactor moe_topk_select op to use apply_norm_weight as a template parameter

* update test
2025-08-13 08:44:16 +08:00
luukunn
eda83ca672 add Tool Parser (#3272)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add tool-parser

* add tool-parser

* add tool parser

* add tool parser

* fix

* add offline

* add offline

* fix

* parsers:tool&reasoning

* 修改tool parser名称·

* update

* fix reasoning-parser

* add requirements

* fix finish reason

* fix

* fix reasoning-parser

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>
2025-08-13 01:06:55 +08:00
memoryCoderC
2d1a4cacdf Completion add raw_prediction/text_after_process (#3356) 2025-08-12 23:06:45 +08:00
zhink
2c0d853067 add test for CustomAllreduce (#3313)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-12 20:44:47 +08:00
YUNSHEN XIE
8791ad4e61 Pre ce modified (#3335)
* update

* update

* fix

* fix

* update

* update

* update

* fix

* update
2025-08-12 20:25:03 +08:00
memoryCoderC
c575611a5b [BugFix] v1/completions add finish_reason (#3246)
* [BugFix] v1/completions add finish_reason

* update TestOpenAIServingCompletion for merge

---------

Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
2025-08-12 19:40:26 +08:00
Jiang-Jia-Jun
90bfa0be9c Update envs.py 2025-08-12 16:24:47 +08:00
Jiang-Jia-Jun
5620bd12de Update envs.py 2025-08-12 16:24:33 +08:00
YUNSHEN XIE
7d0d5a543a Use latest PaddlePaddle package (#3347)
* Use latest PaddlePaddle package

* fix
2025-08-12 16:23:41 +08:00
gaoziyuan
ccc7f1beb3 fix mapping (#3320) 2025-08-12 16:15:59 +08:00
RichardWooSJTU
283da92bfa fix ep lm head (#3244)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-12 15:38:28 +08:00
ming1753
f5164215be [Bug Fix] fix vl V1 schedule bug (#3323)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug Fix] fix vl V1 schedule bug

* fix format
2025-08-12 11:31:39 +08:00
yangjianfengo1
b808c49585 [Doc] 增加中英文切换 (#3318)
* 增加中英文切换

* 增加中英文切换

* 修改readme
2025-08-12 11:20:45 +08:00
chenjian
b21272d9ff [Bug fix] fix block num setting in scheduler v1 for develop (#3303)
* fix block num setting in scheduler v1

* fix block num setting in scheduler v1

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting
2025-08-12 10:38:51 +08:00
Jiang-Jia-Jun
183e3863e8 Remove useless code (#3337) 2025-08-12 10:32:31 +08:00
Sunny-bot1
19fda4e912 fix docs (#3332)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-11 21:03:49 +08:00
JYChen
973ddad91e fix unittest (#3328) 2025-08-11 20:58:24 +08:00
Divano
f27e879785 Update _base_test.yml (#3331) 2025-08-11 20:57:20 +08:00