Commit Graph

2927 Commits

Author SHA1 Message Date
Divano
91dc87f1c5 add some evil cases (#3240)
* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases
2025-08-06 14:23:55 +08:00
xjkmfa
256a82b0b3 Add ci case for min token and max token (#3229)
Co-authored-by: xujing43 <xujing43@baidu.com>
2025-08-06 14:10:57 +08:00
Zero Rains
36dc73470d Fix the confused enable_early_stop when only set early_stop_config (#3214)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix the confused early_stop_config when only set early_stop_config

* pre-commit

* write a general method
2025-08-06 11:42:27 +08:00
YuanRisheng
a6e8b780f8 fix approve (#3224) 2025-08-06 10:36:01 +08:00
yangjianfengo1
89397516a8 [New Feature] Support W4Afp8 MoE GroupGemm (#3171)
* init

* 增加多线程编译

* fix bug

* fix bug

* code style

* 增加fp16

* 将print替换成assert

* 修复stmatrix

* 减小单测shape

* 减小单测shape
2025-08-06 10:34:05 +08:00
sg263
841e831575 [Trace]add trace when fd start (#3174)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

* fix opentelemetry can not work in uvicorn

* move conf to env

* fd start add trace

* fix pre-commit

* fix pre-commit

* change FD_JOB_ID

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: shige <shige@baidu.com>
2025-08-05 21:18:27 +08:00
YUNSHEN XIE
e0bbd3b6ca fix approve ci (#3212)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-05 17:21:26 +08:00
Yuan Xiaolan
7ce00e597c support qk norm (#3145) 2025-08-05 16:46:14 +08:00
RAM
4a10e29804 fix mla attention backend (#3176) 2025-08-05 16:43:15 +08:00
Yuan Xiaolan
af543b7f0f revise get_moe_scores (#3164) 2025-08-05 16:43:07 +08:00
Divano
e24929efa3 Ce add bad cases (#3215)
* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases
2025-08-05 16:37:28 +08:00
lizexu123
b01cfd6007 [BugFix] support real batch_size (#3109)
* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
2025-08-05 16:33:54 +08:00
Jiang-Jia-Jun
55939f7942 Update engine.py 2025-08-05 16:10:36 +08:00
chen
04fc7eb931 fix test_air_top_p_sampling name (#3211) 2025-08-05 15:47:50 +08:00
Divano
9f1936ae28 Ce add repitation early stop cases (#3213)
* add repitation early stop cases

* add repitation early stop cases
2025-08-05 15:47:28 +08:00
RichardWooSJTU
1e9a8e8cef fix lm head bias (#3185)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:24 +08:00
RichardWooSJTU
f5c64a074c [EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization (#3182)
* Add support for mixed-ep across multi nodes

* code refine

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:11 +08:00
ming1753
14ed75f7d3 [Test] scaled_gemm_f8_i4_f16 skip test while sm != 89 (#3210) 2025-08-05 15:25:28 +08:00
yangjianfengo1
40f7f3e0d8 [New Feature] fa3 支持flash mask (#3184)
* 支持flash mask

* 修改test_flash_mask

* 修改test.sh
2025-08-05 12:20:48 +08:00
YUNSHEN XIE
b8f3c73aac fix coverage report (#3198)
* fix coverage report

* fix
2025-08-05 11:24:55 +08:00
Divano
fb7a0689cc add more cases (#3207) 2025-08-05 11:17:36 +08:00
RAM
c593e1a39c [Bug Fix]Fix bug of append attention test case (#3202)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-05 11:04:45 +08:00
RichardWooSJTU
e39159f3bd Add switch to apply fine-grained per token quant fp8 (#3192)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-04 19:54:03 -07:00
Divano
88596c0c63 Add more base chat cases (#3203)
* add test base class

* fix codestyle

* fix codestyle

* add base chat
2025-08-05 10:24:12 +08:00
lizhenyun01
fe540f6caa [plugin] Custom model_runner/model support (#3186)
* support custom model&&model_runner

* fix merge

* add test && update doc

* fix codestyle

* fix unittest

* load model in rl
2025-08-04 18:52:39 -07:00
Sunny-bot1
72ef5a9c93 [FIX]fix bad_words when sending requests consecutively (#3197)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix bad_words

* fix log

* fix log
2025-08-04 05:59:41 -07:00
Yuan Xiaolan
1f8289e106 fix expertwise_scale (#3181) 2025-08-04 20:06:15 +08:00
YuBaoku
3eb9a5df60 [CI] add test_compare_top_logprobs (#3191) 2025-08-04 19:49:24 +08:00
SunLei
68bc1d12c0 [Bugfix] Fix uninitialized decoded_token and add corresponding unit test. (#3195) 2025-08-04 19:23:58 +08:00
Longzhi Wang
01d7586661 [Bug fix] Fix cudagraph when use ep. (#3130)
* fix cudagraph when use ep

* fix typo

* reduce full length to adapt large bsz such 128/256
2025-08-04 18:06:18 +08:00
周周周
2bd8a50649 remove useless code (#3166) 2025-08-04 18:03:08 +08:00
gaoziyuan
0443587a57 【Feature】support qwen3 name_mapping (#3179)
* add fd plugins && rm model_classed

* fix reviews

* add docs

* fix

* fix unitest ci

* support qwen3 name_mapping
2025-08-04 01:34:07 -07:00
Zero Rains
17f51f0c92 [unitest] fix the bug in test_sampler (#3157) 2025-08-04 01:23:25 -07:00
YuanRisheng
79bbacc152 Fix approve shell scripts (#3108)
* fix approve

* fix
2025-08-04 15:51:33 +08:00
Divano
3bfb2eca92 Update test_base_chat.py (#3183) 2025-08-04 15:09:53 +08:00
ltd0924
c9e6ce1518 Update cache_messager.py (#3172) 2025-08-04 14:32:34 +08:00
gaoziyuan
4021d66ea5 【Feature】add fd plugins && rm model_classes (#3123)
* add fd plugins && rm model_classed

* fix reviews

* add docs

* fix

* fix unitest ci
2025-08-03 19:53:20 -07:00
bukejiyu
1582814905 fix load_pre_sharded_checkpoint (#3152)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-04 10:44:20 +08:00
Divano
66d3bb89ad Update __init__.py (#3163)
升级测试基类兼容性
2025-08-04 09:40:09 +08:00
AIbin
22fe695f1c 【Inference Optimize】Support automatic generation of marlin kernel (#3149)
* Support automatic generation of marlin kernel
2025-08-01 22:43:18 +08:00
ApplEOFDiscord
b71cbb466d [Feature] remove dependency on enable_mm and refine multimodal's code (#3014)
* remove dependency on enable_mm

* fix codestyle check error

* fix codestyle check error

* update docs

* resolve conflicts on model config

* fix unit test error

* fix code style check error

---------

Co-authored-by: shige <1021937542@qq.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-01 20:01:18 +08:00
plusNew001
243394044d [XPU]Updata XPU dockerfiles (#3144)
* [CI] add xpu ci case

* [CI]Update run_ci_xpu.sh

* [XPU]Update Dockerfile.xpu

* Update Dockerfile.xpu
2025-08-01 19:41:59 +08:00
Zhang Yulong
0eb32bb9c8 add cases (#3155) 2025-08-01 18:38:57 +08:00
yangjianfengo1
64d7a3194d 集中式支持fa3 (#3112) 2025-08-01 18:03:36 +08:00
YUNSHEN XIE
bdb83e007d fix ci (#3141) 2025-08-01 17:42:26 +08:00
Divano
50db0d7ba9 add case (#3150)
* add test base class

* fix codestyle

* fix codestyle

* add base chat
2025-08-01 17:30:58 +08:00
Ryan
94264bbf60 [Code Simplification] Refactor Post-processing in VL Model Forward Method (#2937)
* rm sth useless

* refactor model forward

* mv bool index to kernel
2025-08-01 17:28:07 +08:00
yinwei
3a4db15765 Fix out-of-memory issue during single-XPU deployment (#3133) 2025-08-01 17:12:03 +08:00
JYChen
c34088b0fd fix stop seq unittest (#3126) 2025-08-01 16:50:05 +08:00
ming1753
fc5f43c6bc [Docs] Optimal Deployment (#2768) 2025-08-01 11:56:27 +08:00