Commit Graph

3005 Commits

Author SHA1 Message Date
Divano
566badb83c Update _base_test.yml (#3298) 2025-08-11 09:40:14 +08:00
Divano
eaae4a580d Split cases (#3297)
* add repitation early stop cases

* add repitation early stop cases

* split repetition_early_stop from the base test
2025-08-11 09:38:35 +08:00
chenjian
c011cb8b16 [Bug Fix] Fix scheduler bug in develop (#3292)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fix scheduler bug in develop

* Fix scheduler bug in develop

* Fix scheduler bug in develop
2025-08-10 13:55:38 +08:00
Jundong Liu
1e4968e810 [Excutor] Fixed the issue of CUDA graph execution failure caused by different branches during decoding (#3223)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* 彻底解决解码切块问题

* update C8 and C4 kernel

* fix problem

* fix with pre-commit

* retain branch for mtp
2025-08-09 07:37:19 +08:00
ltd0924
31d4fcb425 [BugFix] fix too many open files problem (#3256)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Update cache_messager.py

* fix too many open files problem

* fix too many open files problem

* fix too many open files problem

* fix ci bugs

* Update api_server.py

* add parameter

* format

* format

* format

* format

* Update parameters.md

* Update parameters.md

* Update serving_completion.py

* Update serving_chat.py

* Update envs.py

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-08 20:10:11 +08:00
YUNSHEN XIE
22255a65aa add base test ci (#3225) 2025-08-08 19:08:55 +08:00
gaoziyuan
a799d14df1 [Bugfix] Fix model accuracy in some ops (#3231)
* fix noaux_tc op

* fix

* update

* fix qk norm

* fix linear for prequant loader

* test

* fix

* fix

* rm some print

* fix noaux_tc op

* test

* Fix the confused enable_early_stop when only set early_stop_config (#3214)

* fix the confused early_stop_config when only set early_stop_config

* pre-commit

* write a general method

* Add ci case for min token and max token (#3229)

Co-authored-by: xujing43 <xujing43@baidu.com>

* add some evil cases (#3240)

* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases

* qwen3_moe (#3084)

* [Feature] support seed parameter (#3161)

* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix

* 【Fix Bug】 修复 fa3 支持集中式bug (#3235)

* fix fa3 集中式bug

* 增加qknorm参数

* fix qk norm

* fix

* update

* fix linear for prequant loader

* fix

* fix

* rm some print

* fix

* fix moe init weight&scale

* fix moe init weight&scale

---------

Co-authored-by: bukejiyu <395822456@qq.com>
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: Zero Rains <linjunlu@zerorains.top>
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com>
Co-authored-by: xujing43 <xujing43@baidu.com>
Co-authored-by: Divano <dddivano@outlook.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com>
Co-authored-by: qingqing01 <dangqingqing@baidu.com>
2025-08-08 17:30:37 +08:00
Zero Rains
ce1f353c70 Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend (#3148)
* w4a8 bug

* fix w4a8 bug

* remove code

* modify the triton backend

* fix ep

* fix the bug with tensor_wise_fp8 in triton backend

* fix the RL

* fix bug by merge

* fix the bug in w4a8

* fix the tensor_wise_fp8 bug

* fix RL
2025-08-08 15:55:47 +08:00
plusNew001
d0e9a70380 [CI] add CI logprobs case (#3189)
* [ci] add CI case

* [ci] add CI case

* [ci] add CI case

* [ci] add CI case

---------

Co-authored-by: ZhangYulongg <1272816783@qq.com>
2025-08-08 15:47:55 +08:00
freeliuzc
71267840f7 【Fix】fix mtp bug (#3139) 2025-08-08 13:30:12 +08:00
bukejiyu
b76b17fc1b qwen3 0.3B fix (#3255)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-08 11:35:40 +08:00
Yuanle Liu
fac2f64837 delete parallel_state.py (#3250) 2025-08-08 11:03:29 +08:00
yzwu
fbdd6b0663 [Iluvatar GPU] Optimze attention and moe performance (#3234) 2025-08-08 10:51:24 +08:00
bukejiyu
37569cca86 [feat]add fast_weights_iterator (#3258)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add fast_weights_iterator

* update

* update
2025-08-07 22:36:46 +08:00
chenjian
5f0b30f6d0 support logprob in scheduler v1 (#3249)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-07 20:14:01 +08:00
Yzc216
6037dd5d9c [fix] multi source download (#3259)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit

* Change default download

* change requirements.txt

* modify English Documentation

* documentation

* modify model download path

* add requirements

* error optimization

* 连接失败兜底

* 连接失败兜底

* 连接失败兜底

* unit test

* unit test

* unit test

* test

* test

* 兜底修改

* Trigger CI
2025-08-07 19:30:39 +08:00
JYChen
9423c577fe [stop_seq] fix out-bound value for stop sequence (#3216)
* fix out-bound value for stop sequence

* catch error if there are out-of-bounds value

* check in offline mode

* add ut tests
2025-08-07 15:40:21 +08:00
Divano
5885285e57 Ce add benchmark test (#3262)
* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases

* add benchmark gsm8k
2025-08-07 15:28:30 +08:00
YuBaoku
55ac449c31 [CI] remove useless case (#3261) 2025-08-07 15:09:40 +08:00
RAM
820798aec5 [Executor]Update graph test case and delete test_attention (#3257)
* 1.update graph test case 2.delete test_attention

* code style

* delete print
2025-08-07 14:05:15 +08:00
YuanRisheng
0074b423a9 fix ci bug (#3239)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-07 11:32:39 +08:00
hong19860320
93a1731891 [Doc] Update deps and fix dead links (#3252) 2025-08-07 11:04:31 +08:00
李泳桦
09cc4e2802 [fix] fix completion stream api output_tokens not in usage (#3247) 2025-08-07 10:36:00 +08:00
Yzc216
d9e3f88f9e [Feature] multi source download (#3125)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit

* Change default download

* change requirements.txt

* modify English Documentation

* documentation

* modify model download path

* add requirements

* error optimization

* 连接失败兜底

* 连接失败兜底

* 连接失败兜底

* unit test

* unit test

* unit test

* test

* test
2025-08-07 00:40:27 +08:00
bukejiyu
9408e667a5 [bugfix]fix blockwisefp8 and all_reduce (#3243)
* fix

* update

* fix linear for prequant loader
2025-08-06 23:54:33 +08:00
yangjianfengo1
3a15e0c53e 【Fix Bug】 修复 fa3 支持集中式bug (#3235)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix fa3 集中式bug

* 增加qknorm参数
2025-08-06 16:24:27 +08:00
lizexu123
afff4d37ea [Feature] support seed parameter (#3161)
* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix
2025-08-06 15:20:47 +08:00
bukejiyu
20839abccf qwen3_moe (#3084) 2025-08-06 14:45:27 +08:00
Divano
91dc87f1c5 add some evil cases (#3240)
* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases
2025-08-06 14:23:55 +08:00
xjkmfa
256a82b0b3 Add ci case for min token and max token (#3229)
Co-authored-by: xujing43 <xujing43@baidu.com>
2025-08-06 14:10:57 +08:00
Zero Rains
36dc73470d Fix the confused enable_early_stop when only set early_stop_config (#3214)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix the confused early_stop_config when only set early_stop_config

* pre-commit

* write a general method
2025-08-06 11:42:27 +08:00
YuanRisheng
a6e8b780f8 fix approve (#3224) 2025-08-06 10:36:01 +08:00
yangjianfengo1
89397516a8 [New Feature] Support W4Afp8 MoE GroupGemm (#3171)
* init

* 增加多线程编译

* fix bug

* fix bug

* code style

* 增加fp16

* 将print替换成assert

* 修复stmatrix

* 减小单测shape

* 减小单测shape
2025-08-06 10:34:05 +08:00
sg263
841e831575 [Trace]add trace when fd start (#3174)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

* fix opentelemetry can not work in uvicorn

* move conf to env

* fd start add trace

* fix pre-commit

* fix pre-commit

* change FD_JOB_ID

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: shige <shige@baidu.com>
2025-08-05 21:18:27 +08:00
YUNSHEN XIE
e0bbd3b6ca fix approve ci (#3212)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-05 17:21:26 +08:00
Yuan Xiaolan
7ce00e597c support qk norm (#3145) 2025-08-05 16:46:14 +08:00
RAM
4a10e29804 fix mla attention backend (#3176) 2025-08-05 16:43:15 +08:00
Yuan Xiaolan
af543b7f0f revise get_moe_scores (#3164) 2025-08-05 16:43:07 +08:00
Divano
e24929efa3 Ce add bad cases (#3215)
* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases
2025-08-05 16:37:28 +08:00
lizexu123
b01cfd6007 [BugFix] support real batch_size (#3109)
* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
2025-08-05 16:33:54 +08:00
Jiang-Jia-Jun
55939f7942 Update engine.py 2025-08-05 16:10:36 +08:00
chen
04fc7eb931 fix test_air_top_p_sampling name (#3211) 2025-08-05 15:47:50 +08:00
Divano
9f1936ae28 Ce add repitation early stop cases (#3213)
* add repitation early stop cases

* add repitation early stop cases
2025-08-05 15:47:28 +08:00
RichardWooSJTU
1e9a8e8cef fix lm head bias (#3185)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:24 +08:00
RichardWooSJTU
f5c64a074c [EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization (#3182)
* Add support for mixed-ep across multi nodes

* code refine

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:11 +08:00
ming1753
14ed75f7d3 [Test] scaled_gemm_f8_i4_f16 skip test while sm != 89 (#3210) 2025-08-05 15:25:28 +08:00
yangjianfengo1
40f7f3e0d8 [New Feature] fa3 支持flash mask (#3184)
* 支持flash mask

* 修改test_flash_mask

* 修改test.sh
2025-08-05 12:20:48 +08:00
YUNSHEN XIE
b8f3c73aac fix coverage report (#3198)
* fix coverage report

* fix
2025-08-05 11:24:55 +08:00
Divano
fb7a0689cc add more cases (#3207) 2025-08-05 11:17:36 +08:00
RAM
c593e1a39c [Bug Fix]Fix bug of append attention test case (#3202)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-05 11:04:45 +08:00