Commit Graph

3005 Commits

Author SHA1 Message Date
YUNSHEN XIE
fb410b5f4c Add unit test run and coverage report generation (#3011)
* Add unit test run and coverage report generation

* fix

* fix: upload coverage report failure

* fix

* update

* fix

* fix

* update
2025-07-27 22:48:34 +08:00
YUNSHEN XIE
1d29dd80f7 modified dockerfile (#3026)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-25 21:10:23 +08:00
李泳桦
69996a40da [feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3020)
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request

* [fix] pre-commit code check
2025-07-25 20:57:32 +08:00
Longzhi Wang
0700c90caa [Feat] support mixed ep (#2969)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support mixed ep

* fix comment

* fix comment

* update mixep

* fix conflict

* fix typo

* update

* fix typo

* fix code style

* fix conflict
2025-07-25 15:29:30 +08:00
chen
332154f504 [feature] Support FA2 (#3009) 2025-07-25 14:09:00 +08:00
YuBaoku
4b02b96467 [CI] fix codestyle_check (#3015) 2025-07-25 14:02:34 +08:00
EnflameGCU
8c167e130c [GCU] Update post_process (#3012) 2025-07-25 11:03:03 +08:00
EnflameGCU
7634ffb709 [GCU] Add CI (#3006) 2025-07-25 10:59:29 +08:00
Jiang-Jia-Jun
6ce3a8a497 Update index.md 2025-07-25 10:32:47 +08:00
xiaoxiaohehe001
2970b00dfa [Feature] Support_eplb (#2997)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb

* [Feature] support_eplb

* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
littledgg
f37d00e856 [Model] Provide clearer error for missing KV cache quantization scales (#3007) 2025-07-24 20:15:00 +08:00
EnflameGCU
c40df1802e [GCU] Update to develop (#2988) 2025-07-24 19:30:52 +08:00
Yzc216
980126b83a [Feature] multi source download (#3005)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit

* Change default download

* change requirements.txt

* modify English Documentation

* documentation
2025-07-24 17:42:09 +08:00
Zero Rains
0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
Zhang Yulong
5151bc92c8 Update benchmark tools (#3004)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* update benchmark tools

* update benchmark tools
2025-07-24 15:19:23 +08:00
ltd0924
f935d6f862 [BugFix] fix multinode deployment (#2977) 2025-07-24 15:04:04 +08:00
ltd0924
3792345c3a [LLM] update function name (#2985)
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Yzc216
e14587a954 [Feature] multi-source download (#2986)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit
2025-07-24 14:26:37 +08:00
YUNSHEN XIE
87a2f4191d add ci reuse action (#2968)
* add ci reuse action

* fix code formatting

* update
2025-07-24 14:24:10 +08:00
xiaoxiaohehe001
2c0ff068e2 [Fix] fix mm ep empty run (#2999) 2025-07-24 14:15:55 +08:00
xiegegege
e3a843f2c5 [benchmark] add quantization for benchmark yaml (#2995) 2025-07-24 13:26:34 +08:00
lizhenyun01
6235ef3881 fix chunk_prefill 2025-07-24 12:00:52 +08:00
lizhenyun01
29c3292f02 support c4 attn && fix cache 2025-07-24 12:00:52 +08:00
lizexu123
832d25334a [Code Simplification] fix init_distributed_environment() (#2982) 2025-07-24 11:43:28 +08:00
bukejiyu
bfeb664ab8 update (#2978)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chenjian
85a78d695d [Feature] Support block scheduler v1 for FD (#2928)
* Support FD block scheduler v1

* Support FD block scheduler v1

* Support FD block scheduler v1

* Fix according to copilot review

* Fix according to review

* Remove is_dummy

* Fix bug when real_bsz=1

* Fix infer first token cost time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-23 20:31:31 +08:00
Zero Rains
ca0f71bd39 polish code for prefill restrictions (#2991) 2025-07-23 05:10:14 -07:00
chen
172e69fe17 FA3 fix bug (#2987) 2025-07-23 19:07:43 +08:00
zhink
1272c7ce98 Fix performance degradation bug of custom_all_reduce (#2981) 2025-07-23 17:45:44 +08:00
Zero Rains
850c9d98d4 [BugFix] Add prefill restrictions for chunked_prefill+VL (#2983) 2025-07-23 01:45:57 -07:00
freeliuzc
a39a67334c fix mtp bug in pd-split mode (#2970)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-23 15:31:16 +08:00
YuBaoku
6c4cfd9359 [CI] add codestyle_check action (#2972)
* [CI] add codestyle_check action

* [CI] Integrate codestyle check via pre-commit in GitHub Actions
2025-07-23 15:21:56 +08:00
lizexu123
9b22b8d2c3 delete max-len (#2959) 2025-07-23 15:11:39 +08:00
Jiang-Jia-Jun
5b59a97030 Update README.md 2025-07-23 13:52:14 +08:00
Jiang-Jia-Jun
475dc6d84e Update README.md 2025-07-23 13:47:31 +08:00
chen
ad202272ed 【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942) 2025-07-23 13:02:50 +08:00
lizhenyun01
e51f018577 support chunk_prefill in fa3 2025-07-23 12:19:20 +08:00
Ryan
95b5af24db [SOT] Add sot warmup (NVIDIA GPU Only) (#2929)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add sot warmup

* fix code style

* change batch_size list

* add param to config

* rm free_list settings && set sot_warmup_sizes

* finish debug with dynamic dims by type annotations

* add profile_run guard

* rm sth useless
2025-07-22 21:36:14 +08:00
Sunny-bot1
7c5e34e72d [FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS (#2967)
* fix rejection sampling when topp=0

* fix
2025-07-22 05:53:37 -07:00
gaoziyuan
dbe6225b33 fix rl config local rank (#2957) 2025-07-22 04:39:54 -07:00
GoldPancake
9b84d51e25 [MTP Fix] Fix code and register cpp operators (#2965) 2025-07-22 19:36:24 +08:00
K11OntheBoat
93bb68aa71 [Feature] Marlin MoE backend supports DeepseekV3 (#2962)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 18:11:15 +08:00
GoldPancake
dc67c10a7e [Feature][MTP]Support multi-step MTP (#2952) 2025-07-22 16:26:29 +08:00
luukunn
920e6b3f60 [Fix]fix empty prompt_token_ids,update the parser's triggering condit… (#2891) 2025-07-22 16:13:05 +08:00
Zero Rains
89a485b69f [Feature] Support using prefix-caching + cudagraph for inference (#2924)
* fix the bug in cudagraph+prefix-caching but still have some bug with profile

Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397

* add the signal to make sure cache manager launched

* fix judge condition

* reomove useless control

* update control stream

* update

* fix xpu

* change the do_profile flag

* update

* add new threads to init cache_manager

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-07-22 00:59:45 -07:00
Nyakku Shigure
48e6a0ca26 [SOT] Mark dynamic dims by type annotations (#2771)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations

* fix conflict of forward_meta

* mark more attn backend

* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS

* auto infer implicit 0 dim dynamic dim

* revert manual marked dims

* revert missing update

* auto infer can use unsafe code in warmup stage

* check -> type_match

* fix codestyle

* restore blank line

* empty commit

* add need_warmup nonlocal;

* add doc for resolver

* add missing type hints

* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
K11OntheBoat
e991777757 [Feature] DeepseekV3 use pd_build_static_op (#2948)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 15:03:41 +08:00
李泳桦
2a8a2c06de [fix] non-streaming api now returns full output ids if return_token_ids is enabled (#2951) 2025-07-22 14:35:56 +08:00
lifulll
2c6a9e887e native top_p_sampling (#2901) 2025-07-22 14:09:59 +08:00
gaoziyuan
0eedbdaee0 fix import error (#2944) 2025-07-22 14:06:01 +08:00