Commit Graph

10 Commits

Author SHA1 Message Date
Zero Rains
f206474cc7 fix the bug when num_key_value_heads < tensor_parallel_size (#3717) 2025-08-30 12:40:00 +08:00
kevin
67298cf4c0 add error traceback info (#3419)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info

* update error msg

* update code

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-19 19:32:04 +08:00
chenjian
b21272d9ff [Bug fix] fix block num setting in scheduler v1 for develop (#3303)
* fix block num setting in scheduler v1

* fix block num setting in scheduler v1

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting
2025-08-12 10:38:51 +08:00
kevin
22cab724e8 [Feature] block scheduler v1 support prefix caching (#3061)
* block scheduler v1 support prefix cache

* update code

* update code

* fix code bug

* add timeout time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 19:29:19 +08:00
YuanRisheng
7dfdd157ac [BugFix]Fix ep size (#3092)
* fix ep

* fix num_layer
2025-07-30 21:03:12 +08:00
Zero Rains
0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
chenjian
85a78d695d [Feature] Support block scheduler v1 for FD (#2928)
* Support FD block scheduler v1

* Support FD block scheduler v1

* Support FD block scheduler v1

* Fix according to copilot review

* Fix according to review

* Remove is_dummy

* Fix bug when real_bsz=1

* Fix infer first token cost time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-23 20:31:31 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ltd0924
68b4755587 [LLM] support multi node deploy (#2708)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy

* Update engine.py

* fix bugs

* fix

* [LLM] support multi node deploy

* [LLM] support multi node deploy

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-06 10:33:51 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00