chenjian
98b3647aad
[Fix] fix prefix cache in release21 ( #3922 )
...
* fix prefix cache in release21
* fix
* Fix when prompt ids is numpy
2025-09-08 11:33:59 +08:00
chenjian
ffec66097c
[optimize] Optimize prefix caching in v1 release/2.1 ( #3823 )
...
* [optimize] Optimize prefix caching in v1
* [optimize] Optimize prefix caching in v1
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-04 19:25:02 +08:00
chenjian
a424ab907f
[Bug fix] Fix prefix cache in v1 ( #3710 )
...
* [Bug fix] Fix prefix cache in V1
* add comment
2025-09-01 10:14:25 +08:00
Zero Rains
64cf769bee
fix the bug when num_key_value_heads < tensor_parallel_size ( #3722 )
2025-08-30 12:40:29 +08:00
李泳桦
aad9d3564e
[feat] add metrics for yiyan adapter ( #3615 )
...
* [feat] add metrics for yiyan adapter (#3219 )
* [feat] add metrics for yiyan adapter
* [fix] fix metrics num_requests_waiting and num_requests_running
* [fix] fix metrics gpu_cache_usage_perc
* [refactor] change where requests_number increases
* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly
* [chore] delete useless code
* [fix] fix error
2025-08-28 21:16:58 +08:00
chenjian
25f51b0611
Fix block num in schduelr v1 for release 2.1 ( #3315 )
...
* fix bug for scheduler v0
* fix block num setting in scheduler v1 for release 2.1
* fix block num setting in scheduler v1 for release 2.1
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-12 00:41:05 +08:00
kevin
22cab724e8
[Feature] block scheduler v1 support prefix caching ( #3061 )
...
* block scheduler v1 support prefix cache
* update code
* update code
* fix code bug
* add timeout time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:29:19 +08:00
YuanRisheng
7dfdd157ac
[BugFix]Fix ep size ( #3092 )
...
* fix ep
* fix num_layer
2025-07-30 21:03:12 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
chenjian
85a78d695d
[Feature] Support block scheduler v1 for FD ( #2928 )
...
* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-23 20:31:31 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
ltd0924
68b4755587
[LLM] support multi node deploy ( #2708 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy
* Update engine.py
* fix bugs
* fix
* [LLM] support multi node deploy
* [LLM] support multi node deploy
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-06 10:33:51 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00