李泳桦
d18a637a17
[feat] add metrics for yiyan adapter ( #3219 )
...
* [feat] add metrics for yiyan adapter
* [fix] fix metrics num_requests_waiting and num_requests_running
* [fix] fix metrics gpu_cache_usage_perc
* [refactor] change where requests_number increases
* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly
* [chore] delete useless code
2025-08-21 16:58:10 +08:00
chenjian
aba94169dc
[Feature] Support batched tokens for EP ( #3415 )
...
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
2025-08-18 11:43:36 +08:00
chenjian
7573802a88
[Feature] Support mtp ep in fd ( #3340 )
...
* [Optimize] Add metrics for analysing perf
* Fix bug in mtp
2025-08-11 21:49:44 +08:00
chenjian
110f33a530
[Bug fix] Test td cache messager ( #3242 )
...
* support disable cache task in decode node
* fix busg
* Update engine.py
* Update expert_service.py
* Update splitwise_connector.py
* Optimize log for debug
* Optimize log for debug
* fix bug
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-06 15:52:45 +08:00
ltd0924
b20ffe3697
[Feature] optimize expert parallel ( #3196 )
...
* optimize
* Update expert_service.py
* Update worker_process.py
* optimize
2025-08-05 17:34:24 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00