李泳桦
|
d18a637a17
|
[feat] add metrics for yiyan adapter (#3219)
* [feat] add metrics for yiyan adapter
* [fix] fix metrics num_requests_waiting and num_requests_running
* [fix] fix metrics gpu_cache_usage_perc
* [refactor] change where requests_number increases
* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly
* [chore] delete useless code
|
2025-08-21 16:58:10 +08:00 |
|
chenjian
|
d2f6c3b998
|
[Bug fix] Fix bug for seq_len_encoder is 1 (#3467)
|
2025-08-19 15:21:32 +08:00 |
|
ltd0924
|
b20ffe3697
|
[Feature] optimize expert parallel (#3196)
* optimize
* Update expert_service.py
* Update worker_process.py
* optimize
|
2025-08-05 17:34:24 +08:00 |
|
Zero Rains
|
25698d56d1
|
polish code with new pre-commit rule (#2923)
|
2025-07-19 23:19:27 +08:00 |
|
Jiang-Jia-Jun
|
92c2cfa2e7
|
Sync v2.0 version of code to github repo
|
2025-06-29 23:29:37 +00:00 |
|
jiangjiajun
|
684703fd72
|
[LLM] First commit the llm deployment code
|
2025-06-09 19:20:15 +08:00 |
|