Commit Graph

6 Commits

Author SHA1 Message Date
李泳桦
d18a637a17 [feat] add metrics for yiyan adapter (#3219)
* [feat] add metrics for yiyan adapter

* [fix] fix metrics num_requests_waiting and num_requests_running

* [fix] fix metrics gpu_cache_usage_perc

* [refactor] change where requests_number increases

* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly

* [chore] delete useless code
2025-08-21 16:58:10 +08:00
chenjian
d2f6c3b998 [Bug fix] Fix bug for seq_len_encoder is 1 (#3467) 2025-08-19 15:21:32 +08:00
ltd0924
b20ffe3697 [Feature] optimize expert parallel (#3196)
* optimize

* Update expert_service.py

* Update worker_process.py

* optimize
2025-08-05 17:34:24 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00