qwes5s5
2ee91d7a96
[metrics] Add serveral observability metrics ( #3868 ) ( #4011 )
...
* [metrics] Add serveral observability metrics (#3868 )
* Add several observability metrics
* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息
* adjust some metrics and md files
* trigger ci
* adjust ci file
* trigger ci
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* version adjust
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-10 10:59:57 +08:00
chenjian
c49c43d51c
[Bug fix] Fix perf in mixed deployment with yiyan adpater ( #3703 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-01 14:06:09 +08:00
李泳桦
aad9d3564e
[feat] add metrics for yiyan adapter ( #3615 )
...
* [feat] add metrics for yiyan adapter (#3219 )
* [feat] add metrics for yiyan adapter
* [fix] fix metrics num_requests_waiting and num_requests_running
* [fix] fix metrics gpu_cache_usage_perc
* [refactor] change where requests_number increases
* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly
* [chore] delete useless code
* [fix] fix error
2025-08-28 21:16:58 +08:00
chenjian
edd13aad66
support logprob in v1 for release/2.1 ( #3446 )
2025-08-17 08:16:00 +08:00
chenjian
25f51b0611
Fix block num in schduelr v1 for release 2.1 ( #3315 )
...
* fix bug for scheduler v0
* fix block num setting in scheduler v1 for release 2.1
* fix block num setting in scheduler v1 for release 2.1
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-12 00:41:05 +08:00
chenjian
c000cff744
fix scheduler bug in release2.1 ( #3295 )
2025-08-10 13:55:22 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
ltd0924
ecf2fd5b9a
[BugFix] vl encoder tokens dtype problem ( #3069 )
2025-07-30 15:20:53 +08:00
zhuzixuan
ad7bb52a28
修复传入max_tokens=1时的报错 ( #3068 )
...
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
2025-07-29 23:49:28 +08:00
chenjian
85a78d695d
[Feature] Support block scheduler v1 for FD ( #2928 )
...
* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-23 20:31:31 +08:00
GoldPancake
dc67c10a7e
[Feature][MTP]Support multi-step MTP ( #2952 )
2025-07-22 16:26:29 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
GoldPancake
42d4001400
[Features] Add speculative metrics ( #2857 )
2025-07-17 11:08:55 +08:00
ltd0924
d245d1ca6c
[LLM] support send batch data and aggregate data ( #2860 )
...
* [LLM] support send batch data and aggregate data
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] update
2025-07-16 23:42:20 +08:00
chen
d33105baeb
[Feature] Online Chat API Support Return logprobs ( #2777 )
...
* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform
2025-07-10 16:33:40 +08:00
EnflameGCU
d0f4d6ba3a
[GCU] Support gcu platform ( #2702 )
...
baseline: e7fa57ebae
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-08 13:00:52 +08:00
liddk1121
1b54a2831e
Adapt for iluvatar gpu ( #2684 )
2025-07-07 16:53:14 +08:00
ltd0924
00863c43fd
[Bug] fix logger format ( #2689 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-03 19:58:03 +08:00
Jiang-Jia-Jun
05c670e593
[Sync] Update to latest code ( #2679 )
...
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00
jiangjiajun
fb18f3092d
[LLM] Add output module and polish docs
2025-06-09 20:26:53 +08:00