[metrics] Add serveral observability metrics (#3868) (#4011)

* [metrics] Add serveral observability metrics (#3868)

* Add several observability metrics

* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息

* adjust some metrics and md files

* trigger ci

* adjust ci file

* trigger ci

* trigger ci

---------

Co-authored-by: K11OntheBoat <your_email@example.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* version adjust

---------

Co-authored-by: K11OntheBoat <your_email@example.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
This commit is contained in:
qwes5s5
2025-09-10 10:59:57 +08:00
committed by GitHub
parent 187ccb0f04
commit 2ee91d7a96
12 changed files with 1026 additions and 7 deletions

View File

@@ -20,7 +20,12 @@ After FastDeploy is launched, it supports continuous monitoring of the FastDeplo
| `fastdeploy:gpu_cache_usage_perc` | Gauge | GPU KV-cache usage rate | Percentage |
| `fastdeploy:request_params_max_tokens` | Histogram | Distribution of max_tokens for requests | Count |
| `fastdeploy:request_success_total` | Counter | Number of successfully processed requests | Count |
| `fastdeploy:cache_config_info` | Gauge | Information of the engine's CacheConfig | Count |
| `fastdeploy:available_batch_size` | Gauge | Number of requests that can still be inserted during the Decode phase| Count |
| `fastdeploy:hit_req_rate` | Gauge | Request-level prefix cache hit rate | Percentage |
| `fastdeploy:hit_token_rate` | Gauge | Token-level prefix cache hit rate | Percentage |
| `fastdeploy:cpu_hit_token_rate` | Gauge | Token-level CPU prefix cache hit rate | Percentage |
| `fastdeploy:gpu_hit_token_rate` | Gauge | Token-level GPU prefix cache hit rate | Percentage |
## Accessing Metrics
- Access URL: `http://localhost:8000/metrics`