mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-09-26 20:41:53 +08:00
[metrics] update metrics markdown file (#4061)
* adjust md * trigger ci --------- Co-authored-by: K11OntheBoat <your_email@example.com>
This commit is contained in:
@@ -26,6 +26,19 @@ After FastDeploy is launched, it supports continuous monitoring of the FastDeplo
|
||||
| `fastdeploy:hit_token_rate` | Gauge | Token-level prefix cache hit rate | Percentage |
|
||||
| `fastdeploy:cpu_hit_token_rate` | Gauge | Token-level CPU prefix cache hit rate | Percentage |
|
||||
| `fastdeploy:gpu_hit_token_rate` | Gauge | Token-level GPU prefix cache hit rate | Percentage |
|
||||
| `fastdeploy:prefix_cache_token_num` | Counter | Total number of cached tokens | Count |
|
||||
| `fastdeploy:prefix_gpu_cache_token_num` | Counter | Total number of cached tokens on GPU | Count |
|
||||
| `fastdeploy:prefix_cpu_cache_token_num` | Counter | Total number of cached tokens on CPU | Count |
|
||||
| `fastdeploy:batch_size` | Gauge | Real batch size during inference | Count |
|
||||
| `fastdeploy:max_batch_size` | Gauge | Maximum batch size determined when service started | Count |
|
||||
| `fastdeploy:available_gpu_block_num` | Gauge | Number of available gpu blocks in cache, including prefix caching blocks that are not officially released | Count |
|
||||
| `fastdeploy:free_gpu_block_num` | Gauge | Number of free blocks in cache | Count |
|
||||
| `fastdeploy:max_gpu_block_num` | Gauge | Number of total blocks determined when service started| Count |
|
||||
| `available_gpu_resource` | Gauge | Available blocks percentage, i.e. available_gpu_block_num / max_gpu_block_num | Count |
|
||||
| `fastdeploy:requests_number` | Counter | Total number of requests received | Count |
|
||||
| `fastdeploy:send_cache_failed_num` | Counter | Total number of failures of sending cache | Count |
|
||||
| `fastdeploy:first_token_latency` | Gauge | Latest time to generate first token in seconds | Seconds |
|
||||
| `fastdeploy:infer_latency` | Gauge | Latest time to generate one token in seconds | Seconds |
|
||||
## Accessing Metrics
|
||||
|
||||
- Access URL: `http://localhost:8000/metrics`
|
||||
|
@@ -26,6 +26,19 @@
|
||||
| `fastdeploy:hit_token_rate` | Gauge | token级别前缀缓存命中率 | 百分比 |
|
||||
| `fastdeploy:cpu_hit_token_rate` | Gauge | token级别CPU前缀缓存命中率 | 百分比 |
|
||||
| `fastdeploy:gpu_hit_token_rate` | Gauge | token级别GPU前缀缓存命中率 | 百分比 |
|
||||
| `fastdeploy:prefix_cache_token_num` | Counter | 前缀缓存token总数 | 个 |
|
||||
| `fastdeploy:prefix_gpu_cache_token_num` | Counter | 位于GPU上的前缀缓存token总数 | 个 |
|
||||
| `fastdeploy:prefix_cpu_cache_token_num` | Counter | 位于GPU上的前缀缓存token总数 | 个 |
|
||||
| `fastdeploy:batch_size` | Gauge | 推理时的真实批处理大小 | 个 |
|
||||
| `fastdeploy:max_batch_size` | Gauge | 服务启动时确定的最大批处理大小 | 个 |
|
||||
| `fastdeploy:available_gpu_block_num` | Gauge | 缓存中可用的GPU块数量(包含尚未正式释放的前缀缓存块)| 个 |
|
||||
| `fastdeploy:free_gpu_block_num` | Gauge | 缓存中的可用块数 | 个 |
|
||||
| `fastdeploy:max_gpu_block_num` | Gauge | 服务启动时确定的总块数 | 个 |
|
||||
| `available_gpu_resource` | Gauge | 可用块占比,即可用GPU块数量 / 最大GPU块数量| 个 |
|
||||
| `fastdeploy:requests_number` | Counter | 已接收的请求总数 | 个 |
|
||||
| `fastdeploy:send_cache_failed_num` | Counter | 发送缓存失败的总次数 | 个 |
|
||||
| `fastdeploy:first_token_latency` | Gauge | 最近一次生成首token耗时 | 秒 |
|
||||
| `fastdeploy:infer_latency` | Gauge | 最近一次生成单个token的耗时 | 秒 |
|
||||
## 指标访问
|
||||
|
||||
- 访问地址:`http://localhost:8000/metrics`
|
||||
|
@@ -152,10 +152,10 @@ class MetricsManager:
|
||||
spec_decode_draft_single_head_acceptance_rate: "list[Gauge]"
|
||||
|
||||
# for YIYAN Adapter
|
||||
prefix_cache_token_num: "Gauge"
|
||||
prefix_gpu_cache_token_num: "Gauge"
|
||||
prefix_cpu_cache_token_num: "Gauge"
|
||||
prefix_ssd_cache_token_num: "Gauge"
|
||||
prefix_cache_token_num: "Counter"
|
||||
prefix_gpu_cache_token_num: "Counter"
|
||||
prefix_cpu_cache_token_num: "Counter"
|
||||
prefix_ssd_cache_token_num: "Counter"
|
||||
batch_size: "Gauge"
|
||||
max_batch_size: "Gauge"
|
||||
available_gpu_block_num: "Gauge"
|
||||
|
Reference in New Issue
Block a user