qwes5s5
2ee91d7a96
[metrics] Add serveral observability metrics ( #3868 ) ( #4011 )
...
* [metrics] Add serveral observability metrics (#3868 )
* Add several observability metrics
* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息
* adjust some metrics and md files
* trigger ci
* adjust ci file
* trigger ci
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* version adjust
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-10 10:59:57 +08:00
chen
1e19833ba5
[CP] CP Lm head fp32 and temp_logprob to release/2.1 ( #3766 )
...
* [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552 )
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
* [Precision] Support lm_head layer running in float32 (#3597 )
* support lm_head fp32 bf16 fp16
* support lm_head fp32 bf16 fp16
* add doc and check code
* lm_head_fp32 specify lm_head as fp32
* code check
* check doc
* code check
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-09-01 19:56:54 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
Jiang-Jia-Jun
998968f1e8
[Doc] Update parameters of serving
2025-07-30 22:35:01 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
zhenwenDang
5fc659b900
[Docs] add enable_logprob parameter description ( #2850 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-15 19:47:45 +08:00
Jiang-Jia-Jun
e5b94d4117
Update README.md
2025-07-03 15:28:05 +08:00
Jiang-Jia-Jun
9f4a65d817
Update README.md
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-02 10:04:58 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00