ltd0924
cc5430e4c2
[BugFix] [CP] fix max streaming tokens invalid ( #3798 )
...
* Update serving_chat.py
* Update serving_completion.py
2025-09-02 21:03:36 +08:00
chen
1e19833ba5
[CP] CP Lm head fp32 and temp_logprob to release/2.1 ( #3766 )
...
* [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552 )
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
* [Precision] Support lm_head layer running in float32 (#3597 )
* support lm_head fp32 bf16 fp16
* support lm_head fp32 bf16 fp16
* add doc and check code
* lm_head_fp32 specify lm_head as fp32
* code check
* check doc
* code check
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-09-01 19:56:54 +08:00
zhuzixuan
80db7fce05
【Bugfix】修复2.1分支上0.3B模型性能大幅下降 ( #3624 )
...
* 恢复异步方法。
【BugFix】completion接口echo回显支持 (#3245 )
* wenxin-tools-511,修复v1/completion无法回显的问题。
* 支持多prompt的回显
* 支持多prompt情况下的流式回显
* 补充了 completion 接口支持 echo 的单元测试
* pre-commit
* 移除了多余的test文件
* 修复了completion接口echo支持的单测方法
* 补充了单元测试文件
* 补充单测
* unittest
* 补充单测
* 修复单测
* 删除不必要的assert.
* 重新提交
* 更新测试方法
* ut
* 验证是否是正确思路单测
* 验证是否是正确思路单测
* 验证是否是正确思路单测3
* 优化单测代码,有针对性地缩小单测范围。
* 优化单测代码2,有针对性地缩小单测范围。
* 优化单测代码3,有针对性地缩小单测范围。
* support 'echo' in chat/completion.
* update
* update
* update
* update
* update
* update
* 补充了关于tokenid的单元测试
* update
* 修正index错误
* 修正index错误
* [Bugfix] Significant performance degradation of 0.3B model on branch 2.1
2025-08-27 15:29:01 +08:00
SunLei
d8444e22ca
fix: replace list * n initialization with list comprehension to avoid shared references ( #3620 )
2025-08-26 17:53:09 +08:00
李泳桦
b1f8f1aa07
[fix] fix completion stream api output_tokens not in usage ( #3588 )
2025-08-25 18:31:57 +08:00
zhuzixuan
4e369c7fa7
【BugFix】completion接口echo回显支持 ( #3477 )
...
* update
【BugFix】completion接口echo回显支持 (#3245 )
* wenxin-tools-511,修复v1/completion无法回显的问题。
* 支持多prompt的回显
* 支持多prompt情况下的流式回显
* 补充了 completion 接口支持 echo 的单元测试
* pre-commit
* 移除了多余的test文件
* 修复了completion接口echo支持的单测方法
* 补充了单元测试文件
* 补充单测
* unittest
* 补充单测
* 修复单测
* 删除不必要的assert.
* 重新提交
* 更新测试方法
* ut
* 验证是否是正确思路单测
* 验证是否是正确思路单测
* 验证是否是正确思路单测3
* 优化单测代码,有针对性地缩小单测范围。
* 优化单测代码2,有针对性地缩小单测范围。
* 优化单测代码3,有针对性地缩小单测范围。
* support 'echo' in chat/completion.
* update
* update
* update
* update
* update
* update
* 补充了关于tokenid的单元测试
* update
* 修正index错误
* 修正index错误
* 解决冲突
* 解决冲突
* 解决冲突
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-23 13:08:48 +08:00
ltd0924
4d6fb96cd6
[BugFix] Api server bugs ( #3530 )
...
* Update serving_chat.py
* Update serving_completion.py
* Update serving_completion.py
2025-08-22 14:01:14 +08:00
luukunn
4a9c04a746
[Feature] add tool parser ( #3518 )
...
* [Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421 )
* fix chat_template_args
* fix args
* add offline
* add offline
* fix
* fix
* fix default enable_thinking value
* fix default enable_thinking value
* modify condition
* Revert "modify condition"
This reverts commit 26430bdeb1
.
* fix unit test
* add Tool Parser (#3272 )
* add tool-parser
* add tool-parser
* add tool parser
* add tool parser
* fix
* add offline
* add offline
* fix
* parsers:tool&reasoning
* 修改tool parser名称·
* update
* fix reasoning-parser
* add requirements
* fix finish reason
* fix
* fix reasoning-parser
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: zhuzixuan <zhuzixuan@baidu.com >
* [Feature] add tool parser (#3483 )
* add tool parser
* add x1 enable_thinking
* restart ci
* fix vl reasoning parser
* modify call style
* modify call style
* add offline enablethinking
* fix completion
* fix
* fix unit test
* fix unit test
* fix unit test
* fix vl reasoning parser
* fix vl reasoning parser
* fix unit test
---------
Co-authored-by: zhuzixuan <zhuzixuan@baidu.com >
2025-08-22 11:14:35 +08:00
李泳桦
1b399b91c0
[fix] setting disable_chat_template while passing prompt_token_ids led to response error ( #3511 )
...
* [fix] setting disable_chat_template while passing prompt_token_ids led to response error
* [fix] code syntax
* [test] add test case for this bug
* [test] add test case for empty message list
* [test] fix test case for empty message list
2025-08-21 17:33:10 +08:00
memoryCoderC
8bf48dfab8
[Feature] add prompt_tokens and completion_tokens ( #3505 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-08-21 14:10:06 +08:00
Jiang-Jia-Jun
28918702c2
Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1"
...
This reverts commit 02596fc537
, reversing
changes made to 03347626a6
.
2025-08-14 17:20:29 +08:00
Jiang-Jia-Jun
02596fc537
Merge branch 'feature/online/vs_think_20250813' into release/2.1
2025-08-14 17:13:36 +08:00
ltd0924
03347626a6
[BugFix] fix control signal release failed ( #3374 )
...
* [BugFix]
* [BugFix]
* [BugFix]
* [BugFix]
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-14 17:01:25 +08:00
xiaolei373
d1d321bafd
feat(log):add_request_and_response_log ( #3392 )
2025-08-14 14:50:48 +08:00
luukunn
81092c0fe3
add tool parser
2025-08-13 16:06:22 +08:00
memoryCoderC
37b76158f9
Completion add raw_prediction/text_after_process ( #3362 )
2025-08-12 23:20:36 +08:00
memoryCoderC
fe2094609f
Release/2.1 ( #3361 )
...
* [BugFix] v1/completions add finish_reason
* update TestOpenAIServingCompletion for merge
2025-08-12 23:06:51 +08:00
ltd0924
6706ccb37e
[BugFix] fix too many open files problem ( #3275 )
2025-08-08 20:11:32 +08:00
sg263
5d3bf308f6
merge develop trace FD_START ( #3253 )
...
Co-authored-by: shige <shige@baidu.com >
2025-08-07 11:10:55 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
Jiang-Jia-Jun
0616c208d2
[Feature] Support include_stop_str_in_output in completion api ( #3096 )
...
* [Feature] Support include_stop_str_in_output in completion api
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 22:18:48 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
Sunny-bot1
74aa31d15b
[Feature] support bad_words ( #3055 )
...
* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-30 09:31:29 +08:00
李泳桦
69996a40da
[feat] add disable_chat_template in chat api as a substitute for previous raw_request ( #3020 )
...
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request
* [fix] pre-commit code check
2025-07-25 20:57:32 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
ltd0924
f935d6f862
[BugFix] fix multinode deployment ( #2977 )
2025-07-24 15:04:04 +08:00
Yzc216
e14587a954
[Feature] multi-source download ( #2986 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
2025-07-24 14:26:37 +08:00
李泳桦
2a8a2c06de
[fix] non-streaming api now returns full output ids if return_token_ids is enabled ( #2951 )
2025-07-22 14:35:56 +08:00
Jiang-Jia-Jun
56102e91e1
[Polish] Return error message of raw_request ( #2946 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-22 10:21:32 +08:00
李泳桦
8a619e9db5
[Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body ( #2940 )
...
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body
* [fix] return_token_ids not working in curl request
* [test] improve some test cases of return_token_ids and prompt_token_ids
* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
lizexu123
67990e0572
[Feature] support min_p_sampling ( #2872 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fastdeploy support min_p
* add test_min_p
* fix
* min_p_sampling
* update
* delete vl_gpu_model_runner.py
* fix
* Align usage of min_p with vLLM
* fix
* modified unit test
* fix test_min_sampling
* pre-commit all files
* fix
* fix
* fix
* fix xpu_model_runner.py
2025-07-20 23:17:59 -07:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun
fbe3547c95
[Feature] Support include_stop_str_in_output in chat/completion ( #2910 )
...
* [Feature] Support include_stop_str_in_output in chat/completion
* Add ci test for include_stop_str_in_output
* Update version of openai
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-18 16:59:18 +08:00
sg263
e679567d59
[Trace]fix opentelemetry can not work in uvicorn ( #2906 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
* fix opentelemetry can not work in uvicorn
* move conf to env
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 23:16:45 +08:00
Jiang-Jia-Jun
31cab9f87b
Update test_openai.py
2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun
d3dfa1446c
Update test_openai.py
2025-07-17 16:07:07 +08:00
ltd0924
9c25dcca0b
[LLM] Update Multinode Deployment ( #2830 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] fix ci bugs
* Update fastdeploy/engine/args_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [LLM] update random port
* [LLM] update random port
* [LLM] fix ci bugs
* fix ci bugs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-07-16 23:42:54 +08:00
ltd0924
d245d1ca6c
[LLM] support send batch data and aggregate data ( #2860 )
...
* [LLM] support send batch data and aggregate data
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] update
2025-07-16 23:42:20 +08:00
sg263
42b80182e0
[Trace] add opentelemetry ( #2852 )
...
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-16 15:33:25 +08:00
lddfym
ece88596ed
fix spelling error ( #2827 )
2025-07-14 13:12:57 +08:00
zhenwenDang
d48c03413f
Feature/logprob bug fix ( #2817 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens
* Prevent response_logprobs.logprob_token_ids[0] from going out of bounds
2025-07-12 16:48:51 +08:00
lddfym
b5e4288704
Global scheduler supports configuring hot updates ( #2807 )
...
* Check if the controller port is available
* Global scheduler supports configuring hot updates
* add interface: /controller/scheduler
* add interface: /controller/scheduler
2025-07-11 13:38:07 +08:00
chen
d33105baeb
[Feature] Online Chat API Support Return logprobs ( #2777 )
...
* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform
2025-07-10 16:33:40 +08:00
Sunny-bot1
e45050cae3
[Feature] support top_k_top_p sampling ( #2753 )
...
* support top_k_top_p sampling
* fix
* add api param
* add api para
* fix
* fix
* fix
* fix
* fix
* fix
* fix
2025-07-09 20:58:58 -07:00
lddfym
4e293e50fa
Check if the controller port is available ( #2724 )
2025-07-07 13:24:55 +08:00
ltd0924
68b4755587
[LLM] support multi node deploy ( #2708 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy
* Update engine.py
* fix bugs
* fix
* [LLM] support multi node deploy
* [LLM] support multi node deploy
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-06 10:33:51 +08:00
ltd0924
87e638498c
[RL] update reschedule finish reason ( #2709 )
2025-07-04 13:47:36 +08:00
Jiang-Jia-Jun
05c670e593
[Sync] Update to latest code ( #2679 )
...
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00