luukunn
bbd50c6717
add tool parser
2025-08-14 21:08:49 +08:00
luukunn
132a8ef425
Release/2.1 ( #3414 )
...
* Pre ce modified (#3335 ) (#3360 )
* Pre ce modified (#3335 )
* update
* update
* fix
* fix
* update
* update
* update
* fix
* update
* update
* update
* add ut fix pr(3367)
* [Bug Fix] Fix V1 video bug (#3387 )
* fix stopseq error info (#3342 )
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
* [BugFix] Fix default log level of paddleformers (#3377 )
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
* [Polish Code] Remove useless notes
* feat(log):add_request_and_response_log (#3392 )
* Optimize CI execution workflow. (#3371 ) (#3384 )
* fix
* [BugFix] fix control signal release failed (#3374 )
* [BugFix]
* [BugFix]
* [BugFix]
* [BugFix]
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1"
This reverts commit 02596fc537
, reversing
changes made to 03347626a6
.
* [XPU] Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3393 )
* fix v1 schedule oom bug
* fix v1 schedule oom bug
* [BugFix] fix ErnieProcessor not set raw_prediction (#3401 )
* [Doc]Release fastdeploy-xpu 2.1.0 (#3407 )
* fix v1 schedule oom bug
* fix v1 schedule oom bug
* update release note
* [Doc]Release fastdeploy-xpu 2.0.3 (#3408 )
* fix v1 schedule oom bug
* fix v1 schedule oom bug
* update release note
* update info
---------
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
Co-authored-by: xiaolei373 <zley373@gmail.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: memoryCoderC <1137889088@qq.com >
2025-08-14 20:53:47 +08:00
Jiang-Jia-Jun
e11331927f
[Sync Code] Update vs branch ( #3403 )
...
* Pre ce modified (#3335 ) (#3360 )
* Pre ce modified (#3335 )
* update
* update
* fix
* fix
* update
* update
* update
* fix
* update
* update
* update
* add ut fix pr(3367)
* [Bug Fix] Fix V1 video bug (#3387 )
* fix stopseq error info (#3342 )
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
* [BugFix] Fix default log level of paddleformers (#3377 )
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
* [Polish Code] Remove useless notes
* feat(log):add_request_and_response_log (#3392 )
* Optimize CI execution workflow. (#3371 ) (#3384 )
* fix
* [BugFix] fix control signal release failed (#3374 )
* [BugFix]
* [BugFix]
* [BugFix]
* [BugFix]
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
Co-authored-by: xiaolei373 <zley373@gmail.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-14 17:14:45 +08:00
luukunn
81092c0fe3
add tool parser
2025-08-13 16:06:22 +08:00
memoryCoderC
37b76158f9
Completion add raw_prediction/text_after_process ( #3362 )
2025-08-12 23:20:36 +08:00
memoryCoderC
fe2094609f
Release/2.1 ( #3361 )
...
* [BugFix] v1/completions add finish_reason
* update TestOpenAIServingCompletion for merge
2025-08-12 23:06:51 +08:00
ltd0924
6706ccb37e
[BugFix] fix too many open files problem ( #3275 )
2025-08-08 20:11:32 +08:00
JYChen
1b6f482c15
[Cherry-pick] fix stop seq ( #3263 )
...
* fix out-bound value for stop sequence
* catch error if there are out-of-bounds value
* check in offline mode
2025-08-07 19:11:37 +08:00
sg263
5d3bf308f6
merge develop trace FD_START ( #3253 )
...
Co-authored-by: shige <shige@baidu.com >
2025-08-07 11:10:55 +08:00
SunLei
3dd8492601
[Bugfix] Fix uninitialized decoded_token and add corresponding unit test ( #3201 )
...
* Update test_base_chat.py (#3183 )
* [Bugfix] Fix uninitialized decoded_token and add corresponding unit test.
---------
Co-authored-by: Divano <dddivano@outlook.com >
2025-08-05 10:55:22 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
Jiang-Jia-Jun
0616c208d2
[Feature] Support include_stop_str_in_output in completion api ( #3096 )
...
* [Feature] Support include_stop_str_in_output in completion api
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 22:18:48 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
Sunny-bot1
74aa31d15b
[Feature] support bad_words ( #3055 )
...
* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-30 09:31:29 +08:00
李泳桦
69996a40da
[feat] add disable_chat_template in chat api as a substitute for previous raw_request ( #3020 )
...
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request
* [fix] pre-commit code check
2025-07-25 20:57:32 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
ltd0924
f935d6f862
[BugFix] fix multinode deployment ( #2977 )
2025-07-24 15:04:04 +08:00
Yzc216
e14587a954
[Feature] multi-source download ( #2986 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
2025-07-24 14:26:37 +08:00
李泳桦
2a8a2c06de
[fix] non-streaming api now returns full output ids if return_token_ids is enabled ( #2951 )
2025-07-22 14:35:56 +08:00
Jiang-Jia-Jun
56102e91e1
[Polish] Return error message of raw_request ( #2946 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-22 10:21:32 +08:00
李泳桦
8a619e9db5
[Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body ( #2940 )
...
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body
* [fix] return_token_ids not working in curl request
* [test] improve some test cases of return_token_ids and prompt_token_ids
* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
Yuanle Liu
2f74e93d7e
use dist.all_reduce(min) to sync num_blocks_local ( #2933 )
...
* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
lizexu123
67990e0572
[Feature] support min_p_sampling ( #2872 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fastdeploy support min_p
* add test_min_p
* fix
* min_p_sampling
* update
* delete vl_gpu_model_runner.py
* fix
* Align usage of min_p with vLLM
* fix
* modified unit test
* fix test_min_sampling
* pre-commit all files
* fix
* fix
* fix
* fix xpu_model_runner.py
2025-07-20 23:17:59 -07:00
ltd0924
cc4cec0a74
Update engine_client.py ( #2931 )
2025-07-21 11:42:16 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun
fbe3547c95
[Feature] Support include_stop_str_in_output in chat/completion ( #2910 )
...
* [Feature] Support include_stop_str_in_output in chat/completion
* Add ci test for include_stop_str_in_output
* Update version of openai
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-18 16:59:18 +08:00
sg263
e679567d59
[Trace]fix opentelemetry can not work in uvicorn ( #2906 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
* fix opentelemetry can not work in uvicorn
* move conf to env
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 23:16:45 +08:00
Jiang-Jia-Jun
31cab9f87b
Update test_openai.py
2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun
d3dfa1446c
Update test_openai.py
2025-07-17 16:07:07 +08:00
ltd0924
9c25dcca0b
[LLM] Update Multinode Deployment ( #2830 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] fix ci bugs
* Update fastdeploy/engine/args_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [LLM] update random port
* [LLM] update random port
* [LLM] fix ci bugs
* fix ci bugs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-07-16 23:42:54 +08:00
ltd0924
d245d1ca6c
[LLM] support send batch data and aggregate data ( #2860 )
...
* [LLM] support send batch data and aggregate data
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] update
2025-07-16 23:42:20 +08:00
sg263
42b80182e0
[Trace] add opentelemetry ( #2852 )
...
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-16 15:33:25 +08:00
lddfym
ece88596ed
fix spelling error ( #2827 )
2025-07-14 13:12:57 +08:00
zhenwenDang
d48c03413f
Feature/logprob bug fix ( #2817 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens
* Prevent response_logprobs.logprob_token_ids[0] from going out of bounds
2025-07-12 16:48:51 +08:00
lddfym
b5e4288704
Global scheduler supports configuring hot updates ( #2807 )
...
* Check if the controller port is available
* Global scheduler supports configuring hot updates
* add interface: /controller/scheduler
* add interface: /controller/scheduler
2025-07-11 13:38:07 +08:00
chen
d33105baeb
[Feature] Online Chat API Support Return logprobs ( #2777 )
...
* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform
2025-07-10 16:33:40 +08:00
Sunny-bot1
e45050cae3
[Feature] support top_k_top_p sampling ( #2753 )
...
* support top_k_top_p sampling
* fix
* add api param
* add api para
* fix
* fix
* fix
* fix
* fix
* fix
* fix
2025-07-09 20:58:58 -07:00
lddfym
4e293e50fa
Check if the controller port is available ( #2724 )
2025-07-07 13:24:55 +08:00
ltd0924
68b4755587
[LLM] support multi node deploy ( #2708 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy
* Update engine.py
* fix bugs
* fix
* [LLM] support multi node deploy
* [LLM] support multi node deploy
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-06 10:33:51 +08:00
ltd0924
87e638498c
[RL] update reschedule finish reason ( #2709 )
2025-07-04 13:47:36 +08:00
Jiang-Jia-Jun
05c670e593
[Sync] Update to latest code ( #2679 )
...
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-03 15:43:53 +08:00
ltd0924
50aa4080c0
[Serving] fix offline inference sampling parameters overwrite ( #2654 )
2025-07-01 10:17:46 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72
[LLM] First commit the llm deployment code
2025-06-09 19:20:15 +08:00