Commit Graph

115 Commits

Author SHA1 Message Date
SunLei
b4b579a7ed Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. (#4344)
* feat: add OpenAIServing

* feat: add ZmqOpenAIServing & OpenAIServingEmbedding

* feat: Refine the basic ServingEngine class and introduce ServingContext

* fix: codestyle

* fix: request

* fix: pooling_params

* feat: _process_chat_template_kwargs

* feat: support batch request

* feat: pooling_params verify & default parameters

---------

Co-authored-by: sunlei1024 <sunlei1024@example.com>
2025-10-15 19:42:59 +08:00
ltd0924
fbdb056de0 [BUGFIX] clear request #4286 (#4402)
Co-authored-by: ltd0924 <luotingdan@baidu.com>
2025-10-15 17:43:28 +08:00
ltd0924
d8841b7b40 [BugFix] fix workers=1 (#4364)
* [Feature] support prefix cache in DP

* fix

* Update common_engine.py

* Update common_engine.py

* Update common_engine.py

* Update common_engine.py

* [BugFix] fix workers more than 1

* fix

* Update api_server.py

* fix

* Update api_server.py

* fix

---------

Co-authored-by: ltd0924 <luotingdan@baidu.com>
2025-10-15 17:06:25 +08:00
qwes5s5
abb62624b8 [fix] Fixed the issue of excessive/redundant spans being returned for streaming requests. (#4375)
* fix stream span

* fix stream span
2025-10-15 11:47:47 +08:00
ltd0924
28d1b6cd97 [BugFix] fix multinode bugs (#4377)
* [BugFix] fix multinode bugs

* Update test_config.py

* Update test_config.py

* Update test_config.py

---------

Co-authored-by: ltd0924 <luotingdan@baidu.com>
2025-10-15 11:43:39 +08:00
YuanRisheng
a2ec2c4152 [FDConfig]Remove max_model_len in FDConfig (#4350)
* modify max_model_len

* fix unittest

* fix unittest

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-11 14:04:17 +08:00
qwes5s5
6fd3e72da1 [FastDeploy Cli] Bench Command eval and throughput (#4239)
* bench command

* bench command

* bench command

* bench command

* bench command

---------

Co-authored-by: K11OntheBoat <your_email@example.com>
2025-10-10 16:17:44 +08:00
xiaolei373
1282ebe1b1 add_cli_tokenizer (#4278)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-28 20:47:35 +08:00
李泳桦
6265f4385f [feat] support prefix cache clearing when /clear_load_weight is called (#4008)
* [feat] support clearing prefix cache (cherry-picked from release/2.1)

* [fix] fix ipc suffix, use port instead

* [fix] fix prefix caching not enabled

* [fix] fix key/value_cache_scales indent

* [fix] fix ep group all-reduce

* [fix] fix clear/update lock not working when workers > 1

* [chore] add preemption triggered info log

* [fix] fix code style

* [fix] fix max_num_seqs config

* [fix] do not force enable_prefix_caching=False in dynamic loading

* [fix] fix ci

* Revert "[fix] fix ci"

This reverts commit 0bc6d55cc8.

* [fix] initialize available_gpu_block_num with max_gpu_block_num

* [fix] fix config splitwise_role

* [fix] fix clearing caches synchronization and add more logs

* [chore] print cache_ready_signal in log

* [fix] fix scheduler_config.splitwise_role

* [fix] fix cache_messager cache_ready_signal create=True

* [fix] stop cache messager from launching in mixed deployment
2025-09-28 19:42:53 +08:00
xiaolei373
55124f8491 Add cli run batch (#4237)
* feat(log):add_request_and_response_log

* [cli] add run batch cli

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-26 14:27:25 +08:00
zhuzixuan
12a3587cca [Supplements and upgrades]Improvement of X1 parsers (#4172)
* reasoning_parser

* reasoning_parser

* reasoning_parser

* reasoning_parser

* reasoning_parser

* reasoning_parser

* reasoning_parser
2025-09-26 13:37:37 +08:00
memoryCoderC
4ec00df2b0 [Feature] add config api (#4254) 2025-09-26 11:21:02 +08:00
K11OntheBoat
4515ad21e9 Support limit thinking lengths (#4069)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-09-25 19:55:56 +08:00
luukunn
18f4977aec [fix]update apply_chat_template (#4137)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* update apply_chat_template

* fix unittest

* fix unittest

* fix

* fix

* fix unit test

* fix

* fix unit test

* add unit test
2025-09-24 18:56:32 +08:00
memoryCoderC
8b0ce8e3ab [Feature] add cli command serve (#4226) 2025-09-24 14:50:45 +08:00
ltd0924
83720da79f [Feature] support clear data (#3601)
* [Feature] support clear data

* update

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2025-09-23 10:20:02 +08:00
Jiang-Jia-Jun
772f0156f3 Remove useless code (#4195)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-22 21:18:19 +08:00
Zhang Yulong
5532e8a323 [FD CLI] Add bench cli (#4160)
* add bench cli

* Update test_main.py
2025-09-22 20:37:30 +08:00
luukunn
ee9d8a840a [fix]Modify follow-up push parameters and Modify the verification method for thinking length (#4086)
* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* add completion_token_ids

* add logger

* fix reasoning_max_tokens ParameterError

* add unittest

* add unittest

* add unittest

* add unittest

* add unittest

* add unit test
2025-09-19 14:26:01 +08:00
xiaolei373
ddf5606263 Bugfix test exception (#4171)
* feat(log):add_request_and_response_log

* modify default error type
2025-09-19 11:48:49 +08:00
xiaolei373
98447beb4d Add param valid log (#4113)
* feat(log):add_request_and_response_log

* [bugfix] add param valid log

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-18 10:39:24 +08:00
chenjian
618ccdbfba [Feature] Support mixed deployment with yiyan adapter in develop (#3976)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] Support mixed deployment with yiyan adapter in release2.2

* fix metrics

* add unit test

* add unit test

* add unit test

* fix ci

* fix for eb5

* fix ci

* fix ci

* fix ci

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-18 01:52:20 +08:00
Jiang-Jia-Jun
a04365a0c7 Update api_server.py 2025-09-15 21:31:33 +08:00
qwes5s5
553adb299e 【FastDeploy CLI】collect-env subcommand (#4044)
* collect-env subcommand

* trigger ci

---------

Co-authored-by: K11OntheBoat <your_email@example.com>
2025-09-15 10:31:23 +08:00
zhouchong
958abebeab Support offline inference with streaming output (#4071)
* Support offline inference with streaming output

* add unit test

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-15 10:27:03 +08:00
xiaolei373
9ac539471d [format] Valid para format error info (#4035)
* feat(log):add_request_and_response_log

* 报错信息与OpenAI对齐
2025-09-12 19:05:17 +08:00
co63oc
8466219ec8 fix typos (#3840)
* fix typos

* ci

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-12 11:04:38 +08:00
memoryCoderC
850465e8ed [Feature] add cli command chat,complete (#4037) 2025-09-11 19:53:14 +08:00
zhuzixuan
a47976e82d [Echo] Support more types of prompt echo (#4022)
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

---------

Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com>
2025-09-11 19:34:44 +08:00
ltd0924
684e93269b [Fix] fix multi api server log dir (#3967)
* [BugFix] fix max streaming tokens invalid

* fix scheduler bug

* fix scheduler bug

* Update multi_api_server.py
2025-09-10 17:15:30 +08:00
zhuzixuan
83bd55100b [Optimize]Error messages about Model api. (#3839)
* add v1/models interface related

* add model parameters

* default model verification

* unit test

* check model err_msg

* unit test

* type annotation

* model parameter in response

* modify document description

* modify document description

* unit test

* verification

* verification update

* model_name

* pre-commit

* update test case

* update test case

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/entrypoints/openai/serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* 优化报错信息。

---------

Co-authored-by: yangzichao01 <yangzichao01@baidu.com>
Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com>
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-08 15:52:26 +08:00
ApplEOFDiscord
b649494655 [Feature] add HTTP GET retry (#3838)
* add http get retry

* fix coments

---------

Co-authored-by: zhangjunjun04 <zhangjunjun04@baidu.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-08 10:11:14 +08:00
ltd0924
0c45e225d3 mv connection_manager init (#3901)
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-05 21:11:48 +08:00
ltd0924
bd7d15f7ea [Feature] support controller port in multi api server (#3898)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py

* Update multi_api_server.py
2025-09-05 17:16:31 +08:00
SunLei
29628de6a7 Support for async processor added. (#3869)
* Support for async processor added.

* remove yappi code

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-04 19:58:53 +08:00
xiaolei373
ed97cf8396 Graceful shut down (#3785)
* feat(log):add_request_and_response_log

* 优雅退出-接口增加退出时长参数
2025-09-04 19:33:50 +08:00
RichardWooSJTU
f36a388ffe fix response processsors (#3826)
* fix response processsors

* fix ci

* fix ut

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-04 16:01:25 +08:00
yyssys
abde903813 Automatically configure workers based on max-num-seqs (#3846)
Automatically configure workers based on max-num-seqs
2025-09-03 21:12:42 +08:00
luukunn
fc598d4c5a add reasoning parser plugin (#3811)
* add reasoning parser plugin

* fix finish reason
2025-09-03 18:31:27 +08:00
ltd0924
2c9b169c0e [BugFix] fix scheduler invalid (#3803)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [BugFix] fix max streaming tokens invalid

* fix scheduler bug

* fix scheduler bug
2025-09-02 20:28:51 +08:00
kevin
1908465542 [Feature] mm and thinking model support structred output (#2749)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* mm support structured output

* update code

* update code

* update format

* update code

* update code

* add enable_thinking default

* update code

* add structured_outputs test case

* add ci install xgrammar

* add ci timeout time

* update test for structured_outputs

* update code

* add error traceback info

* update error msg

* update structred output code

* update code

* update code

* update config

* update torch version

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-02 16:21:09 +08:00
Jiang-Jia-Jun
0e4df5a6f4 [Feature] Setting number of apiserver workers automatically (#3790)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-09-02 14:17:48 +08:00
ltd0924
bf0cf5167a [BugFix] fix max streaming tokens invalid (#3789) 2025-09-02 13:57:32 +08:00
co63oc
d6369b4d51 fix typos (#3684) 2025-09-01 17:50:17 +08:00
SunLei
b9af95cf1c [Feature] Add AsyncTokenizerClient&ChatResponseProcessor with remote encode&decode support. (#3674)
* [Feature] add AsyncTokenizerClient

* add decode_image

* Add response_processors with remote decode support.

* [Feature] add tokenizer_base_url startup argument

* Revert comment removal and restore original content.

* [Feature] Non-streaming requests now support remote image decoding.

* Fix parameter type issue in decode_image call.

* Keep completion_token_ids when return_token_ids = False.

* add copyright
2025-08-30 17:06:26 +08:00
luukunn
9a7c231f2c [Feature]support chat_template.jinja (#3721)
* add support chat_template.jinja

* add support chat_template.jinja
2025-08-30 17:05:34 +08:00
李泳桦
88297240e7 [feat] completion api supports passing input token ids in either prompt or prompt_token_ids (#3311)
* [feat] completion api supports passing input token ids in either `prompt` or `prompt_token_ids`

* [fix] update comment

* [fix] fix type error

* [test] add a unittest file for serving api test

* [test] try to fix ci error

* [chore] rename test function names

* [test] try to fix ci error

* [test] try to fix ci error

* [test] add tests for qwen
2025-08-29 14:19:42 +08:00
Yuanle Liu
4957908275 add input_processor plugin (#3657)
* add input_processor plugin

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2025-08-28 22:53:57 +08:00
ltd0924
e5015eea05 [BugFix] fix logger (#3666) 2025-08-28 17:08:00 +08:00
ltd0924
3d92fb09f7 [BugFix] fix parameter is 0 (#3592)
* Update engine_client.py

* fix

* Update common_engine.py
2025-08-28 09:52:36 +08:00