kxz2002
9a640b3d6b
[BugFix] unify max_tokens ( #4968 ) ( #5119 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* unify max tokens
* modify and add unit test
* modify and add unit test
* modify and add unit tests
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-11-19 19:05:03 +08:00
kxz2002
0a6981f928
[BugFix] Fix inference_start_time ( #4922 ) ( #4930 )
...
* fix inference_start_time
* fix inference_start_time
2025-11-10 21:07:52 +08:00
luukunn
7df7035055
【DataProcessor】add options thinking_mode ( #4735 ) ( #4759 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* add thinking_mode
* add thinking_mode
* add thinking_mode
* add thinking_mode
* add thinking_mode
* add thinking_mode
* add unit test
2025-11-03 18:14:39 +08:00
kxz2002
24b85b752b
[Cherry-Pick] Unify the registration name recognition for tool_parser and reasoning_parser to “-” ( #4668 ) ( #4737 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] add a new reasoning parser (#4571 )
* add new reasoning_parser initial commit
* add parser file content
* add register
* ernie_test_reasoning_parser
* support <tool_call> token and add tool_parser
* add and fix unit tests
* modify reasoning_parser
* modify reasoning parser and tool parser
* modify unit tests
* modify reasoning_parser and tool_parser
* modify unit tests
* fix tool_parser
* modify the logic of reasoning_parser and tool_parser
* add and modify unit tests
* standardize code style
* simplify reasoning_parser and tool_parser
* modify unit test
* [BugFix] Fix finish reason in _create_chat_completion_choice (#4582 )
* fix n_param _create_chat_completion_choicel
* fix unit test
* fix final_res
* modify unit tests
* [BugFix] fix offline llm chat "enable_thinking" is always "False" (#4686 )
* fix enable_thinking
* recover ernie4_5_vl_processor
* [Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668 )
* parser register name unify
* change ernie_x1 to ernie-x1
* change ernie4_5_vl to ernie-45-vl
* fix unit test
2025-10-31 23:27:21 +08:00
ApplEOFDiscord
52a6e0be41
[Cherry-Pick] add mm token usage ( #4648 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] add mm token usage (#4570 )
* add mm token usage
* fix unit test
* fix unit test
* fix unit test
* fix model path
* fix unit test
* fix unit test
* fix unit test
* remove uncomment
* change var name
* fix code style
* fix code style
* fix code style
* fix code style
* fix unit test
* update doc
* update doc
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-30 09:58:07 +08:00
kxz2002
895ca7694e
[Feature] add a new reasoning parser ( #4571 ) ( #4664 )
...
* add new reasoning_parser initial commit
* add parser file content
* add register
* ernie_test_reasoning_parser
* support <tool_call> token and add tool_parser
* add and fix unit tests
* modify reasoning_parser
* modify reasoning parser and tool parser
* modify unit tests
* modify reasoning_parser and tool_parser
* modify unit tests
* fix tool_parser
* modify the logic of reasoning_parser and tool_parser
* add and modify unit tests
* standardize code style
* simplify reasoning_parser and tool_parser
* modify unit test
2025-10-30 09:49:53 +08:00
xiaolei373
14e7d88ea4
[feature] support reward api ( #4518 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: SunLei <sunlei5788@gmail.com >
2025-10-29 00:20:28 +08:00
李泳桦
a012e3608b
[Feature] support logits processors ( #4515 )
...
* [feat] provide an interface for logits processors and a builtin LogitBiasLogitsProcessor
* [chore] fix code style
* [fix] add unit test & fix existing bugs
* [feat] add engine/worker arg --logits-processors
* [fix] redefine user args as logits_processors_args and fix some bugs
* [fix] fix test_sampler
* Update fastdeploy/model_executor/logits_processor/builtin.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/model_executor/logits_processor/__init__.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/model_executor/test_logits_processor.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [fix] fix typo
* Update fastdeploy/engine/sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [fix] fix bracelet
* [chore] redefine logits processor interface: pass the entire share_inputs into LP, do not copy share_inputs and logits
* [doc] add docs
* [fix] fix logit bias processor not applied when decoding is too fast & add docs and tests
* [fix] fix redundant code
* [feat] skip apply() if no bias is specified
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-10-29 00:08:53 +08:00
SunLei
2a9ed72533
feat: add support for API usage with multimodal models ( #4548 )
...
* feat: add support for API usage with multimodal models
* completion_tokens contains num_image_tokens
* remove test_request.py
* fix: paddle.device.is_compiled_with_cuda()
* fix test_unstream_without_logprobs
2025-10-28 20:23:46 +08:00
Daci
6426414a0f
[Feature] EngineWorkerQueue anonymous port ( #4597 )
...
* EngineWorkerQueue 支持匿名端口设置
* EngineWorkerQueue 支持匿名端口设置
* EngineWorkerQueue 支持匿名端口设置
* EngineWorkerQueue 支持匿名端口设置
* EngineWorkerQueue 支持匿名端口设置
2025-10-28 10:22:37 +08:00
kevin
8aab4e367f
[Feature] mm support prefix cache ( #4134 )
...
* support mm prefix caching
* update code
* fix mm_hashes
* support encoder cache
* add encoder cache
* update code
* update encoder cache
* fix features bug
* fix worker bug
* support processor cache, need to optimize yet
* refactor multimodal data cache
* update code
* update code
* update v1 scheduler
* update code
* update code
* update codestyle
* support turn off processor cache and encoder cache
* update pre-commit
* fix code
* solve review
* update code
* update code
* update test case
* set processor cache in GiB
* update test case
* support mm prefix caching for qwen model
* fix code style check
* update pre-commit
* fix unit test
* fix unit test
* add ci test case
* fix rescheduled bug
* change text_after_process to prompt_tokens
* fix unit test
* fix chat template
* change model path
* [EP] fix adapter bugs (#4572 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* fix v1 hang bug (#4573 )
* fix import image_ops error on some platforms (#4559 )
* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )
* add collect-env
* del files
* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578 )
* add new branch for sot
* reorder
* fix batch bug
* [XPU]Moe uses a new operator (#4585 )
* [XPU]Moe uses a new operator
* [XPU]Moe uses a new operator
* update response
* [Feature] Support Paddle-OCR (#4396 )
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
* [DataProcessor] add reasoning_tokens into usage info (#4520 )
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
* perf: Optimize task queue communication from engine to worker (#4531 )
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clean up ports after processing results (#4587 )
* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593 )
* [Others] api server exits when worker process is dead (#3271 )
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
* update docs
---------
Co-authored-by: ApplEOFDiscord <wwy640130@163.com >
Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com >
Co-authored-by: Ryan <zihaohuang@aliyun.com >
Co-authored-by: yyssys <atyangshuang@foxmail.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com >
Co-authored-by: SunLei <sunlei5788@gmail.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com >
2025-10-27 17:39:51 +08:00
李泳桦
cdc40cdc2a
[Others] api server exits when worker process is dead ( #3271 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
2025-10-27 10:23:48 +08:00
kxz2002
327fa4c255
[DataProcessor] add reasoning_tokens into usage info ( #4520 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
2025-10-25 16:57:58 +08:00
ming1753
e4e3cede7f
[Feature] Support Paddle-OCR ( #4396 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
2025-10-24 23:34:30 +08:00
RichardWooSJTU
5a8c60454e
[BugFix] Fix decode_type which has been deleted in req and optimize token client retry scheme ( #4564 )
2025-10-23 05:08:10 -07:00
luukunn
bbf06b9ff7
[BugFix]Fix finish reason ( #4543 )
...
* fix finish reason
* add unit test
* add unit test
* fix unie test
* fix unit test
2025-10-23 14:04:43 +08:00
SunLei
809c1ac7ec
feat: add post-processing step for pool_output ( #4462 )
...
* feat: add post-processing step for pool_output
* bugfix
* fix: test_serving_embedding
* fix test_request_to_batch_dicts
* fix: code style
2025-10-21 20:24:26 +08:00
SunLei
ee915220bd
[Speculative Decoding] Add draft_logprobs Support for Speculative Decode MTP ( #4467 )
...
* feat: add draft_logprobs for Speculative Decode MTP
* feat: add draft_logprobs for Speculative Decode MTP
* feat: add draft_logprobs for Speculative Decode MTP
* fix: postprocess for speculative decode
* test: test_speculative_decoding_use_logprobs
* fix: test_completion_echo
* fix test_max_streaming_tokens
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-21 14:57:50 +08:00
ltd0924
a498736af5
[APIServer] support define gunicorn timeout ( #4496 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [BUGFIX] clear request #4286
* [BugFix] support define gunicorn timeout
* Update utils.py
* Update utils.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-20 23:36:07 +08:00
Yuanle Liu
cef3164c3b
Optimizing the performance of think length limit using custom operators ( #4279 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete impl
* delete min_length&max_length
* support limit thinking content strategy
* fix
* fix
* fix
* update
* fix set_value_by_flags_and_idx
* fix
* fix
* fix
* fix
* update
* fix
* fix
* fix typo
* fix ci
* fix
* fix
* support mtp
* fix
* fix
* update
* update
2025-10-20 21:09:13 +08:00
kxz2002
b5b993e48e
【feature】support n parameter ( #4273 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support n parameter
* pre-commit check
* pre-commit check
* restore format_and_add_data
* update n_param
* bug fix index - str to int
* bug fix del child_task
* bug fix metrics
* add debug info
* add debug info2
* remove debug info
* change connecting symbol to '-'
* bugfix change connecting symbol
* bugfix change connecting symbol2
* unit tests fix
* unit test fix2
* unittest add param n=2
* n param add unit tests and adapt to echo
* pre-commit fix
* resolve review
* adjust stop reason
* add unittest for _create_chat_completion_choice
* modify unittest
* solve confict
* solve conflict
* resolve conflict
---------
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: gaoziyuan <m13689897706@163.com >
2025-10-17 20:51:59 +08:00
LiqinruiG
4251ac5e95
【Fix】 remove text_after_process & raw_prediction ( #4421 )
...
* remove text_after_process & raw_prediction
* remove text_after_process & raw_prediction
2025-10-16 19:00:18 +08:00
SunLei
b4b579a7ed
Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. ( #4344 )
...
* feat: add OpenAIServing
* feat: add ZmqOpenAIServing & OpenAIServingEmbedding
* feat: Refine the basic ServingEngine class and introduce ServingContext
* fix: codestyle
* fix: request
* fix: pooling_params
* feat: _process_chat_template_kwargs
* feat: support batch request
* feat: pooling_params verify & default parameters
---------
Co-authored-by: sunlei1024 <sunlei1024@example.com >
2025-10-15 19:42:59 +08:00
ltd0924
fbdb056de0
[BUGFIX] clear request #4286 ( #4402 )
...
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-15 17:43:28 +08:00
ltd0924
d8841b7b40
[BugFix] fix workers=1 ( #4364 )
...
* [Feature] support prefix cache in DP
* fix
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* [BugFix] fix workers more than 1
* fix
* Update api_server.py
* fix
* Update api_server.py
* fix
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-15 17:06:25 +08:00
qwes5s5
abb62624b8
[fix] Fixed the issue of excessive/redundant spans being returned for streaming requests. ( #4375 )
...
* fix stream span
* fix stream span
2025-10-15 11:47:47 +08:00
ltd0924
28d1b6cd97
[BugFix] fix multinode bugs ( #4377 )
...
* [BugFix] fix multinode bugs
* Update test_config.py
* Update test_config.py
* Update test_config.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-15 11:43:39 +08:00
李泳桦
6265f4385f
[feat] support prefix cache clearing when /clear_load_weight is called ( #4008 )
...
* [feat] support clearing prefix cache (cherry-picked from release/2.1)
* [fix] fix ipc suffix, use port instead
* [fix] fix prefix caching not enabled
* [fix] fix key/value_cache_scales indent
* [fix] fix ep group all-reduce
* [fix] fix clear/update lock not working when workers > 1
* [chore] add preemption triggered info log
* [fix] fix code style
* [fix] fix max_num_seqs config
* [fix] do not force enable_prefix_caching=False in dynamic loading
* [fix] fix ci
* Revert "[fix] fix ci"
This reverts commit 0bc6d55cc8 .
* [fix] initialize available_gpu_block_num with max_gpu_block_num
* [fix] fix config splitwise_role
* [fix] fix clearing caches synchronization and add more logs
* [chore] print cache_ready_signal in log
* [fix] fix scheduler_config.splitwise_role
* [fix] fix cache_messager cache_ready_signal create=True
* [fix] stop cache messager from launching in mixed deployment
2025-09-28 19:42:53 +08:00
xiaolei373
55124f8491
Add cli run batch ( #4237 )
...
* feat(log):add_request_and_response_log
* [cli] add run batch cli
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-26 14:27:25 +08:00
zhuzixuan
12a3587cca
[Supplements and upgrades]Improvement of X1 parsers ( #4172 )
...
* reasoning_parser
* reasoning_parser
* reasoning_parser
* reasoning_parser
* reasoning_parser
* reasoning_parser
* reasoning_parser
2025-09-26 13:37:37 +08:00
memoryCoderC
4ec00df2b0
[Feature] add config api ( #4254 )
2025-09-26 11:21:02 +08:00
memoryCoderC
8b0ce8e3ab
[Feature] add cli command serve ( #4226 )
2025-09-24 14:50:45 +08:00
ltd0924
83720da79f
[Feature] support clear data ( #3601 )
...
* [Feature] support clear data
* update
* fix
* fix
* fix
* fix
* fix
* fix
* fix
2025-09-23 10:20:02 +08:00
luukunn
ee9d8a840a
[fix]Modify follow-up push parameters and Modify the verification method for thinking length ( #4086 )
...
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* add completion_token_ids
* add logger
* fix reasoning_max_tokens ParameterError
* add unittest
* add unittest
* add unittest
* add unittest
* add unittest
* add unit test
2025-09-19 14:26:01 +08:00
xiaolei373
ddf5606263
Bugfix test exception ( #4171 )
...
* feat(log):add_request_and_response_log
* modify default error type
2025-09-19 11:48:49 +08:00
xiaolei373
98447beb4d
Add param valid log ( #4113 )
...
* feat(log):add_request_and_response_log
* [bugfix] add param valid log
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-18 10:39:24 +08:00
Jiang-Jia-Jun
a04365a0c7
Update api_server.py
2025-09-15 21:31:33 +08:00
xiaolei373
9ac539471d
[format] Valid para format error info ( #4035 )
...
* feat(log):add_request_and_response_log
* 报错信息与OpenAI对齐
2025-09-12 19:05:17 +08:00
zhuzixuan
a47976e82d
[Echo] Support more types of prompt echo ( #4022 )
...
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
---------
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com >
2025-09-11 19:34:44 +08:00
ltd0924
684e93269b
[Fix] fix multi api server log dir ( #3967 )
...
* [BugFix] fix max streaming tokens invalid
* fix scheduler bug
* fix scheduler bug
* Update multi_api_server.py
2025-09-10 17:15:30 +08:00
zhuzixuan
83bd55100b
[Optimize]Error messages about Model api. ( #3839 )
...
* add v1/models interface related
* add model parameters
* default model verification
* unit test
* check model err_msg
* unit test
* type annotation
* model parameter in response
* modify document description
* modify document description
* unit test
* verification
* verification update
* model_name
* pre-commit
* update test case
* update test case
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/entrypoints/openai/serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* 优化报错信息。
---------
Co-authored-by: yangzichao01 <yangzichao01@baidu.com >
Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com >
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-09-08 15:52:26 +08:00
ltd0924
0c45e225d3
mv connection_manager init ( #3901 )
...
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-09-05 21:11:48 +08:00
ltd0924
bd7d15f7ea
[Feature] support controller port in multi api server ( #3898 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* Update serving_chat.py
* Update serving_completion.py
* Update serving_completion.py
* Update multi_api_server.py
2025-09-05 17:16:31 +08:00
SunLei
29628de6a7
Support for async processor added. ( #3869 )
...
* Support for async processor added.
* remove yappi code
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-09-04 19:58:53 +08:00
xiaolei373
ed97cf8396
Graceful shut down ( #3785 )
...
* feat(log):add_request_and_response_log
* 优雅退出-接口增加退出时长参数
2025-09-04 19:33:50 +08:00
RichardWooSJTU
f36a388ffe
fix response processsors ( #3826 )
...
* fix response processsors
* fix ci
* fix ut
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-04 16:01:25 +08:00
yyssys
abde903813
Automatically configure workers based on max-num-seqs ( #3846 )
...
Automatically configure workers based on max-num-seqs
2025-09-03 21:12:42 +08:00
luukunn
fc598d4c5a
add reasoning parser plugin ( #3811 )
...
* add reasoning parser plugin
* fix finish reason
2025-09-03 18:31:27 +08:00
ltd0924
2c9b169c0e
[BugFix] fix scheduler invalid ( #3803 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [BugFix] fix max streaming tokens invalid
* fix scheduler bug
* fix scheduler bug
2025-09-02 20:28:51 +08:00
Jiang-Jia-Jun
0e4df5a6f4
[Feature] Setting number of apiserver workers automatically ( #3790 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-09-02 14:17:48 +08:00