kevin
8aab4e367f
[Feature] mm support prefix cache ( #4134 )
...
* support mm prefix caching
* update code
* fix mm_hashes
* support encoder cache
* add encoder cache
* update code
* update encoder cache
* fix features bug
* fix worker bug
* support processor cache, need to optimize yet
* refactor multimodal data cache
* update code
* update code
* update v1 scheduler
* update code
* update code
* update codestyle
* support turn off processor cache and encoder cache
* update pre-commit
* fix code
* solve review
* update code
* update code
* update test case
* set processor cache in GiB
* update test case
* support mm prefix caching for qwen model
* fix code style check
* update pre-commit
* fix unit test
* fix unit test
* add ci test case
* fix rescheduled bug
* change text_after_process to prompt_tokens
* fix unit test
* fix chat template
* change model path
* [EP] fix adapter bugs (#4572 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* fix v1 hang bug (#4573 )
* fix import image_ops error on some platforms (#4559 )
* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )
* add collect-env
* del files
* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578 )
* add new branch for sot
* reorder
* fix batch bug
* [XPU]Moe uses a new operator (#4585 )
* [XPU]Moe uses a new operator
* [XPU]Moe uses a new operator
* update response
* [Feature] Support Paddle-OCR (#4396 )
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
* [DataProcessor] add reasoning_tokens into usage info (#4520 )
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
* perf: Optimize task queue communication from engine to worker (#4531 )
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clean up ports after processing results (#4587 )
* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593 )
* [Others] api server exits when worker process is dead (#3271 )
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
* update docs
---------
Co-authored-by: ApplEOFDiscord <wwy640130@163.com >
Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com >
Co-authored-by: Ryan <zihaohuang@aliyun.com >
Co-authored-by: yyssys <atyangshuang@foxmail.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com >
Co-authored-by: SunLei <sunlei5788@gmail.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com >
2025-10-27 17:39:51 +08:00
ltd0924
fb76cdfb4f
[Fearture] Support mm model close prefix cache ( #4459 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support prefix cache in DP
* fix
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* [BugFix] fix workers more than 1
* fix
* Update api_server.py
* fix
* Update api_server.py
* fix
* [Fearture] Support mm model close prefix cache
* Update api_server.py
* Update engine_client.py
* Update engine_client.py
* add test
* Update test_chat.py
* fix
* fix
* Update test_chat.py
* Update test_chat.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-21 15:37:59 +08:00
Yuanle Liu
cef3164c3b
Optimizing the performance of think length limit using custom operators ( #4279 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete impl
* delete min_length&max_length
* support limit thinking content strategy
* fix
* fix
* fix
* update
* fix set_value_by_flags_and_idx
* fix
* fix
* fix
* fix
* update
* fix
* fix
* fix typo
* fix ci
* fix
* fix
* support mtp
* fix
* fix
* update
* update
2025-10-20 21:09:13 +08:00
zhuzixuan
1e59905e34
Optimization of ‘tools’ in request fields ( #4380 )
...
* Remove multiple 'tools'
* Remove multiple 'tools'
* Remove multiple 'tools'
* Remove multiple 'tools'
2025-10-20 11:04:08 +08:00
kxz2002
b5b993e48e
【feature】support n parameter ( #4273 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support n parameter
* pre-commit check
* pre-commit check
* restore format_and_add_data
* update n_param
* bug fix index - str to int
* bug fix del child_task
* bug fix metrics
* add debug info
* add debug info2
* remove debug info
* change connecting symbol to '-'
* bugfix change connecting symbol
* bugfix change connecting symbol2
* unit tests fix
* unit test fix2
* unittest add param n=2
* n param add unit tests and adapt to echo
* pre-commit fix
* resolve review
* adjust stop reason
* add unittest for _create_chat_completion_choice
* modify unittest
* solve confict
* solve conflict
* resolve conflict
---------
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: gaoziyuan <m13689897706@163.com >
2025-10-17 20:51:59 +08:00
SunLei
b4b579a7ed
Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. ( #4344 )
...
* feat: add OpenAIServing
* feat: add ZmqOpenAIServing & OpenAIServingEmbedding
* feat: Refine the basic ServingEngine class and introduce ServingContext
* fix: codestyle
* fix: request
* fix: pooling_params
* feat: _process_chat_template_kwargs
* feat: support batch request
* feat: pooling_params verify & default parameters
---------
Co-authored-by: sunlei1024 <sunlei1024@example.com >
2025-10-15 19:42:59 +08:00
李泳桦
6265f4385f
[feat] support prefix cache clearing when /clear_load_weight is called ( #4008 )
...
* [feat] support clearing prefix cache (cherry-picked from release/2.1)
* [fix] fix ipc suffix, use port instead
* [fix] fix prefix caching not enabled
* [fix] fix key/value_cache_scales indent
* [fix] fix ep group all-reduce
* [fix] fix clear/update lock not working when workers > 1
* [chore] add preemption triggered info log
* [fix] fix code style
* [fix] fix max_num_seqs config
* [fix] do not force enable_prefix_caching=False in dynamic loading
* [fix] fix ci
* Revert "[fix] fix ci"
This reverts commit 0bc6d55cc8 .
* [fix] initialize available_gpu_block_num with max_gpu_block_num
* [fix] fix config splitwise_role
* [fix] fix clearing caches synchronization and add more logs
* [chore] print cache_ready_signal in log
* [fix] fix scheduler_config.splitwise_role
* [fix] fix cache_messager cache_ready_signal create=True
* [fix] stop cache messager from launching in mixed deployment
2025-09-28 19:42:53 +08:00
K11OntheBoat
4515ad21e9
Support limit thinking lengths ( #4069 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-09-25 19:55:56 +08:00
luukunn
18f4977aec
[fix]update apply_chat_template ( #4137 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* update apply_chat_template
* fix unittest
* fix unittest
* fix
* fix
* fix unit test
* fix
* fix unit test
* add unit test
2025-09-24 18:56:32 +08:00
ltd0924
83720da79f
[Feature] support clear data ( #3601 )
...
* [Feature] support clear data
* update
* fix
* fix
* fix
* fix
* fix
* fix
* fix
2025-09-23 10:20:02 +08:00
Jiang-Jia-Jun
772f0156f3
Remove useless code ( #4195 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-22 21:18:19 +08:00
luukunn
ee9d8a840a
[fix]Modify follow-up push parameters and Modify the verification method for thinking length ( #4086 )
...
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* add completion_token_ids
* add logger
* fix reasoning_max_tokens ParameterError
* add unittest
* add unittest
* add unittest
* add unittest
* add unittest
* add unit test
2025-09-19 14:26:01 +08:00
chenjian
618ccdbfba
[Feature] Support mixed deployment with yiyan adapter in develop ( #3976 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] Support mixed deployment with yiyan adapter in release2.2
* fix metrics
* add unit test
* add unit test
* add unit test
* fix ci
* fix for eb5
* fix ci
* fix ci
* fix ci
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-18 01:52:20 +08:00
xiaolei373
9ac539471d
[format] Valid para format error info ( #4035 )
...
* feat(log):add_request_and_response_log
* 报错信息与OpenAI对齐
2025-09-12 19:05:17 +08:00
co63oc
8466219ec8
fix typos ( #3840 )
...
* fix typos
* ci
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-12 11:04:38 +08:00
SunLei
29628de6a7
Support for async processor added. ( #3869 )
...
* Support for async processor added.
* remove yappi code
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-09-04 19:58:53 +08:00
ltd0924
3d92fb09f7
[BugFix] fix parameter is 0 ( #3592 )
...
* Update engine_client.py
* fix
* Update common_engine.py
2025-08-28 09:52:36 +08:00
ltd0924
2974016103
[BugFix] fix ce bugs ( #3641 )
...
* [BugFix] fix tp8 client refuse
* fix engine port bug
* Update utils.py
2025-08-27 20:38:15 +08:00
gaoziyuan
82e64b13e1
[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop ( #3598 )
...
* [Feature] update ep
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix queue ports idx
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* Update engine.py
* fix ci
* fix some bug in mixed ep
* add server fix and op fix
* rm some log
* fix code style
* ltd fix
* fix
* fix
* fix some bug
* fix bug
* fix bug
* fix style
* Update config.py
* Update splitwise_connector.py
* Update cache_messager.py
* Update __init__.py
* merge and fix
* Update engine.py
* Update common_engine.py
* Update run_ci_xpu.sh
* Update ernie_processor.py
* Update ernie_processor.py
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-26 19:59:02 +08:00
YuanRisheng
c389a4013c
Unify server-side and model-side Config(Part-5) ( #3497 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* move config
* fix xpu
* fix
* fix vl
* fix vl
* fix unitest
* fix args
* add unitest
* fix test
2025-08-21 19:00:21 +08:00
ltd0924
51f68ae593
[Feature] add dealer manager to reuse the connection ( #3471 )
...
* [BugFix] fix control signal release failed
* [BugFix] fix control signal release failed
* update
* update
* update
* [Feature] add dealer manager to reuse the connection
* fix
* fix
* fix
* fix
* fix
* fix
* Create test_dealer_connection_manager.py
* Delete test/entrypoints/openai directory
* Update test_dealer_connection_manager.py
* Update test_dealer_connection_manager.py
2025-08-21 13:11:13 +08:00
kevin
67298cf4c0
add error traceback info ( #3419 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info
* update error msg
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-19 19:32:04 +08:00
luukunn
eda83ca672
add Tool Parser ( #3272 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add tool-parser
* add tool-parser
* add tool parser
* add tool parser
* fix
* add offline
* add offline
* fix
* parsers:tool&reasoning
* 修改tool parser名称·
* update
* fix reasoning-parser
* add requirements
* fix finish reason
* fix
* fix reasoning-parser
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: zhuzixuan <zhuzixuan@baidu.com >
2025-08-13 01:06:55 +08:00
ltd0924
31d4fcb425
[BugFix] fix too many open files problem ( #3256 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Update cache_messager.py
* fix too many open files problem
* fix too many open files problem
* fix too many open files problem
* fix ci bugs
* Update api_server.py
* add parameter
* format
* format
* format
* format
* Update parameters.md
* Update parameters.md
* Update serving_completion.py
* Update serving_chat.py
* Update envs.py
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-08 20:10:11 +08:00
JYChen
9423c577fe
[stop_seq] fix out-bound value for stop sequence ( #3216 )
...
* fix out-bound value for stop sequence
* catch error if there are out-of-bounds value
* check in offline mode
* add ut tests
2025-08-07 15:40:21 +08:00
ApplEOFDiscord
b71cbb466d
[Feature] remove dependency on enable_mm and refine multimodal's code ( #3014 )
...
* remove dependency on enable_mm
* fix codestyle check error
* fix codestyle check error
* update docs
* resolve conflicts on model config
* fix unit test error
* fix code style check error
---------
Co-authored-by: shige <1021937542@qq.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-01 20:01:18 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
ltd0924
f935d6f862
[BugFix] fix multinode deployment ( #2977 )
2025-07-24 15:04:04 +08:00
Yuanle Liu
2f74e93d7e
use dist.all_reduce(min) to sync num_blocks_local ( #2933 )
...
* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
ltd0924
cc4cec0a74
Update engine_client.py ( #2931 )
2025-07-21 11:42:16 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72
[LLM] First commit the llm deployment code
2025-06-09 19:20:15 +08:00