* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Fix bug for memory allocation
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
* optimize prefix cache
* optimize prefix cache
* optimize prefix cache
* pre commit format
* pre commit format
* pre commit format
* Update cache_messager.py
* Support external module
* Support external module
* Support external module
* Support external module
* refactor code to make it more clear
* refactor code to make it more clear
* refactor code to make it more clear
* refactor code to make it more clear
* fix according to review
* fix according to review
* fix according to review
* fix according to review
* fix according to review
* fix according to review
* fix bug
* fix bug
* fix bug
* merge
---------
Co-authored-by: root <root@tjdm-inf-sci-k8s-hzz2-h12ni8-0202.tjdm.baidu.com>
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
* support repetition early stop and support user to set the parameter
* remove log
* fix codestyle
* add the early_stop_config to rollout_config
* update config and EarlyStopper class
* fix the bug for triton
* modify the stop method
* update description
* modify the usage for stop_flags
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com>