kxz2002
1689f7ef86
[Cherry-Pick] Fix ernie4_5_vl_processor.py and qwen_vl_processor.py can not disable thinking ( #4762 ) ( #4798 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] add a new reasoning parser (#4571 )
* add new reasoning_parser initial commit
* add parser file content
* add register
* ernie_test_reasoning_parser
* support <tool_call> token and add tool_parser
* add and fix unit tests
* modify reasoning_parser
* modify reasoning parser and tool parser
* modify unit tests
* modify reasoning_parser and tool_parser
* modify unit tests
* fix tool_parser
* modify the logic of reasoning_parser and tool_parser
* add and modify unit tests
* standardize code style
* simplify reasoning_parser and tool_parser
* modify unit test
* [BugFix] Fix finish reason in _create_chat_completion_choice (#4582 )
* fix n_param _create_chat_completion_choicel
* fix unit test
* fix final_res
* modify unit tests
* [BugFix] fix offline llm chat "enable_thinking" is always "False" (#4686 )
* fix enable_thinking
* recover ernie4_5_vl_processor
* [BugFix] Fix ernie4_5_vl_processor.py and qwen_vl_processor.py can not disable thinking (#4762 )
* fix ernie4_5_vl_processor.py and qwen_vl_processor.py
* add unit test
2025-11-05 10:59:47 +08:00
kevin
8aab4e367f
[Feature] mm support prefix cache ( #4134 )
...
* support mm prefix caching
* update code
* fix mm_hashes
* support encoder cache
* add encoder cache
* update code
* update encoder cache
* fix features bug
* fix worker bug
* support processor cache, need to optimize yet
* refactor multimodal data cache
* update code
* update code
* update v1 scheduler
* update code
* update code
* update codestyle
* support turn off processor cache and encoder cache
* update pre-commit
* fix code
* solve review
* update code
* update code
* update test case
* set processor cache in GiB
* update test case
* support mm prefix caching for qwen model
* fix code style check
* update pre-commit
* fix unit test
* fix unit test
* add ci test case
* fix rescheduled bug
* change text_after_process to prompt_tokens
* fix unit test
* fix chat template
* change model path
* [EP] fix adapter bugs (#4572 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* fix v1 hang bug (#4573 )
* fix import image_ops error on some platforms (#4559 )
* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )
* add collect-env
* del files
* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578 )
* add new branch for sot
* reorder
* fix batch bug
* [XPU]Moe uses a new operator (#4585 )
* [XPU]Moe uses a new operator
* [XPU]Moe uses a new operator
* update response
* [Feature] Support Paddle-OCR (#4396 )
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
* [DataProcessor] add reasoning_tokens into usage info (#4520 )
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
* perf: Optimize task queue communication from engine to worker (#4531 )
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clean up ports after processing results (#4587 )
* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593 )
* [Others] api server exits when worker process is dead (#3271 )
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
* update docs
---------
Co-authored-by: ApplEOFDiscord <wwy640130@163.com >
Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com >
Co-authored-by: Ryan <zihaohuang@aliyun.com >
Co-authored-by: yyssys <atyangshuang@foxmail.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com >
Co-authored-by: SunLei <sunlei5788@gmail.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com >
2025-10-27 17:39:51 +08:00
LiqinruiG
4251ac5e95
【Fix】 remove text_after_process & raw_prediction ( #4421 )
...
* remove text_after_process & raw_prediction
* remove text_after_process & raw_prediction
2025-10-16 19:00:18 +08:00
luukunn
ee9d8a840a
[fix]Modify follow-up push parameters and Modify the verification method for thinking length ( #4086 )
...
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* add completion_token_ids
* add logger
* fix reasoning_max_tokens ParameterError
* add unittest
* add unittest
* add unittest
* add unittest
* add unittest
* add unit test
2025-09-19 14:26:01 +08:00
lddfym
2056a428bd
[bug fix] Fix the placeholder in qwen prompt and add some unittests ( #4065 )
...
* fix the placeholder in qwen prompt
* fix the placeholder in qwen prompt
* add soem unittests for qwen_vl_processor
2025-09-11 20:00:02 +08:00
Yuanle Liu
cbce94a00e
rename ernie_xxx to ernie4_5_xxx ( #3621 )
...
* rename ernie_xxx to ernie4_5_xxx
* ci fix
2025-08-26 19:29:27 +08:00
lddfym
27666ee586
[Feature] Add Qwen25-VL Processor ( #3501 )
...
* add qwen-2.5-vl processor
* add qwen25-vl processor
* add qwen25-vl processor
* add qwen25-vl processor
* add qwen25-vl processor position_ids
* add qwen25-vl processor
* add qwen25-vl processor
* position_ids
* add test for qwen25-vl
* organize comments
* formatted
* qwen_vl_processor
* add qwen_vl_processor unittest
* update model path
* update model path
* update qwen_vl_processor unittest
* add unittest and bug fix
* add unittest and bug fix
* Update fastdeploy/input/qwen_mm_processor/image_processor.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/input/qwen_vl_processor.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-08-22 16:49:42 +08:00