* [Feature] add a new reasoning parser (#4571)
* add new reasoning_parser initial commit
* add parser file content
* add register
* ernie_test_reasoning_parser
* support <tool_call> token and add tool_parser
* add and fix unit tests
* modify reasoning_parser
* modify reasoning parser and tool parser
* modify unit tests
* modify reasoning_parser and tool_parser
* modify unit tests
* fix tool_parser
* modify the logic of reasoning_parser and tool_parser
* add and modify unit tests
* standardize code style
* simplify reasoning_parser and tool_parser
* modify unit test
* [BugFix] Fix finish reason in _create_chat_completion_choice (#4582)
* fix n_param _create_chat_completion_choicel
* fix unit test
* fix final_res
* modify unit tests
* [BugFix] fix offline llm chat "enable_thinking" is always "False" (#4686)
* fix enable_thinking
* recover ernie4_5_vl_processor
* [BugFix] Fix ernie4_5_vl_processor.py and qwen_vl_processor.py can not disable thinking (#4762)
* fix ernie4_5_vl_processor.py and qwen_vl_processor.py
* add unit test
* [Feature] add mm token usage (#4570)
* add mm token usage
* fix unit test
* fix unit test
* fix unit test
* fix model path
* fix unit test
* fix unit test
* fix unit test
* remove uncomment
* change var name
* fix code style
* fix code style
* fix code style
* fix code style
* fix unit test
* update doc
* update doc
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [BugFix] qwen2.5vl enable_thinking=true and image_patch_id bug fix
* [Docs]offine infer add apply_chat_template add_generation_prompt parameter
* [Model]qwen2.5VL support --use-cudagraph
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v2
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v3
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v4
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v5
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v6
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v7
* qwen25vl v1 loader
* qwen25vl v1 loader v2
* qwen25vl v1 loader v3
* qwen25vl v1 loader fix tp2 weight PySafeSlice
* qwen25vl v1 loader no test
* qwen25vl v1 loader add unit test
* qwen25vl v1 loader add unit test v2
* qwen25vl v1 loader add torch unit test v3
* qwen25vl v1 loader add torch unit test v4
* qwen25vl v1 loader add torch unit test v5
* qwen25vl v1 loader add torch unit test v6