[Model] Qwen2.5VL support --use-cudagraph and unit testing (#4087)

* [BugFix] qwen2.5vl enable_thinking=true and image_patch_id bug fix * [Docs]offine infer add apply_chat_template add_generation_prompt parameter * [Model]qwen2.5VL support --use-cudagraph * [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test * [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test * [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v2 * [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v3 * [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v4 * [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v5 * [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v6 * [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v7
2025-12-24 13:28:13 +08:00 · 2025-09-24 19:45:01 +08:00
parent 18f4977aec
commit 5ff10c8ced
5 changed files with 1048 additions and 39 deletions
--- a/docs/offline_inference.md
+++ b/docs/offline_inference.md
@@ -107,7 +107,7 @@ messages = [
     }
 ]

-prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 images, videos = [], []
 for message in messages:
    content = message["content"]