yangjianfengo1
a83a3eea5f
将FLAGS_max_partition_size修改为环境变量获取 ( #2854 )
2025-07-16 14:14:21 +08:00
RAM
0fad10b35a
[Executor] CUDA Graph support padding batch ( #2844 )
...
* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug
2025-07-15 19:49:01 -07:00
Zero Rains
e7bcbbab52
Merge vl execution path into normal execution path ( #2829 )
...
* merge vl model into gpu_model runner
Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6
* fix chinese
Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a
* fix the parse parameter
Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe
* fix the bug in online_inference
Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559
* polish code
Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c
2025-07-15 22:20:03 +08:00
lddfym
b5e4288704
Global scheduler supports configuring hot updates ( #2807 )
...
* Check if the controller port is available
* Global scheduler supports configuring hot updates
* add interface: /controller/scheduler
* add interface: /controller/scheduler
2025-07-11 13:38:07 +08:00
chen
d33105baeb
[Feature] Online Chat API Support Return logprobs ( #2777 )
...
* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform
2025-07-10 16:33:40 +08:00
0x3878f
1d8af7ab73
Add env variable for dy2st ( #2779 )
2025-07-10 11:06:06 +08:00
zhink
b89180f1cd
[Feature] support custom all-reduce ( #2758 )
...
* [Feature] support custom all-reduce
* add vllm adapted
2025-07-09 16:00:27 +08:00
GoldPancake
f7cad30a38
[Feature] Add speculative decoding simulation benchmark. ( #2751 )
...
* Add speculative decoding simulation benchmark
* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
Ryan
f72c4de539
[SOT] Make custom_op dy&st unified ( #2733 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* make_custom_op dy&st unified
* add instance judgement
2025-07-08 19:21:44 +08:00
ltd0924
68b4755587
[LLM] support multi node deploy ( #2708 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy
* Update engine.py
* fix bugs
* fix
* [LLM] support multi node deploy
* [LLM] support multi node deploy
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-06 10:33:51 +08:00
Jiang-Jia-Jun
05c670e593
[Sync] Update to latest code ( #2679 )
...
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72
[LLM] First commit the llm deployment code
2025-06-09 19:20:15 +08:00