Commit Graph

11 Commits

Author SHA1 Message Date
Zhang Yulong
5151bc92c8 Update benchmark tools (#3004)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* update benchmark tools

* update benchmark tools
2025-07-24 15:19:23 +08:00
xiegegege
e3a843f2c5 [benchmark] add quantization for benchmark yaml (#2995) 2025-07-24 13:26:34 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
RAM
0fad10b35a [Executor] CUDA Graph support padding batch (#2844)
* cuda graph support padding batch

* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.

* Do not insert max_num_seqs when the user specifies a capture list

* Support set graph optimization config from YAML file

* update cuda graph ci

* fix ci bug

* fix ci bug
2025-07-15 19:49:01 -07:00
ophilia-lee
33db137d0b 新增vLLM默认请求参数yaml 2025-07-15 19:31:27 +08:00
lijingning
9d6a42b334 适配vLLM无arrival_time;适配vLLM model必传;RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no 2025-07-15 19:31:27 +08:00
GoldPancake
f7cad30a38 [Feature] Add speculative decoding simulation benchmark. (#2751)
* Add speculative decoding simulation benchmark

* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
Divano
050d9658a5 Update requirements.txt 2025-07-04 09:53:03 +08:00
Divano
be5cabaf80 add quick benchmark (#2703)
测试脚本不需要过CI
2025-07-04 09:32:36 +08:00
Zhang Yulong
264ddfdf8a Update README.md 2025-06-30 10:28:15 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00