Commit Graph

25 Commits

Author SHA1 Message Date
luukunn
132a8ef425 Release/2.1 (#3414)
* Pre ce modified (#3335) (#3360)

* Pre ce modified (#3335)

* update

* update

* fix

* fix

* update

* update

* update

* fix

* update

* update

* update

* add ut fix pr(3367)

* [Bug Fix] Fix V1 video bug (#3387)

* fix stopseq error info (#3342)

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

* [BugFix] Fix default log level of paddleformers (#3377)

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

* [Polish Code] Remove useless notes

* feat(log):add_request_and_response_log (#3392)

* Optimize CI execution workflow. (#3371) (#3384)

* fix

* [BugFix] fix control signal release failed (#3374)

* [BugFix]

* [BugFix]

* [BugFix]

* [BugFix]

* fix

* fix

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1"

This reverts commit 02596fc537, reversing
changes made to 03347626a6.

* [XPU] Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3393)

* fix v1 schedule oom bug

* fix v1 schedule oom bug

* [BugFix] fix ErnieProcessor not set raw_prediction (#3401)

* [Doc]Release fastdeploy-xpu 2.1.0 (#3407)

* fix v1 schedule oom bug

* fix v1 schedule oom bug

* update release note

* [Doc]Release fastdeploy-xpu 2.0.3  (#3408)

* fix v1 schedule oom bug

* fix v1 schedule oom bug

* update release note

* update info

---------

Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
Co-authored-by: JYChen <zoooo0820@qq.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: xiaolei373 <zley373@gmail.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: yinwei <yinwei_hust@163.com>
Co-authored-by: memoryCoderC <1137889088@qq.com>
2025-08-14 20:53:47 +08:00
luukunn
81092c0fe3 add tool parser 2025-08-13 16:06:22 +08:00
chenjian
25f51b0611 Fix block num in schduelr v1 for release 2.1 (#3315)
* fix bug for scheduler v0

* fix block num setting in scheduler v1 for release 2.1

* fix block num setting in scheduler v1 for release 2.1

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
2025-08-12 00:41:05 +08:00
bukejiyu
db698bda01 qwen loader (#3057) 2025-07-30 19:09:38 +08:00
YuanRisheng
99a70fc722 unify parallel config (#3070) 2025-07-30 11:41:23 +08:00
Zero Rains
b2f9a42d87 [Feature] Support repetition early stop (#3024)
* support repetition early stop and support user to set the parameter

* remove log

* fix codestyle

* add the early_stop_config to rollout_config

* update config and EarlyStopper class

* fix the bug for triton

* modify the stop method

* update description

* modify the usage for stop_flags

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-07-29 22:42:54 +08:00
YuanRisheng
502ee92a0a Unify server-side and model-side Config (Part3) (#3047)
* merge model config

* fix arch

* fix rl
2025-07-29 17:07:44 +08:00
YuanRisheng
bddf403576 Unify server-side and model-side Config (Part2) (#3035)
* merge speculative and graph opt conifg

* add attr
2025-07-28 15:31:48 +08:00
YuanRisheng
6ccc10ad47 Unify server-side and model-side Config (Part1) (#3018)
* move cache config

* fix mtp
2025-07-28 10:51:52 +08:00
Zero Rains
0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
ltd0924
f935d6f862 [BugFix] fix multinode deployment (#2977) 2025-07-24 15:04:04 +08:00
Yzc216
e14587a954 [Feature] multi-source download (#2986)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit
2025-07-24 14:26:37 +08:00
chenjian
85a78d695d [Feature] Support block scheduler v1 for FD (#2928)
* Support FD block scheduler v1

* Support FD block scheduler v1

* Support FD block scheduler v1

* Fix according to copilot review

* Fix according to review

* Remove is_dummy

* Fix bug when real_bsz=1

* Fix infer first token cost time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-23 20:31:31 +08:00
Zero Rains
89a485b69f [Feature] Support using prefix-caching + cudagraph for inference (#2924)
* fix the bug in cudagraph+prefix-caching but still have some bug with profile

Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397

* add the signal to make sure cache manager launched

* fix judge condition

* reomove useless control

* update control stream

* update

* fix xpu

* change the do_profile flag

* update

* add new threads to init cache_manager

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-07-22 00:59:45 -07:00
zhink
0262ef7eb3 custom all reduce support cuda graph (#2938)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag

* rename communication_op to communication
2025-07-21 22:52:03 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ltd0924
9c25dcca0b [LLM] Update Multinode Deployment (#2830)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] fix ci bugs

* Update fastdeploy/engine/args_utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [LLM] update random port

* [LLM] update random port

* [LLM] fix ci bugs

* fix ci bugs

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-16 23:42:54 +08:00
Yuanle Liu
dda4a9f848 rl update (#2861) 2025-07-16 00:33:10 -07:00
RAM
0fad10b35a [Executor] CUDA Graph support padding batch (#2844)
* cuda graph support padding batch

* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.

* Do not insert max_num_seqs when the user specifies a capture list

* Support set graph optimization config from YAML file

* update cuda graph ci

* fix ci bug

* fix ci bug
2025-07-15 19:49:01 -07:00
chen
d33105baeb [Feature] Online Chat API Support Return logprobs (#2777)
* online chat support logprobs

* check xpu

* check vl_gpu_model_runner and xpu_model_runner

* get_worker() check platform
2025-07-10 16:33:40 +08:00
zhink
b89180f1cd [Feature] support custom all-reduce (#2758)
* [Feature] support custom all-reduce

* add vllm adapted
2025-07-09 16:00:27 +08:00
ltd0924
68b4755587 [LLM] support multi node deploy (#2708)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy

* Update engine.py

* fix bugs

* fix

* [LLM] support multi node deploy

* [LLM] support multi node deploy

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-06 10:33:51 +08:00
Jiang-Jia-Jun
05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00