Commit Graph

14 Commits

Author SHA1 Message Date
Divano
c1aa66df02 Revert "[Optim] Remove limitation of number of kvcache blocks (#5612)" (#5702)
This reverts commit 9da89a374b.
2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun
9da89a374b [Optim] Remove limitation of number of kvcache blocks (#5612)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optim] Remove limitation of number of kvcache blocks

* Update fastdeploy/envs.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/worker/iluvatar_worker.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add docs

* Update fastdeploy/worker/worker_process.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix ci case

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-23 11:18:29 +08:00
yzwu
4b661512ca [Iluvatar GPU] Adapt VL model (#4313) 2025-10-17 16:13:38 +08:00
YuanRisheng
0355235fb9 [FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400)
* delete some attr in parallel config

* delete comment

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
2025-10-16 20:00:37 +08:00
co63oc
d6369b4d51 fix typos (#3684) 2025-09-01 17:50:17 +08:00
yzwu
ce9180241e [Iluvatar GPU] Modify the names of some variables (#3273) 2025-08-13 11:38:02 +08:00
yzwu
fbdd6b0663 [Iluvatar GPU] Optimze attention and moe performance (#3234) 2025-08-08 10:51:24 +08:00
lizexu123
afff4d37ea [Feature] support seed parameter (#3161)
* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix
2025-08-06 15:20:47 +08:00
lizexu123
b01cfd6007 [BugFix] support real batch_size (#3109)
* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
2025-08-05 16:33:54 +08:00
Ryan
73cfe1fd37 [SOT] Extend SOT warmup support to new hardware (#3032)
* add new hardware

* add_sot_warmup4new_hardware

* fix conflict

* rm Optional
2025-07-29 22:45:20 +08:00
ltd0924
3792345c3a [LLM] update function name (#2985)
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Zero Rains
89a485b69f [Feature] Support using prefix-caching + cudagraph for inference (#2924)
* fix the bug in cudagraph+prefix-caching but still have some bug with profile

Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397

* add the signal to make sure cache manager launched

* fix judge condition

* reomove useless control

* update control stream

* update

* fix xpu

* change the do_profile flag

* update

* add new threads to init cache_manager

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-07-22 00:59:45 -07:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
liddk1121
1b54a2831e Adapt for iluvatar gpu (#2684) 2025-07-07 16:53:14 +08:00