lzy
d339df2e90
Supports DP+TP+EP hybrid parallel deployment strategy ( #3489 )
...
* Support DP+TP+EP hybrid parallel deployment strategy
* Support DP+TP+EP hybrid parallel deployment strategy
* fix conflict
* add moe_tp_ep function split_allgather_out
* del tp_group in moe_cutlass_backend
* for ci
* fix parallel_config for ci
* del log
2025-08-26 00:04:01 -07:00
Sunny-bot1
c68c3c4b8b
[Feature] bad words support v1 scheduler and specifiy token ids ( #3608 )
...
* support bad_words_token_ids
* docs
* fix test
* fix
* bad words support kvcache v1 and token ids
* fix
2025-08-25 20:14:51 -07:00
RAM
2fa173e327
[Executor] CUDAGraph support RL training ( #3265 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* add clear graph opt backend
* cuda graph support rl
* add branch
* 1.fix dynamic_weight_manager bug 2.add clear api for CasualLM
* open test case
* fix typo
* update mkdocs.yaml
* [Docs]Update mkdocs.yml
* update test case
* use unittest in graph test case
2025-08-25 20:59:30 +08:00
Kane2011
2ae7ab28d2
[MetaxGPU] adapt to the latest fastdeploy on metax gpu ( #3492 )
2025-08-25 17:44:20 +08:00
chen
9cab3f47ff
[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing ( #3552 )
...
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-08-25 14:11:49 +08:00
zhink
df7c31012b
Modified to support custom all reduce by default ( #3538 )
2025-08-22 16:59:05 +08:00
YuanRisheng
5b66462f0e
Fix fdconfig bugs ( #3528 )
...
* fix config
* fix parallel
* fix ips
* fix rl
* open code
2025-08-22 16:17:15 +08:00
YuanRisheng
c389a4013c
Unify server-side and model-side Config(Part-5) ( #3497 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* move config
* fix xpu
* fix
* fix vl
* fix vl
* fix unitest
* fix args
* add unitest
* fix test
2025-08-21 19:00:21 +08:00
qw86972190
c83381d650
revert pr ( #3481 )
...
Co-authored-by: iosmers <yinwei_hust@163.com >
2025-08-21 14:19:50 +08:00
kevin
67298cf4c0
add error traceback info ( #3419 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info
* update error msg
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-19 19:32:04 +08:00
lizexu123
a053ab889b
[BugFix] fix num_running_requests in cuda_graph ( #3457 )
...
* fix cuda_grpah
* add note
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-08-19 10:47:22 +08:00
lizexu123
32b39620bc
[Code Simplification] remove cum_offsets ( #3410 )
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-18 20:21:25 +08:00
Jundong Liu
ea4a3b479c
[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool ( #3404 )
...
* 修复buffer申请不够大,增加打印forwardmetadata的工具
* fix mistake
* Make CPU tensor in CPUPlace
* Add test about forward_meta_str and Add unitest_requirement
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-08-18 16:14:09 +08:00
chen
f0f00a6025
[OPs] Universal optimization and Fix early_stop cuda 700 ( #3375 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* delete nonzero
* delete setup_ops_base.py
* check if
* check gcp infer_seed.cpu()
* fix repetition_early_stopper_kernel cuda 700
2025-08-14 22:40:44 +08:00
lizexu123
7b596d0877
[BugFix] fix real_bsz in ep ( #3366 )
...
* Your commit message here
* fix ep
* delete cuda_graph
2025-08-14 17:31:19 +08:00
EnflameGCU
d1a92e3e17
[GCU] Enable gcu CI ( #3190 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [GCU] Update to the latest version
* [GCU] Enable CI
2025-08-13 11:48:24 +08:00
yzwu
ce9180241e
[Iluvatar GPU] Modify the names of some variables ( #3273 )
2025-08-13 11:38:02 +08:00
Kane2011
b4fef2cf29
[MetaxGPU] Support FastDeploy on metax gpu ( #3241 )
...
* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-08-13 11:11:54 +08:00
ming1753
f5164215be
[Bug Fix] fix vl V1 schedule bug ( #3323 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug Fix] fix vl V1 schedule bug
* fix format
2025-08-12 11:31:39 +08:00
Zero Rains
b23af29d0b
Launch expert_service before kv_cache initialization in worker_process ( #3045 )
...
* launch expert_service before kv_cache initialization
* add two signal make sure model loading and expert_service lauching finished
* fix the EP bug
* fix ep
* update launching way
* fix ep
* update
* roback ep
* pre-commit all files
---------
Co-authored-by: RAM <gstian5555@outlook.com >
Co-authored-by: Divano <dddivano@outlook.com >
2025-08-11 19:38:46 +08:00
Jiang-Jia-Jun
c56c99837a
Revert "[BugFix] num_seqs ( #3291 )" ( #3316 )
...
This reverts commit e0aeac58e1
.
2025-08-11 16:16:51 +08:00
Yuanle Liu
9571c458f0
enhance eos_tokens ( #3274 )
...
* enhance eos_tokens
* update
* update
2025-08-11 14:47:52 +08:00
lizexu123
e0aeac58e1
[BugFix] num_seqs ( #3291 )
...
* fix num_seqs
* merge develop
2025-08-11 13:38:55 +08:00
chenjian
c011cb8b16
[Bug Fix] Fix scheduler bug in develop ( #3292 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fix scheduler bug in develop
* Fix scheduler bug in develop
* Fix scheduler bug in develop
2025-08-10 13:55:38 +08:00
Yuanle Liu
fac2f64837
delete parallel_state.py ( #3250 )
2025-08-08 11:03:29 +08:00
yzwu
fbdd6b0663
[Iluvatar GPU] Optimze attention and moe performance ( #3234 )
2025-08-08 10:51:24 +08:00
lizexu123
afff4d37ea
[Feature] support seed parameter ( #3161 )
...
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
2025-08-06 15:20:47 +08:00
lizexu123
b01cfd6007
[BugFix] support real batch_size ( #3109 )
...
* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix
2025-08-05 16:33:54 +08:00
lizhenyun01
fe540f6caa
[plugin] Custom model_runner/model support ( #3186 )
...
* support custom model&&model_runner
* fix merge
* add test && update doc
* fix codestyle
* fix unittest
* load model in rl
2025-08-04 18:52:39 -07:00
Sunny-bot1
72ef5a9c93
[FIX]fix bad_words when sending requests consecutively ( #3197 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix bad_words
* fix log
* fix log
2025-08-04 05:59:41 -07:00
Longzhi Wang
01d7586661
[Bug fix] Fix cudagraph when use ep. ( #3130 )
...
* fix cudagraph when use ep
* fix typo
* reduce full length to adapt large bsz such 128/256
2025-08-04 18:06:18 +08:00
gaoziyuan
4021d66ea5
【Feature】add fd plugins && rm model_classes ( #3123 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
2025-08-03 19:53:20 -07:00
yinwei
3a4db15765
Fix out-of-memory issue during single-XPU deployment ( #3133 )
2025-08-01 17:12:03 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
RAM
d850660872
[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )
...
* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug
2025-07-31 00:09:31 +08:00
YuanRisheng
7dfdd157ac
[BugFix]Fix ep size ( #3092 )
...
* fix ep
* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19
[Feature] support ep in mixed mode ( #3001 )
...
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
2025-07-30 20:43:39 +08:00
bukejiyu
db698bda01
qwen loader ( #3057 )
2025-07-30 19:09:38 +08:00
ming1753
5acde4eb43
[Feature] Multimodal Scheduler V1 ( #3019 )
...
* [Feature] Support multimodal scheduler v1
* remove debug log
* fix bug
* fix format
* modify code
* fix bug
* fix bug
* fix bug
* modify code
2025-07-30 16:05:55 +08:00
Sunny-bot1
74aa31d15b
[Feature] support bad_words ( #3055 )
...
* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-30 09:31:29 +08:00
Ryan
73cfe1fd37
[SOT] Extend SOT warmup support to new hardware ( #3032 )
...
* add new hardware
* add_sot_warmup4new_hardware
* fix conflict
* rm Optional
2025-07-29 22:45:20 +08:00
Zero Rains
b2f9a42d87
[Feature] Support repetition early stop ( #3024 )
...
* support repetition early stop and support user to set the parameter
* remove log
* fix codestyle
* add the early_stop_config to rollout_config
* update config and EarlyStopper class
* fix the bug for triton
* modify the stop method
* update description
* modify the usage for stop_flags
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-29 22:42:54 +08:00
YuanRisheng
502ee92a0a
Unify server-side and model-side Config (Part3) ( #3047 )
...
* merge model config
* fix arch
* fix rl
2025-07-29 17:07:44 +08:00
JYChen
dafe02a7b9
[stop sequence] support stop sequence ( #3025 )
...
* stop seqs in multi-ends
* unittest for gpu stop op
* kernel tid==0
2025-07-29 14:17:37 +08:00
YuanRisheng
1a815b7a2a
Fix Speculative Config bug ( #3049 )
...
* fix speculative bug
* fix rl
2025-07-29 10:50:48 +08:00
yinwei
f2a528f9ae
[XPU] Support kvblock centralized management ( #3017 )
2025-07-29 10:40:55 +08:00
YuanRisheng
bddf403576
Unify server-side and model-side Config (Part2) ( #3035 )
...
* merge speculative and graph opt conifg
* add attr
2025-07-28 15:31:48 +08:00
begin2023
dd877f38b1
[Perf] Remove unnecessary operations in non-cuda_graph ( #3010 )
...
* [Perf] Remove unnecessary operations in non-cuda_graph
* fix code logic
* use suggestion comment
* reduce function call
* reduce function call
* reduce function call
* reduce function call
2025-07-27 20:38:29 -07:00
YuanRisheng
6ccc10ad47
Unify server-side and model-side Config (Part1) ( #3018 )
...
* move cache config
* fix mtp
2025-07-28 10:51:52 +08:00
Longzhi Wang
0700c90caa
[Feat] support mixed ep ( #2969 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support mixed ep
* fix comment
* fix comment
* update mixep
* fix conflict
* fix typo
* update
* fix typo
* fix code style
* fix conflict
2025-07-25 15:29:30 +08:00