gaoziyuan
82e64b13e1
[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop ( #3598 )
...
* [Feature] update ep
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix queue ports idx
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* Update engine.py
* fix ci
* fix some bug in mixed ep
* add server fix and op fix
* rm some log
* fix code style
* ltd fix
* fix
* fix
* fix some bug
* fix bug
* fix bug
* fix style
* Update config.py
* Update splitwise_connector.py
* Update cache_messager.py
* Update __init__.py
* merge and fix
* Update engine.py
* Update common_engine.py
* Update run_ci_xpu.sh
* Update ernie_processor.py
* Update ernie_processor.py
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-26 19:59:02 +08:00
Yuanle Liu
cbce94a00e
rename ernie_xxx to ernie4_5_xxx ( #3621 )
...
* rename ernie_xxx to ernie4_5_xxx
* ci fix
2025-08-26 19:29:27 +08:00
bukejiyu
3200a80de3
[v1 loader]support fp8 ( #3593 )
...
* support fp8
* update ci
2025-08-26 02:42:46 -07:00
lzy
d339df2e90
Supports DP+TP+EP hybrid parallel deployment strategy ( #3489 )
...
* Support DP+TP+EP hybrid parallel deployment strategy
* Support DP+TP+EP hybrid parallel deployment strategy
* fix conflict
* add moe_tp_ep function split_allgather_out
* del tp_group in moe_cutlass_backend
* for ci
* fix parallel_config for ci
* del log
2025-08-26 00:04:01 -07:00
zhink
df7c31012b
Modified to support custom all reduce by default ( #3538 )
2025-08-22 16:59:05 +08:00
YuanRisheng
5b66462f0e
Fix fdconfig bugs ( #3528 )
...
* fix config
* fix parallel
* fix ips
* fix rl
* open code
2025-08-22 16:17:15 +08:00
YuanRisheng
c389a4013c
Unify server-side and model-side Config(Part-5) ( #3497 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* move config
* fix xpu
* fix
* fix vl
* fix vl
* fix unitest
* fix args
* add unitest
* fix test
2025-08-21 19:00:21 +08:00
lizexu123
7b596d0877
[BugFix] fix real_bsz in ep ( #3366 )
...
* Your commit message here
* fix ep
* delete cuda_graph
2025-08-14 17:31:19 +08:00
Kane2011
b4fef2cf29
[MetaxGPU] Support FastDeploy on metax gpu ( #3241 )
...
* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-08-13 11:11:54 +08:00
ming1753
f5164215be
[Bug Fix] fix vl V1 schedule bug ( #3323 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug Fix] fix vl V1 schedule bug
* fix format
2025-08-12 11:31:39 +08:00
Zero Rains
b23af29d0b
Launch expert_service before kv_cache initialization in worker_process ( #3045 )
...
* launch expert_service before kv_cache initialization
* add two signal make sure model loading and expert_service lauching finished
* fix the EP bug
* fix ep
* update launching way
* fix ep
* update
* roback ep
* pre-commit all files
---------
Co-authored-by: RAM <gstian5555@outlook.com >
Co-authored-by: Divano <dddivano@outlook.com >
2025-08-11 19:38:46 +08:00
Jiang-Jia-Jun
c56c99837a
Revert "[BugFix] num_seqs ( #3291 )" ( #3316 )
...
This reverts commit e0aeac58e1
.
2025-08-11 16:16:51 +08:00
lizexu123
e0aeac58e1
[BugFix] num_seqs ( #3291 )
...
* fix num_seqs
* merge develop
2025-08-11 13:38:55 +08:00
yzwu
fbdd6b0663
[Iluvatar GPU] Optimze attention and moe performance ( #3234 )
2025-08-08 10:51:24 +08:00
lizexu123
b01cfd6007
[BugFix] support real batch_size ( #3109 )
...
* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix
2025-08-05 16:33:54 +08:00
gaoziyuan
4021d66ea5
【Feature】add fd plugins && rm model_classes ( #3123 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
2025-08-03 19:53:20 -07:00
YuanRisheng
7dfdd157ac
[BugFix]Fix ep size ( #3092 )
...
* fix ep
* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19
[Feature] support ep in mixed mode ( #3001 )
...
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
2025-07-30 20:43:39 +08:00
bukejiyu
db698bda01
qwen loader ( #3057 )
2025-07-30 19:09:38 +08:00
Zero Rains
b2f9a42d87
[Feature] Support repetition early stop ( #3024 )
...
* support repetition early stop and support user to set the parameter
* remove log
* fix codestyle
* add the early_stop_config to rollout_config
* update config and EarlyStopper class
* fix the bug for triton
* modify the stop method
* update description
* modify the usage for stop_flags
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-29 22:42:54 +08:00
YuanRisheng
502ee92a0a
Unify server-side and model-side Config (Part3) ( #3047 )
...
* merge model config
* fix arch
* fix rl
2025-07-29 17:07:44 +08:00
YuanRisheng
1a815b7a2a
Fix Speculative Config bug ( #3049 )
...
* fix speculative bug
* fix rl
2025-07-29 10:50:48 +08:00
YuanRisheng
bddf403576
Unify server-side and model-side Config (Part2) ( #3035 )
...
* merge speculative and graph opt conifg
* add attr
2025-07-28 15:31:48 +08:00
YuanRisheng
6ccc10ad47
Unify server-side and model-side Config (Part1) ( #3018 )
...
* move cache config
* fix mtp
2025-07-28 10:51:52 +08:00
ltd0924
f935d6f862
[BugFix] fix multinode deployment ( #2977 )
2025-07-24 15:04:04 +08:00
ltd0924
3792345c3a
[LLM] update function name ( #2985 )
...
* [LLM] update function name
2025-07-24 15:03:40 +08:00
lizexu123
832d25334a
[Code Simplification] fix init_distributed_environment() ( #2982 )
2025-07-24 11:43:28 +08:00
Zero Rains
ca0f71bd39
polish code for prefill restrictions ( #2991 )
2025-07-23 05:10:14 -07:00
Zero Rains
850c9d98d4
[BugFix] Add prefill restrictions for chunked_prefill+VL ( #2983 )
2025-07-23 01:45:57 -07:00
Ryan
95b5af24db
[SOT] Add sot warmup (NVIDIA GPU Only) ( #2929 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add sot warmup
* fix code style
* change batch_size list
* add param to config
* rm free_list settings && set sot_warmup_sizes
* finish debug with dynamic dims by type annotations
* add profile_run guard
* rm sth useless
2025-07-22 21:36:14 +08:00
GoldPancake
dc67c10a7e
[Feature][MTP]Support multi-step MTP ( #2952 )
2025-07-22 16:26:29 +08:00
Zero Rains
89a485b69f
[Feature] Support using prefix-caching + cudagraph for inference ( #2924 )
...
* fix the bug in cudagraph+prefix-caching but still have some bug with profile
Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397
* add the signal to make sure cache manager launched
* fix judge condition
* reomove useless control
* update control stream
* update
* fix xpu
* change the do_profile flag
* update
* add new threads to init cache_manager
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-07-22 00:59:45 -07:00
Nyakku Shigure
48e6a0ca26
[SOT] Mark dynamic dims by type annotations ( #2771 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations
* fix conflict of forward_meta
* mark more attn backend
* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS
* auto infer implicit 0 dim dynamic dim
* revert manual marked dims
* revert missing update
* auto infer can use unsafe code in warmup stage
* check -> type_match
* fix codestyle
* restore blank line
* empty commit
* add need_warmup nonlocal;
* add doc for resolver
* add missing type hints
* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
zhink
0262ef7eb3
custom all reduce support cuda graph ( #2938 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication
2025-07-21 22:52:03 +08:00
Yuanle Liu
2f74e93d7e
use dist.all_reduce(min) to sync num_blocks_local ( #2933 )
...
* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
ming1753
5328daa333
[Bug Fix] fix ep config bug ( #2920 )
2025-07-18 19:12:56 +08:00
gaoziyuan
6efad14b95
support vl ori_vacab_size ( #2900 )
2025-07-18 16:26:14 +08:00
ltd0924
b630031414
[LLM] fix serval bugs ( #2878 )
2025-07-17 14:21:05 +08:00
Yuanle Liu
dbb9e2506b
Fix rollout_model init ( #2881 )
2025-07-16 22:36:21 -07:00
ltd0924
9c25dcca0b
[LLM] Update Multinode Deployment ( #2830 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] fix ci bugs
* Update fastdeploy/engine/args_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [LLM] update random port
* [LLM] update random port
* [LLM] fix ci bugs
* fix ci bugs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-07-16 23:42:54 +08:00
Yuanle Liu
63d6e7ce06
fix and refine vl ( #2866 )
...
* refine vl config
* delete attn_sep
* fix vl accuracy
2025-07-16 05:59:28 -07:00
Yuanle Liu
dda4a9f848
rl update ( #2861 )
2025-07-16 00:33:10 -07:00
YuanRisheng
101ad33332
[BugFix] Fix Configs ( #2849 )
...
* fix config
* fix config
2025-07-15 19:50:36 -07:00
RAM
0fad10b35a
[Executor] CUDA Graph support padding batch ( #2844 )
...
* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug
2025-07-15 19:49:01 -07:00
Zero Rains
e7bcbbab52
Merge vl execution path into normal execution path ( #2829 )
...
* merge vl model into gpu_model runner
Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6
* fix chinese
Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a
* fix the parse parameter
Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe
* fix the bug in online_inference
Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559
* polish code
Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c
2025-07-15 22:20:03 +08:00
YuanRisheng
4c7b8bc458
Simplify the Config code ( #2770 )
...
* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe
2025-07-14 19:50:05 +08:00
bukejiyu
bad53c6b6e
[vl]remove duplicated load logic ( #2744 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-13 07:36:26 +08:00
gaoziyuan
e9e8443ea8
fix num_blocks_local when small size model in TP2 running mode ( #2792 )
2025-07-12 12:50:48 +08:00
lizexu123
8c660a0dfb
[BugFix] fix RMSNorm rms_norm_esp ( #2797 )
...
* fix rms
* add vl
* fix
* add vl
* fix
* fix
2025-07-10 20:02:24 +08:00