Commit Graph

419 Commits

Author SHA1 Message Date
freeliuzc
6715196924 fix attention bug in spec decoding (#5480) 2025-12-10 12:56:13 +08:00
lzy
04b2c43806 [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5316) 2025-12-02 13:03:55 +08:00
Yuanle Liu
b99064432e Update load_weight_utils.py (#5285) 2025-11-28 13:39:59 +08:00
lizhenyun01
fd1313cdb4 [Cherry-Pick][Feature] support flash_mask_attention backend(#5134) (#5256)
* [Feature] suppert flash_mask_attention backend

* fix unittest

* clean code
2025-11-28 10:13:00 +08:00
kevin
8e4e3ff510 [Feature] support eplb in api_server (#4782)
* support eplb in api_server

* update code

* add eplb test case

* update eplb

* support tp+dp eplb

* update test cese

* update code

* update code

* fix bug

* update copilot review

* update test case name
2025-11-24 20:22:29 +08:00
xiaozude
d5bd64336a [Metax] support ENABLE_V1_KVCACHE_SCHEDULER (#5163) 2025-11-24 19:19:49 +08:00
xiaoxiaohehe001
e150a418d4 support moe offline quant (#5142) 2025-11-24 18:59:18 +08:00
xiaoxiaohehe001
95f3c8c641 [Fix] Fix eplb bug and support fp8 load weight (#5178)
* fix eplb part2

* fix eplb part2

* fix eplb part2
2025-11-24 15:31:37 +08:00
xiaoxiaohehe001
6471dade4a [Fix] Fix noaux ep test (#5161)
* support noaux eplb

* noaux_eplb

* noaux_eplb

* noaux_eplb

* noaux_eplb
2025-11-21 16:36:41 +08:00
freeliuzc
2d1dade5e2 [Speculative Decoding][MTP] Support static CacheKV C8 quantization and optimize memory usage (#5155)
* support static cachekv c8 quantization in mtp mode

* optimize memory allocation
2025-11-21 15:10:13 +08:00
bukejiyu
34f59d9800 [RL]Fix missing is_distributed attribute (#5150)
* fix

* update
2025-11-21 14:14:25 +08:00
xiaoxiaohehe001
6ca2651995 [Feature] Support noaux for eplb (#5143)
* support noaux eplb

* noaux_eplb

* noaux_eplb

* noaux_eplb
2025-11-21 14:10:32 +08:00
ddchenhao66
e70e2279ce [PD Disaggregation][XPU] Add XPU support for PD disaggregation (#5113)
* [XPU] xpu support PD disaggregation

* [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards

* [XPU] xpu support PD disaggregation in v1 scheduler

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-11-21 14:09:01 +08:00
Yonghua Li
43097a512a [BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol (#5132)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix v1 scheduler profile run for append attention in prefill node

* [fix] skip send_signal if kv signal not inited for gpu and xpu

* [fix] extend fix to flash_attn & mla_attn

* [fix] fix v1 pd run in ipc transfer protocol

* [ci] add test for v1 pd profile run using ipc transfer protocol

* [style] fix code style check

* [style] fix code style again

* [fix] fix profile run

* [update] remove --num-gpu-blocks-override in example script

* [chore] rename forward_meta is_profiling to is_dummy_or_profile_run
2025-11-20 21:39:22 +08:00
Ryan
0857099191 mv import (#5146) 2025-11-20 19:25:56 +08:00
周周周
385fe6dade [Others] clean code (#5133) 2025-11-20 18:44:08 +08:00
Yuanle Liu
7ac25935c7 [Optimization] default compile rdma, reduce cudagraph buffer size in mm, fix some config bug (#5121)
* default compile rdma, reduce cudagraph buffer size in mm, fix some config logic

* update

* update

* fix bug

* enhance rdma compile

* fix
2025-11-20 17:19:47 +08:00
周周周
6fa34102e8 [Others]get_block_shape_and_split_kv_block clean code (#5123) 2025-11-20 16:40:04 +08:00
Neil Zhu
0edda75a56 [Metax] optimize cutlass moe and flash attention backend (#5128) 2025-11-20 16:12:35 +08:00
freeliuzc
f1e36ff2f7 [Speculative Decoding][MTP]Support stop_seqs and pd-split mode (#5029)
* support multi_stop_seqs in speculative decoding

* support mtp tp with ep split

* fix custom op register

* fix spec stop_seqs params
2025-11-20 15:26:01 +08:00
Sunny-bot1
bde97e09f7 support dynamic activation quant for w4afp8 (#5117) 2025-11-19 21:11:16 +08:00
Sunny-bot1
43f0c7557e [Feature] Add an unquantized option for MoE and Dense quant type (#4813) 2025-11-19 16:24:03 +08:00
bukejiyu
a82f25ea7b [RL]Resolve shape mismatch problems in RL-related modules (#5032)
* RL fix

* update
2025-11-19 11:12:48 +08:00
Daci
eab8384da6 [Feature] ThreadPoolExecutor async fill_token_bitmask (#5083)
* ThreadPoolExecutor async fill_token_bitmask

* ThreadPoolExecutor async fill_token_bitmask logging

* fix test_guided_decoding

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add fill_bitmask_parallel_batch_size ENV

* FD_FILL_BITMASK_BATCH fastdeploy.envs

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-19 10:04:16 +08:00
MingkunZhang
a36c958c66 [Metax] support default_v1 loader based #4988 (#5001)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-18 09:44:30 +08:00
fmiao2372
74f33efdbf [Intel HPU] fix bugs caused by other commits (#5074)
* [Intel HPU] fix bugs caused by other commits

* update code by copilot
2025-11-17 15:28:55 +08:00
Winters Montagne
02c83d65db [CI]【Hackathon 9th Sprint No.13】NO.13 功能模块 fastdeploy/model_executor/ops/triton_ops/triton_utils.py 单测补充 (#5035)
* Add unit tests for triton_utils.py

* update name

* update

* update

* update
2025-11-17 11:43:31 +08:00
xiaozude
68f638f8b9 [Metax] support default_v1 loader and quant_config is None for triton moe (#5030) 2025-11-17 10:38:00 +08:00
yangjianfengo1
3afb717995 【Fix】fix deepep dispatch (#5036)
* fix dispatch

* fix dispatch

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-11-17 10:34:01 +08:00
yzwu
3b80a799ab [Iluvatar][CI] Fix moe_expert_dispatch cannot support dequant_scale (#5012)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-11-17 10:18:42 +08:00
fmiao2372
e43a5fc055 [Intel HPU] enable level 1 prefix caching and fix some bugs (#4971)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [Intel HPU] enable prefix caching and dense tp moe ep and fix some bugs

* update code by copilot

* remove dense tp and moe ep code
2025-11-14 19:42:50 +08:00
Daci
5fc12eddfe [Optimization] xgrammar async compile, multi thread, speed up (#4835)
* xgrammar async compile, multi thread, speed up

* fix test_sampler.py & pre-commit err

* add redis version check && fix request.llm_engine_recv_req_timestamp

* xgrammar prefill & decode & v0

* fix test_gpu_prompt_logprobs.py

* add test_guided_decoding.py

* Update fastdeploy/scheduler/splitwise_scheduler.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/model_executor/guided_decoding/xgrammar_backend.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/model_executor/guided_decoding/xgrammar_backend.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix torch xgrammar unittest env

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-14 18:05:26 +08:00
周周周
51b1f13547 [Executor]move batch_id_per_token (#4853) 2025-11-14 15:38:48 +08:00
yangjianfengo1
ae7bee8122 【New Feature】W4afp8 supports per group quantization (#4987)
* w4afp8 支持per group

* code style

* fix transpose

* revert fast hardmard

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-13 19:17:27 +08:00
bukejiyu
4a0d881e15 update (#4985)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-13 15:58:01 +08:00
Yuanle Liu
2272160faf fix mtp tsp (#4990) 2025-11-12 22:05:19 +08:00
ming1753
3148dbca06 [BugFix] fix VL fp8 bug when moe token_num is 0 (#4928)
* [BugFix] fix VL fp8 bug when moe token_num is 0

* fix bug

* format

* fix bug
2025-11-12 21:19:36 +08:00
bukejiyu
f0189292df [CI] fix test_model_cache (#4982)
* ci

* update
2025-11-12 20:26:49 +08:00
xiaozude
c45b3ccb52 [Metax] optimize flash mla (#4915) 2025-11-12 16:43:46 +08:00
MingkunZhang
9d9f5df8d0 [Metax] support default_v1 loader & thinking model (#4956)
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-12 16:32:26 +08:00
BossPi
bde6e2f931 [BugFix] Avoid loading training file (#4966)
* bug fix

don't put scheduler.pdparams into model weights

* run pre-commit
2025-11-12 15:49:14 +08:00
bukejiyu
6e2e2fcd29 xpu (#4969) 2025-11-12 15:12:59 +08:00
ltd0924
5bf48de999 [KVCache] support unified cache backend (#4903)
* [Feature] support unified cache backend

* fix

* fix

* fix

* fix

* Update metax_model_runner.py

* fix

* update

* Update test_moba_attention_backend.py

---------

Co-authored-by: ltd0924 <luotingdan@baidu.com>
2025-11-12 14:54:52 +08:00
yzwu
76e60e98f8 [Iluvatar][CI] fix safetensors_rust.SafetensorError: framework paddle is invalid (#4972) 2025-11-12 14:13:40 +08:00
bukejiyu
b09ebb2813 refactor pt loading (#4532)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-11 21:30:39 +08:00
周周周
da6b4c10e5 [ATTENTION] make buffer alloc as a function (#4945) 2025-11-11 19:17:08 +08:00
SunLei
3098aee05f [Perf] Support tensor transmission between work and engine with zero-copy to improve efficiency (#4839)
* feat(zmq): support tensor transmission with zero-copy for improved efficiency

* perf: zmq.send disable copy

* zmq recv data for debug

* convert logprobs tensor to cpu
2025-11-11 15:43:11 +08:00
yzwu
3707af7a4f [Iluvatar] add vl into ci and support v1 loader (#4774) 2025-11-11 10:50:17 +08:00
Ryan
07a82afcae add tie_word_embeddings for lmhead (#4916) 2025-11-11 10:46:35 +08:00
Yuanle Liu
3dc0ffa46d [TSP] Support qwen3 moe tsp + cudagraph (#4871)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen3_moe tsp mode

* fix

* fix

* update

* update

* update

* fix

* support external_rmsnorm

* update

* fix
2025-11-10 23:37:51 +08:00