chen
5c63a089f6
[Feature] Support logprobs_mode ( #4567 )
2025-10-27 14:27:48 +08:00
Sunny-bot1
4ffe41a747
WINT4/WINT8 dense gemm default use Machete ( #4451 )
2025-10-23 17:57:59 +08:00
周周周
d7bcedf421
small change in test_fusedmoe.py ( #4538 )
2025-10-22 17:49:18 +08:00
RAM
775edcc09a
[Executor] Default use CUDAGraph ( #3594 )
...
* add start intercept
* Adjustment GraphOptConfig
* pre-commit
* default use cudagraph
* set default value
* default use cuda graph
* pre-commit
* fix test case bug
* disable rl
* fix moba attention
* only support gpu
* Temporarily disable PD Disaggregation
* set max_num_seqs of test case as 1
* set max_num_seqs and temperature
* fix max_num_batched_tokens bug
* close cuda graph
* success run wint2
* profile run with max_num_batched_tokens
* 1.add c++ memchecker 2.success run wint2
* updatee a800 yaml
* update docs
* 1. delete check 2. fix plas attn test case
* default use use_unique_memory_pool
* add try-except for warmup
* ban mtp, mm, rl
* fix test case mock
* fix ci bug
* fix form_model_get_output_topp0 bug
* fix ci bug
* refine deepseek ci
* refine code
* Disable PD
* fix sot yaml
2025-10-21 14:25:45 +08:00
Haonan Luo
1b9f351d21
Support GPT-OSS-BF16 ( #4240 )
...
* [Feature] AppendAtten support sinks & HEAD_DIM=64
* fix bug
* fix bug
* fix bug
* fix bug
* [Feature] support gpt-oss
* fix bug
* add mask
* support-gpt-oss
* support-gpt-oss
* fix long seq
* support wint8
* support wint8
* support wint8
* update test
* change sliding windows init pos
---------
Co-authored-by: ming1753 <ideaminghp@163.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
2025-10-20 14:44:58 +08:00
周周周
817210e47f
[ATTN]delete code and add ffn and moe layer level test ( #4440 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete code
* delete code
* delete code
* commit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
2025-10-19 16:23:11 +08:00
Sunny-bot1
a751d977bc
[Optimization] Fuse get_max_len and get_kv_max_len ( #4369 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* opt split_q_block
* fuse max_lens and max kv_len
2025-10-13 20:35:00 +08:00
yangjianfengo1
4325b737e7
【FIX】Change the name of sparse attn from moba to plas ( #4006 ) ( #4076 )
...
* 【FIX】Change the name of sparse attn from moba to plas (#4006 )
* 更新文档
* 【docs】 update readme (#4000 )
* 更新文档
* update readme
* update docs
* 【FIX】Change the name of sparse attn from moba to plas (#3845 )
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
* code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* fix max_num_seqs
* fix test load attn
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-23 10:26:40 +08:00
Yuan Xiaolan
de8638b1e9
fix dynamic Cfp8 computing error ( #4119 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-16 20:21:49 +08:00
AIbin
a7392a0ff9
【Inference Optimize】DeepSeek-V3-model MLA Optimize ( #3886 )
...
* support MLA chunk_size auto search & cuda_graph
2025-09-11 10:46:09 +08:00
YuanRisheng
b3fac5bde1
[V1 Loader] Ernie kv cache quant support v1 loader ( #3899 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support c8 for ernie
* add unittest
* support vl
* fix c8
2025-09-09 05:25:08 -07:00
Jiang-Jia-Jun
c60adf4281
Revert "【FIX】Change the name of sparse attn from moba to plas ( #3845 )" ( #4001 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
This reverts commit e31c8f7336 .
2025-09-09 11:08:23 +08:00
yangjianfengo1
e31c8f7336
【FIX】Change the name of sparse attn from moba to plas ( #3845 )
...
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
2025-09-09 10:56:50 +08:00
lzy
f12159b630
del batch id per token ( #3963 )
...
* Update decoder_write_cache_with_rope_kernel.cu
del batch_id_per_token
* Update decoder_write_cache_with_rope_impl.cuh
* Update test_append_attention.py
* Update test_append_attention.py
2025-09-08 21:58:34 +08:00
Jundong Liu
3d0aaa5923
[Excutor] Experiment Feature-Support Prefill in cudagraph ( #3459 )
...
* Support prefill in Cudagraph
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5
* Solve problem about encoder_num_blocks_x_cpu
* Add early-exit mechanism for attention kernel
* fix test case about append-attention
* Update testcode, Add annotations to related tensors
* move get_input_length_list
* solve test_code
* Add annotations about early-exit for attention kernel
* Add annotations about early-exit for attention kernel2
* solve comment
* solve mtp
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-09-08 13:12:24 +08:00
lzy
af49b81ffd
supports dynamic Cfp8 ( #3767 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* supports dynamic Cfp8
* add unittest
2025-09-07 20:41:29 -07:00
Zhang Yulong
349aa6348b
add cache queue port ( #3904 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* add cache queue port
* add cache queue port
* add cache queue port
2025-09-05 21:17:06 +08:00
yangjianfengo1
c870be6d27
fix port ( #3863 )
2025-09-04 10:01:38 +08:00
lzy
2527eb0e4e
fix test_append_attention_with_output.py ( #3831 )
...
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com >
2025-09-03 14:07:50 +08:00
lzy
7a521bbf62
Modify mask_offset‘s format ( #3525 )
...
* modify mask_offset in decode
* modify mask_offset unittest
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-02 03:05:35 -07:00
lizhenyun01
bed09ae8f8
fix mask_offset in append_attn ( #3745 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mask_offset in append_attn
* fix test
2025-08-31 15:03:16 +08:00
yangjianfengo1
3754a9906d
[Feature] block sparse attention ( #3668 )
...
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
* 修复显存
* add test server
* add test server
* fix mlp 最后一层使用full attn
2025-08-29 19:46:30 +08:00
Ryan
45f81b34f0
add dtype int32 ( #3692 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-29 14:56:35 +08:00
Liumengyuan
e93d4cfcdd
Add with_output version AppendAttention ( #3302 )
...
* get use_output from fd_config
* add clear TODO description
* add mask_offset para to align with develop
* fix bug
* fix use_output logic
* fix sot bug
2025-08-28 17:10:18 +08:00
Jiang-Jia-Jun
c694fa2879
Revert "[Feature] block sparse attention ( #3209 )" ( #3647 )
...
This reverts commit 646a0c2fd8 .
2025-08-27 17:35:04 +08:00
yangjianfengo1
646a0c2fd8
[Feature] block sparse attention ( #3209 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
2025-08-26 07:16:04 -07:00
YUNSHEN XIE
3a6058e445
Add stable ci ( #3460 )
...
* add stable ci
* fix
* update
* fix
* rename tests dir;fix stable ci bug
* add timeout limit
* update
2025-08-20 08:57:17 +08:00