Commit Graph

29 Commits

Author SHA1 Message Date
AIbin
a7392a0ff9 【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886)
* support MLA chunk_size auto search & cuda_graph
2025-09-11 10:46:09 +08:00
wanrui
276f73cf83 【Hackathon 9th No.28】add test_cutlass_fp8_fp8_fp8_dual_gemm_fused (#3935)
* add test_cutlass_fp8_fp8_fp8_dual_gemm_fused

* fix the version

* fix code style

---------

Co-authored-by: Tao Luo <luotao02@baidu.com>
2025-09-10 14:57:49 +08:00
Echo-Nie
319a4bf75f 【Hackathon 9th No.36】add test_extract_text_token_output(#3862) 2025-09-08 17:31:58 +08:00
co63oc
f884cd4f62 [UnitTest][MTP]add test_speculate_set_stop_value_multi_seqs.py (#3941) 2025-09-08 17:11:00 +08:00
co63oc
f32327661c [UnitTest][MTP]add test_eagle_get_hidden_states (#3876) 2025-09-08 17:10:01 +08:00
co63oc
976aa88e66 【Hackathon 9th No.69】add test_draft_model_preprocess (#3832)
* add test_draft_model_preprocess

* fix

* ci
2025-09-08 17:08:50 +08:00
co63oc
ed462cf238 [UnitTest][MTP] add test_speculate_get_token_penalty_multi_scores.py (#3742)
* add test_speculate_get_token_penalty_multi_scores

* fix
2025-09-08 17:07:11 +08:00
Echo-Nie
20495f927e [UnitTest][MTP] supplementary unit test for ngram_match (#3732)
* supplement unittest for custom_ops: ngram_match

* add annotation

* 借助 step_idx 信息,改为在具体位置判断是否相等

* del anno

* del print

---------

Co-authored-by: Tao Luo <luotao02@baidu.com>
2025-09-08 17:06:06 +08:00
ooo oo
0c46318b34 【Hackathon 9th No.22】add unit tests for share_external_data (#3744) 2025-09-08 17:05:48 +08:00
Jundong Liu
3d0aaa5923 [Excutor] Experiment Feature-Support Prefill in cudagraph (#3459)
* Support prefill in Cudagraph

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5

* Solve problem about encoder_num_blocks_x_cpu

* Add early-exit mechanism for attention kernel

* fix test case about append-attention

* Update testcode, Add annotations to related tensors

* move get_input_length_list

* solve test_code

* Add annotations about early-exit for attention kernel

* Add annotations about early-exit for attention kernel2

* solve comment

* solve mtp

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-09-08 13:12:24 +08:00
ooo oo
b23fc654d9 【Hackathon 9th No.32】add unit tests for group_swiglu_with_masked (#3748) 2025-09-05 11:53:47 +08:00
Echo-Nie
fc3bc56e59 【Hackathon 9th No.35】add test_moe_redundant_topk_select (#3867) 2025-09-05 11:29:02 +08:00
freeliuzc
88d44a2c93 support mtp in v1_scheduler mode (#3695)
Some checks failed
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-04 17:39:59 +08:00
co63oc
e83251699f 【Hackathon 9th No.63】add test_draft_model_postprocess.py (#3757)
* add test_draft_model_postprocess.py

* fix

* fix
2025-09-04 15:00:48 +08:00
Echo-Nie
ac46ef403a 【Hackathon 9th No.34】add test_get_position_ids_and_mask_encoder_batch (#3739) 2025-09-04 14:54:30 +08:00
ooo oo
460809070c 【Hackathon 9th No.54、57】 add unit tests for per_token_quant and per_token_quant_padding (#3746) 2025-09-04 11:46:38 +08:00
co63oc
7baf1b56e0 【Hackathon 9th No.27】add test_get_padding_offset (#3708)
* add test_get_padding_offset

* fix

* fix

* fix
2025-09-04 11:42:35 +08:00
co63oc
e24b745d48 [UnitTest][MTP]add test_speculate_get_output_padding_offset (#3740) 2025-09-03 22:21:21 +08:00
co63oc
aaa2de1afa [UnitTest][MTP]add test_speculate_get_padding_offset (#3730) 2025-09-03 22:21:02 +08:00
Yuan Xiaolan
fa58a9fa8f qk norm for speculate decode C16 (#3637) 2025-09-03 14:53:56 +08:00
Echo-Nie
0fe1d62232 [MTP] add test_draft_model_set_value_by_flags.py (#3741) 2025-09-02 19:33:33 +08:00
co63oc
d4fc893fe3 fix typos (#3633)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-28 14:42:24 +08:00
Sunny-bot1
479c8b85d3 [Optimize]support machete weight only gemm (#3561)
* support machete weight only gemm

* add generate

* update

* fix

* change file location

* add sm_version limit

* fix

* fix

* fix ci

* fix coverage

* fix xpu
2025-08-28 09:49:58 +08:00
YuanRisheng
642480f5f6 [CI] Standard unittest (#3606)
* standard unittest

* fix bugs

* fix script
2025-08-26 19:03:11 +08:00
freeliuzc
52eda7fdb3 [Feature][MTP]support new speculative decoding method named hybrid mtp with ngram (#3610) 2025-08-26 14:29:22 +08:00
Yuan Xiaolan
9205c88da1 support w4afp8 EP inference (#3044)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-25 11:27:45 +08:00
freeliuzc
76759108c9 [Feature][SpeculativeDecoding]Support tree-attention (#3514)
* support tree-attention

* fix merge bug

* fix unit-test api

* fix merge bug
2025-08-22 13:36:41 +08:00
yangjianfengo1
e5aa7087db 【bug fix】修复w4a8编译慢 (#3510)
* 修复w4a8编译

* code style

* 修复tma copy
2025-08-21 18:50:14 +08:00
YUNSHEN XIE
3a6058e445 Add stable ci (#3460)
* add stable ci

* fix

* update

* fix

* rename tests dir;fix stable ci bug

* add timeout limit

* update
2025-08-20 08:57:17 +08:00