Commit Graph

21 Commits

Author SHA1 Message Date
yangjianfengo1
e81046fdad 【New Feature】集中式支持w4afp8 (#3644)
* 支持tp w4afp8

* code style
2025-08-28 10:53:24 +08:00
gaoziyuan
82e64b13e1 [NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598)
* [Feature] update ep

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix queue ports idx

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* Update engine.py

* fix ci

* fix some bug in mixed ep

* add server fix and op fix

* rm some log

* fix code style

* ltd fix

* fix

* fix

* fix some bug

* fix bug

* fix bug

* fix style

* Update config.py

* Update splitwise_connector.py

* Update cache_messager.py

* Update __init__.py

* merge and fix

* Update engine.py

* Update common_engine.py

* Update run_ci_xpu.sh

* Update ernie_processor.py

* Update ernie_processor.py

---------

Co-authored-by: ltd0924 <ltd0924@sina.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
2025-08-26 19:59:02 +08:00
lzy
d339df2e90 Supports DP+TP+EP hybrid parallel deployment strategy (#3489)
* Support DP+TP+EP hybrid parallel deployment strategy

* Support DP+TP+EP hybrid parallel deployment strategy

* fix conflict

* add moe_tp_ep function split_allgather_out

* del tp_group in moe_cutlass_backend

* for ci

* fix parallel_config for ci

* del log
2025-08-26 00:04:01 -07:00
Yuan Xiaolan
9205c88da1 support w4afp8 EP inference (#3044)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-25 11:27:45 +08:00
Sunny-bot1
6c1f3ff897 topk_gating_softmax support bias (#3405) 2025-08-15 11:57:45 +08:00
Sunny-bot1
2e7831185f [Optimize]Add norm_weights feature for topk_gating_softmax (#3372)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-14 15:05:23 +08:00
Sunny-bot1
8224b21525 Refactor moe_topk_select op to use apply_norm_weight as a template parameter (#3345)
* Refactor moe_topk_select op to use apply_norm_weight as a template parameter

* update test
2025-08-13 08:44:16 +08:00
Yuan Xiaolan
7ce00e597c support qk norm (#3145) 2025-08-05 16:46:14 +08:00
AIbin
22fe695f1c 【Inference Optimize】Support automatic generation of marlin kernel (#3149)
* Support automatic generation of marlin kernel
2025-08-01 22:43:18 +08:00
chen
a2f5cc54f8 moe preprocess op support 160 experts and fused_moe triton kernel name add K (#3121) 2025-08-01 10:46:20 +08:00
Yuan Xiaolan
7d87aaace8 optimize w4a8 decoding (#3050) 2025-07-28 22:20:13 +08:00
lizhenyun01
e80ea8a71b remove Synchronize in hadamard 2025-07-28 19:22:46 +08:00
Yiqun Liu
8f426c1690 Optimize the performance of moe_expert_ffn_wint2 (#2990)
* Change wint2 to ColumnMajor.

Change-Id: I6b44d02946a685f8fe24d9f2c7be258b51e16da2

* Unify default_wint2x_mma.

Change-Id: I9e77b0e8e6cecab01fedc0b24b536ee0a1a89ff7

* Change wint2 to ColumnMajorTileInterleave.

Change-Id: I593cbe36f991c0c5044989d65f0014087587c624

* Enable async copy for B.

Change-Id: Ia3ac37ad162a8cf3ccce4f268e81bd06c8ac3c46

* Add wint2x Dequantizer

* Remove TileDequanterB related codes.

Change-Id: Id8e65703b72a8984d367f584ff41b7726017fbb8

* Implement FastInterleavedAndBiasedNumericArrayConverter for wint2.

Change-Id: I438f2b18ab964a04ae1cdb09d9e7d9f7b95eafca

* Implement Wint2ParamsAccessor to load extra quant params from global memory.

Change-Id: Ic3750cd9b767df8893501820880c3342a4b47233

* Implement FastInterleavedAndBiasedNumericArrayConverter for wint2.

Change-Id: I438f2b18ab964a04ae1cdb09d9e7d9f7b95eafca

* Use async copy for local_scale.

Change-Id: Ib882ba41c3d2354bda4d25b40e2408ad3b2f7893

* Check and correct the load and dequantize of weights.

Change-Id: Ie8dca505b39987144964fe6407d465b3b5953790

* Change for performance tuning.

Change-Id: I1da026fb1d1533a9d70350c7ba23c27e896cfc29

* Optimize the global memory access size of local_scale reading.

Change-Id: I4cbe3a2ef5951723d415c2d3252ce912394beaf5

* Specialize mma_tensor_op for wint2 to enable fine-grained pipeline.

Change-Id: Icbb4d48f90a41136f42d6ffff42d68de32f408da

* Minor fix.

Change-Id: I14d4ac9d267ee05442a3b47f00c26bee13d79e6f

* optimizing dequant performance with LOP3

* optimizing dequant performance with LOP3

* Avoid redundant dequantization of local_scale and use bf16 as computing type.

Change-Id: I63239ebc8f8e4a92d6281af59840ba50600b4334

* Add Multiplier and remove some logs.

Change-Id: Ifa199d81e6aeb472d2247c63f85ef30213684bcd

* optimizing dequant performance with LOP3

* Use __byte_perm to implement int8 to float32 conversion for performance improvement.

* Use lop3 to optimize the dequantize of local_scale.

Change-Id: I6189759970cb5b8dcbef769724784b8a7533b63c

* Minor fix and remove some logs.

Change-Id: I6279ba9926d5041093b1c6aea200acf2e4c49d46

* Fix stages for test.

Change-Id: I6f7b7cac612ef2c678e9d49f5ffa60eb53d3ae29

* Fix stages for test and add clock64 to profile.

Change-Id: Iffaf7324beaa910ce9ee56f47ae289de98f1a267

* Use __byte_perm to replace shift-and-or operations for faster integer merging.

* Split the uint2b convert.

Change-Id: I78da672ce8968e21f685285140ba546a161521b4

* Optimize convert of unscale.

Change-Id: I6795da1cdf5e8ab38ddaa9836240921b5312913a

* Minor optimization.

Change-Id: I1800aec34c3f4621abb02658208108f54da44d88

* Optimize mma pipeline and refine codes.

Change-Id: Id3075cf7b88f2813a11ccd1d3b49c62c978f36b8

* Add missing support.

Change-Id: Id65b7bc2c25fbb1a5b232c6bc9fb8c9093f691a8

* Accelerate FP16 dequantization performance

* Support tile shape as Xx64x64.

Change-Id: Ib8fd37e1ba1d06f7d11f2956e7f1367b0a92bcac

* Remove debugging codes and minor optimization.

Change-Id: I6b79bd56a6e8dd823efc169967ecd3cc9a43baf4

* Fix offset bug.

Change-Id: Id7aeb91e99d6f51836f2aff22187b4f79607395e

* Fix typo.

Change-Id: I19dde93fc1c1f7e19605905c90dc46298e203952

* Restore some codes and remove some debugging logs.

Change-Id: I8d44daf82ad1c6f8174134d195e7b3fe9a3afdfb

---------

Co-authored-by: baoqiwen <baoqiwen@baidu.com>
2025-07-28 10:32:43 +08:00
xiaoxiaohehe001
2970b00dfa [Feature] Support_eplb (#2997)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb

* [Feature] support_eplb

* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
K11OntheBoat
e991777757 [Feature] DeepseekV3 use pd_build_static_op (#2948)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 15:03:41 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
Yuanle Liu
61b3997b85 refactor rl get_name_mappings_to_training (#2847)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training

* fix tp>1

* change variable name(ffn1->up_gate_proj/ffn2->down_proj)

* change variable name(linear_weight->weight/linear_bias->bias)

* add rl names mapping for vl

* fix ernie 0.3B error

* fix develop code

* fix
2025-07-15 07:31:42 -07:00
RichardWooSJTU
fee544e808 fix ep prefill (#2762) 2025-07-09 14:03:05 +08:00
Jiang-Jia-Jun
05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00