yangjianfengo1
|
93fcf7e4ec
|
【New Feature】W4afp8 supports per group quantization (#4272)
* w4afp8 支持per group
* code style
* 精度完成
* revert append attn utils
* ffn1 动态量化
* ffn2 支持动态量化
* code style
* code style
* 修改单测
* 修改单测
* fix bug
* Implement conditional parameter creation for layers
Add parameter creation for up_gate_proj_in_scale when ep_size > 1.
* code style
* fix conflict
* code style
* code style
* 修复w4aint8 精度
* fix ci
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
|
2025-11-05 21:00:23 +08:00 |
|
Zhenghai Zhang
|
1712e1351b
|
【Hackathon 9th No.86】autogen MoeFastHardamardImplWrapper template_instantiation (#4592)
* autogen MoeFastHardamardImplWrapper template_instantiation
* fix codestyle
* fix codestyle
* add impl cu files
|
2025-10-30 10:28:36 +08:00 |
|
yangjianfengo1
|
8e1b35a09b
|
【Fix bug] w4afp8 的nblock固定为256,并且fa3的append attn 增加mask参数 (#3771)
* fix w4afp8
* 增加集中式配置
* codestyle
* fix fa3 append attn
|
2025-09-02 19:17:01 +08:00 |
|
Yuan Xiaolan
|
c71ee0831c
|
add w4afp8 offline script (#3636)
|
2025-08-29 17:56:05 +08:00 |
|
Yuan Xiaolan
|
9205c88da1
|
support w4afp8 EP inference (#3044)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
|
2025-08-25 11:27:45 +08:00 |
|
yangjianfengo1
|
e5aa7087db
|
【bug fix】修复w4a8编译慢 (#3510)
* 修复w4a8编译
* code style
* 修复tma copy
|
2025-08-21 18:50:14 +08:00 |
|
yangjianfengo1
|
b047681c5d
|
【New Feature】支持Fp8 group Gemm 24稀疏 (#3463)
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* 支持24稀疏
* code style
* 增加stmatrix 宏定义判断
* code style
|
2025-08-19 02:54:47 -07:00 |
|
yangjianfengo1
|
89397516a8
|
[New Feature] Support W4Afp8 MoE GroupGemm (#3171)
* init
* 增加多线程编译
* fix bug
* fix bug
* code style
* 增加fp16
* 将print替换成assert
* 修复stmatrix
* 减小单测shape
* 减小单测shape
|
2025-08-06 10:34:05 +08:00 |
|
Zero Rains
|
25698d56d1
|
polish code with new pre-commit rule (#2923)
|
2025-07-19 23:19:27 +08:00 |
|
Jiang-Jia-Jun
|
92c2cfa2e7
|
Sync v2.0 version of code to github repo
|
2025-06-29 23:29:37 +00:00 |
|