[Optimize] Optimize tensorwise fp8 performance (#2729)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled

* [Optimize] Optimize tensorwise fp8 performance
This commit is contained in:
ming1753
2025-07-07 20:06:28 +08:00
committed by GitHub
parent 1b54a2831e
commit ef6649a577
6 changed files with 318 additions and 88 deletions

View File

@@ -442,6 +442,7 @@ elif paddle.is_compiled_with_cuda():
"gpu_ops/scaled_gemm_f8_i4_f16_weight_quantize.cu",
"gpu_ops/cutlass_kernels/cutlass_heuristic.cu",
"gpu_ops/cutlass_kernels/cutlass_preprocessors.cu",
"gpu_ops/fused_hadamard_quant_fp8.cu"
]
sources += find_end_files(fp8_auto_gen_directory, ".cu")