* test: add unit tests for fused_hadamard_quant_fp8 * test: add unit tests for moe_fused_hadamard_quant_fp8 * tests: simulate CUDA kernel's hadamard32_warp using butterfly operations * apply review * apply review