mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-04 08:16:42 +08:00

Files

GoldPancake 4bd6a9fa7d [Bugs] Fix DeepGEMM pre-compile tools. (#3351 )

Fix some miss cache problems.
Add README.md.

2025-08-15 14:37:49 +08:00

generate_config.py

[Bugs] Fix DeepGEMM pre-compile tools. (#3351 )

2025-08-15 14:37:49 +08:00

pre_compile.py

[Bugs] Fix DeepGEMM pre-compile tools. (#3351 )

2025-08-15 14:37:49 +08:00

pre_compile.sh

[Bugs] Fix DeepGEMM pre-compile tools. (#3351 )

2025-08-15 14:37:49 +08:00

README.md

[Bugs] Fix DeepGEMM pre-compile tools. (#3351 )

2025-08-15 14:37:49 +08:00

README.md

DeepGEMM Pre-compilation Tool

This tool provides pre-compilation functionality for DeepGEMM kernels to optimize performance.

Usage

1. Using Shell Script (Recommended)

bash pre_compile.sh \
    [MODEL_PATH] \
    [TP_SIZE] \
    [EP_SIZE] \
    [HAS_SHARED_EXPERTS] \
    [OUTPUT_FILE]

The script will:

Generate configurations
Pre-compile all kernels

2. Alternative: Manual Steps

If you need more control, you can run the steps manually:

Generate Configuration

python generate_config.py \
    --model /path/to/model \
    --tensor-parallel-size [TP_SIZE] \
    --expert-parallel-size [EP_SIZE] \
    --has-shared-experts [True/False] \
    --output [CONFIG_FILE]

Arguments:

--model: Path to model directory containing config.json
--tensor-parallel-size: Tensor parallel size (default: 1)
--expert-parallel-size: Expert parallel size (default: 8)
--has-shared-experts: Whether model has shared experts (default: False)
--output: Output config file path (default: ./deep_gemm_pre_compile_config.jsonl)

Pre-compile Kernels

python pre_compile.py \
    --config-file [CONFIG_FILE] \
    --expert-parallel-size [EP_SIZE] \
    --num-threads [NUM_THREADS]

Arguments:

--config-file: Path to config file generated in step 1
--expert-parallel-size: Expert parallel size (must match step 1)
--num-threads: Number of compilation threads (default: CPU cores)

Environment Variables

PRE_COMPILE_LOG_LEVEL: Set log level (DEBUG/INFO/WARNING/ERROR)
DG_CACHE_DIR: Cache directory for compiled kernels (default: ./deep_gemm_cache)

Notes

For best performance, set --num-threads to the number of available CPU cores
The compilation process may take significant time depending on configuration size
Compiled kernels will be cached in DG_CACHE_DIR