Files
FastDeploy/tools/deep_gemm_pre-compile
GoldPancake 4bd6a9fa7d [Bugs] Fix DeepGEMM pre-compile tools. (#3351)
Fix some miss cache problems.
Add README.md.
2025-08-15 14:37:49 +08:00
..

DeepGEMM Pre-compilation Tool

This tool provides pre-compilation functionality for DeepGEMM kernels to optimize performance.

Usage

bash pre_compile.sh \
    [MODEL_PATH] \
    [TP_SIZE] \
    [EP_SIZE] \
    [HAS_SHARED_EXPERTS] \
    [OUTPUT_FILE]

The script will:

  1. Generate configurations
  2. Pre-compile all kernels

2. Alternative: Manual Steps

If you need more control, you can run the steps manually:

Generate Configuration

python generate_config.py \
    --model /path/to/model \
    --tensor-parallel-size [TP_SIZE] \
    --expert-parallel-size [EP_SIZE] \
    --has-shared-experts [True/False] \
    --output [CONFIG_FILE]

Arguments:

  • --model: Path to model directory containing config.json
  • --tensor-parallel-size: Tensor parallel size (default: 1)
  • --expert-parallel-size: Expert parallel size (default: 8)
  • --has-shared-experts: Whether model has shared experts (default: False)
  • --output: Output config file path (default: ./deep_gemm_pre_compile_config.jsonl)

Pre-compile Kernels

python pre_compile.py \
    --config-file [CONFIG_FILE] \
    --expert-parallel-size [EP_SIZE] \
    --num-threads [NUM_THREADS]

Arguments:

  • --config-file: Path to config file generated in step 1
  • --expert-parallel-size: Expert parallel size (must match step 1)
  • --num-threads: Number of compilation threads (default: CPU cores)

Environment Variables

  • PRE_COMPILE_LOG_LEVEL: Set log level (DEBUG/INFO/WARNING/ERROR)
  • DG_CACHE_DIR: Cache directory for compiled kernels (default: ./deep_gemm_cache)

Notes

  • For best performance, set --num-threads to the number of available CPU cores
  • The compilation process may take significant time depending on configuration size
  • Compiled kernels will be cached in DG_CACHE_DIR