Files
FastDeploy/tools/deep_gemm_pre-compile
GoldPancake 79c2c52756
Some checks are pending
CE Compile Job / ce_job_pre_check (push) Waiting to run
CE Compile Job / print_ce_job_pre_check_outputs (push) Blocked by required conditions
CE Compile Job / FD-Clone-Linux (push) Blocked by required conditions
CE Compile Job / Show Code Archive Output (push) Blocked by required conditions
CE Compile Job / BUILD_SM8090 (push) Blocked by required conditions
CE Compile Job / BUILD_SM8689 (push) Blocked by required conditions
CE Compile Job / CE_UPLOAD (push) Blocked by required conditions
Deploy GitHub Pages / deploy (push) Waiting to run
deepgemm pre-compile tool support mixed parallel (#4282)
2025-09-26 18:43:39 +08:00
..
2025-09-22 14:27:17 +08:00

DeepGEMM Pre-compilation Tool

This tool provides pre-compilation functionality for DeepGEMM kernels to optimize performance.

Usage

bash pre_compile.sh \
    [MODEL_PATH] \
    [TP_SIZE] \
    [EP_SIZE] \
    [HAS_SHARED_EXPERTS] \
    [OUTPUT_FILE]

The script will:

  1. Generate configurations
  2. Pre-compile all kernels

2. Alternative: Manual Steps

If you need more control, you can run the steps manually:

Generate Configuration

python generate_config.py \
    --model /path/to/model \
    --tensor-parallel-size [TP_SIZE] \
    --expert-parallel-size [EP_SIZE] \
    --has-shared-experts [True/False] \
    --output [CONFIG_FILE]

Arguments:

  • --model: Path to model directory containing config.json
  • --tensor-parallel-size: Tensor parallel size (default: 1)
  • --expert-parallel-size: Expert parallel size (default: 8)
  • --has-shared-experts: Whether model has shared experts (default: False)
  • --output: Output config file path (default: ./deep_gemm_pre_compile_config.jsonl)

Pre-compile Kernels

python pre_compile.py \
    --config-file [CONFIG_FILE] \
    --expert-parallel-size [EP_SIZE] \
    --num-threads [NUM_THREADS]

Arguments:

  • --config-file: Path to config file generated in step 1
  • --expert-parallel-size: Expert parallel size (must match step 1)
  • --num-threads: Number of compilation threads (default: CPU cores)

Environment Variables

  • PRE_COMPILE_LOG_LEVEL: Set log level (DEBUG/INFO/WARNING/ERROR)
  • DG_CACHE_DIR: Cache directory for compiled kernels (default: ./deep_gemm_cache)

Notes

  • For best performance, set --num-threads to the number of available CPU cores
  • The compilation process may take significant time depending on configuration size
  • Compiled kernels will be cached in DG_CACHE_DIR