[INTEL_HPU] [CI] enabled fastdeploy PR testing (#4596)

* [INTEL HPU] added hpu ci work flow support Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] added run ci hpu test scripts Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] enabled HPU ernie test case Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] updated Intel Gaudi Readme with Warmup disable cmdline Signed-off-by: Luo, Focus <focus.luo@intel.com> * Modify paddlepaddle installation command Updated paddlepaddle installation command to use a specific index URL. * Update run_ci_hpu.sh * Rename json directory to nlohmann_json Rename extracted json directory to nlohmann_json. * Update ci_hpu.yml * Set pip global index URL to Tsinghua mirror * Update CI workflow to use self-hosted runner and paths * Update Docker image in CI workflow * Modify HPU installation URLs in run_ci_hpu.sh Updated the installation URL for paddle_intel_hpu and added paddlenlp_ops installation. * Fix paddle_intel_hpu installation URL Corrected the URL for paddle_intel_hpu wheel installation. --------- Signed-off-by: Luo, Focus <focus.luo@intel.com> Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-12-24 13:28:13 +08:00 · 2025-11-17 19:24:41 +08:00
parent b23e684b67
commit c2c1942db9
5 changed files with 293 additions and 0 deletions
--- a/docs/get_started/installation/intel_gaudi.md
+++ b/docs/get_started/installation/intel_gaudi.md
@@ -57,7 +57,11 @@ export PADDLE_XCCL_BACKEND=intel_hpu
 export HABANA_PROFILE=0
 export HPU_VISIBLE_DEVICES=0

+#WARMUP Enabled
 HPU_WARMUP_BUCKET=1 HPU_WARMUP_MODEL_LEN=4096 FD_ATTENTION_BACKEND=HPU_ATTN python -m fastdeploy.entrypoints.openai.api_server --model ERNIE-4.5-21B-A3B-Paddle --tensor-parallel-size 1 --max-model-len 32768 --max-num-seqs 128
+
+#WARMUP Disabled
+HPU_WARMUP_BUCKET=0 HPU_WARMUP_MODEL_LEN=4096 FD_ATTENTION_BACKEND=HPU_ATTN python -m fastdeploy.entrypoints.openai.api_server --model ERNIE-4.5-21B-A3B-Paddle --tensor-parallel-size 1 --max-model-len 32768 --max-num-seqs 128 --graph-optimization-config '{"use_cudagraph":false}'
 ```

 ### 2. Launch the request
--- a/docs/zh/get_started/installation/intel_gaudi.md
+++ b/docs/zh/get_started/installation/intel_gaudi.md
@@ -57,7 +57,11 @@ export PADDLE_XCCL_BACKEND=intel_hpu
 export HABANA_PROFILE=0
 export HPU_VISIBLE_DEVICES=0

+#WARMUP Enabled
 HPU_WARMUP_BUCKET=1 HPU_WARMUP_MODEL_LEN=4096 FD_ATTENTION_BACKEND=HPU_ATTN python -m fastdeploy.entrypoints.openai.api_server --model ERNIE-4.5-21B-A3B-Paddle --tensor-parallel-size 1 --max-model-len 32768 --max-num-seqs 128
+
+#WARMUP Disabled
+HPU_WARMUP_BUCKET=0 HPU_WARMUP_MODEL_LEN=4096 FD_ATTENTION_BACKEND=HPU_ATTN python -m fastdeploy.entrypoints.openai.api_server --model ERNIE-4.5-21B-A3B-Paddle --tensor-parallel-size 1 --max-model-len 32768 --max-num-seqs 128 --graph-optimization-config '{"use_cudagraph":false}'
 ```

 ### 2. 发送请求