[XPU]add xpu ci ep case (#4432)

* add xpu ci case

* Add xDeepEP download and build steps

Download and build xDeepEP before running tests.

* Fix formatting and add missing sleep command

* Update Docker image version in CI workflow

* Modify run_ci_xpu.sh for log cleanup and error handling

Clean up log files before running tests and output worker log on failure.

* Enhance test_ep.py with process management and assertions

Refactor test function to include process cleanup and assertions.

* Replace test_fastdeploy_llm with test_fd_ep

* Fix conditional statement in run_ci_xpu.sh

* Update test_ep.py for string handling and formatting

Fix string encoding issues and improve readability.

* Rename test_ep.py to run_ep.py

* Change test script from test_ep.py to run_ep.py
This commit is contained in:
plusNew001
2025-10-21 19:19:40 +08:00
committed by GitHub
parent 175391389f
commit 2bd3fb6315
3 changed files with 111 additions and 2 deletions

View File

@@ -176,3 +176,33 @@ if [ ${kv_block_test_exit_code} -ne 0 ]; then
echo "kv block相关测试失败请检查pr代码"
exit 1
fi
echo "============================开始EP并行测试!============================"
sleep 5
rm -rf log/*
rm -f core*
xpu-smi
export XPU_VISIBLE_DEVICES="0,1,2,3"
export BKCL_ENABLE_XDR=1
export BKCL_RDMA_NICS=xgbe1,xgbe2,xgbe3,xgbe4
export BKCL_TRACE_TOPO=1
export BKCL_PCIE_RING=1
export XSHMEM_MODE=1
export XSHMEM_QP_NUM_PER_RANK=32
export BKCL_RDMA_VERBS=1
wget -q https://paddle-qa.bj.bcebos.com/xpu_third_party/xDeepEP.tar.gz
tar -xzf xDeepEP.tar.gz
cd xDeepEP
bash build.sh
cd -
python tests/ci_use/XPU_45T/run_ep.py
ep_exit_code=$?
if [ ${ep_exit_code} -ne 0 ]; then
echo "log/workerlog.0"
cat log/workerlog.0
echo "EP并行 相关测试失败请检查pr代码"
exit 1
fi