Commit Graph

3840 Commits

Author SHA1 Message Date
Juncai
36822fa49c [PD Disaggregation] remove splitwise deployment on single node and refine the code (#4891)
* remove splitwise deployment on single node and refine the code

* up

* up

* up

* add test

* up
2025-11-14 09:56:53 +08:00
kxz2002
9703108c28 [BugFix] adjust max_tokens and min_tokens when continue to generate tokens (#5010)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* fix max and min tokens initial commit

* fix double subtraction

* add unit tests
2025-11-13 23:52:54 +08:00
carryyu
6c3d1da62f fix conflicts 2025-11-13 20:30:29 +08:00
yangjianfengo1
ae7bee8122 【New Feature】W4afp8 supports per group quantization (#4987)
* w4afp8 支持per group

* code style

* fix transpose

* revert fast hardmard

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-13 19:17:27 +08:00
Echo-Nie
a5e949d9d0 [Feature] Enhance build script, add pre_wheel logic (#4729)
* Enhance build script, add pre_wheel logic

Updated copyright year and added precompiled wheel installation logic.

* update the nvidia_gpu.md, add pre_wheel description

* fix zh .md

* update the url, automatically detect CUDA and SM

* Fix GPU architecture string formatting in build.sh

* Change default for FD_USE_PRECOMPILED to 0

* fix build.sh

* add ./dist, pre-wheel path

* simplify the process,just save the whl

* del pre_wheel dir

* fix function name, extract_ops_from_precompiled_wheel

* fix docs

* add default commitID in docs

---------

Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-13 19:03:52 +08:00
Sunny-bot1
05da8e34c0 [BugFix][Metax] Fix metax compile issue in get_block_shape_and_split_kv_block (#5000)
* fix metax compile

* fix
2025-11-13 00:55:06 -08:00
zccjjj
88da9d9788 [XPU] [CI] Change CI ep test from offline to online (#4885)
* change CI ep test from offline to online

* add ep all2all ci's changes, from offline to online

* change env var in ep-all2all ci test

* add expected response for ep8tp8 all2all

* Adapt to CI refactoring and support dual-concurrent code execution

* Adapt to CI refactoring and support dual-concurrent, second

* Explicitly specify the #port

* change the startup method of all2all

* Modify the command of all2all

* Update assertion to check multiple keywords

* Update assertion to check multiple keywords

* Update run_w4a8.py

* Update run_w4a8.py

---------

Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-13 16:15:45 +08:00
bukejiyu
4a0d881e15 update (#4985)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-13 15:58:01 +08:00
周周周
6c4ebc5fee [worker_process.py]modify some var name (#4749) 2025-11-13 14:21:27 +08:00
Yonghua Li
6c5ab727c1 [BugFix] fix num_requests_running after clear_data (#4927)
* [BugFix] fix num_requests_running after clear_data

* [fix] fix tasks_list and stop flags not cleared when _free_blocks failed
2025-11-13 13:50:21 +08:00
Sunny-bot1
5b24013d46 skip DtoH capture (#4988) 2025-11-13 10:57:44 +08:00
Jiang-Jia-Jun
8329338d37 Update nvidia_gpu.md 2025-11-13 10:25:22 +08:00
ltd0924
303c986cc7 [FDConfig] add block number verfied (#4983)
* Update config.py

* fix

* update unit test

---------

Co-authored-by: ltd0924 <luotingdan@baidu.com>
2025-11-13 09:48:44 +08:00
YuBaoku
1c0b0b08b7 [CI] set DG_NVCC_OVERRIDE_CPP_STANDARD in test_quantized_linear (#4995)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-12 23:03:21 +08:00
Yuanle Liu
2272160faf fix mtp tsp (#4990) 2025-11-12 22:05:19 +08:00
ming1753
3148dbca06 [BugFix] fix VL fp8 bug when moe token_num is 0 (#4928)
* [BugFix] fix VL fp8 bug when moe token_num is 0

* fix bug

* format

* fix bug
2025-11-12 21:19:36 +08:00
Jiang-Jia-Jun
c8140326fa Update nvidia_gpu.md 2025-11-12 20:50:09 +08:00
bukejiyu
f0189292df [CI] fix test_model_cache (#4982)
* ci

* update
2025-11-12 20:26:49 +08:00
qwes5s5
a2d06118e1 [Logprobs]Support prompt_logprobs and max_logprobs (#4897)
* add prompt logprobs

* trigger ci

* fix unitest

* Update fastdeploy/config.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/entrypoints/llm.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/engine/sampling_params.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/engine/test_sampling_params.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/engine/test_sampling_params.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix max_logprobs

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-12 19:29:48 +08:00
Lucas
da7863ae85 [XPU] fix text_image_gather_scatter when image_token_num == token_num && text_token_num == 1 (#4882) 2025-11-12 17:13:22 +08:00
JYChen
a1218076dc remove load default_v1 since already been as default (#4980) 2025-11-12 16:49:48 +08:00
xiaozude
c45b3ccb52 [Metax] optimize flash mla (#4915) 2025-11-12 16:43:46 +08:00
MingkunZhang
9d9f5df8d0 [Metax] support default_v1 loader & thinking model (#4956)
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-12 16:32:26 +08:00
BossPi
bde6e2f931 [BugFix] Avoid loading training file (#4966)
* bug fix

don't put scheduler.pdparams into model weights

* run pre-commit
2025-11-12 15:49:14 +08:00
plusNew001
c7b589d75b [CI][XPU] Fix EP Case Bug (#4976)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* Update health check endpoint to use port variable

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update installation method for paddlepaddle-xpu

Revert to installing paddlepaddle-xpu from the official repository.

* Modify XPU_VISIBLE_DEVICES based on GPU_ID

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-12 15:23:28 +08:00
bukejiyu
6e2e2fcd29 xpu (#4969) 2025-11-12 15:12:59 +08:00
ltd0924
5bf48de999 [KVCache] support unified cache backend (#4903)
* [Feature] support unified cache backend

* fix

* fix

* fix

* fix

* Update metax_model_runner.py

* fix

* update

* Update test_moba_attention_backend.py

---------

Co-authored-by: ltd0924 <luotingdan@baidu.com>
2025-11-12 14:54:52 +08:00
yzwu
76e60e98f8 [Iluvatar][CI] fix safetensors_rust.SafetensorError: framework paddle is invalid (#4972) 2025-11-12 14:13:40 +08:00
Sunny-bot1
35bd2afab3 [Benchmark] Add GEMM & MoE kernel bench (#4809) 2025-11-12 11:56:40 +08:00
YuBaoku
8a96944a0a [CI] Update PORT range to avoid conflict with system ports (#4953) 2025-11-12 11:17:49 +08:00
Jiang-Jia-Jun
09cd6c5d3e Modify README 2025-11-12 11:03:23 +08:00
YuBaoku
9c52d9eb8f [CI] remove useless tests in docker_build (#4974)
* [CI] fix

* [CI] fix apt_sources error of focal in docker_build

* [CI] remove useless tests in docker_build
2025-11-12 10:55:09 +08:00
Echo-Nie
ff653503ff [Docs] Add License in Unittest (#4957)
* add copyright

* add CopyRight
2025-11-12 10:44:09 +08:00
Echo-Nie
2aabaecbc2 [CI] Add five unittest (#4958)
* add unittest

* Update test_logger.py
2025-11-12 10:43:33 +08:00
plusNew001
a5103eb198 [CI][XPU] Change Paddle Version to Nightly (#4973)
* Update health check endpoint to use port variable

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update installation method for paddlepaddle-xpu

Revert to installing paddlepaddle-xpu from the official repository.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-12 10:29:16 +08:00
bukejiyu
b09ebb2813 refactor pt loading (#4532)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-11 21:30:39 +08:00
YuBaoku
4c911ecb74 [CI] fix apt_sources error of focal in docker_build (#4961)
* [CI] fix

* [CI] fix apt_sources error of focal in docker_build
2025-11-11 20:35:06 +08:00
plusNew001
f20f29fc79 [CI][XPU]Update health check endpoint to use port variable (#4965)
* Update health check endpoint to use port variable

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update scripts/run_ci_xpu.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-11 20:19:53 +08:00
周周周
da6b4c10e5 [ATTENTION] make buffer alloc as a function (#4945) 2025-11-11 19:17:08 +08:00
yzwu
08b96baa4a [Iluvatar][Doc] Add ERNIE-4.5-VL-28B-A3B-Thinking doc (#4955) 2025-11-11 19:15:19 +08:00
chen
896ef565cc [Others] Add Tests for GPU Model Runner and Logprobs Output (#4913) 2025-11-11 18:37:33 +08:00
kxz2002
a83250ae3f [CI] Update test_api_key.py (#4948)
* fix test_api_key

* fix test_api_key
2025-11-11 16:49:54 +08:00
K11OntheBoat
76be598129 replace paddle.max by numpy to avoid useless error log (#4893)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-11-11 16:28:05 +08:00
SunLei
3098aee05f [Perf] Support tensor transmission between work and engine with zero-copy to improve efficiency (#4839)
* feat(zmq): support tensor transmission with zero-copy for improved efficiency

* perf: zmq.send disable copy

* zmq recv data for debug

* convert logprobs tensor to cpu
2025-11-11 15:43:11 +08:00
plusNew001
8b61f01c68 [CI][XPU]Update run_ci_xpu.sh to lock paddlepaddle-xpu version (#4949)
Temporarily lock paddlepaddle-xpu version due to framework update.
2025-11-11 15:38:05 +08:00
Lucas
5280b9e0b4 [XPU] fix xpu deployment md (#4941)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-11 14:39:52 +08:00
yinwei
215cda2f80 [XPU][Doc]Update XPU release2.3 note (#4939)
* update doc

* update

* update

* udpate
2025-11-11 11:57:49 +08:00
Jiang-Jia-Jun
3f09ebf3da Update model names in FastDeploy v2.3 release notes 2025-11-11 11:53:26 +08:00
LiqinruiG
75294bcfb1 [Docs] add ERNIE-4.5-VL-28B-A3B-Thinking instruction (#4944)
* [Docs] Improve reasoning_out docs

* [Docs] Improve reasoning_out docs

* [Docs] Improve reasoning_out docs

* [Docs] add ERNIE-4.5-VL-28B-A3B-Thinking  instruction

* [Docs] add ERNIE-4.5-VL-28B-A3B-Thinking  instruction

* [Docs] add ERNIE-4.5-VL-28B-A3B-Thinking  instruction

* [Docs] add ERNIE-4.5-VL-28B-A3B-Thinking  instruction

---------

Co-authored-by: liqinrui <liqinrui@baidu.com>
2025-11-11 11:40:52 +08:00
Jiang-Jia-Jun
c0a4e2b63b Update README.md 2025-11-11 11:38:30 +08:00