copilot-swe-agent[bot]
9e4eb339b8
Address code review feedback
...
- Translate Chinese comments to English for consistency
- Add subprocess import to api_server.py for TimeoutExpired handling
- Improve signal name detection in worker_process.py using signal.Signals
- Add better docstring comments for signal handlers and cleanup functions
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-12-23 12:15:24 +00:00
copilot-swe-agent[bot]
e09d6363d3
Add signal handlers for graceful process termination
...
- Added SIGINT/SIGTERM signal handlers in api_server.py (both OpenAI and simple versions)
- Added cleanup_processes() function to properly terminate worker processes
- Enhanced StandaloneApplication with worker exit hooks and cleanup
- Added signal handling in worker_process.py for graceful worker shutdown
- Added shutdown_event to coordinate graceful shutdown across threads
- Improved worker monitor to respect shutdown event
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-12-23 12:10:20 +00:00
Divano
c1aa66df02
Revert "[Optim] Remove limitation of number of kvcache blocks ( #5612 )" ( #5702 )
...
This reverts commit 9da89a374b .
2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun
9da89a374b
[Optim] Remove limitation of number of kvcache blocks ( #5612 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optim] Remove limitation of number of kvcache blocks
* Update fastdeploy/envs.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/worker/iluvatar_worker.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Add docs
* Update fastdeploy/worker/worker_process.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix ci case
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-23 11:18:29 +08:00
Yuanle Liu
8beb0158fa
[BugFix] fix rl signal ( #5681 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-22 00:35:54 -08:00
Yonghua Li
4f830aa505
[RL] provide options for whether shutdown comm group after weights cleared ( #5663 )
...
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [rl] provide options for whether shutdown comm group after weights cleared
* [fix] fix args hardcode
* [fix] change args type
* [fix] add worker process args
2025-12-19 07:06:48 -08:00
Yuanle Liu
689f54f671
[RL] Update worker_process.py ( #5651 )
2025-12-18 20:07:58 -08:00
fmiao2372
a8fce47195
[Intel HPU] enable kv cache scheduler v1 for hpu ( #5648 )
...
* [Intel HPU] enable kv cache scheduler v1 for hpu
* fix copilt comments
2025-12-19 12:03:39 +08:00
Yuanle Liu
b47674c796
[BugFix] fix rl model_weights_signal to support tp>1 ( #5639 )
2025-12-18 04:43:58 -08:00
yzwu
ac013803f3
[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode ( #5555 )
2025-12-18 02:14:25 -08:00
Yonghua Li
0c8c6369ed
[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports ( #5415 )
...
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports
* [fix] fix some bugs
* [fix] fix rdma port for cache manager/messager
* [fix] temporarily cancel port availability check to see if it can pass ci test
* [feat] simplify args for multi api server
* [fix] fix dp
* [fix] fix port for xpu
* [fix] add tests for ports post processing & fix ci
* [test] fix test_multi_api_server
* [fix] fix rdma_comm_ports args for multi_api_server
* [fix] fix test_common_engine
* [fix] fix test_cache_transfer_manager
* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER
* [fix] avoid api server from creating engine_args twice
* [fix] fix test_run_batch
* [fix] fix test_metrics
* [fix] fix splitwise connector init
* [test] add test_rdma_transfer and test_expert_service
* [fix] fix code syntax
* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
Yonghua Li
eeb99d2af5
[BugFix] skip model executing after clearing/updating is done ( #5527 )
...
* [fix] fix ep loop
* [fix] another try
* [fix] again
2025-12-16 17:39:03 +08:00
周周周
722de5ace1
[Others] Clean code ( #5543 )
2025-12-15 10:57:59 +08:00
kevin
954a145d57
[Optimization] support mm prefill batch ( #5313 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mm prefill batch
* update code
* update code
* update code
* update code
* fix encoder cache bug
* update code
* update code
* fix bug
* fix paddle ocr bug
* fix xpu bug
* update code
2025-12-11 22:21:14 +08:00
Jiang-Jia-Jun
4b3e41c665
[Optim] Improve task-checking performance in engine-worker-queue ( #5376 )
...
* [Optim] Optimize costtime in checking tasks in engine-worker-queue
* Update fastdeploy/engine/common_engine.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/inter_communicator/engine_worker_queue.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [Docs] Add docstring to set_exist_tasks method (#5382 )
* Initial plan
* Add docstring to set_exist_tasks method
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* [Docs] Add docstring documentation to exist_tasks() method (#5381 )
* Initial plan
* Add comprehensive docstring to exist_tasks() method
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* [Optimization] Conditionally initialize shared memory for single-node deployments only (#5383 )
* Initial plan
* Conditionally initialize exist_tasks_intra_signal for single-node deployments
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Use is_single_node flag for consistent deployment type checking
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Remove redundant None checks in exist_tasks methods
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* format code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com >
2025-12-11 10:33:32 +08:00
Yonghua Li
2ec76352da
[BugFix] fix instability after clearing weight ( #5493 )
...
* [BugFix] fix instability after clearing weight
* [chore] add todo
2025-12-11 10:22:35 +08:00
Yonghua Li
419b416376
[BugFix] [RL] remove shutdown_process_group/restart_process_group for RL ( #5433 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [fix] remove shutdown_process_group/restart_process_group for RL
* [chore] remove log
* [chore] remove log
* [chore] set log to debug level
2025-12-09 20:32:37 +08:00
RAM
b2908b8e82
[New][RL] Support Rollout Routing Replay ( #5405 )
...
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
* Revert "Revert "[RL] Support Rollout Routing Replay (#5321 )" (#5402 )"
This reverts commit c45e064f3d .
* Fix XPU and NPU bug
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-12-05 22:06:26 +08:00
Jiang-Jia-Jun
c45e064f3d
Revert "[RL] Support Rollout Routing Replay ( #5321 )" ( #5402 )
...
This reverts commit 96d2d4877b .
2025-12-05 20:19:39 +08:00
RAM
96d2d4877b
[RL] Support Rollout Routing Replay ( #5321 )
...
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-12-05 20:01:33 +08:00
Yonghua Li
f4119d51b4
[PD Disaggregation] support DP via v1 router and decouple DP and EP ( #5197 )
...
* [fix] support DP via v1 router and decouple DP and EP
* [fix] fix scripts
* [fix] reset model path
* [fix] dp use get_output_ep, fix router port type, update scripts
* [merge] merge with latest code
* [chore] remove some debug log
* [fix] fix code style check
* [fix] fix test_multi_api_server for log_dir name
* [chore] reduce logs
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-12-04 15:38:43 +08:00
qw86972190
6048ea37bd
[XPU]add enable_logprob ( #5279 )
...
* [XPU]Update document
* [XPU]Update documentation
* [XPU]add enable_logprob
* Fix code style issues
* “doc”
* “docs”
* “doc”
* Fix code style via pre-commit
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1498354.gajl.baidu.com >
2025-12-02 15:32:28 +08:00
Longzhi Wang
add524d80c
[Feature] support chunked moe ( #4575 )
...
* [Feature] support chunked moe
* update
* update
* fix and add test
* update
* fix conflict and modity test
* fix fused_moe
* fix fused_moe
* fix docstring
* fix
* fix typo
* fix test
* fix
* fix
* fix test
* fix test
2025-12-01 15:17:18 +08:00
Daci
f25ee3a26f
[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 ( #5140 )
...
* enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-26 10:22:35 +08:00
kevin
8e4e3ff510
[Feature] support eplb in api_server ( #4782 )
...
* support eplb in api_server
* update code
* add eplb test case
* update eplb
* support tp+dp eplb
* update test cese
* update code
* update code
* fix bug
* update copilot review
* update test case name
2025-11-24 20:22:29 +08:00
xiaozude
d5bd64336a
[Metax] support ENABLE_V1_KVCACHE_SCHEDULER ( #5163 )
2025-11-24 19:19:49 +08:00
xiaoxiaohehe001
95f3c8c641
[Fix] Fix eplb bug and support fp8 load weight ( #5178 )
...
* fix eplb part2
* fix eplb part2
* fix eplb part2
2025-11-24 15:31:37 +08:00
Daci
eab8384da6
[Feature] ThreadPoolExecutor async fill_token_bitmask ( #5083 )
...
* ThreadPoolExecutor async fill_token_bitmask
* ThreadPoolExecutor async fill_token_bitmask logging
* fix test_guided_decoding
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* add fill_bitmask_parallel_batch_size ENV
* FD_FILL_BITMASK_BATCH fastdeploy.envs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-19 10:04:16 +08:00
Juncai
36822fa49c
[PD Disaggregation] remove splitwise deployment on single node and refine the code ( #4891 )
...
* remove splitwise deployment on single node and refine the code
* up
* up
* up
* add test
* up
2025-11-14 09:56:53 +08:00
周周周
6c4ebc5fee
[worker_process.py]modify some var name ( #4749 )
2025-11-13 14:21:27 +08:00
Yuanle Liu
3dc0ffa46d
[TSP] Support qwen3 moe tsp + cudagraph ( #4871 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen3_moe tsp mode
* fix
* fix
* update
* update
* update
* fix
* support external_rmsnorm
* update
* fix
2025-11-10 23:37:51 +08:00
chenjian
78895e2c7d
[Bug Fix] fix bug for PD EP ( #4823 )
...
* fix bug for PD EP
* fix
* optimize perf for engine worker queue
* fix bug
* fix internode ll two stage
* fix for ci
* fix bug
2025-11-10 15:33:29 +08:00
Juncai
08ca0f6aea
[Feature] [PD] add simple router and refine splitwise deployment ( #4709 )
...
* add simple router and refine splitwise deployment
* fix
2025-11-06 14:56:02 +08:00
K11OntheBoat
62dfad4a5f
[PD Disaggregation] Support Qwen3-MoE use PD + EP inference. ( #4691 )
...
support Qwen-MoE PD/EP
2025-11-06 10:32:15 +08:00
chen
1c3ca48128
[Feature][Executor] GPU Model Runner Supports prompt_logprobs and max_logprobs ( #4769 )
2025-11-05 10:43:25 +08:00
李泳桦
1b61d62ecf
[fix] fix v0 pd, let worker step_shm_value create=False ( #4780 )
2025-11-04 20:37:57 +08:00
lzy
af7e0f27f3
supports internode_ll_two_stage ( #4162 )
...
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports internode_ll_two_stage
* supports D internode_ll_two_stage
* fix codestype
* fix xpu internode_ll_two_stage
* fix xpu internode_ll_two_stage
2025-11-04 16:35:40 +08:00
chenjian
25498efcf3
[Optimize] Support and robust for tpN for PD ( #4595 )
...
* [Optimize] Support and robust for tpN for PD
* fix
* fix
* support dpM tpN for cache messager
* fix
* fix token counter
* fix bug for merge develop
* fix bug
* robust cache messager for v0
2025-11-03 15:38:31 +08:00
chenjian
f83d0cf127
[Feature] Support eplb for fd ( #4599 )
...
* support eplb
* support eplb
---------
Co-authored-by: kevin <chengyf112@gmail.com >
2025-11-03 14:08:15 +08:00
李泳桦
0f75b62de2
[BugFix] Fix profile run in pd-disaggregated deployment ( #4584 )
...
* [fix] fix pd+dp+ep bug
* [fix] fix again
* [ci] fix code style
2025-10-31 14:42:00 +08:00
kevin
64e875b460
[Scheduler] update v1 prefill batch ( #4611 )
...
* update v1 prefill batch
* update code
* update code
2025-10-31 14:03:01 +08:00
RichardWooSJTU
0dde936e93
[BugFix] fix total_block_num init error in worker_process ( #4553 )
...
* fix total_block_num init error in worker_process
* fix req and token client
* fix req and token client
* fix xpu xi
* fix xpu ci
2025-10-28 20:42:12 -07:00
李泳桦
a012e3608b
[Feature] support logits processors ( #4515 )
...
* [feat] provide an interface for logits processors and a builtin LogitBiasLogitsProcessor
* [chore] fix code style
* [fix] add unit test & fix existing bugs
* [feat] add engine/worker arg --logits-processors
* [fix] redefine user args as logits_processors_args and fix some bugs
* [fix] fix test_sampler
* Update fastdeploy/model_executor/logits_processor/builtin.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/model_executor/logits_processor/__init__.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/model_executor/test_logits_processor.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [fix] fix typo
* Update fastdeploy/engine/sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [fix] fix bracelet
* [chore] redefine logits processor interface: pass the entire share_inputs into LP, do not copy share_inputs and logits
* [doc] add docs
* [fix] fix logit bias processor not applied when decoding is too fast & add docs and tests
* [fix] fix redundant code
* [feat] skip apply() if no bias is specified
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-10-29 00:08:53 +08:00
ming1753
561b9f38d3
[BugFix] fix paddleocr prefix cache bug ( #4625 )
...
* fix paddleocr prefix cache bug
* disable prefix-caching in ocr
2025-10-28 21:38:12 +08:00
lizhenyun01
4d2f478d53
[BugFix] fix TPDP mix parallel infer ( #4583 )
...
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-10-28 16:58:20 +08:00
freeliuzc
c63361fd1d
[Speculative Decoding][MTP]Support mtp in epdptp mode ( #4614 )
...
* support mtp many features
* support mtp reshard in rl mode
* fix function
* support mtp ep
* support mtp in hybird-dp-tp mode
* default open scheduler_v1 in mtp
2025-10-28 16:02:47 +08:00
ming1753
7681375a19
[BugFix] PaddleOCR-VL fix FD_DEBUG type and support v1 loader ( #4605 )
...
* [Bug Fix] PaddleOCRVL fix FD_DEBUG type and support HF model
* fix bug
* fix bug
* fix bug
2025-10-28 09:47:47 +08:00
kevin
8aab4e367f
[Feature] mm support prefix cache ( #4134 )
...
* support mm prefix caching
* update code
* fix mm_hashes
* support encoder cache
* add encoder cache
* update code
* update encoder cache
* fix features bug
* fix worker bug
* support processor cache, need to optimize yet
* refactor multimodal data cache
* update code
* update code
* update v1 scheduler
* update code
* update code
* update codestyle
* support turn off processor cache and encoder cache
* update pre-commit
* fix code
* solve review
* update code
* update code
* update test case
* set processor cache in GiB
* update test case
* support mm prefix caching for qwen model
* fix code style check
* update pre-commit
* fix unit test
* fix unit test
* add ci test case
* fix rescheduled bug
* change text_after_process to prompt_tokens
* fix unit test
* fix chat template
* change model path
* [EP] fix adapter bugs (#4572 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* fix v1 hang bug (#4573 )
* fix import image_ops error on some platforms (#4559 )
* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )
* add collect-env
* del files
* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578 )
* add new branch for sot
* reorder
* fix batch bug
* [XPU]Moe uses a new operator (#4585 )
* [XPU]Moe uses a new operator
* [XPU]Moe uses a new operator
* update response
* [Feature] Support Paddle-OCR (#4396 )
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
* [DataProcessor] add reasoning_tokens into usage info (#4520 )
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
* perf: Optimize task queue communication from engine to worker (#4531 )
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clean up ports after processing results (#4587 )
* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593 )
* [Others] api server exits when worker process is dead (#3271 )
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
* update docs
---------
Co-authored-by: ApplEOFDiscord <wwy640130@163.com >
Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com >
Co-authored-by: Ryan <zihaohuang@aliyun.com >
Co-authored-by: yyssys <atyangshuang@foxmail.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com >
Co-authored-by: SunLei <sunlei5788@gmail.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com >
2025-10-27 17:39:51 +08:00
chen
5c63a089f6
[Feature] Support logprobs_mode ( #4567 )
2025-10-27 14:27:48 +08:00
Yuanle Liu
cef3164c3b
Optimizing the performance of think length limit using custom operators ( #4279 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete impl
* delete min_length&max_length
* support limit thinking content strategy
* fix
* fix
* fix
* update
* fix set_value_by_flags_and_idx
* fix
* fix
* fix
* fix
* update
* fix
* fix
* fix typo
* fix ci
* fix
* fix
* support mtp
* fix
* fix
* update
* update
2025-10-20 21:09:13 +08:00