Commit Graph

31 Commits

Author SHA1 Message Date
Jiang-Jia-Jun
4b3e41c665 [Optim] Improve task-checking performance in engine-worker-queue (#5376)
* [Optim] Optimize costtime in checking tasks in engine-worker-queue

* Update fastdeploy/engine/common_engine.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/inter_communicator/engine_worker_queue.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [Docs] Add docstring to set_exist_tasks method (#5382)

* Initial plan

* Add docstring to set_exist_tasks method

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* [Docs] Add docstring documentation to exist_tasks() method (#5381)

* Initial plan

* Add comprehensive docstring to exist_tasks() method

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* [Optimization] Conditionally initialize shared memory for single-node deployments only (#5383)

* Initial plan

* Conditionally initialize exist_tasks_intra_signal for single-node deployments

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Use is_single_node flag for consistent deployment type checking

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Remove redundant None checks in exist_tasks methods

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* format code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
2025-12-11 10:33:32 +08:00
zhouchong
5d9b5e4a5b [Engine] [Feature] Refactor async_llm:cross-process with EngineService,based on zmq communication (#4868)
* Refactor async_llm:cross-process with EngineService

* fix: async_llm output process

* fix: return prompt_token_ids and prompt_tokens in first res

* optimize common_engine start func
2025-12-09 10:53:40 +08:00
SunLei
5fb93d84f5 [Feature] [Benchmark]: add ZMQ-based FMQ implementation and benchmark tools (#5418)
* feat(fmq): add ZMQ-based FMQ implementation and benchmark tools

* move FMQ_CONFIG_JSON to envs

* fix top_p_candidates (#5400)

Co-authored-by: freeliuzc <lzc842650834@gmail.com>

* [RL] Support Rollout Routing Replay (#5321)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374)

* fix multi-inputs

* fix threshold

* fix threshold

* fix

* [BugFix]remove _execute_empty_input (#5396)

* Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)

This reverts commit 96d2d4877b.

* [New][RL] Support Rollout Routing Replay (#5405)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"

This reverts commit c45e064f3d.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* bf16 deepseek (#5379)

* fix deepseek (#5410)

* Update tests/inter_communicator/test_fmq_factory.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update benchmarks/benchmark_fmq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/inter_communicator/fmq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: GoldPancake <56388518+Deleter-D@users.noreply.github.com>
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
Co-authored-by: RAM <gstian5555@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: 周周周 <39978853+zhoutianzi666@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
2025-12-08 22:04:49 +08:00
Yonghua Li
f4119d51b4 [PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197)
* [fix] support DP via v1 router and decouple DP and EP

* [fix] fix scripts

* [fix] reset model path

* [fix] dp use get_output_ep, fix router port type, update scripts

* [merge] merge with latest code

* [chore] remove some debug log

* [fix] fix code style check

* [fix] fix test_multi_api_server for log_dir name

* [chore] reduce logs

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-04 15:38:43 +08:00
qwes5s5
117980dd4e [LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. (#5089)
* add prompt logprobs

* Merge prompt_logprobs_tensors and prompt_logprobs

* fix param check

* trigger ci

* fix unitest

* fix logprobs bug
2025-12-02 13:49:51 +08:00
fl0w2o48
e63d715fc3 [BugFix][Metrics] Fix Prometheus Multiprocess Metrics Issues and Add ZMQ Communication Metrics (#5185)
* [Feature] add metrics for ZMQ and fix multiprocess metrics

* fix test_metrics.py

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-11-27 15:05:09 +08:00
kevin
8e4e3ff510 [Feature] support eplb in api_server (#4782)
* support eplb in api_server

* update code

* add eplb test case

* update eplb

* support tp+dp eplb

* update test cese

* update code

* update code

* fix bug

* update copilot review

* update test case name
2025-11-24 20:22:29 +08:00
kevin
cceaba1c8d [Feature] remove to_numpy (#5162)
* remove to_numpy

* update code

* update name

* update code

* update code

* update code
2025-11-21 21:54:26 +08:00
chenjian
3ea1b44a58 [Optimization] Improve perf for fd response token with internal adapter (#4992)
* [Optimize] Improve perf for fd response token with internal adapter

* fix

* fix bug

* fix ci

* fix ci

* fix ci

* fix ci
2025-11-21 19:02:03 +08:00
kevin
109d48e456 [Feature] support async download features (#5003)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support async download features

* add test case

* update code
2025-11-19 22:23:36 +08:00
Juncai
36822fa49c [PD Disaggregation] remove splitwise deployment on single node and refine the code (#4891)
* remove splitwise deployment on single node and refine the code

* up

* up

* up

* add test

* up
2025-11-14 09:56:53 +08:00
SunLei
3098aee05f [Perf] Support tensor transmission between work and engine with zero-copy to improve efficiency (#4839)
* feat(zmq): support tensor transmission with zero-copy for improved efficiency

* perf: zmq.send disable copy

* zmq recv data for debug

* convert logprobs tensor to cpu
2025-11-11 15:43:11 +08:00
chenjian
78895e2c7d [Bug Fix] fix bug for PD EP (#4823)
* fix bug for PD EP

* fix

* optimize perf for engine worker queue

* fix bug

* fix internode ll two stage

* fix for ci

* fix bug
2025-11-10 15:33:29 +08:00
Juncai
08ca0f6aea [Feature] [PD] add simple router and refine splitwise deployment (#4709)
* add simple router and refine splitwise deployment

* fix
2025-11-06 14:56:02 +08:00
chenjian
25498efcf3 [Optimize] Support and robust for tpN for PD (#4595)
* [Optimize] Support and robust for tpN for PD

* fix

* fix

* support dpM tpN for cache messager

* fix

* fix token counter

* fix bug for merge develop

* fix bug

* robust cache messager for v0
2025-11-03 15:38:31 +08:00
李泳桦
0f75b62de2 [BugFix] Fix profile run in pd-disaggregated deployment (#4584)
* [fix] fix pd+dp+ep bug

* [fix] fix again

* [ci] fix code style
2025-10-31 14:42:00 +08:00
Daci
6426414a0f [Feature] EngineWorkerQueue anonymous port (#4597)
* EngineWorkerQueue 支持匿名端口设置

* EngineWorkerQueue 支持匿名端口设置

* EngineWorkerQueue 支持匿名端口设置

* EngineWorkerQueue 支持匿名端口设置

* EngineWorkerQueue 支持匿名端口设置
2025-10-28 10:22:37 +08:00
SunLei
dc1a9c7287 perf: Optimize task queue communication from engine to worker (#4531)
Some checks failed
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* perf: Optimize task queue communication from engine to worker

* perf: get_tasks to numpy

* perf: get_tasks remove to_numpy

* fix: request & replace ENV

* remove test_e2w_perf.py

* fix code style

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-10-25 22:45:38 +08:00
yinwei
20c7b741f4 [XPU] Support W4A8C8-TP4-300B Model (#4068)
* support w4a8

* delete ep block attn

* delete moe_topk_select

* update note

* update

* delte useless info

* update

* add some note

* fix some format

* update scale info

* add ans baseline

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-10-10 15:41:32 +08:00
李泳桦
6265f4385f [feat] support prefix cache clearing when /clear_load_weight is called (#4008)
* [feat] support clearing prefix cache (cherry-picked from release/2.1)

* [fix] fix ipc suffix, use port instead

* [fix] fix prefix caching not enabled

* [fix] fix key/value_cache_scales indent

* [fix] fix ep group all-reduce

* [fix] fix clear/update lock not working when workers > 1

* [chore] add preemption triggered info log

* [fix] fix code style

* [fix] fix max_num_seqs config

* [fix] do not force enable_prefix_caching=False in dynamic loading

* [fix] fix ci

* Revert "[fix] fix ci"

This reverts commit 0bc6d55cc8.

* [fix] initialize available_gpu_block_num with max_gpu_block_num

* [fix] fix config splitwise_role

* [fix] fix clearing caches synchronization and add more logs

* [chore] print cache_ready_signal in log

* [fix] fix scheduler_config.splitwise_role

* [fix] fix cache_messager cache_ready_signal create=True

* [fix] stop cache messager from launching in mixed deployment
2025-09-28 19:42:53 +08:00
zhupengyang
9082f625ba [xpu] use cpu barrier (#4181) 2025-09-23 12:19:03 +08:00
ltd0924
83720da79f [Feature] support clear data (#3601)
* [Feature] support clear data

* update

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2025-09-23 10:20:02 +08:00
chenjian
918ccdb123 [Feature] Support pd ep deployment with yiyan adapter (#4029)
* [Feature] Support mixed deployment with yiyan adapter in release2.2

* fix metrics

* add unit test

* add unit test

* add unit test

* Support pd ep deployment with yiyan adapter

* Support pd ep deployment with yiyan adapter

* refactor cache messager

* support scheduler v1 in PD

* suppport pd v1 + chunk prefill

* suppport pd v1 + chunk prefill

* add eplb

* support eplb

* support eplb

* support eplb

* support v1

* fix

* fix

* fix bug

* remove eplb support

* support prefix cache in P

* fix bug

* fix bug

* support one stop in V1

* fix bug

* fix ci

* fix ci

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-22 16:41:38 +08:00
chenjian
618ccdbfba [Feature] Support mixed deployment with yiyan adapter in develop (#3976)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] Support mixed deployment with yiyan adapter in release2.2

* fix metrics

* add unit test

* add unit test

* add unit test

* fix ci

* fix for eb5

* fix ci

* fix ci

* fix ci

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-18 01:52:20 +08:00
Yuanle Liu
4957908275 add input_processor plugin (#3657)
* add input_processor plugin

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2025-08-28 22:53:57 +08:00
gaoziyuan
82e64b13e1 [NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598)
* [Feature] update ep

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix queue ports idx

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* Update engine.py

* fix ci

* fix some bug in mixed ep

* add server fix and op fix

* rm some log

* fix code style

* ltd fix

* fix

* fix

* fix some bug

* fix bug

* fix bug

* fix style

* Update config.py

* Update splitwise_connector.py

* Update cache_messager.py

* Update __init__.py

* merge and fix

* Update engine.py

* Update common_engine.py

* Update run_ci_xpu.sh

* Update ernie_processor.py

* Update ernie_processor.py

---------

Co-authored-by: ltd0924 <ltd0924@sina.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
2025-08-26 19:59:02 +08:00
kevin
67298cf4c0 add error traceback info (#3419)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info

* update error msg

* update code

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-19 19:32:04 +08:00
ltd0924
bca8905b40 [BugFix] fix control signal release failed (#3390)
* [BugFix] fix control signal release failed

* [BugFix] fix control signal release failed

* update

* update

* update
2025-08-19 13:51:38 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ltd0924
d245d1ca6c [LLM] support send batch data and aggregate data (#2860)
* [LLM] support send batch data and aggregate data

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] update
2025-07-16 23:42:20 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00