Commit Graph

  • fa43c5f83e Fix grammar in log message copilot-swe-agent[bot] 2025-12-17 09:49:48 +00:00
  • f35fa87e5f Fix connection release logic and add bounds checking to prevent negative counter copilot-swe-agent[bot] 2025-12-17 09:47:39 +00:00
  • 041a361f8a Address code review feedback: move imports, fix race condition, improve exception handling copilot-swe-agent[bot] 2025-12-17 09:45:57 +00:00
  • cd844979e9 Remove unused connection_semaphore and fix manual release calls copilot-swe-agent[bot] 2025-12-17 09:42:24 +00:00
  • 581fed5f8e Use shared memory to enforce global concurrency limit across workers copilot-swe-agent[bot] 2025-12-17 09:39:34 +00:00
  • 10b9f19441 Fix concurrency control logic to not divide by workers copilot-swe-agent[bot] 2025-12-17 09:23:24 +00:00
  • e2152db758 Initial plan copilot-swe-agent[bot] 2025-12-17 09:18:45 +00:00
  • e65000af20 [Cherry-Pick][BugFix] fix speculate_limit_thinking_content_length #5590 (#5591) Yuanle Liu 2025-12-17 17:05:09 +08:00
  • d67b64d5e1 add detoken switch (#5463) (#5572) qwes5s5 2025-12-17 17:04:45 +08:00
  • 19653ee03a [Speculative Decoding]Support different inferseed in speculate decoding (#5569) feature/experimental_feature_20250908 freeliuzc 2025-12-17 16:54:24 +08:00
  • a7359d1c1d [Cherry-Pick][CI]Support different inferseed in speculate decoding(#5568) (#5597) freeliuzc 2025-12-17 16:53:47 +08:00
  • 404cf0ece4 [Intel HPU] enable tensor_wise_fp8 (#5324) fmiao2372 2025-12-17 16:45:03 +08:00
  • 15f5112ecb [Speculative Decoding]Support different inferseed in speculate decoding (#5568) freeliuzc 2025-12-17 16:14:29 +08:00
  • 80fb530ce2 [XPU][CI] xpu add ci test for pd (#5610) ddchenhao66 2025-12-17 16:07:44 +08:00
  • 0c8c6369ed [Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415) Yonghua Li 2025-12-17 15:50:42 +08:00
  • cdc0004894 Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563)" (#5611) Yuanle Liu 2025-12-17 13:59:06 +08:00
  • 21fa2baa51 [CI] disable test_prefix_cache_manager.py in unit_test YuBaoku 2025-12-17 10:48:02 +08:00
  • 8981ce8fa3 [Cherry-Pick][RL] R3 Support RDMA Store (#5454) RAM 2025-12-17 09:50:53 +08:00
  • c19af496cb [Cherry-Pick][RL] R3 Support RDMA Store(#5467) (#5468) RAM 2025-12-17 09:50:40 +08:00
  • e29b005520 [Others] Clean code && remove GPU sync code (#5548) 周周周 2025-12-16 21:09:37 +08:00
  • 867803ae10 [BugFix] fix speculate_limit_thinking_content_length (#5590) Yuanle Liu 2025-12-16 20:31:45 +08:00
  • 7140939c51 [BugFix] fix video bug (#5557) kevin 2025-12-16 20:06:50 +08:00
  • 27ef3610b5 support glm fa3 (#5586) chen 2025-12-16 19:33:27 +08:00
  • 2ad3bff4ff [Optim] Optimize costtime in checking tasks in engine-worker-queue (#5580) Jiang-Jia-Jun 2025-12-16 19:27:31 +08:00
  • 531b96adce [Cherry-Pick][CI] Adape unit_test due to Paddle update(#5576) (#5589) YuBaoku 2025-12-16 19:27:06 +08:00
  • 55609a51fc [CI] 【Hackathon 9th Sprint No.36】NO.36 功能模块单测补充 (#5058) xunyoyo 2025-12-16 19:19:03 +08:00
  • 73e1d6aa90 [Feature] add ue8m0 for per_token_quant_fp8 (#5563) fxyfxy777 2025-12-16 18:40:12 +08:00
  • eeb99d2af5 [BugFix] skip model executing after clearing/updating is done (#5527) Yonghua Li 2025-12-16 17:39:03 +08:00
  • 6fc5eccf83 [RL] R3 Support RDMA Store (#5467) RAM 2025-12-16 16:50:13 +08:00
  • a30b4da260 [Feature] Tracing: Fine-Grained Tracing for Request Latency Part1 (#5458) xiaolei373 2025-12-16 16:36:09 +08:00
  • 53158b7f8d [Cherry-Pick][CI] Adape unit_test due to incompatibility change(#5578) (#5583) YuBaoku 2025-12-16 15:45:49 +08:00
  • 06e0aa16d0 Update fastdeploy/engine/common_engine.py feature/optim_engine_worker_exist_tasks_20251216 Jiang-Jia-Jun 2025-12-16 15:25:18 +08:00
  • c9b47f90ce [BugFix] fix cpu prefix cache bug (#5544) kevin 2025-12-16 14:21:42 +08:00
  • 196d6240e5 [Cherry-Pick][BugFix] Cp fix cpu prefix cache bug(#5544) (#5546) kevin 2025-12-16 14:21:35 +08:00
  • f223e42c49 [Optim] Optimize costtime in checking tasks in engine-worker-queue Jiang-Jia-Jun 2025-12-16 14:10:45 +08:00
  • 5d2b16e6f3 [CI] Remove test_metrics.py due to incompatible forced merge (#5578) YuBaoku 2025-12-16 14:04:46 +08:00
  • 021399f7c9 Revert "[Feature] Use paddle.compat.enable_torch_proxy in `fastdeploy/__ini…" (#5579) Jiang-Jia-Jun 2025-12-16 13:55:27 +08:00
  • 50100f98d7 [Feature] Support fusedmoe on Blackwell (#5325) Echo-Nie 2025-12-16 11:58:50 +08:00
  • 63fff8df70 [CI] Adapt vl_model baseline changes due to Paddle update (#5576) YuBaoku 2025-12-16 11:42:31 +08:00
  • 9f74233966 【NewFeature】support load fp8 weight (#5566) gaoziyuan 2025-12-16 11:24:17 +08:00
  • 5db08cc1d5 【NewFeature】support load fp8 weight (#5565) gaoziyuan 2025-12-16 11:23:57 +08:00
  • 8b6395478a Revert "[BugFix] reschedule_preempt_task append waiting & PREEMPTED blocksize…" (#5575) Jiang-Jia-Jun 2025-12-16 11:12:57 +08:00
  • 9058cc712d Update gpu_model_runner.py Jiang-Jia-Jun 2025-12-16 11:12:07 +08:00
  • 77ff0cb242 [Cherry-Pick][Quantization][BugFix] Support w4afp8 dynamic quant(#5282 #5429) (#5535) Sunny-bot1 2025-12-16 11:09:48 +08:00
  • 075bd71272 Remove GPUMemoryChecker initialization Jiang-Jia-Jun 2025-12-16 11:09:27 +08:00
  • ff45ac078e [Feature] Use paddle.compat.enable_torch_proxy in fastdeploy/__init__.py (#5211) Jundong Liu 2025-12-16 11:05:30 +08:00
  • 9e8c46c526 [CI] 【Hackathon 9th Sprint No.34】NO.34 功能模块单测补充 (#5057) xunyoyo 2025-12-15 20:29:25 +08:00
  • 99b40247ea [Cherry-Pick][BugFix] fix dynamic c8 in v1 loader(#5562) (#5519) Yuanle Liu 2025-12-15 20:08:07 +08:00
  • b8e4828373 [BugFix] fix dynamic c8 in v1 loader (#5562) Yuanle Liu 2025-12-15 20:07:54 +08:00
  • 4c76171b57 [Optimize][Cherry-pick] Robust stabilty for PD deployment #5338 (#5395) chenjian 2025-12-15 18:58:09 +08:00
  • 64e4a51991 Fix speculate decoding write cachekv bug (#5517) freeliuzc 2025-12-15 18:27:33 +08:00
  • 532f9ba227 [BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding (#5491) freeliuzc 2025-12-15 18:27:11 +08:00
  • 5265d844e9 [Metax] fix GetStopFlagsMulti kernel crash issue (#5556) MingkunZhang 2025-12-15 17:56:20 +08:00
  • 0fa40f5f0c Fix bug for caching output when preempted (#5510) chenjian 2025-12-15 17:25:55 +08:00
  • 0100ee885f Fix bug for caching output when preempted (#5502) chenjian 2025-12-15 17:25:35 +08:00
  • 5bdef760a2 [Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486) (#5536) chen 2025-12-15 15:53:34 +08:00
  • 9f70f4310e [PD Disaggregation][XPU] update_inputs_v1 operator supports PD (#5550) ddchenhao66 2025-12-15 15:39:38 +08:00
  • 97e340eb14 [CE]add pd router and wint4 tp4 config (#5554) xiegegege 2025-12-15 15:25:14 +08:00
  • 7b0fdf7055 add check health in FD (#5534) chenjian 2025-12-15 15:14:45 +08:00
  • 77f8ba06e7 [Metax] fix release2.4 and support cudagraph (#5547) zhang-chenyi 2025-12-15 14:23:33 +08:00
  • 4bd991aa17 [CI]【Hackathon 9th Sprint No.22】功能模块 fastdeploy/input/ernie4_5_vl_processor/ernie4_5_vl_processor.py 单测补充 (#5263) kesmeey 2025-12-15 14:00:53 +08:00
  • 722de5ace1 [Others] Clean code (#5543) 周周周 2025-12-15 10:57:59 +08:00
  • d01cb274d6 [Graph Optimization][CI] Add ERNIE45T 21B sot test (#5538) Ryan 2025-12-13 00:43:15 +08:00
  • bebd722b5d fix encoder cache bug (#5528) kevin 2025-12-12 19:25:03 +08:00
  • 92119773c7 [CI][XPU] add mtp case (#5537) Jiaxin Sui 2025-12-12 19:14:40 +08:00
  • dbedb0797b [BugFix] reschedule_preempt_task append waiting & PREEMPTED blocksize (#5506) Daci 2025-12-12 17:43:29 +08:00
  • a389bb7c5c [Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486) chen 2025-12-12 17:10:17 +08:00
  • 13cc7dacfd [Doc]add text/vl cinn ce config (#5532) tianlef 2025-12-12 16:16:06 +08:00
  • 12c76f8137 [XPU] add speculate_get_logits (#5497) RuohengMa 2025-12-12 15:38:30 +08:00
  • 12e0206d4d [Cherry-Pick] [BugFix] [RL] skip model executing after clearing/updating is done (#5527) (#5523) Yonghua Li 2025-12-12 14:56:09 +08:00
  • 888c4b992d [XPU] refactor of block_attn param 'pos_emb_type' (#5511) Lucas 2025-12-12 14:30:09 +08:00
  • 4eb55332f6 [Models] Add forward_meta to VocabParallelEmbedding of all models (#5524) Ryan 2025-12-12 14:11:31 +08:00
  • 6cc3cb4bcf fix mtp multi batch (#5521) cmcamdy 2025-12-12 14:11:20 +08:00
  • d67388a479 [PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill (#5514) Juncai 2025-12-12 14:05:36 +08:00
  • f32e331ef5 [Metax] add ci yaml (#5520) MingkunZhang 2025-12-12 13:35:38 +08:00
  • 8d477e3d01 [CI]【Hackathon 9th Sprint No.25】功能模块 fastdeploy/input/ernie4_5_vl_processor/image_preprocessor/image_preprocessor_adaptive.py 单测补充 (#5265) kesmeey 2025-12-12 12:45:06 +08:00
  • 909059c60a [Feature] Support for request-level speculative decoding metrics monitoring. (#5518) GoldPancake 2025-12-12 12:22:18 +08:00
  • 3c1f7b85a4 [XPU] support get hidden state for mix (#5513) cmcamdy 2025-12-12 10:31:20 +08:00
  • 954a145d57 [Optimization] support mm prefill batch (#5313) kevin 2025-12-11 22:21:14 +08:00
  • 7116982995 [CI] Reduce timeout of send_request in test_mtp (#5512) YuBaoku 2025-12-11 20:40:00 +08:00
  • 4e5e36ec9c [[Cherry-Pick][BugFix] fix hung when n>1 and --enable-logprob (#5492)(#5499) (#5498) chen 2025-12-11 20:03:22 +08:00
  • 747b16e021 [BugFix] Fix MTP no logprobs when enable_logprob (#5499) chen 2025-12-11 19:57:22 +08:00
  • 4066dfb4a6 RL fix (#5503) bukejiyu 2025-12-11 19:25:27 +08:00
  • 71781b56e1 RL fix (#5505) bukejiyu 2025-12-11 19:25:24 +08:00
  • c3aaa7e441 [BugFix] Fixed build script issue on Intel HPU platforms (#5455) FocusLuo 2025-12-11 16:36:37 +08:00
  • e58fed3665 [Graph Optimization][BugFix][CI] Fix 0size bug && add unitest (#5495) Ryan 2025-12-11 16:25:26 +08:00
  • e1347be4d9 [Docs] Fix nvidia_gpu.md, add sm80 in precompiled (#5462) Echo-Nie 2025-12-11 14:41:50 +08:00
  • f133ce501c [CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test YuBaoku 2025-12-11 14:20:53 +08:00
  • b43563977d [CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test YuBaoku 2025-12-11 14:14:30 +08:00
  • 9f4512c932 [CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test YuBaoku 2025-12-11 14:12:49 +08:00
  • ff353b922f [Others] update tbo related code (#5485) 周周周 2025-12-11 12:34:46 +08:00
  • 510b82173a [Benchmark] Update benchmark (#5496) Zhang Yulong 2025-12-11 11:53:12 +08:00
  • 6289cbc434 [BugFix] fix hung when n>1 and --enable-logprob (#5492) chen 2025-12-11 10:46:27 +08:00
  • 4b3e41c665 [Optim] Improve task-checking performance in engine-worker-queue (#5376) Jiang-Jia-Jun 2025-12-11 10:33:32 +08:00
  • 2ec76352da [BugFix] fix instability after clearing weight (#5493) Yonghua Li 2025-12-11 10:22:35 +08:00
  • 7019afbb86 [BugFix] fix instability after clearing weight (#5487) Yonghua Li 2025-12-11 09:58:18 +08:00
  • d79438bb86 add detoken switch (#5463) qwes5s5 2025-12-10 21:44:02 +08:00
  • 3bdd54ef6e Disable unsupported feature in multi-node deployment docs Jiang-Jia-Jun 2025-12-10 20:23:19 +08:00
  • bcde798098 [CI][XPU] ep+prefix cache+chunk prefill (#5490) zccjjj 2025-12-10 19:40:38 +08:00
  • 03819f30c3 [CI][XPU] ep+prefix cache+chunk prefill (#5489) zccjjj 2025-12-10 19:39:49 +08:00