Commit Graph

  • eab8384da6 [Feature] ThreadPoolExecutor async fill_token_bitmask (#5083) Daci 2025-11-19 10:04:16 +08:00
  • 4a7739ec0b Fix dummy run when use PD Disaggregation with EP inference. (#5112) K11OntheBoat 2025-11-18 21:09:30 +08:00
  • 7fdc920a01 [HPU][CI]Update Docker image in CI workflow (#5108) plusNew001 2025-11-18 20:43:19 +08:00
  • 97189079b9 [BugFix] unify max_tokens (#4968) kxz2002 2025-11-18 20:01:33 +08:00
  • 3ce2c8f754 [Feature] support async download features (#4910) kevin 2025-11-18 18:37:59 +08:00
  • 3d7f1a843e [Docs]fix_cli_docs (#5109) xiaolei373 2025-11-18 17:56:12 +08:00
  • 6584ee90e8 [unitest]clean code (#5094) 周周周 2025-11-18 17:21:35 +08:00
  • d11235333e format flash_mask_attn lizhenyun01 2025-11-18 13:33:37 +08:00
  • cd2c4df64a format flash_mask_attn lizhenyun01 2025-11-18 11:45:29 +08:00
  • 3672afb487 [Cherry-Pick] [Metrics] Update time_to_first_token to include tokenization & queue time, and remove redundant metrics (#5076) Yonghua Li 2025-11-18 14:38:59 +08:00
  • 379f7e4cc1 [Bug fix] Fix decoding speed slowly bug in 20250922 (#5101) chenjian 2025-11-18 14:38:38 +08:00
  • d0b3bec585 Revert "[CI] Temporarily lock paddlepaddle-gpu as of 20251112 (#5017)" (#5098) YuBaoku 2025-11-18 14:17:09 +08:00
  • d5d0602859 [Iluvatar][CI] disable compiling cudaLaunch API (#5100) yzwu 2025-11-18 14:15:31 +08:00
  • ef057a86f3 Update fastdeploy/envs.py feature/optimize_engine_worker_comm_20251118 Jiang-Jia-Jun 2025-11-18 13:54:01 +08:00
  • e7f11b0051 [Optimize] Reduce comm overhead of engine-worker by obtaining requests asynchronously Jiang-Jia-Jun 2025-11-18 13:53:35 +08:00
  • 59c6de63a6 [Optimize] Reduce comm overhead of engine-worker by obtaining requests asynchronously Jiang-Jia-Jun 2025-11-18 13:40:54 +08:00
  • 1cba2e05d3 [ForRLRelease] temporary change mtp msg size (#5103) GoldPancake 2025-11-18 11:22:00 +08:00
  • a36c958c66 [Metax] support default_v1 loader based #4988 (#5001) MingkunZhang 2025-11-18 09:44:30 +08:00
  • 5d7516dc8c [CI] Enable check_pr_template in CI rerun (#5093) YuBaoku 2025-11-17 22:34:38 +08:00
  • abc9fd31c7 【Hackathon 9th No.76】supplementary unit test for XGrammarChecker (#4075) Echo-Nie 2025-11-17 22:05:53 +08:00
  • d58c1db8a0 [Feature][OP] Append Attn Support CUDA-PDL (#5072) chen 2025-11-17 20:47:33 +08:00
  • c2c1942db9 [INTEL_HPU] [CI] enabled fastdeploy PR testing (#4596) FocusLuo 2025-11-17 19:24:41 +08:00
  • 9bb4337143 [BugFix] rollback max_tokens and min_tokens when continue to infer (#5053) LiqinruiG 2025-11-17 19:03:09 +08:00
  • bd28f18785 [XPU][CI] Ci release update (#5085) plusNew001 2025-11-17 19:01:13 +08:00
  • b23e684b67 revert group size 3 (#5079) 周周周 2025-11-17 18:54:13 +08:00
  • c55c0d2ca3 fix: Fix block allocation issue when MTP and logprobs are enabled (#5086) SunLei 2025-11-17 17:50:23 +08:00
  • d9f64adb0e fix: Fix block allocation issue when MTP and logprobs are enabled (#5077) SunLei 2025-11-17 17:50:07 +08:00
  • 8a4ddb29df Revert "[BugFix] Revert skip capture (#5023)" (#5080) Sunny-bot1 2025-11-17 16:14:55 +08:00
  • 7f94d77e08 [XPU][CI] fix ci case bug (#5084) plusNew001 2025-11-17 16:01:27 +08:00
  • 74f33efdbf [Intel HPU] fix bugs caused by other commits (#5074) fmiao2372 2025-11-17 15:28:55 +08:00
  • 33f96ff93a [BugFix] rollback max_tokens and min_tokens when continue to infer (#5052) LiqinruiG 2025-11-17 14:31:26 +08:00
  • ff26158f20 Add unit tests for triton_utils_v2 (#5073) Winters Montagne 2025-11-17 11:46:38 +08:00
  • c35e540c18 【Hackathon 9th No.109】[CppExtension] Support build Custom OP in setuptools 80+ (#4977) megemini 2025-11-17 11:46:27 +08:00
  • 02c83d65db [CI]【Hackathon 9th Sprint No.13】NO.13 功能模块 fastdeploy/model_executor/ops/triton_ops/triton_utils.py 单测补充 (#5035) Winters Montagne 2025-11-17 11:43:31 +08:00
  • 36216e62f0 [Log] Add trace log and add loggingInstrumentor tool (#4692) qwes5s5 2025-11-17 11:08:57 +08:00
  • 5444af6ff6 [APIServer] metrics use port the same as api_port (#5016) zhouchong 2025-11-17 10:42:45 +08:00
  • 68f638f8b9 [Metax] support default_v1 loader and quant_config is None for triton moe (#5030) xiaozude 2025-11-17 10:38:00 +08:00
  • 3afb717995 【Fix】fix deepep dispatch (#5036) yangjianfengo1 2025-11-17 10:34:01 +08:00
  • 3b80a799ab [Iluvatar][CI] Fix moe_expert_dispatch cannot support dequant_scale (#5012) yzwu 2025-11-17 10:18:42 +08:00
  • 24a7b79eec [BugFix] rollback max_tokens and min_tokens when continue to infer (#5051) LiqinruiG 2025-11-15 16:40:35 +08:00
  • cbcb5c6e84 temporary change mtp logprob msg size (#5026) GoldPancake 2025-11-15 13:39:40 +08:00
  • 936a80962f [BugFix] adjust max_tokens and min_tokens when continue to generate tokens (#5010) (#5015) kxz2002 2025-11-14 22:24:59 +08:00
  • e43a5fc055 [Intel HPU] enable level 1 prefix caching and fix some bugs (#4971) fmiao2372 2025-11-14 19:42:50 +08:00
  • 0e819cd596 [CI][XPU] Optimize CI logs and variable names (#5025) plusNew001 2025-11-14 19:35:35 +08:00
  • d41cf643f8 Update nvidia_gpu.md Jiang-Jia-Jun 2025-11-14 18:26:08 +08:00
  • e92783e903 [BugFix] adjust max_tokens and min_tokens when continue to generate tokens (#5010) (#5013) kxz2002 2025-11-14 18:20:59 +08:00
  • 692d69229b Update nvidia_gpu.md Jiang-Jia-Jun 2025-11-14 18:17:32 +08:00
  • 5fc12eddfe [Optimization] xgrammar async compile, multi thread, speed up (#4835) Daci 2025-11-14 18:05:26 +08:00
  • 59eeb9e049 [Cherry-Pick][CI] Temporarily lock paddlepaddle-gpu as of 20251112(#5017) (#5022) YuBaoku 2025-11-14 17:28:36 +08:00
  • b925533051 add test_process_video.py (#5011) Winters Montagne 2025-11-14 17:23:30 +08:00
  • 544ea9cbc2 check max_logprobs (#5018) chen 2025-11-14 17:18:06 +08:00
  • 249feca65a [BugFix] Revert skip capture (#5023) Sunny-bot1 2025-11-14 15:52:51 +08:00
  • 51b1f13547 [Executor]move batch_id_per_token (#4853) 周周周 2025-11-14 15:38:48 +08:00
  • c0a4393d72 [ATTENTION] unitest (#4962) 周周周 2025-11-14 13:45:53 +08:00
  • 87c7c0d852 fix local scheduler bug (#5020) chenjian 2025-11-14 12:07:02 +08:00
  • 2ecbaa7cd9 Fix dp scheduler bug for unify code 20250922 (#5021) chenjian 2025-11-14 12:06:47 +08:00
  • 91d34c2e35 [CI] Temporarily lock paddlepaddle-gpu as of 20251112 (#5017) YuBaoku 2025-11-14 11:55:25 +08:00
  • ee1ea43e36 [Docs] Fix broken commitID (#5008) Echo-Nie 2025-11-14 10:39:41 +08:00
  • 191a597d9f [CI]【Hackathon 9th Sprint No.56】NO.56 功能模块 fastdeploy/multimodal/utils.py 单测补充 (#4954) essos 2025-11-14 10:37:27 +08:00
  • 36822fa49c [PD Disaggregation] remove splitwise deployment on single node and refine the code (#4891) Juncai 2025-11-14 09:56:53 +08:00
  • 9703108c28 [BugFix] adjust max_tokens and min_tokens when continue to generate tokens (#5010) kxz2002 2025-11-13 23:52:54 +08:00
  • 6c3d1da62f fix conflicts carryyu 2025-11-13 18:17:44 +08:00
  • ae7bee8122 【New Feature】W4afp8 supports per group quantization (#4987) yangjianfengo1 2025-11-13 19:17:27 +08:00
  • a5e949d9d0 [Feature] Enhance build script, add pre_wheel logic (#4729) Echo-Nie 2025-11-13 19:03:52 +08:00
  • 436742f120 [Bug fix] Fix dp scheduler bug (#5005) chenjian 2025-11-13 18:50:02 +08:00
  • 4700230db1 fix dp scheduler bug (#5006) chenjian 2025-11-13 18:49:43 +08:00
  • 05da8e34c0 [BugFix][Metax] Fix metax compile issue in get_block_shape_and_split_kv_block (#5000) Sunny-bot1 2025-11-13 16:55:06 +08:00
  • db5d421aa3 [Optimize] Improve perf for fd response token with internal adapter (#4991) chenjian 2025-11-13 16:19:48 +08:00
  • 88da9d9788 [XPU] [CI] Change CI ep test from offline to online (#4885) zccjjj 2025-11-13 16:15:45 +08:00
  • 4a0d881e15 update (#4985) bukejiyu 2025-11-13 15:58:01 +08:00
  • 6c4ebc5fee [worker_process.py]modify some var name (#4749) 周周周 2025-11-13 14:21:27 +08:00
  • 3da9f01e19 [BugFix] fix num_requests_running after clear_data (#4989) Yonghua Li 2025-11-13 13:50:38 +08:00
  • 6c5ab727c1 [BugFix] fix num_requests_running after clear_data (#4927) Yonghua Li 2025-11-13 13:50:21 +08:00
  • 9cec098add [BugFix] fix num_requests_running after clear_data (#4926) Yonghua Li 2025-11-13 13:50:04 +08:00
  • 52e5db9983 [BugFix] fix num_requests_running after clear_data (#4923) Yonghua Li 2025-11-13 13:49:42 +08:00
  • 5b24013d46 skip DtoH capture (#4988) Sunny-bot1 2025-11-13 10:57:44 +08:00
  • 9590072a91 [Cherry-Pick] [BugFix] Avoid loading training file (#4966) (#4979) BossPi 2025-11-13 10:47:58 +08:00
  • 8329338d37 Update nvidia_gpu.md Jiang-Jia-Jun 2025-11-13 10:25:22 +08:00
  • 303c986cc7 [FDConfig] add block number verfied (#4983) ltd0924 2025-11-13 09:48:44 +08:00
  • 1c0b0b08b7 [CI] set DG_NVCC_OVERRIDE_CPP_STANDARD in test_quantized_linear (#4995) YuBaoku 2025-11-12 23:03:21 +08:00
  • 2272160faf fix mtp tsp (#4990) Yuanle Liu 2025-11-12 22:05:19 +08:00
  • 3148dbca06 [BugFix] fix VL fp8 bug when moe token_num is 0 (#4928) ming1753 2025-11-12 21:19:36 +08:00
  • 484c9f3be2 Fix env setting for unify code 20250922 (#4994) chenjian 2025-11-12 20:51:38 +08:00
  • c8140326fa Update nvidia_gpu.md Jiang-Jia-Jun 2025-11-12 20:50:09 +08:00
  • f0189292df [CI] fix test_model_cache (#4982) bukejiyu 2025-11-12 20:26:49 +08:00
  • 8749ca2fb6 support ENCODE_FEATURE_ENDPOINT (#4905) lizhenyun01 2025-11-12 20:01:36 +08:00
  • a2d06118e1 [Logprobs]Support prompt_logprobs and max_logprobs (#4897) qwes5s5 2025-11-12 19:29:48 +08:00
  • 38f6e6c7c6 [BugFix] fix triton fp8 bug (#4967) ming1753 2025-11-12 19:28:09 +08:00
  • da7863ae85 [XPU] fix text_image_gather_scatter when image_token_num == token_num && text_token_num == 1 (#4882) Lucas 2025-11-12 17:13:22 +08:00
  • c599268f57 [BugFix] fix work metrics not returned by metrics api (#4920) Yonghua Li 2025-11-12 17:11:51 +08:00
  • 6d564d5e05 [BugFix] fix work metrics not returned by metrics api (#4921) Yonghua Li 2025-11-12 17:11:40 +08:00
  • a1218076dc remove load default_v1 since already been as default (#4980) JYChen 2025-11-12 16:49:48 +08:00
  • c45b3ccb52 [Metax] optimize flash mla (#4915) xiaozude 2025-11-12 16:43:46 +08:00
  • 9d9f5df8d0 [Metax] support default_v1 loader & thinking model (#4956) MingkunZhang 2025-11-12 16:32:26 +08:00
  • cf61d30fd7 [Optimize] Improve perf for fd response token with internal adapter (#4947) chenjian 2025-11-12 16:02:07 +08:00
  • bde6e2f931 [BugFix] Avoid loading training file (#4966) BossPi 2025-11-12 15:49:14 +08:00
  • c7b589d75b [CI][XPU] Fix EP Case Bug (#4976) plusNew001 2025-11-12 15:23:28 +08:00
  • 6e2e2fcd29 xpu (#4969) bukejiyu 2025-11-12 15:12:59 +08:00
  • 5bf48de999 [KVCache] support unified cache backend (#4903) ltd0924 2025-11-12 14:54:52 +08:00
  • c6929cb41d [CI][XPU]Remove release branch from pull request trigger (#4959) plusNew001 2025-11-12 14:45:14 +08:00