Commit Graph

2780 Commits

Author SHA1 Message Date
lddfym
ece88596ed fix spelling error (#2827) 2025-07-14 13:12:57 +08:00
bukejiyu
bad53c6b6e [vl]remove duplicated load logic (#2744)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-13 07:36:26 +08:00
xiegegege
16940822a7 add result save for ci (#2824)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
LGTM
2025-07-12 23:34:46 +08:00
zhenwenDang
d48c03413f Feature/logprob bug fix (#2817)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens

* Prevent response_logprobs.logprob_token_ids[0] from going out of bounds
2025-07-12 16:48:51 +08:00
gaoziyuan
e9e8443ea8 fix num_blocks_local when small size model in TP2 running mode (#2792) 2025-07-12 12:50:48 +08:00
gaoziyuan
749b2e9c89 support qwen3moe name_mapping (#2820) 2025-07-12 12:05:54 +08:00
Sunny-bot1
f6ad26fc08 fix topp default value (#2814)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-11 17:10:21 +08:00
zhink
c08561c13a [Feature] support tensor-parallel-size>num_key_value_heads for qwen3 (#2799) 2025-07-11 15:09:43 +08:00
chen
2c3607407f check (#2811) 2025-07-11 13:54:52 +08:00
lddfym
b5e4288704 Global scheduler supports configuring hot updates (#2807)
* Check if the controller port is available

* Global scheduler supports configuring hot updates

* add interface: /controller/scheduler

* add interface: /controller/scheduler
2025-07-11 13:38:07 +08:00
yulangz
abbbd0cddc [XPU] Update docker file (#2809) 2025-07-11 13:26:38 +08:00
yinwei
e98937cbba delete useless file (#2772)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-11 11:46:04 +08:00
Sunny-bot1
240d6236bc [Fix]fix top_k_top_p sampling (#2801)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix topk-topp

* update

* add base_non_truncated
2025-07-10 22:35:10 +08:00
littledgg
59071268b6 [Executor] Move forward_meta.py to fastdeploy/model_executor (#2774)
* Use PEP 563 in attention.py and fix conflict

* merge commit

* Change what was left out last time
2025-07-10 20:36:51 +08:00
lizexu123
8c660a0dfb [BugFix] fix RMSNorm rms_norm_esp (#2797)
* fix rms

* add vl

* fix

* add vl

* fix

* fix
2025-07-10 20:02:24 +08:00
LiqinruiG
ce5adec877 [Doc] modify offline-inerence docs (#2800)
* modify offline-inerence docs

* [bug] remove tool_call_content
2025-07-10 19:41:12 +08:00
Zeyu Chen
36571fd2d9 Update README.md
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-10 17:01:08 +08:00
yulangz
830de5a925 [XPU] Supports TP4 deployment on 4,5,6,7 (#2794)
* 支持通过 XPU_VISIBLE_DEVICES 指定 4,5,6,7 卡运行
* 修改 XPU 文档中多卡说明
2025-07-10 16:48:08 +08:00
chen
d33105baeb [Feature] Online Chat API Support Return logprobs (#2777)
* online chat support logprobs

* check xpu

* check vl_gpu_model_runner and xpu_model_runner

* get_worker() check platform
2025-07-10 16:33:40 +08:00
K11OntheBoat
24f934f1f9 [BugFix] Fix low prediction accuracy of deepseekv3 (#2798) 2025-07-10 16:16:44 +08:00
Sunny-bot1
1e2319cbef Rename top_p_sampling to top_k_top_p_sampling (#2791) 2025-07-10 00:09:25 -07:00
Sunny-bot1
e45050cae3 [Feature] support top_k_top_p sampling (#2753)
* support top_k_top_p sampling

* fix

* add api param

* add api para

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2025-07-09 20:58:58 -07:00
Ryan
b0f525955c [SOT] Remove breakgraph in post processing && fix datatype (#2780) 2025-07-10 11:26:00 +08:00
Yuanle Liu
2ea267f624 assert prompt len > 0 (#2773) 2025-07-10 11:14:52 +08:00
0x3878f
1d8af7ab73 Add env variable for dy2st (#2779) 2025-07-10 11:06:06 +08:00
LiqinruiG
54affdc44b [Doc] modify offline_inference docs (#2787)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* modify reasoning_output docs

* modify offline inference docs

* modify offline inference docs

* modify offline_inference docs

* modify offline_inference docs
2025-07-10 01:06:14 +08:00
Jiang-Jia-Jun
a4fdb3970b [BugFix] Fix vocab size error for ernie model (#2785)
* [BugFix] Fix vocab size error for ernie model

* [BugFix] Fix vocab size error for ernie model

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-10 01:05:51 +08:00
Jiang-Jia-Jun
2a86928657 [BugFix Revert] Fix vocab size error for ernie model 2025-07-09 22:14:54 +08:00
Jiang-Jia-Jun
b1c53fa779 [BugFix] Fix vocab size error for ernie model 2025-07-09 22:13:41 +08:00
lizexu123
da20cf681e [Bug fix] Fixed the garbled text issues in Qwen3-8B (#2783) 2025-07-09 22:03:57 +08:00
LiqinruiG
4ccd1696ab [Doc] modify offline inference docs (#2747)
* modify reasoning_output docs

* modify offline inference docs

* modify offline inference docs
2025-07-09 20:53:26 +08:00
chen
888780ffde [Feature] block_wise_fp8 support triton_moe_backend (#2767) 2025-07-09 19:22:47 +08:00
RAM
e3768c5a83 [Executor] Fix bug of logger.debug (#2778) 2025-07-09 04:13:43 -07:00
lifulll
1f28bdf994 dcu adapter ernie45t (#2756)
Co-authored-by: lifu <lifu@sugon.com>
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-09 18:56:27 +08:00
RAM
03a74995b8 Clear dead code And supplementary notes (#2757)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* 1.supplementary notes 2.delete dead code

* fix bug of forward meta

* Global modification of forward meta

* fix vl model_runner bug
2025-07-09 16:17:34 +08:00
zhink
b89180f1cd [Feature] support custom all-reduce (#2758)
* [Feature] support custom all-reduce

* add vllm adapted
2025-07-09 16:00:27 +08:00
yulangz
be21ef5047 [XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B (#2765)
* fix no quant xpu moe

* change dir of xpu moe weight only
2025-07-09 15:57:51 +08:00
celsowm
771e71a24d Feat/blackwell sm100 support (#2670)
* Add initial support for NVIDIA Blackwell (SM100) architecture

This change introduces initial support for the NVIDIA Blackwell GPU
architecture, specifically targeting SM100 (Compute Capability 10.x)
with '100a' architecture-specific features (e.g., for CUTLASS).

Key changes:
- Updated custom_ops/setup_ops.py to generate appropriate gencode
  flags (arch=compute_100a,code=sm_100a) when '100' is specified
  in FD_BUILDING_ARCS. Requires CUDA 12.9+.
- Updated custom_ops/gpu_ops/cutlass_extensions/gemm_configs.h:
    - Added CutlassTileConfigSM100 enum (with placeholder tile shapes).
    - Added BLACKWELL to CandidateConfigTypeParam.
    - Updated CutlassGemmConfig struct with is_sm100 flag,
      tile_config_sm100, and new constructor for SM100.
    - Modified toString() and fromString() for SM100 support.
- Updated custom_ops/gpu_ops/cutlass_kernels/cutlass_heuristic.cu:
    - Added get_candidate_tiles_sm100() (with placeholder tiles).
    - Added placeholder mcast support functions for SM100.
    - Updated get_candidate_configs() to include SM100 paths using
      the BLACKWELL flag and new SM100 config types.
- Updated build.sh with comments to guide users on specifying '100'
  for Blackwell in FD_BUILDING_ARCS.

Further work:
- Optimal CUTLASS tile configurations for SM100 need to be researched
  and updated in cutlass_heuristic.cu.
- Kernel auto-generation scripts in custom_ops/utils/ may need
  SM100-specific versions if Blackwell's hardware features for FP8/TMA
  differ significantly from SM90.
- Compatibility of third-party libraries (CUTLASS v3.8.0, DeepGEMM)
  with Blackwell should be fully verified.

* Feat: Implement detailed Blackwell (SM100) CUTLASS heuristics

This change integrates specific, expert-provided CUTLASS heuristic
configurations for the NVIDIA Blackwell (SM100) GPU architecture,
replacing previous placeholders. This includes:

- Updated `custom_ops/gpu_ops/cutlass_extensions/gemm_configs.h`:
    - Populated `CutlassTileConfigSM100` enum with specific tile shapes
      (e.g., CtaShape64x64x128B, CtaShape128x128x128B) suitable for SM100.
    - Added `FP4_ONLY` to `CandidateConfigTypeParam` for new FP4 paths.

- Updated `custom_ops/gpu_ops/cutlass_kernels/cutlass_heuristic.cu`:
    - Implemented `get_candidate_tiles_sm100` with detailed logic for
      selecting tile configurations based on GROUPED_GEMM and FP4_ONLY flags,
      using the new SM100 tile enums.
    - Implemented `supports_mcast_along_m_sm100` and
      `supports_mcast_along_n_sm100` with specific tile checks for Blackwell.
    - Updated the `sm == 100` (Blackwell) block in `get_candidate_configs`
      to use these new helper functions and accurately populate candidate
      kernel configurations for various cluster shapes.

- `custom_ops/setup_ops.py` remains configured to compile for
  `arch=compute_100a,code=sm_100a` with CUDA 12.9+ for these features.

This aligns the codebase with heuristic configurations similar to those
in upstream TensorRT-LLM / CUTLASS for Blackwell, enabling more
performant kernel selection on this new architecture.

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-09 15:29:42 +08:00
yulangz
0350831c2b fix xpu offline demo garbled output (#2763) 2025-07-09 14:51:20 +08:00
RichardWooSJTU
fee544e808 fix ep prefill (#2762) 2025-07-09 14:03:05 +08:00
Ryan
c4718fd693 Enable SOT D2St in Multimodal Model (#2735) 2025-07-09 12:26:18 +08:00
GoldPancake
f7cad30a38 [Feature] Add speculative decoding simulation benchmark. (#2751)
* Add speculative decoding simulation benchmark

* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
gaoziyuan
6b10c19482 【Feature】add fd commit/branch info when start server (#2752)
* add_commit_config

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-09 11:52:22 +08:00
EnflameGCU
f4f1d8de44 Support for non-CUDA builds (#2750)
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-09 11:48:40 +08:00
RichardWooSJTU
6610aa29d0 Revert "[Bug fix] fix attention rank init (#2743)" (#2761)
This reverts commit e8bbe7244b.
2025-07-09 10:38:12 +08:00
Ryan
f72c4de539 [SOT] Make custom_op dy&st unified (#2733)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* make_custom_op dy&st unified

* add instance judgement
2025-07-08 19:21:44 +08:00
xiegetest
f6ffbc3cbd add precision check for ci (#2732)
* add precision check for ci

* add precision check for ci

* add precision check for ci

* add precision check for ci

---------

Co-authored-by: xiegegege <xiege01@baidu.com>
2025-07-08 18:43:53 +08:00
RichardWooSJTU
e8bbe7244b [Bug fix] fix attention rank init (#2743)
* fix attention rank init

* fix attention rank init
2025-07-08 17:19:49 +08:00
Longzhi Wang
57b086dc6b [Bug fix] Add the missing pod_ip param to the launch_cache_manager function. (#2742)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug fix] fix the missing position args in expert_service.py

* update
2025-07-08 14:52:13 +08:00
lizexu123
525be243e7 [Bug fix] Fixed the garbled text issues in Qwen3-8B (#2737)
* fix qwen3.py

* update

* update lm_head tie_word_embeddings

* update tie_word_embeddings

* fix

* fix tie_word_embedding not in config.json

---------

Co-authored-by: lizexu <lizexu@baidu.com>
2025-07-07 23:15:27 -07:00