0x3878f
1d8af7ab73
Add env variable for dy2st ( #2779 )
2025-07-10 11:06:06 +08:00
Jiang-Jia-Jun
a4fdb3970b
[BugFix] Fix vocab size error for ernie model ( #2785 )
...
* [BugFix] Fix vocab size error for ernie model
* [BugFix] Fix vocab size error for ernie model
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-10 01:05:51 +08:00
Jiang-Jia-Jun
2a86928657
[BugFix Revert] Fix vocab size error for ernie model
2025-07-09 22:14:54 +08:00
Jiang-Jia-Jun
b1c53fa779
[BugFix] Fix vocab size error for ernie model
2025-07-09 22:13:41 +08:00
lizexu123
da20cf681e
[Bug fix] Fixed the garbled text issues in Qwen3-8B ( #2783 )
2025-07-09 22:03:57 +08:00
chen
888780ffde
[Feature] block_wise_fp8 support triton_moe_backend ( #2767 )
2025-07-09 19:22:47 +08:00
RAM
e3768c5a83
[Executor] Fix bug of logger.debug ( #2778 )
2025-07-09 04:13:43 -07:00
lifulll
1f28bdf994
dcu adapter ernie45t ( #2756 )
...
Co-authored-by: lifu <lifu@sugon.com >
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-09 18:56:27 +08:00
RAM
03a74995b8
Clear dead code And supplementary notes ( #2757 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* 1.supplementary notes 2.delete dead code
* fix bug of forward meta
* Global modification of forward meta
* fix vl model_runner bug
2025-07-09 16:17:34 +08:00
zhink
b89180f1cd
[Feature] support custom all-reduce ( #2758 )
...
* [Feature] support custom all-reduce
* add vllm adapted
2025-07-09 16:00:27 +08:00
yulangz
be21ef5047
[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B ( #2765 )
...
* fix no quant xpu moe
* change dir of xpu moe weight only
2025-07-09 15:57:51 +08:00
yulangz
0350831c2b
fix xpu offline demo garbled output ( #2763 )
2025-07-09 14:51:20 +08:00
RichardWooSJTU
fee544e808
fix ep prefill ( #2762 )
2025-07-09 14:03:05 +08:00
Ryan
c4718fd693
Enable SOT D2St in Multimodal Model ( #2735 )
2025-07-09 12:26:18 +08:00
GoldPancake
f7cad30a38
[Feature] Add speculative decoding simulation benchmark. ( #2751 )
...
* Add speculative decoding simulation benchmark
* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
gaoziyuan
6b10c19482
【Feature】add fd commit/branch info when start server ( #2752 )
...
* add_commit_config
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-09 11:52:22 +08:00
RichardWooSJTU
6610aa29d0
Revert "[Bug fix] fix attention rank init ( #2743 )" ( #2761 )
...
This reverts commit e8bbe7244b .
2025-07-09 10:38:12 +08:00
Ryan
f72c4de539
[SOT] Make custom_op dy&st unified ( #2733 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* make_custom_op dy&st unified
* add instance judgement
2025-07-08 19:21:44 +08:00
RichardWooSJTU
e8bbe7244b
[Bug fix] fix attention rank init ( #2743 )
...
* fix attention rank init
* fix attention rank init
2025-07-08 17:19:49 +08:00
Longzhi Wang
57b086dc6b
[Bug fix] Add the missing pod_ip param to the launch_cache_manager function. ( #2742 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug fix] fix the missing position args in expert_service.py
* update
2025-07-08 14:52:13 +08:00
lizexu123
525be243e7
[Bug fix] Fixed the garbled text issues in Qwen3-8B ( #2737 )
...
* fix qwen3.py
* update
* update lm_head tie_word_embeddings
* update tie_word_embeddings
* fix
* fix tie_word_embedding not in config.json
---------
Co-authored-by: lizexu <lizexu@baidu.com >
2025-07-07 23:15:27 -07:00
EnflameGCU
d0f4d6ba3a
[GCU] Support gcu platform ( #2702 )
...
baseline: e7fa57ebae
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-08 13:00:52 +08:00
gaoziyuan
26d5d737dd
【Fearture】support qwen2 some func ( #2740 )
...
* add rl qwen model support
* fix
* fix
2025-07-08 12:03:04 +08:00
Ryan
fefbd65cf8
[SOT] Remove BreakGraph with paddle.maximum ( #2731 )
...
* rm if with clip
* clip -> maximum
* int64 -> int32
2025-07-08 11:44:25 +08:00
ming1753
1eb8ea7328
[Bug fix] fix complie bug when sm < 89 ( #2738 )
2025-07-08 11:24:52 +08:00
ming1753
ef6649a577
[Optimize] Optimize tensorwise fp8 performance ( #2729 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optimize] Optimize tensorwise fp8 performance
2025-07-07 20:06:28 +08:00
liddk1121
1b54a2831e
Adapt for iluvatar gpu ( #2684 )
2025-07-07 16:53:14 +08:00
lddfym
4e293e50fa
Check if the controller port is available ( #2724 )
2025-07-07 13:24:55 +08:00
ltd0924
68b4755587
[LLM] support multi node deploy ( #2708 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy
* Update engine.py
* fix bugs
* fix
* [LLM] support multi node deploy
* [LLM] support multi node deploy
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-06 10:33:51 +08:00
Ting
a6e9161045
fix bug. ( #2718 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-05 08:19:19 +08:00
Ting
90ef28d982
spec token map lazy. ( #2715 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-05 00:14:54 +08:00
lizexu123
9cb08e71e8
add support QWQ enable_thinking ( #2706 )
...
* add support QWQ enable_thinking
* add stream=True
* fix stream=true
* fix qwen
---------
Co-authored-by: lizexu <lizexu@baidu.com >
2025-07-04 20:55:23 +08:00
GoldPancake
e7fa57ebae
Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue ( #2707 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mtp eh_proj layer
* fix mtp update_cfg function
* fix stringdoc
* simplify class name
2025-07-04 14:15:04 +08:00
gaoziyuan
a5ae88ded9
[feature]add fd whl version info ( #2698 )
2025-07-04 14:12:42 +08:00
ltd0924
87e638498c
[RL] update reschedule finish reason ( #2709 )
2025-07-04 13:47:36 +08:00
freeliuzc
667547be59
support chunk_prefill in MTP ( #2705 )
2025-07-04 11:55:48 +08:00
Yuanle Liu
240bdac2a4
[feat] support fa3 backend for pd disaggregated ( #2695 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* delete use_fast_ffn
2025-07-03 22:33:27 +08:00
ltd0924
00863c43fd
[Bug] fix logger format ( #2689 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-03 19:58:03 +08:00
Jiang-Jia-Jun
9fd74f75bd
Update dynamic_weight_manager.py
2025-07-03 15:55:22 +08:00
Jiang-Jia-Jun
05c670e593
[Sync] Update to latest code ( #2679 )
...
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-03 15:43:53 +08:00
AIbin
a197dcd729
【Inference Optimize】Support ERNIE-4_5-300B-A47B-2BITS-Paddle model TP2/TP4 Inference ( #2666 )
...
* Support TP2&TP4 Wint
* Support TP2&TP4 Wint2 Inference
2025-07-01 18:29:11 +08:00
ltd0924
50aa4080c0
[Serving] fix offline inference sampling parameters overwrite ( #2654 )
2025-07-01 10:17:46 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00
jiangjiajun
fb18f3092d
[LLM] Add output module and polish docs
2025-06-09 20:26:53 +08:00
jiangjiajun
684703fd72
[LLM] First commit the llm deployment code
2025-06-09 19:20:15 +08:00
Zheng-Bicheng
9faf1b5ad9
Merge branch 'PaddlePaddle:develop' into develop
2025-02-12 21:23:36 +08:00
Jiang-Jia-Jun
d4bbdbefea
Merge pull request #2559 from MaaXYZ/fix/build_error
...
fix: build error without ENABLE_PADDLE2ONNX
2025-01-10 13:55:49 +08:00
Jiang-Jia-Jun
618826b39d
Merge pull request #2560 from MaaXYZ/perf/read_file
...
perf: ReadBinaryFromFile supports Chinese path
2025-01-10 13:55:27 +08:00
Jiang-Jia-Jun
17d204b975
Merge pull request #2561 from MaaXYZ/feat/directml
...
feat: select adapter id for DirectML
2025-01-10 13:54:44 +08:00
MistEO
ec3d4c714c
fix: valid_directml_backends
2024-11-21 16:47:16 +08:00