Commit Graph

20 Commits

Author SHA1 Message Date
Jiang-Jia-Jun
a4fdb3970b [BugFix] Fix vocab size error for ernie model (#2785)
* [BugFix] Fix vocab size error for ernie model

* [BugFix] Fix vocab size error for ernie model

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-10 01:05:51 +08:00
Jiang-Jia-Jun
2a86928657 [BugFix Revert] Fix vocab size error for ernie model 2025-07-09 22:14:54 +08:00
Jiang-Jia-Jun
b1c53fa779 [BugFix] Fix vocab size error for ernie model 2025-07-09 22:13:41 +08:00
lizexu123
da20cf681e [Bug fix] Fixed the garbled text issues in Qwen3-8B (#2783) 2025-07-09 22:03:57 +08:00
lifulll
1f28bdf994 dcu adapter ernie45t (#2756)
Co-authored-by: lifu <lifu@sugon.com>
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-09 18:56:27 +08:00
RAM
03a74995b8 Clear dead code And supplementary notes (#2757)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* 1.supplementary notes 2.delete dead code

* fix bug of forward meta

* Global modification of forward meta

* fix vl model_runner bug
2025-07-09 16:17:34 +08:00
zhink
b89180f1cd [Feature] support custom all-reduce (#2758)
* [Feature] support custom all-reduce

* add vllm adapted
2025-07-09 16:00:27 +08:00
yulangz
0350831c2b fix xpu offline demo garbled output (#2763) 2025-07-09 14:51:20 +08:00
Ryan
c4718fd693 Enable SOT D2St in Multimodal Model (#2735) 2025-07-09 12:26:18 +08:00
GoldPancake
f7cad30a38 [Feature] Add speculative decoding simulation benchmark. (#2751)
* Add speculative decoding simulation benchmark

* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
lizexu123
525be243e7 [Bug fix] Fixed the garbled text issues in Qwen3-8B (#2737)
* fix qwen3.py

* update

* update lm_head tie_word_embeddings

* update tie_word_embeddings

* fix

* fix tie_word_embedding not in config.json

---------

Co-authored-by: lizexu <lizexu@baidu.com>
2025-07-07 23:15:27 -07:00
EnflameGCU
d0f4d6ba3a [GCU] Support gcu platform (#2702)
baseline: e7fa57ebae

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-08 13:00:52 +08:00
gaoziyuan
26d5d737dd 【Fearture】support qwen2 some func (#2740)
* add rl qwen model support

* fix

* fix
2025-07-08 12:03:04 +08:00
liddk1121
1b54a2831e Adapt for iluvatar gpu (#2684) 2025-07-07 16:53:14 +08:00
ltd0924
68b4755587 [LLM] support multi node deploy (#2708)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy

* Update engine.py

* fix bugs

* fix

* [LLM] support multi node deploy

* [LLM] support multi node deploy

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-06 10:33:51 +08:00
freeliuzc
667547be59 support chunk_prefill in MTP (#2705) 2025-07-04 11:55:48 +08:00
Yuanle Liu
240bdac2a4 [feat] support fa3 backend for pd disaggregated (#2695)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* delete use_fast_ffn
2025-07-03 22:33:27 +08:00
Jiang-Jia-Jun
05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00