Commit Graph

10 Commits

Author SHA1 Message Date
Ryan
fefbd65cf8 [SOT] Remove BreakGraph with paddle.maximum (#2731)
* rm if with clip

* clip -> maximum

* int64 -> int32
2025-07-08 11:44:25 +08:00
ming1753
1eb8ea7328 [Bug fix] fix complie bug when sm < 89 (#2738) 2025-07-08 11:24:52 +08:00
ming1753
ef6649a577 [Optimize] Optimize tensorwise fp8 performance (#2729)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optimize] Optimize tensorwise fp8 performance
2025-07-07 20:06:28 +08:00
liddk1121
1b54a2831e Adapt for iluvatar gpu (#2684) 2025-07-07 16:53:14 +08:00
GoldPancake
e7fa57ebae Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue (#2707)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mtp eh_proj layer

* fix mtp update_cfg function

* fix stringdoc

* simplify class name
2025-07-04 14:15:04 +08:00
Yuanle Liu
240bdac2a4 [feat] support fa3 backend for pd disaggregated (#2695)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* delete use_fast_ffn
2025-07-03 22:33:27 +08:00
Jiang-Jia-Jun
05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
AIbin
a197dcd729 【Inference Optimize】Support ERNIE-4_5-300B-A47B-2BITS-Paddle model TP2/TP4 Inference (#2666)
* Support TP2&TP4 Wint

* Support TP2&TP4 Wint2 Inference
2025-07-01 18:29:11 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00