YuanRisheng
6ccc10ad47
Unify server-side and model-side Config (Part1) ( #3018 )
...
* move cache config
* fix mtp
2025-07-28 10:51:52 +08:00
xiaoxiaohehe001
2970b00dfa
[Feature] Support_eplb ( #2997 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
EnflameGCU
c40df1802e
[GCU] Update to develop ( #2988 )
2025-07-24 19:30:52 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
lizhenyun01
29c3292f02
support c4 attn && fix cache
2025-07-24 12:00:52 +08:00
Nyakku Shigure
48e6a0ca26
[SOT] Mark dynamic dims by type annotations ( #2771 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations
* fix conflict of forward_meta
* mark more attn backend
* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS
* auto infer implicit 0 dim dynamic dim
* revert manual marked dims
* revert missing update
* auto infer can use unsafe code in warmup stage
* check -> type_match
* fix codestyle
* restore blank line
* empty commit
* add need_warmup nonlocal;
* add doc for resolver
* add missing type hints
* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
lifulll
2c6a9e887e
native top_p_sampling ( #2901 )
2025-07-22 14:09:59 +08:00
zhink
0262ef7eb3
custom all reduce support cuda graph ( #2938 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication
2025-07-21 22:52:03 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
Yuanle Liu
61b3997b85
refactor rl get_name_mappings_to_training ( #2847 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix
2025-07-15 07:31:42 -07:00
YuanRisheng
4c7b8bc458
Simplify the Config code ( #2770 )
...
* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe
2025-07-14 19:50:05 +08:00
littledgg
59071268b6
[Executor] Move forward_meta.py to fastdeploy/model_executor ( #2774 )
...
* Use PEP 563 in attention.py and fix conflict
* merge commit
* Change what was left out last time
2025-07-10 20:36:51 +08:00
lifulll
1f28bdf994
dcu adapter ernie45t ( #2756 )
...
Co-authored-by: lifu <lifu@sugon.com >
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-09 18:56:27 +08:00
yulangz
be21ef5047
[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B ( #2765 )
...
* fix no quant xpu moe
* change dir of xpu moe weight only
2025-07-09 15:57:51 +08:00
EnflameGCU
d0f4d6ba3a
[GCU] Support gcu platform ( #2702 )
...
baseline: e7fa57ebae
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-08 13:00:52 +08:00
Jiang-Jia-Jun
05c670e593
[Sync] Update to latest code ( #2679 )
...
* [Sync] Update to latest code
* Add new code files
* Add new code files
* update code
* Try to fix build.sh
* Try to fix build.sh
* Update code
* Update requirements.txt
* Update code
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72
[LLM] First commit the llm deployment code
2025-06-09 19:20:15 +08:00