mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-10-05 08:37:06 +08:00
[Feature] support pool (#3827)
* support pool * update pooling * add pooler_config and check * update * support AutoWeightsLoader load weight * fix * update * delete print * update pre-commit * fix * fix xpu * fix ModelRegistry->model_registry * fix Copilot review * fix pooler.py * delete StepPooler * fix abstract * fix default_loader_v1 * fix Pre Commit * support torch qwen3 dense * add test and fix torch-qwen * fix * fix * adapter ci: * fix review * fix pooling_params.py * fix * fix tasks.py 2025 * fix print and logger * Modefy ModelRegistry and delete AutoWeightsLoader * fix logger * fix test_embedding * fix ci bug * ernie4_5 model_registry * fix test * support Qwen3-Embedding-0.6B tp=1 load * fix extra code * fix * delete fix vocab_size * delete prepare_params_dict * fix:
This commit is contained in:
@@ -1319,8 +1319,12 @@ class GPUModelRunner(ModelRunnerBase):
|
||||
self.parallel_config.max_model_len,
|
||||
)
|
||||
|
||||
# 4. Execute spec decode
|
||||
logits = self.model.compute_logits(hidden_states)
|
||||
logits = None
|
||||
if hasattr(self.model, "is_pooling_model") and self.model.is_pooling_model:
|
||||
pass
|
||||
else:
|
||||
# 4. Execute spec decode
|
||||
logits = self.model.compute_logits(hidden_states)
|
||||
|
||||
if not self.speculative_decoding:
|
||||
set_value_by_flags_and_idx(
|
||||
@@ -1625,8 +1629,13 @@ class GPUModelRunner(ModelRunnerBase):
|
||||
self.parallel_config.max_model_len,
|
||||
)
|
||||
|
||||
logits = None
|
||||
# 4. Compute logits, Sample
|
||||
logits = self.model.compute_logits(hidden_states)
|
||||
if hasattr(self.model, "is_pooling_model") and self.model.is_pooling_model:
|
||||
pass
|
||||
else:
|
||||
# 4. Execute spec decode
|
||||
logits = self.model.compute_logits(hidden_states)
|
||||
|
||||
if not self.speculative_decoding:
|
||||
set_value_by_flags_and_idx(
|
||||
|
Reference in New Issue
Block a user