Commit Graph

3 Commits

Author SHA1 Message Date
hlohaus
b68b9ff6be feat: add audio generation support for multiple providers
- Added new examples for `client.media.generate` with `PollinationsAI`, `EdgeTTS`, and `Gemini` in `docs/media.md`
- Modified `PollinationsAI.py` to default to `default_audio_model` when audio data is present
- Adjusted `PollinationsAI.py` to conditionally construct message list from `prompt` when media is being generated
- Rearranged `PollinationsAI.py` response handling to yield `save_response_media` after checking for non-JSON content types
- Added support in `EdgeTTS.py` to use default values for `language`, `locale`, and `format` from class attributes
- Improved voice selection logic in `EdgeTTS.py` to fallback to default locale or language when not explicitly provided
- Updated `EdgeTTS.py` to yield `AudioResponse` with `text` field included
- Modified `Gemini.py` to support `.ogx` audio generation when `model == "gemini-audio"` or `audio` is passed
- Used `format_image_prompt` in `Gemini.py` to create audio prompt and saved audio file using `synthesize`
- Appended `AudioResponse` to `Gemini.py` for audio generation flow
- Added `save()` method to `Image` class in `stubs.py` to support saving `/media/` files locally
- Changed `client/__init__.py` to fallback to `options["text"]` if `alt` is missing in `Images.create`
- Ensured `AudioResponse` in `copy_images.py` includes the `text` (prompt) field
- Added `Annotated` fallback definition in `api/__init__.py` for compatibility with older Python versions
2025-04-19 06:23:46 +02:00
hlohaus
1296b3f64f refactor: update audio parameter handling in EdgeTTS and stubs
- Remove the unused `language`, `locale`, and `extra_parameters` parameters from the `EdgeTTS` function signature in `g4f/Provider/audio/EdgeTTS.py`.
- Update voice selection logic to check for `"locale"` and `"language"` keys in the `audio` dictionary, defaulting to `cls.default_locale` when neither is provided, and modify the error message accordingly.
- Refactor extraction of extra parameters by building a dict from the `audio` dictionary for keys `"rate"`, `"volume"`, and `"pitch"`.
- In `g4f/api/stubs.py`, remove the try/except block for importing `Annotated` and import `Messages` from `..typing` instead.
- Add an optional `audio: Optional[dict] = None` field to the `ImageGenerationConfig` model.
2025-04-19 03:51:37 +02:00
hlohaus
e83282fc4b feat: add EdgeTTS audio provider and global image→media refactor
- **Docs**
  - `docs/file.md`: update upload instructions to use inline `bucket` content parts instead of `tool_calls/bucket_tool`.
  - `docs/media.md`: add asynchronous audio transcription example, detailed explanation, and notes.

- **New audio provider**
  - Add `g4f/Provider/audio/EdgeTTS.py` implementing Edge Text‑to‑Speech (`EdgeTTS`).
  - Create `g4f/Provider/audio/__init__.py` for provider export.
  - Register provider in `g4f/Provider/__init__.py`.

- **Refactor image → media**
  - Introduce `generated_media/` directory and `get_media_dir()` helper in `g4f/image/copy_images.py`; add `ensure_media_dir()`; keep back‑compat with legacy `generated_images/`.
  - Replace `images_dir` references with `get_media_dir()` across:
    - `g4f/api/__init__.py`
    - `g4f/client/stubs.py`
    - `g4f/gui/server/api.py`
    - `g4f/gui/server/backend_api.py`
    - `g4f/image/copy_images.py`
  - Rename CLI/API config field/flag from `image_provider` to `media_provider` (`g4f/cli.py`, `g4f/api/__init__.py`, `g4f/client/__init__.py`).
  - Extend `g4f/image/__init__.py`
    - add `MEDIA_TYPE_MAP`, `get_extension()`
    - revise `is_allowed_extension()`, `to_input_audio()` to support wider media types.

- **Provider adjustments**
  - `g4f/Provider/ARTA.py`: swap `raise_error()` parameter order.
  - `g4f/Provider/Cloudflare.py`: drop unused `MissingRequirementsError` import; move `get_args_from_nodriver()` inside try; handle `FileNotFoundError`.

- **Core enhancements**
  - `g4f/providers/any_provider.py`: use `default_model` instead of literal `"default"`; broaden model/provider matching; update model list cleanup.
  - `g4f/models.py`: safeguard provider count logic when model name is falsy.
  - `g4f/providers/base_provider.py`: catch `json.JSONDecodeError` when reading auth cache, delete corrupted file.
  - `g4f/providers/response.py`: allow `AudioResponse` to accept extra kwargs.

- **Misc**
  - Remove obsolete `g4f/image.py`.
  - `g4f/Provider/Cloudflare.py`, `g4f/client/types.py`: minor whitespace and import tidy‑ups.
2025-04-19 03:20:57 +02:00