apps/go2rtc

Fork 0

mirror of https://github.com/AlexxIT/go2rtc.git synced 2025-09-26 20:31:11 +08:00

Files

Alex X 6d37cceb91 Improve readme for wyoming module

2025-04-25 14:52:11 +03:00

9.5 KiB

Raw Blame History

Wyoming

This module provide Wyoming Protocol support to create local voice assistants using Home Assistant.

go2rtc can act as Wyoming Satellite
go2rtc can act as Wyoming External Microphone
go2rtc can act as Wyoming External Sound
any supported audio source with PCM codec can be used as audio input
any supported two-way audio source with PCM codec can be used as audio output
any desktop/server microphone/speaker can be used as two-way audio source
- supported any OS via FFmpeg or any similar software
- supported Linux via alsa source
you can change the behavior using the built-in scripting engine

Typical Voice Pipeline

Audio stream (MIC)
- any audio source with PCM codec support (include PCMA/PCMU)
Voice Activity Detector (VAD)
Wake Word (WAKE)
- OpenWakeWord
Speech-to-Text (STT)
- Whisper
- Vosk
Conversation agent (INTENT)
- Home Assistant
Text-to-speech (TTS)
- Google Translate
- Piper
Audio stream (SND)
- any source with two-way audio (backchannel) and PCM codec support (include PCMA/PCMU)

You can use a large number of different projects for WAKE, STT, INTENT and TTS thanks to the Home Assistant.

And you can use a large number of different technologies for MIC and SND thanks to Go2rtc.

Configuration

You can optionally specify WAKE service. So go2rtc will start transmitting audio to Home Assistant only after WAKE word. If the WAKE service cannot be connected to or not specified - go2rtc will pass all audio to Home Assistant. In this case WAKE service must be configured in your Voice Assistant pipeline.

You can optionally specify VAD threshold. So go2rtc will start transmitting audio to WAKE service only after some audio noise.

Your stream must support audio transmission in PCM codec (include PCMA/PCMU).

wyoming:
  stream_name_from_streams_section:
    listen: :10700 
    name: "My Satellite"                # optional name
    wake_uri: tcp://192.168.1.23:10400  # optional WAKE service
    vad_threshold: 1                    # optional VAD threshold (from 0.1 to 3.5)

Home Assistant -> Settings -> Integrations -> Add -> Wyoming Protocol -> Host + Port from go2rtc.yaml

Select one or multiple wake words:

wake_uri: tcp://192.168.1.23:10400?name=alexa_v0.1&name=hey_jarvis_v0.1&name=hey_mycroft_v0.1&name=hey_rhasspy_v0.1&name=ok_nabu_v0.1

Events

You can add wyoming event handling using the expr language. For example, to pronounce TTS on some media player from HA.

Turn on the logs to see what kind of events happens.

This is what the default scripts look like:

wyoming:
  script_example:
    event:
      run-satellite: Detect()
      pause-satellite: Stop()
      voice-stopped: Pause()
      audio-stop: PlayAudio() && WriteEvent("played") && Detect()
      error: Detect()
      internal-run: WriteEvent("run-pipeline", '{"start_stage":"wake","end_stage":"tts"}') && Stream()
      internal-detection: WriteEvent("run-pipeline", '{"start_stage":"asr","end_stage":"tts"}') && Stream()

Supported functions and variables:

Detect() - start the VAD and WAKE word detection process
Stream() - start transmission of audio data to the client (Home Assistant)
Stop() - stop and disconnect stream without disconnecting client (Home Assistant)
Pause() - temporary pause of audio transfer, without disconnecting the stream
PlayAudio() - playing the last audio that was sent from client (Home Assistant)
WriteEvent(type, data) - send event to client (Home Assistant)
Sleep(duration) - temporary script pause (ex. Sleep('1.5s'))
PlayFile(path) - play audio from wav file
Type - type (name) of event
Data - event data in JSON format (ex. {"text":"how are you"})
also available other functions from expr module (ex. fetch)

If you write a script for an event - the default action is no longer executed. You need to repeat the necessary steps yourself.

In addition to the standard events, there are two additional events:

internal-run - called after Detect() when VAD detected, but WAKE service unavailable
internal-detection - called after Detect() when WAKE word detected

Example 1. You want to play a sound file when a wake word detected (only wav supported):

PlayFile and PlayAudio functions are executed synchronously, the following steps will be executed only after they are completed

wyoming:
  script_example:
    event:
      internal-detection: PlayFile('/media/beep.wav') && WriteEvent("run-pipeline", '{"start_stage":"asr","end_stage":"tts"}') && Stream()

Example 2. You want to play TTS on a Home Assistant media player:

Each event has a Type and Data in JSON format. You can use their values in scripts.

in the synthesize step, we get the value of the text and call the HA REST API
in the audio-stop step we get the duration of the TTS in seconds, wait for this time and start the pipeline again

wyoming:
  script_example:
    event:
      synthesize: |
        let text = fromJSON(Data).text;
        let token = 'eyJhbGci...';
        fetch('http://localhost:8123/api/services/tts/speak', {
          method: 'POST',
          headers: {'Authorization': 'Bearer '+token,'Content-Type': 'application/json'},
          body: toJSON({
            entity_id: 'tts.google_translate_com',
            media_player_entity_id: 'media_player.google_nest',
            message: text,
            language: 'en',
          }),
        }).ok
      audio-stop: |
        let timestamp = fromJSON(Data).timestamp;
        let delay = string(timestamp)+'s';
        Sleep(delay) && WriteEvent("played") && Detect()

Config examples

Satellite on Windows server using FFmpeg and FFplay.

streams:
  satellite_win:
    - exec:ffmpeg -hide_banner -f dshow -i "audio=Microphone (High Definition Audio Device)" -c pcm_s16le -ar 16000 -ac 1 -f wav -
    - exec:ffplay -hide_banner -nodisp -probesize 32 -f s16le -ar 22050 -#backchannel=1#audio=s16le/22050

wyoming:
  satellite_win:
    listen: :10700
    name: "Windows Satellite"
    wake_uri: tcp://192.168.1.23:10400
    vad_threshold: 1

Satellite on Dahua camera with two-way audio support.

streams:
  dahua_camera:
    - rtsp://admin:password@192.168.1.123/cam/realmonitor?channel=1&subtype=1&unicast=true&proto=Onvif

wyoming:
  dahua_camera:
    listen: :10700
    name: "Dahua Satellite"
    wake_uri: tcp://192.168.1.23:10400
    vad_threshold: 1

Satellite on external wyoming Microphone and Sound.

streams:
  wyoming_external:
     - wyoming://192.168.1.23:10600                # wyoming-mic-external
     - wyoming://192.168.1.23:10601?backchannel=1  # wyoming-snd-external

wyoming:
   wyoming_external:
    listen: :10700
    name: "Wyoming Satellite"
    wake_uri: tcp://192.168.1.23:10400
    vad_threshold: 1

Wyoming External Microphone and Sound

Advanced users, who want to enjoy the Wyoming Satellite project, can use go2rtc as a Wyoming External Microphone or Wyoming External Sound.

go2rtc.yaml

streams:
  wyoming_mic_external:
    - exec:ffmpeg -hide_banner -f dshow -i "audio=Microphone (High Definition Audio Device)" -c pcm_s16le -ar 16000 -ac 1 -f wav -
  wyoming_snd_external:
    - exec:ffplay -hide_banner -nodisp -probesize 32 -f s16le -ar 22050 -#backchannel=1#audio=s16le/22050

wyoming:
  wyoming_mic_external:
    listen: :10600
    mode: mic
  wyoming_snd_external:
    listen: :10601
    mode: snd

docker-compose.yml

version: "3.8"
services:
  satellite:
    build: wyoming-satellite  # https://github.com/rhasspy/wyoming-satellite
    ports:
      - "10700:10700"
    command:
      - "--name"
      - "my satellite"
      - "--mic-uri"
      - "tcp://192.168.1.23:10600"
      - "--snd-uri"
      - "tcp://192.168.1.23:10601"
      - "--debug"

Wyoming External Source