Andrej Tozon's blog

In the Attic

NAVIGATION - SEARCH

Text-To-Speech with Windows 10 IoT Core & UWP on Raspberry Pi Part 2

In my previous post, I've written about using Raspberry Pi running Windows 10 IoT Core for Text-To-Speech services for my smart home. As I've mentioned in that post, I have speakers in two floors wired to an amplifier, connected to a Raspberry Pi. The thing is that each of the speakers is wired to its own audio channel - ground floor speaker is wired to the left and 1st floor speaker is wired to the right channel. It may not be what you'd call a true stereo when listening to music, but it plays a big deal with speech running through the house. With such wiring, I can target the floor I want to convey the speech through, exclusively or with subtle mix, e.g. 100% volume to ground floor and only 10% to 1st floor (it's basically how audio balancing works). This basically lets me cover these use cases and more:

  • Want to call your children that are playing in their rooms upstairs, to lunch? Use 1st floor speaker.
  • Somebody at the door? Send the message to upstairs and ground floor speakers equally.
  • Late at night, kids are sleeping? Maybe use ground floor speakers at full and upstairs speakers at a minimum?
  • Etc...

Scenarios are limitless. And the MediaPlayer class from my previous example offers just what I needed to implement this - audio balance. You can simply set the balance prior to playing your audio, like this:

public async Task SayAsync(string text, double balance)
{
    speechPlayer.AudioBalance = balance;
    using (var stream = await speechSynthesizer.SynthesizeTextToStreamAsync(text))
    {
        speechPlayer.Source = MediaSource.CreateFromStream(stream, stream.ContentType);
    }
    speechPlayer.Play();
}
This code is from my previous blog post, with additional parameter for setting the AudioBalance property prior to playing the synthesized speech.

Playing speech remotely

Of course the real fun begins when you're able to control your audio player remotely, e.g. from an application or even web browser. To achieve this, I have put together a simple "web server" that runs as a background service on Windows 10 IoT Core. I've used the SimpleWebServer from IoTBlockly code sample as a base for my own implementation of a "poor man's Web API server", just trying to simulate controllers, etc. I won't go into that code as it's very hacky, absolutely not production ready, complete or even tested, but it appears to work OK for my current needs. Full source code I've used for this blog post is included in my sample project; I'm only listing the part that controls the speech here:

internal class SayController : Controller
{
    private readonly SpeechService _speechService;
    public SayController(SpeechService speechService)
    {
        _speechService = speechService;
    }
    public async Task<WebServerResponse> Get(string text, string floor)
    {
        if (string.IsNullOrEmpty(floor)) { await _speechService.SayAsync(text, 0) } else if (floor.ToLower() == "up")         {
            await _speechService.SayAsync(text, -1);
        }
        else
        {
            await _speechService.SayAsync(text, 1);
        }
        return WebServerResponse.CreateOk("OK");
    }
}
The code is pretty much self explaining - the first parameter contains the text that should be spoken and the second parameter is the name of the floor the text should be spoken on ("up" sets the audio balance to the left channel and "down" to the right).

With such server set up on Raspberry Pi (the sample is set to listening on port 8085), it's easy for me to make my home say the synthesized text on a specific floor by simply calling its URL:

http://<IP>:8085/say?text=Hello&floor=up

Sample source code for this post is available on GitHub.