Speech API

For the WordTutor application to work, we need to be able to read words (and letters) out loud to our student. To power the speech synthesis, we’re going to integrate Azure Cognitive Services into the application.

Azure Speech API

Setting up access to the Speech API within Azure Cognitive Services is relatively straightforward. Rather than repeating the details here, I’ll just point you to the quickstart. I’m using the free tier (“F0”), allowing for up to 5 hours of speech rendering per month.

When you’re finished creating your service, take note of the region you used (mine was australiaeast) and one of the API keys generated for you.

Secret Management

We don’t want to embed any secrets directly in the application, nor commit them to our git repository.

Fortunately, we can easily move those secrets out of the application, using a couple of NuGet packages:

Microsoft.Extensions.Configuration provides basic infrastructure; and
Microsoft.Extensions.Configuration.UserSecrets allows us to keep our secrets outside the project directory during development.

There are other packages available as well, allowing configuration to be stored in other places.

With those packages installed, we can use the dotnet command to store our secrets.

For the entry-point project of our application we need a one-time initialization:

PS> dotnet user-secrets init

Set UserSecretsId to 'ab66dbe0-608d-4ac8-84a4-cb083baac56b'
for MSBuild project '...\WordTutor.Azure.csproj'.

If you’re doing this for yourself, don’t make the mistake I just showed above! Secrets configuration needs to be done for the entry point of the project, so I had to redo the above step for the WordTutor.Desktop project.

Once completed, you’ll find a <UserSecretsId> element has been added to the .csproj file of your project, in the first <PropertyGroup>:

<PropertyGroup>
    <OutputType>WinExe</OutputType>
    <TargetFramework>netcoreapp3.0</TargetFramework>
    ...
    <UserSecretsId>ab66dbe0-608d-4ac8-84a4-cb083baac56b</UserSecretsId>
</PropertyGroup>

To store our two secrets, we again use the dotnet command:

PS> dotnet user-secrets set "WordTutor:SpeechApiKey" "yoursupersecretapikeygoeshere"
Successfully saved WordTutor:SpeechApiKey = yoursupersecretapikeygoeshere to the secret store.

dotnet user-secrets set "WordTutor:SpeechApiRegion" "australiaeast"
Successfully saved WordTutor:SpeechApiRegion = australiaeast to the secret store.

These secrets are stored in your user profile on this PC. Navigate to the folder %USERPROFLE%\AppData\Roaming\Microsoft\UserSecrets to find a folder with the name matching the <UserSecretsId> from above; inside there is usersecrets.json, containing your secrets. It’s worth emphasizing that there’s no encryption here; the goal is to keep the secrets out of your git repo, not to hide them from you.

Speech service

We need to declare a service interface for our application, representing the speech service in a technology-agnostic way:

public interface ISpeechService : IDisposable
{
    Task SayAsync(string content);
}

Using this interface will allow us to wire up a fake service for testing purposes, allowing us to verify correct behaviour without actually calling into the Azure implementation and incurring costs.

Our primary implementation of ISpeechService will be AzureSpeechService. To keep our dependencies properly isolated, this lives in a new project WordTutor.Azure. Anything else Azure related we choose to add in the future will live here too.

public class AzureSpeechService : ISpeechService
{
    private readonly IConfigurationRoot _configurationRoot;
    private readonly ILogger _logger;

    public AzureSpeechService(
        IConfigurationRoot configuration, 
        ILogger logger)
    {
        // elided
    }

    public async Task SayAsync(string content)
    {
        // elided
    }
}

The IConfigurationRoot parameter for the constructor is how we retrieve the user secrets we stashed away earlier. In full, the constructor looks like this:

public AzureSpeechService(IConfigurationRoot configuration, ILogger logger)
{
    _configurationRoot = configuration 
        ?? throw new ArgumentNullException(nameof(configuration));
    _logger = logger
        ?? throw new ArgumentNullException(nameof(logger));

    var apiKey = _configurationRoot["WordTutor:SpeechApiKey"];
    var apiRegion = _configurationRoot["WordTutor:SpeechApiRegion"];

    _logger.Debug($"ApiKey: {apiKey}");
    _logger.Debug($"ApiRegion: {apiRegion}");

    _configuration = SpeechConfig.FromSubscription(apiKey, apiRegion);
}

SimpleInjector

To make our ISpeechService available for consumption, we need to register it with our dependency injection container.

// Register Services
container.RegisterSingleton<ISpeechService, AzureSpeechService>();

We also need a singleton registration for IConfigurationRoot. To build this we need to first build it:

// Register Configuration
var builder = new ConfigurationBuilder();
builder.AddUserSecrets<Program>();
container.RegisterInstance<IConfigurationRoot>(builder.Build());

The generic parameter Program provided to the AddUserSecrets() call is used to identify the entry-point assembly. The <UserSecretsId> element from the csproj file turns into an assembly level attribute, allowing the running application to find the secrets it needs.

Consumption

Finally, for demo purposes, we can inject ISpeechService into our main window, hook it up to a button and make everything work.

public WordTutorWindow(
    ViewModelToViewValueConverter converter,
    ISpeechService speechService)
{
    Resources.Add("ViewModelToViewValueConverter", converter);

    InitializeComponent();
    this.speechService = speechService;
}

private void Button_Click(object sender, RoutedEventArgs e)
{
    speechService.SayAsync("Hello World.");
}

Having (literally!) achieved “Hello World”, we need to make a number of improvements. Most notably, we currently have a non-trivial lag before speech begins - plus we’re re-rendering the same text as audio every time we want to speak. Some caching - and some pre-caching - seems to be in order.

Redeveloping WordTutor
Speech API

Azure Speech API

Secret Management

Speech service

SimpleInjector

Consumption

About this series

Posts in this series

Comments

Redeveloping WordTutorSpeech API

Azure Speech API

Secret Management

Speech service

SimpleInjector

Consumption

About this series

Posts in this series

Comments

Redeveloping WordTutor
Speech API