After our recent introduction of caching, I had an interesting conversation with a friend about the way I’d written the code. He was persuasive that the approach I’d taken had some significant issues and that it was worth taking the time to address them.

The AzureSpeechGenerator class is doing too much. In violation of the single responsibility principle, the class has multiple conflicting responsibilities. It interacts with Azure Cognitive Services, caches the results, and plays audio through the speaker.

The caching approach is flawed. Only completed speech audio is cached. Multiple overlapping requests to play the same audio can result in multiple requests to ACS to generate the same speech audio. This is wasteful in both time and cost.

The null return for failed speech generation is non-idiomatic. In normal .NET code, unusual failures are conventionally indicated by throwing an exception, not by returning an error code. We don’t expect the call to ACS to fail, so the use of null as a sentinel return value is more than a little dodgy. Worse, we’re discarding all the details of the actual error.

To address these issues, we’re going to restructure the speech service subsystem. Fortunately, we can do this without changing the existing ISpeechService interface, so the rest of the application will remain untouched.

Separation of Concerns

Let’s separate out our concerns (or responsibilities) by introducing a new interface: IRenderSpeechService. This interface abstracts the ability to convert content into speech:

public interface IRenderSpeechService : IDisposable
{
    Task<Stream> RenderSpeechAsync(string content);
}

With this interface, we can apply the decorator design pattern and encapsulate all of our required caching functionality in a dedicated type, CachingRenderSpeechService.

Our existing AzureSpeechService can be dramatically simplified, focusing entirely on interacting with Azure. The core method, RenderSpeechAsync simplifies down to this:

public async Task<Stream> RenderSpeechAsync(string content)
{
    var audioStream = AudioOutputStream.CreatePullStream();
    var audioConfig = AudioConfig.FromStreamOutput(audioStream);

    using var _synthesizer 
        = new SpeechSynthesizer(_configuration, audioConfig);
    using var result = await _synthesizer.SpeakTextAsync(content);

    if (result.Reason == ResultReason.SynthesizingAudioCompleted)
    {
        var stream = new MemoryStream();
        stream.Write(result.AudioData, 0, result.AudioData.Length);
        return stream;
    }

    var details = SpeechSynthesisCancellationDetails.FromResult(result);
    throw new RenderSpeechException(details.ErrorDetails);
}

Note the new exception type RenderSpeechException that is thrown near the end if something goes wrong. In our simplified SpeechServices implementation, we catch that to log any errors:

public async Task SayAsync(string content)
{
    using var logger = _logger.Action($"Say {content}");
    try
    {
        var speech = await _renderSpeechService.RenderSpeechAsync(content)
            .ConfigureAwait(false);

        _player.Stop();
        speech.Seek(0, SeekOrigin.Begin);
        _player.Stream = speech;
        _player.Play();
    }
    catch(RenderSpeechException ex)
    {
        logger.Failure(ex.Message);
    }
}

Caching improvements

With a dedicated wrapper class that does just caching, we can focus on doing the job properly.

Our prior implementation only cached the completed results of rendering phrases into speech; we also need to cache in-flight rendering so that we don’t trip up over ourselves by doing the same work over and over. This means we need two different stores in our cache:

// Cache of completed speech rendering
private Dictionary<string, Stream> _completedRenders 
    = new Dictionary<string, Stream>();

// Details of speech rendering that is still under way
private Dictionary<string, TaskCompletionSource<Stream>> _pendingRenders 
    = new Dictionary<string, TaskCompletionSource<Stream>>();

In case you haven’t seen it before outstanding asynchronous work is handled in .NET by using a TaskCompletionSource. This class is awaitable, allowing asynchronous code to use the normal await keyword to obtain the result when it’s available.

With two distinct dictionaries in use, we need to use locks to ensure our consumers never see partial updates.

The core of the cache is in the RenderSpeechAsync method. Here’s how it works.

public async Task<Stream> RenderSpeechAsync(string content)
{
    bool haveSource;
    TaskCompletionSource<Stream>? tcsStream;

    lock(_padlock)
    {
        if (_completedRenders.TryGetValue(content, out var existingStream))
        {
            return existingStream;
        }

If the required stream is already present in our cache of completed results, we can return it directly. If not, we look in our cache of pending results.

        haveSource = _pendingRenders.TryGetValue(content, out tcsStream);
    }

    if (haveSource)
    {
        return await tcsStream!.Task.ConfigureAwait(false);
    }

If we found something in our cache of pending results, we await the provision of a result and then return it. Note that we can’t do the await within the lock - not only does the compiler stop us from doing that, but we also don’t want to keep the lock while we’re waiting.

Since we didn’t find a pending result, we will need to call our nested RenderSpeechService; to avoid others doing the same work over, we need to add this into our cache of pending results.

    tcsStream = new TaskCompletionSource<Stream>();
    lock(_padlock)
    {
        _pendingRenders[content] = tcsStream;
    }

With our cache updated, we call into our nested speech renderer. Once we have a result, we update our TaskCompletionSource, releasing any tasks that were awaiting the result. We also update our caches and then return the result.

    var stream = await _renderSpeechService.RenderSpeechAsync(content).ConfigureAwait(false);
    tcsStream.SetResult(stream);

    lock(_padlock)
    {
        _completedRenders[content] = stream;
        _pendingRenders.Remove(content);
    }

    return stream;
}

Conclusion

Code reviews are always valuable - and, assuming you have the time to do so, fixing up code issues is worthwhile too.

Prior post in this series:
Caching Speech
Next post in this series:
Caching without Race Conditions

Comments

blog comments powered by Disqus