After our recent introduction of caching, I had an interesting conversation with a friend about the way I’d written the code. He was persuasive that the approach I’d taken had some significant issues and that it was worth taking the time to address them.
The AzureSpeechGenerator class is doing too much. In violation of the single responsibility principle, the class has multiple conflicting responsibilities. It interacts with Azure Cognitive Services, caches the results, and plays audio through the speaker.
The caching approach is flawed. Only completed speech audio is cached. Multiple overlapping requests to play the same audio can result in multiple requests to ACS to generate the same speech audio. This is wasteful in both time and cost.
The null return for failed speech generation is non-idiomatic. In normal .NET code, unusual failures are conventionally indicated by throwing an exception, not by returning an error code. We don’t expect the call to ACS to fail, so the use of null as a sentinel return value is more than a little dodgy. Worse, we’re discarding all the details of the actual error.
To address these issues, we’re going to restructure the speech service subsystem. Fortunately, we can do this without changing the existing
ISpeechService interface, so the rest of the application will remain untouched.
Separation of Concerns
Let’s separate out our concerns (or responsibilities) by introducing a new interface:
IRenderSpeechService. This interface abstracts the ability to convert content into speech:
With this interface, we can apply the decorator design pattern and encapsulate all of our required caching functionality in a dedicated type,
AzureSpeechService can be dramatically simplified, focusing entirely on interacting with Azure. The core method,
RenderSpeechAsync simplifies down to this:
Note the new exception type
RenderSpeechException that is thrown near the end if something goes wrong. In our simplified
SpeechServices implementation, we catch that to log any errors:
With a dedicated wrapper class that does just caching, we can focus on doing the job properly.
Our prior implementation only cached the completed results of rendering phrases into speech; we also need to cache in-flight rendering so that we don’t trip up over ourselves by doing the same work over and over. This means we need two different stores in our cache:
In case you haven’t seen it before outstanding asynchronous work is handled in .NET by using a
TaskCompletionSource. This class is awaitable, allowing asynchronous code to use the normal await keyword to obtain the result when it’s available.
With two distinct dictionaries in use, we need to use locks to ensure our consumers never see partial updates.
The core of the cache is in the
RenderSpeechAsync method. Here’s how it works.
If the required stream is already present in our cache of completed results, we can return it directly. If not, we look in our cache of pending results.
If we found something in our cache of pending results, we await the provision of a result and then return it. Note that we can’t do the await within the lock - not only does the compiler stop us from doing that, but we also don’t want to keep the lock while we’re waiting.
Since we didn’t find a pending result, we will need to call our nested RenderSpeechService; to avoid others doing the same work over, we need to add this into our cache of pending results.
With our cache updated, we call into our nested speech renderer. Once we have a result, we update our TaskCompletionSource, releasing any tasks that were awaiting the result. We also update our caches and then return the result.
Code reviews are always valuable - and, assuming you have the time to do so, fixing up code issues is worthwhile too.