Building an IVR System with Lync
One huge benefit of Microsoft Lync/Skype for Business that’s often not mentioned is that it’s a software solution. Because of this, there are rich integration and extension opportunities available for developers. One of the key aims of this blog is to demonstrate these opportunities, and IÂ want to continue that with this post.
Traditionally an IVR system was something you commissioned your PBX provider to setup and maintain, often with expensive call-out charges. Can I really show you how to do something better in less than 300 lines of code?
Yes! I’m going to show you how to put together a working demonstration of what’s possible, with a DTMF-tone chooser menu, hold music, information look-up for customer self-service, and even conferencing options. It’s sample code and would need modification before you could use it for your company, but the changes are clear and simple.
Finally, all the code is available on GitHub, so you can do this too!
Which API to use?
One of the problems people have with Microsoft Lync / Skype for Business development is knowing which of the multiple APIs to use. I have a guide to choosing the right one which you can reference for later use but several factors make this choice easy.
We require a highly-available system which can accept multiple incoming calls and process them simultaneously. We need it to be able to receive incoming calls, accept them and then act on them. For this, a Trusted Application using the Unified Communications Managed API (UCMA) is the perfect fit. UCMA can run as a Console Application (as we will in this example), or as a Windows Service (which makes it easier to manage). It can handle many simultaneous calls and will process each one in a separate thread. This gives us the flexibility and power we need to produce a scalable IVR system simply.
I’m going to start with a new Console Application, and add references to Microsoft.Rtc.Collaboration from the UCMA 4.0 SDK.
Getting Established
I’m going to use separate class files to split out the different parts of this solution. The first file I’m going to create is LyncServer.cs (view in GitHub).
This will be responsible for setting up the UCMA application, connecting it into Lync and registering it to receive new incoming calls sent to its SIP address. I’m going to expose two public methods: Start() and Stop(). (there is another one, but it’s for conferencing so I’ll cover it later). I’m also going to create two events so this class can notify when it’s ready and when a new call is received.
Start()
Trusted Applications can now (since Lync 2010) be provisioned automatically which makes everything a lot easier. First, register the application and an endpoint in Lync using New-CsTrustedApplication and New-CsTrustedApplicationEndpoint (instructions and video). After than you can create a new CollaborationPlatform with the same application ID as the one you used to set up the application in PowerShell, register to receive notification when your endpoint is discovered, and then start it:
public async Task Start() { try { Console.WriteLine("Starting Collaboration Platform"); ProvisionedApplicationPlatformSettings settings = new ProvisionedApplicationPlatformSettings(_appUserAgent, _appID); _collabPlatform = new CollaborationPlatform(settings); _collabPlatform.RegisterForApplicationEndpointSettings(OnNewApplicationEndpointDiscovered); await _collabPlatform.StartupAsync(); Console.WriteLine("Platform Started"); } catch (Exception ex) { Console.WriteLine("Error establishing collaboration platform: {0}", ex.ToString()); } }
I’m using Michael Greenlee’s excellent Async Extension Methods to make this code simple. If you’re not using Lync 2013 or higher you won’t be able to do this, you’ll have to turn each Async method into  Begin..End methods.
OnNewApplicationEndpointDiscovered
UCMA will call this method when it finds an endpoint associated with this application. You can have more than one endpoint assocaited with an application, but in this project I’m assuming you only have one. We’re only interested in AudioVideo calls for our IVR solution, so that’s the only modality we’re going to register for incoming calls on. Once we’ve done that we establish the newly-discovered endpoint. At this point, we’re ready to go, so we raise the event to signify that we’re ready. (ignore the CreateConference() method for now, I’ll cover it later).
private async void OnNewApplicationEndpointDiscovered(object sender, ApplicationEndpointSettingsDiscoveredEventArgs e) { Console.WriteLine(string.Format("New Endpoint {0} discovered", e.ApplicationEndpointSettings.OwnerUri)); _endpoint = new ApplicationEndpoint(_collabPlatform, e.ApplicationEndpointSettings); _endpoint.RegisterForIncomingCall<AudioVideoCall>(OnIncomingCall); await _endpoint.EstablishAsync(); Console.WriteLine("Endpoint established"); await CreateConference(); LyncServerReady(this, new EventArgs()); }
Stop()
It’s good to tidy up after yourself, so I’m exposing a Stop() method, which terminates the endpoint and then shuts down the CollaborationPlatform.
Incoming!
At this point, our UCMA application is running. Anyone who calls our application’s endpoint will cause the OnIncomingCall() method to be triggered. Because I want to keep different parts of this project separate (to make it easier to follow), I’m not going to do anything with the incoming call in this class except raise an event about it, and pass the call in the event arguments:
//new incoming audio call private void OnIncomingCall(object sender, CallReceivedEventArgs<AudioVideoCall> e) { Console.WriteLine(string.Format("Incoming call! Caller: {0}", e.Call.RemoteEndpoint.Uri)); IncomingCall(this, e); }
We need a bit of a framework to run this code in, so I’m going to build out the Program.cs (view in GitHub)Â class in the Console Application. All I’m going to do is instantiate a new LyncServer object, register for the events and start it. Then, I’ll add a ReadLine() to break flow, then stop it. I’m sure you’ve all done something like this before. The poor man’s UI. 🙂
static void Main(string[] args) { AppDomain.CurrentDomain.UnhandledException += UnhandledExceptionTrapper; //quick and dirty error handling! _server = new LyncServer(); _server.LyncServerReady += server_LyncServerReady; _server.IncomingCall += server_IncomingCall; Task t = _server.Start(); Console.ReadLine(); var stopping = _server.Stop(); stopping.Wait(); Console.WriteLine("Press any key to exit"); Console.ReadLine(); }
Give Them Options
If you were now to run this application it would start up, establish itself as a UCMA application and any incoming calls would arrive at server_IncomingCall in Program.cs. Excellent, we’ve abstracted away how the setup process happens and given ourselves a method to use to deal with new incoming calls.
We want to automatically accept all incoming calls. That’s not always the case though – you can imagine applications where you’d want to do some logic first before deciding whether to accept or reject calls. The part of the call we’re really interested in for this project isn’t actually the main Call object (containing stuff about participants and IDs) but the AudioVideoFlow object: you can think of this as the actual voice/video part of the call. Every modality has its own flow object – for instance the InstantMessagingCall has an InstantMessageFlow which contains the actual IMs that flow between sender and receiver(s). Because this flow object only gets fully established once we accept the call we register for state changes on it before we accept the call. That means that once it’s active we can proceed:
static void server_IncomingCall(object sender, Microsoft.Rtc.Collaboration.CallReceivedEventArgs<Microsoft.Rtc.Collaboration.AudioVideo.AudioVideoCall> e) { e.Call.AudioVideoFlowConfigurationRequested += Call_AudioVideoFlowConfigurationRequested; e.Call.AcceptAsync(); } static void Call_AudioVideoFlowConfigurationRequested(object sender, AudioVideoFlowConfigurationRequestedEventArgs e) { e.Flow.StateChanged +=Flow_StateChanged; } static void Flow_StateChanged(object sender, Microsoft.Rtc.Collaboration.MediaFlowStateChangedEventArgs e) { if (e.State == Microsoft.Rtc.Collaboration.MediaFlowState.Active) { //do something with the flow - it's now active! } }
In this example I want to provide an IVR – a Interactive Voice Response system. That means I need to do some Text-To-Speech and process incoming DTMF tones (they’re the tones your keypad makes when you press the buttons). Luckily for me (and you) this support is all built it into UCMA. The only extra thing you need to do is add a reference to System.Speech.
I’m going to use a new class library for the IVR menu, called IVRMenu.cs (view in GitHub). It exposes a single public method, the terribly-named StartWithWelcome. I pass through the AudioVideoFlow object, and also a reference to the LyncServer object (for conferencing, later).
Because a new thread is created for each incoming call, a new instance of IVRMenu will be created. This means I can be lazy and create some class-level variables to store flow and server object.
The first thing I want to do is read out a welcome to the user. To do this I use a SpeechSynthesisConnector object. It’s really hard to set this up and connect it to the AudioVideoFlow object:
SpeechSynthesisConnector _speechSynthesisConnector = new SpeechSynthesisConnector(); SpeechAudioFormatInfo audioformat = new SpeechAudioFormatInfo(16000, AudioBitsPerSample.Sixteen, System.Speech.AudioFormat.AudioChannel.Mono); _speechSynthesisConnector.AttachFlow(_flow); _speechSynthesizer = new SpeechSynthesizer(); _speechSynthesizer.SetOutputToAudioStream(_speechSynthesisConnector, audioformat);
Now that the synthesizer is connected to the call flow object, getting it to speak to your caller is equally as hard:
_speechSynthesizer.Speak("Hello!");
In this instance I’m using the method Speak() which will speak synchronously – that is to say, the code won’t continue to execute until the speaking is finished. Sometimes that’s a good thing (for instance here where we want to say hello to the user before we do anything else. However, when it comes to reading out the menu there’s an expectation that the user can “jump ahead” of the application and choose the option if they already know what it is. Therefore, there is also a SpeakAsync() method which doesn’t pause code execution.
Before we read out the menu options though we need to make sure we can respond to the choices the user makes. We need to be able to hear the DTMF tones. To do this we create a new ToneController and attach it to the flow. We also register with the controller’s ToneReceived event so that we know when a key has been pressed:
var toneController = new ToneController(); //this is for the DTMF tones toneController.AttachFlow(_flow); toneController.ToneReceived += toneController_ToneReceived;
Now we can go ahead and read out the options, using SpeakAsync() because we know the user might press the tone before we’ve finished speaking:
private void SpeakMenuOptions() { _speechSynthesizer.SpeakAsync("Press 1 to hear the time. Press 2 to join a conference, or press 3 to hear some music."); }
In this example I’m not adding any code to handle users not doing anything, or pressing a button we’re not dealing with, but hopefully you can see how this would be trivial.
toneController_ToneReceived
Alright, so the user presses a button. Code execution falls through to our ToneReceived method, which contains in its arguments a Tone property, which is the tone the user pressed. The first thing we should do is stop the reading of the menu options (if it’s still going on), with a simple call to SpeakAsyncCancelAll(). Then, a simple switch statement allows us to evaluate e.Tone to see what to do next.
In my IVR example I have three options, designed to showcase three different things you can do with UCMA.  I’ve  going to cover the fist one (the time) here, and then talk about the other two (hold music and conferencing) in the next sections.
If the user presses 1 on my IVR, I’m simply going to read out the time:
_speechSynthesizer.Speak(string.Format("It's {0} on {1}", DateTime.Now.ToShortTimeString(), DateTime.Now.ToLongDateString()));
This is super-simple, but that’s why I included it. It’s really a placeholder for real business processes. You can read out to the user whatever information you want here. Maybe you have an order system – why not have them type their order number in, then you can look up the status and read it out to them. That’s just one example – the point is you have unlimited opportunities here to integrate this into your existing back-end systems. The world is your IVR!
There is a whole extra level of magic here, which I don’t have time to go into. UCMA can also do speech recognition, so instead of making them paw at the keypad you can instead have then announce their intent – just like a grown-up IVR you’d pay megabucks for! To find out more about speech recognition, have a look at this MSDN article, or contact me if you’d like me/my company to build one for you!
Hold Music
No IVR would be complete without some hold music. Admittedly not many have an explicit option to just listen to it without doing anything else (although sometimes I’m not so sure), but that’s what we’re going to do here. In reality you’d probably be doing something else whilst you inflicted this on your user, but at least this shows you how you’d do it.
The developers who put together UCMA made a clever architecture decision. Although it’s possible to play every user their own WMA file (yes, only WMA files are supported, get over it!) from the beginning, some of the time you don’t actually care where in the file the user joins it. Therefore, you can attach one playing file to many different calls simultaneously. This means you can scale your system much more efficiently. Sometimes of course you want every user to start at the beginning (recorded announcements for instance) in which case you’ll need to treat each case separately, but when you’re just playing hold music, having one single stream of music and adding multiple callers to it works really well.
I’ve created a separate class file to showcase how the hold music is done, called ContinuousMusicPlayer.cs (view on GitHub). It’s a singleton class, because we only ever want one instance of it, which all incoming callers can be attached to.
When the singleton object is created the method CreatePlayer() is called. This creates a new WmaFileSource object (which links to the .wma to play), ‘prepares’ the source so it’s ready to go, then creates a new Player object. In order to ensure that it plays continuously, I subscribe to state-change events, and when it stops, I start it again!
private void CreatePlayer() { var source = new WmaFileSource("music.wma"); source.EndPrepareSource(source.BeginPrepareSource(MediaSourceOpenMode.Buffered, null, null)); _player = new Player(); _player.SetSource(source); _player.SetMode(PlayerMode.Manual); _player.StateChanged += player_StateChanged; _player.Start(); } void player_StateChanged(object sender, PlayerStateChangedEventArgs e) { if (e.State == PlayerState.Stopped) { _player.Start(); } }
I expose the Player object, because this is what you attach to any call you want to hear the music. So, back in the IVRMenu object, you’ll see that I attach the Player to the AudioVideoFlow object. You’ll notice I had to detach the speech synthesizer first – this is because you can only attach one thing at a time (makes sense really). This also explains why I detach from the ContinuousMusicPlayer just before the switch statement: detaching a flow which isn’t attached won’t cause an error…but if the music was attached and the user then pressed 1 to read out the time, the code would crash when it tried to attach a second thing to the flow.
Always Be Conferencing
My third and final option is a little pointless, but it shows you something else you can do with UCMA. What I do is join anyone who chooses this option into a conference, along with anyone else who also chose that option! A dumping ground for customers, if you will. I think it would be more fun if IVRs did this more often, rather than just cutting you off!
Anyway, earlier I told you to ignore two lines of code: a call to CreateConference() in LyncServer.cs when the application was setup, and a method AddCallerToConference() also in LyncServer.cs. Let’s look at those now.
Similar to the concept of the shared hold music above, I’m creating a new conference when the UCMA application starts up, and then adding each person who wants to be into the same conference.
Creating a new conference is easy enough. There are two types: scheduled conferences and ad-hoc conferences. You can think of the difference between them as being the difference between you setting up a new Lync meeting for a pre-determined time in the future (a scheduled conference) vs using the Meet Now button, or just dragging multiple people into a conversation to force a new conference (an ad-hoc conference). For this code, I’m creating an ad-hoc conference. I do this simply by not specifying any conference scheduling information:
private async Task CreateConference() { _confConversation = new Conversation(_endpoint); var options = new ConferenceJoinOptions() { JoinMode = JoinMode.TrustedParticipant, AdHocConferenceAccessLevel = ConferenceAccessLevel.Everyone, AdHocConferenceAutomaticLeaderAssignment = AutomaticLeaderAssignment.Everyone }; await _confConversation.ConferenceSession.JoinAsync(options); Console.WriteLine("New conference created"); }
The AddCallerToConference() method is called from the IVR menu when the user presses the appropriate button on their keypad. I can reference the AudioVideo part of the conference (the AudioVideoMcuSession) and transfer the caller’s AudioVideoCall right into it. The effect for the end-user is that they seamlessly drop into the conference.
Show Me The Code
The code is all on GitHub. It’s a Visual Studio project, and references the UCMA 4.0 SDK. All the previously mentioned task extensions are in a folder – “UCMA Task Extensions”. Program.cs is a good place to start if you want to trace through what’s happening. I haven’t included the music.wma file referenced by ContinuousMusicPlayer.cs – so you’ll have to provide your own cheesy hold music.
Summary
IVRs are always considered hard to do, hard to change and expensive. When you use Microsoft Lync as your PBX replacement and voice solution, suddenly everything gets easier. Using the power of a Lync software solution, you can make powerful IVR applications relatively quickly and easily – often with far more power and usefulness than IVRs have ever had.
Hello,
I have an IVR app that if someone hangs up during speech synthesis… the application crashes with:
Unhandled Exception: System.IO.EndOfStreamException: Attempted to read past the
end of the stream.
at Microsoft.Speech.Synthesis.SpeechSynthesizer.SpeakPrompt(Prompt prompt, Bo
olean async)
I am not using SpeakAsync (another issue), but wondering what I can do to capture this and either do nothing, or perform some other action. I added the “UnhandledExceptionTrapper” and it catches it, but the app still crashes. 🙁
Any help is greatly appreciated!
Thanks,
Paul
Hello,
it looks like that the IVRMenu.cs is missing in the Github Repository.
Can you add this please.
Thanks,
Sven
Hi,
I tried your code, and the code is running.
But when I call the sip address of the application, the OnIncomingCall is not triggered…
Do you know what can be the problem?
Thanks
Alejandro
Hello! Congratulations for your post!
I cannot find IVRMenu.cs file on githun! Can you help me?
hi
in this github project, ivrmenu.cs is missing please check
thanks
This is awesome thanks Tom! I downloaded the example from Git Hub, however the project is missing IVRMenu.cs. Would you kindly update?
See broken link here – https://github.com/tomorgan/UCMA-IVR-Demo/blob/master/IVRMenu.cs
Added – my apologies
Hi Tom,
Thank you for this sharing. It provided to find you.
Actually we are using UCMA IVR. It was developed last year.
We have a problem about hearing dtmf tones. Our bussiness units can listen the UCMA IVR for some customer problems. Ok, they can listen to UCMA IVR but not hear dtmf tones. Is there a way to hide tones?
Hi,
We are using the UCMA IVR for a year.
I need an advice about DTMF tones.
Our bussiness unit can listen to the UCMA IVR for some customer problem. But they don’t want to hear dtmf tones because some people can understand the numbers from the tones and it creats a sequrity problem.
Is there a way to hide this tones when they listen at UCMA IVR?
Hi,
Nice work, but, how can i use another port for the application? is there another constructor wich port is specified?
Thanks you.