I recently migrated our organization to Microsoft Lync Server 2010 and ran into a peculiar problem which I thought might be worthwhile sharing. Our internal users were not able to transfer calls directly with the Lync 2010 client, which was very disruptive. Here is what happened.
After the user selected the desired person to transfer to (step 1), the call went into transfer state (step 2).
A transfer was now attempted for 30 seconds after which the transfer failed with error “Cannot complete the transfer” (step 3). Luckily the user was given the option to resume the call to try another method of transferring the call, for instance by using the new Call Park feature.
However, when the user tried to resume the call too fast, he was confronted with yet another message: “An error occurred while trying to take the call off hold” (step 4).
Repeated attempts to resume the call eventually succeeded after a varying number of seconds, and the user was back in the original conversation after loosing roughly 45 seconds of precious productivity.
Neither the client’s nor the servers’ Windows event logs nor the standard Lync Server logs showed any signs of an issue, so I was faced with a bit of a mystery.
To learn more about why this happened I first turned on Lync client logging, then used Snooper.exe from the Lync Server 2010 Resource Kit to debug the output. I could see a SIP REFER request for the Mediation server being sent to the Front End server. Then a 30 second pause, then a client error “OUTGOING_REFER_TRANSACTION::OnTimerExpire MaxRetransmits for request Done terminating transaction”.
Using the Lync Server Logging Tool I logged MediationServer TF_PROTOCOL messages with a log level of Warning and higher. Analyzing the result with Snooper.exe, two consecutive messages stood out:
1) An outgoing SIP message sent from the external interface of the Mediation server to the upstream ITSP (Interoute) requesting a REFER for the call.
2) An outgoing SIP message sent from the internal interface of the Mediation server to the Front End server with a “408 Request Timeout” error, with reason “Gateway did not respond in a timely manner (timeout).”
Instead of handling the call transfer internally, Lync was asking the ITSP to handle the transfer on their session border controller. Apparently this is an unsupported feature with Interoute, but luckily it’s easy to turn off in the Lync Server Control Panel under Voice Routing –> Trunk Configuration.
Fixing the issue was now a matter of disabling the feature, committing the change and allowing the Replica Replicator Agent Service to do its magic! Keep in mind that the “Enable refer support” setting is enabled by default when creating a new trunk configuration.
Now, this still left the little mystery of not being able to resume the call immediately after the transfer failed. As it turns out, the transfer timeout (OnTimerExpire) on the client takes exactly 30 seconds, while the gateway timeout on the Mediation server took 34 seconds when I tested it. Although the client was ready to come up for air and resume the call, the Mediation server still had 4 seconds of breath left in its lungs. Once the SIP REFER timed out on the Mediation server, the call was released and the user was able to resume the call again.
(As a side note to any Microsoft devs stumbling upon this post; it would be nice to have a “Cancel transfer” option to resume the call immediately. The other person may just be leaving the desk to get some coffee, or the agent may realize he has chosen the wrong person from the list to transfer to.)