Team-Connect Engineering Guide · Updated 3 May 2026

Troubleshooting Guide

A practical engineer's guide to debugging voice AI and integration failures — how to identify the right subsystem fast, isolate audio from recognition from TTS from webhook from API problems, and use the right diagnostic tool for each layer of the stack.

Symptom-to-subsystem mapping · Audio + ASR + TTS + Webhook + API · Logs, traces, support tickets
Jump to a section

01How to Debug Voice AI Systematically

The fastest way to fix a voice AI problem is to resist the urge to "just try things" and instead follow a deliberate process. Almost every voice AI failure is at one specific layer of the stack, and the diagnostic effort needed once you know the layer is small. The expensive bit is figuring out which layer.

Step 1: Identify the symptom precisely

"It's not working" is not a symptom. Push for specifics until you can fill in this sentence: "When the user [does X], [Y happens] but [Z was expected]." Common categories:

Symptom categoryWhat it sounds likeLikely layer
Audio missing or broken"I can't hear them" / "They can't hear me" / "Audio is choppy/robotic"Network, codec, RTP transport, audio codecs, WebRTC ICE
Recognition wrong"AI is misunderstanding what I say" / "It cuts me off"ASR / voice recognition
AI voice wrong"AI sounds wrong/robotic" / "Mispronouncing names" / "Wrong voice"TTS
AI says wrong things"AI gave wrong information" / "Booked wrong slot"LLM / prompt / agent logic
Slow / sluggish"There's a long pause before the AI responds" / "Calls feel laggy"Latency budget across multiple stages
Data missing"Call didn't show in my CRM" / "Webhook never arrived"Webhook delivery or API integration
Authentication broken"Getting 401 errors" / "Token rejected"API authentication

Step 2: Isolate to a single layer

The voice AI stack has roughly 8 layers, each with its own diagnostic surface:

  1. Network & transport — SIP signalling, RTP media, TCP/UDP, NAT, firewalls.
  2. Audio codec — sample rate, encoding format, bitrate.
  3. Voice activity detection — speech vs silence detection.
  4. ASR (voice recognition) — audio → text.
  5. LLM — text → response text.
  6. TTS — response text → audio.
  7. Webhook delivery — events from us to your endpoint.
  8. API integration — your code calling our API.

Once you know which layer the problem is at, you usually know what tool to reach for: SIP traces for transport issues, audio capture and playback for codec issues, transcript inspection for ASR, raw API request/response logs for webhook and API problems, distributed traces for cross-layer latency.

Step 3: Reproduce in development

Before changing anything, get a reliable repro. If the issue happens "sometimes", figure out the conditions that trigger it: specific phone numbers, specific times of day, specific call lengths, specific browsers, specific accents, specific carriers. A reproducible bug is half-fixed; an intermittent one without conditions is essentially unfixable until you find the pattern.

Step 4: Check the obvious before the obscure

Before diving deep, check the things that go wrong frequently:

  • Did anything change recently? Deploys, config updates, library upgrades, infrastructure changes — "it worked yesterday and not today" usually traces back to a change.
  • Are the credentials right? Wrong API key, expired token, wrong environment (test vs production), key revoked.
  • Are you hitting rate limits? Check 429 frequency in your API logs.
  • Is the provider's status page green? Sometimes it really is them.
  • Is your TLS cert valid? Expired certs cause silent failures of every kind.
  • Is DNS resolving correctly? Check from the actual environment, not your laptop.

Step 5: Use a structured fix-and-verify loop

When you have a hypothesis, change exactly one thing, verify it works, and roll back if it doesn't. Multiple simultaneous changes mean if it works you don't know which fix mattered, and if it doesn't you've made the system harder to reason about. One change, verify, next change.

The single biggest debugging time-waster: jumping to a fix without isolating the layer. "Voice quality is bad, let me adjust the TTS settings" — when the actual problem is RTP packet loss on the carrier link, which no TTS setting can fix. Spend the first 10 minutes on isolation, even when you think you already know the answer. The diagnostic time pays itself back many times over.

02Audio Quality Issues

Audio problems are the most common voice AI complaint and the most varied in cause. The pattern of the failure usually points directly at the layer:

Symptom: One-way audio

You can hear them but they can't hear you, or vice versa. Almost always a NAT or firewall issue in the RTP media path.

  • Cause: SIP signalling negotiates one set of IPs and ports for media (RTP); the firewall has not opened those ports or NAT is rewriting addresses incorrectly.
  • Fixes: enable SIP ALG on the firewall (or disable it — both are common, depending on the firewall implementation), open the RTP UDP port range (typically 10000-20000), use STUN/TURN to traverse NAT, or use a session border controller (SBC) to handle NAT centrally.
  • Deep dive: see the SIP trunking and security sections of SIP Protocol Basics.

Symptom: Choppy or stuttering audio

Audio cuts in and out in small fragments. Almost always packet loss or jitter on the network path.

  • Cause: RTP packets are arriving late, out of order, or being dropped. The receiver's jitter buffer cannot smooth it out.
  • Fixes: increase the jitter buffer size if your endpoint allows (trades latency for smoothness), check the network path for congestion, switch to a codec with better packet loss concealment (Opus > G.711), prioritise voice traffic with QoS markings.
  • Deep dive: see Audio Codecs Explained for codec choice; the WebRTC troubleshooting section of WebRTC Integration for browser-side debugging.

Symptom: Robotic or distorted audio

Audio is intelligible but sounds artificial, metallic, or compressed. Usually a codec mismatch or transcoding artefact.

  • Cause: the call is being transcoded between incompatible codecs (e.g. Opus ↔ G.711 ↔ Opus), or the bitrate is too low for the content (16 kbps Opus on music sounds bad).
  • Fixes: negotiate the same codec end-to-end where possible, raise bitrate if the codec supports it, ensure the SDP offer is preferring your target codec.
  • Deep dive: the codec comparison and quality tables in Audio Codecs Explained.

Symptom: No audio at all

Call connects but neither side hears anything. Usually media negotiation failure or a totally blocked media path.

  • Cause: SDP negotiation failed (no common codec), RTP packets blocked entirely, or wrong audio device selected at the endpoint.
  • Fixes: capture the SIP INVITE and 200 OK to verify SDP negotiation, check that your firewall is not blocking all UDP, verify microphone/speaker permissions in the browser for WebRTC.
  • Deep dive: the "How a SIP Call Works" walk-through in SIP Protocol Basics.

Symptom: Echo or feedback

You hear yourself a fraction of a second after speaking, or there's an oscillating howl.

  • Cause: microphone is picking up speaker output (acoustic echo), or there is a network loop (less common). For voice AI specifically, the AI's own TTS leaking into the microphone confuses ASR.
  • Fixes: enable acoustic echo cancellation (AEC) on the endpoint — standard in browser WebRTC, configurable on most SIP phones. For voice AI, AEC is essential for barge-in to work correctly.
  • Deep dive: the security and barge-in sections of TTS Configuration.

Symptom: Volume too quiet or too loud

Audio levels are wrong on one or both sides.

  • Cause: AGC (automatic gain control) misfiring, microphone gain wrong, or codec normalisation differences.
  • Fixes: calibrate input levels at the endpoint, check that AGC is enabled in the browser's getUserMedia constraints for WebRTC calls, normalise audio at the server if necessary.
The first audio diagnostic move: capture a real RTP stream during a problem call and play it back. If you can hear what's wrong in the recording, the audio left the network in that state and the problem is upstream of you. If the recording sounds fine, the problem is at the playback endpoint — speaker, headset, browser, audio routing on the device. This single test eliminates 50% of audio investigations.

03Voice Recognition (ASR) Problems

Voice recognition (ASR) failures fall into a small set of repeatable patterns. The pattern points to the cause; the cause points to the fix.

Symptom: ASR returns wrong words

The transcript says "I'd like to book for two" when the user said "I'd like to book for Tuesday". Misrecognition is a function of audio quality, language model match, and domain vocabulary.

  • Cause 1: audio quality. Heavily compressed audio (G.711 narrowband, 8kHz mu-law) loses the high-frequency information ASR uses to distinguish similar sounds. Background noise drops accuracy further.
  • Cause 2: language mismatch. ASR defaulting to general English while your callers use Glaswegian, Welsh-accented English, or domain jargon.
  • Cause 3: out-of-vocabulary words. Product names, place names, brand names that the model has never seen.
  • Fixes: use the highest-quality codec you can (Opus 16kHz beats G.711 by a wide margin), set the right language code (en-GB, not en), configure custom vocabulary boost for your domain words.
  • Deep dive: the accuracy and improvement sections of Voice Recognition Setup.

Symptom: ASR cuts off mid-sentence

The user says "Could I book a slot for next week, maybe Tuesday or Wednesday" but the transcript stops after "next week". This is endpointing too aggressively.

  • Cause: the endpoint detection silence threshold is too short. The user paused to think, ASR decided they were done, and submitted the partial transcript.
  • Fixes: increase the silence threshold (typically from 500ms to 800-1200ms for natural speech with thinking pauses), enable smarter endpointing models that consider linguistic completeness rather than just silence.
  • Trade-off: longer thresholds make recognition feel slower because the AI waits longer before responding. The right value depends on your use case — appointment booking might need 1000ms, drive-thru order taking might need 600ms.

Symptom: Long delay before transcript appears

User stops speaking and there's a multi-second gap before the AI responds.

  • Cause 1: batch ASR not streaming. Some providers default to batch transcription — the entire utterance is uploaded after speech ends, transcribed, then returned. This can take 1-3 seconds.
  • Cause 2: endpointing too lazy. Endpoint detection is waiting too long for the user to finish.
  • Cause 3: network round-trip to far region. ASR provider is in US-East and your callers are in UK, adding 100ms+ per request.
  • Fixes: use streaming ASR with interim results, tune endpointing balance, pick a provider region near your callers.

Symptom: Empty transcript on quiet speech

User speaks clearly but ASR returns no transcript at all.

  • Cause 1: VAD threshold too high. Voice activity detection is rejecting the audio as non-speech.
  • Cause 2: input gain too low. The audio level is below ASR's signal floor.
  • Cause 3: wrong audio format. Sending mu-law where the API expects PCM, or 8kHz where it expects 16kHz, can cause silent recognition failure with no error.
  • Fixes: lower VAD sensitivity, raise input gain, verify audio format matches provider documentation exactly.

Symptom: ASR mishears specific names every time

"Mathew" becomes "Matthew" every call; "Macclesfield" becomes "Maxwellfield"; product names become English approximations.

  • Cause: the names are out of the model's vocabulary, or the model has a strong prior toward common spellings.
  • Fixes: use the provider's keyword boost / custom vocabulary feature with your problem words at high boost weight (10-20). For very rare words, a custom acoustic model fine-tuned on your domain audio is the heavyweight option but rarely needed.
  • Deep dive: the keyword boost and custom vocabulary techniques in Voice Recognition Setup.
The single most useful ASR diagnostic: capture the audio that produced a wrong transcript and play it back to a human. If a human cannot understand it either, the problem is upstream (audio quality, codec, network). If a human can understand it perfectly, the problem is in ASR (language, vocabulary, endpointing). This routes you to the right fix in 30 seconds.

04TTS and AI Voice Problems

TTS issues are usually one of: wrong words being said, wrong way of saying them, or wrong timing of saying them. Each has its own diagnostic path.

Symptom: TTS mispronouncing names or jargon

"Anthropic" becomes "an tropic"; "Macclesfield" becomes "Mac-cles-field" with the wrong stress; "Claude" becomes "cloud".

  • Cause: the TTS model has never seen these words and is making a phonetic guess.
  • Fixes (in order of effort): try alternative spellings (AnthropicAnthropik usually works), use SSML <phoneme> with IPA notation for the difficult words, use SSML <sub alias="..."> to substitute pronunciation at speak time, or fine-tune a voice on examples that include your domain vocabulary.
  • Deep dive: the SSML section of TTS Configuration.

Symptom: TTS sounds robotic or unnatural

Audio is intelligible but the pacing, emphasis or intonation feels wrong.

  • Cause 1: text without punctuation. TTS uses punctuation as a major signal for prosody. Run-on sentences with no commas come out flat.
  • Cause 2: chunking mid-sentence. If you stream text to TTS in fragments smaller than a sentence, each fragment is synthesised without context and the joins sound abrupt.
  • Cause 3: wrong voice for the content. A "warm conversational" voice sounds robotic on long technical content; a "professional newscaster" voice sounds robotic on casual chat.
  • Fixes: add proper punctuation in your LLM output, only stream to TTS at sentence boundaries, choose a voice that matches the content tone.

Symptom: TTS responds slowly

Long delay between AI deciding what to say and audio actually playing.

  • Cause 1: batch TTS instead of streaming. Generating the full audio file before any plays adds 1-3 seconds.
  • Cause 2: not pipelining LLM → TTS. Waiting for the LLM to finish before sending text to TTS doubles the perceived latency.
  • Cause 3: provider region distant. TTS server in US-East for UK callers adds 100ms round-trip per request.
  • Fixes: use streaming TTS with WebSocket, pipe LLM output to TTS at sentence boundaries, choose a region near your callers, pre-render frequently-used phrases.
  • Deep dive: the streaming setup and voice AI pipeline sections of TTS Configuration.

Symptom: TTS audio glitches at chunk boundaries

Pop, click, or repeated syllable at the join between two streamed audio chunks.

  • Cause: sample rate mismatch between provider output and your decoder, or you're concatenating PCM that includes WAV headers (inserting headers mid-stream breaks decoding).
  • Fixes: verify the audio format your provider sends matches what you're decoding (16kHz vs 24kHz vs 48kHz), use the provider's "headerless" PCM mode for streaming.

Symptom: SSML is being read literally

The TTS audio says "less than break time equals five hundred milliseconds slash greater than" instead of inserting a pause.

  • Cause: the provider doesn't support SSML, or supports it on a different endpoint than you're using. OpenAI TTS does not support SSML; ElevenLabs supports a subset.
  • Fixes: use SSML only with supporting providers, or wrap SSML emission in helper functions that emit empty strings for non-supporting providers and let punctuation handle pacing.

Symptom: AI cuts itself off mid-sentence

TTS audio stops abruptly before finishing the response.

  • Cause 1: barge-in misfiring. The system detected user speech (real or false-positive) and stopped TTS.
  • Cause 2: WebSocket disconnect. Provider connection dropped mid-stream.
  • Cause 3: audio buffer underrun. The playback ran out of data before the next chunk arrived.
  • Fixes: tune barge-in sensitivity (or disable AEC if it's leaking false positives), add reconnection logic to your TTS WebSocket, increase the playback buffer floor.
The first TTS diagnostic move: capture the input text and the output audio for a problem case. If the input text is wrong (LLM hallucinated, missing punctuation, weird formatting), the problem is upstream of TTS. If the input text is fine but the audio is wrong, the problem is the TTS layer itself — voice selection, SSML, or provider.

05Latency and Timing Issues

"It's slow" is one of the most common voice AI complaints and one of the most fixable. Almost every "slow" voice AI is a fixable pipeline problem rather than a fundamental limit. Approach it the same way you would profile any slow system: measure each stage, find the dominant cost, optimise that, repeat.

The end-to-end latency budget

For voice AI to feel conversational, the gap from "user stops speaking" to "user hears AI start speaking" should be under 1 second. That budget breaks down approximately:

StageBudgetWhat it does
Audio capture & transport< 50msUser audio leaves device, arrives at your edge
VAD endpointing500-800msSystem decides user is done speaking
ASR final transcript50-200msStream finalises after endpointing detected
LLM time-to-first-token200-500msModel starts generating response
TTS time-to-first-audio100-200msFirst audio chunk leaves TTS provider
Network & playback< 50msAudio reaches user device and plays

The ASR endpointing stage dominates the budget. If a user's natural pause is being counted as "done speaking", you're already 800ms in before anything else has happened. Most "slow voice AI" wins come from tuning this stage carefully and pipelining everything that follows.

How to find the slow stage

Add timestamps to your call logs at every stage transition:

  • user_speech_started_at — first VAD-confirmed audio
  • user_speech_ended_at — VAD endpoint detected
  • asr_final_at — final transcript received from ASR
  • llm_started_at — LLM stream initiated
  • llm_first_token_at — first token received
  • tts_started_at — first text sent to TTS
  • tts_first_audio_at — first audio chunk received from TTS
  • audio_playback_at — audio started playing to user

The deltas between consecutive timestamps are your stage budget breakdown. Look at p50, p95, p99 across many calls — users feel p95+, not the average. The single largest delta is your bottleneck.

Common latency patterns and their fixes

What you seeLikely causeFix
Long gap between user_speech_ended and asr_final (>500ms)Batch ASR, slow finalisation, distant regionUse streaming ASR; pick closer region
Long gap between asr_final and llm_started (>200ms)Synchronous business logic between stages, slow database readPre-fetch context; parallelise database calls
Long llm_first_token (>1s)Cold start, large prompt, distant regionReduce prompt size; use streaming; warmer region
Long gap between llm_first_token and tts_started (>300ms)Waiting for full LLM response before sending to TTSPipeline LLM → TTS at sentence boundaries
Long tts_first_audio (>400ms)Batch TTS, distant region, opening fresh WebSocket per turnStreaming TTS; closer region; reuse connections
Long gap from tts_first_audio to audio_playbackInitial buffer too large, decoding slow, output device queue fullReduce initial buffer; check audio output pipeline

The latency-saving moves that almost always help

  • Stream everything. Streaming ASR with interim results, streaming LLM responses, streaming TTS. Batch anywhere in the pipeline is a 1-3 second penalty.
  • Pipeline LLM → TTS. Send LLM tokens to TTS at sentence boundaries as they emerge; don't wait for the full response.
  • Pre-render common openers. "Hi, how can I help?" is the same 99% of the time. Render once, play from cache, skip the entire TTS round-trip on the first turn.
  • Pick close regions. Every 1000 km of geographic distance adds ~10ms round-trip. Stack three providers in distant regions and you've added 100ms before any compute.
  • Reuse connections. Opening a fresh WebSocket per turn adds 100-200ms of TLS handshake. Keep connections open across turns where the provider supports it.
  • Tune endpointing per use case. 600ms is fine for transactional flows; 1000ms is needed for thoughtful flows. Don't use one global value.
Most "voice AI is too slow" complaints come down to two specific things: (a) endpointing waiting too long for the user to finish, and (b) batch (non-streaming) TTS. Fix those two and most apparent slowness goes away. The remaining 10% requires the full profiling exercise above.

06Webhook Delivery Issues

Webhook problems are usually invisible until you go looking. Events not arriving means data is silently missing from your CRM, your downstream systems, or your analytics. Build the habit of monitoring delivery rather than waiting for someone to notice gaps.

Symptom: Webhook events not arriving at all

You expected a call.ended event, your handler never received it.

  1. Check sender's delivery log. Most providers (including Team-Connect) show every webhook delivery attempt, response code received, and error if any. If the dashboard shows no attempts, the webhook is not subscribed correctly. If attempts show but they're failing, you've narrowed it.
  2. Check endpoint reachability. Run curl -i https://your-domain.example.com/webhooks/team-connect from a machine outside your network. If it doesn't respond, the URL isn't reachable from the internet.
  3. Check TLS validity. Expired certificate or chain issues cause silent failures. openssl s_client -connect your-domain.example.com:443 -servername your-domain.example.com shows the cert state.
  4. Check that you're returning 2xx fast enough. If your handler takes longer than 5-10 seconds, the sender treats it as failed and may retry briefly then give up.

Deep dive: the testing and security sections of Webhook Integration.

Symptom: HMAC signature verification failures

Your handler receives webhooks but rejects them as having invalid signatures.

  • Most common cause: body parser middleware running before your verification code. The middleware parses JSON, your code re-serialises to verify, formatting differs, signature fails. Fix: capture the raw body bytes before any parsing.
  • Second most common cause: wrong secret. Whitespace in the env var, wrong environment, secret rotated and not updated.
  • Less common: wrong algorithm (SHA-1 vs SHA-256), wrong encoding (hex vs base64), case sensitivity in header name.
  • Diagnostic move: log the received body length, the received signature header value, and your computed signature side by side. The difference is usually obvious once visible.

Deep dive: the HMAC verification section of Webhook Integration.

Symptom: Duplicate webhook events

Same event arriving multiple times, causing double-processing.

  • Cause: your handler is not returning 2xx fast enough, or downstream processing is failing and you're returning 5xx accidentally. Sender retries.
  • The fix is idempotency, not de-duplication of incoming events. Use the event ID for dedupe; treat duplicate delivery as expected behaviour, not a bug. See section 06 of Webhook Integration.

Symptom: Webhook events arriving out of order

call.ended arrives before call.started.

  • Cause: webhook delivery is generally not order-guaranteed across event types. Network paths differ, retries happen at different times, and some events take longer to compute than others.
  • Fix: design your handler to be order-tolerant. Use the event timestamp, not arrival time, for ordering decisions. If event B arrives before event A and B references A, hold B until A arrives or until a reasonable timeout.

Symptom: Sender keeps retrying despite 200

Your handler returns 200 but the sender's log shows the event as failed.

  • Most likely: you're returning 200 too late, after the sender's timeout. Move heavy work to async queue and return 200 within milliseconds.
  • Less likely: network issue between you and the sender means the 200 isn't reaching them.

The webhook delivery monitor every team should have

Build a simple alert that fires when expected events are missing:

  • Track the gap between successive webhook deliveries.
  • Alert when the gap exceeds your usual maximum (e.g. you usually get an event every 30 seconds; alert if 5 minutes pass with no events).
  • Cross-check against the sender's delivery log periodically — if their dashboard shows attempts you didn't receive, your endpoint has a silent reliability issue.

Without this monitor, you'll learn about webhook outages from your customers. With it, you find them in minutes.

The webhook diagnostic that solves 90% of issues: open ngrok pointing at your handler, configure the sender to send to the ngrok URL, trigger a real event, and watch the request inspector at http://localhost:4040. You see the exact bytes the sender sent, your exact response, and the timing of every step. Most webhook bugs become obvious in 30 seconds with this view.

07API Integration Problems

API integration failures cluster into a small set of recurring patterns. Most don't need deep investigation — just methodical checking.

Symptom: 401 Unauthorized on every request

  • Auth header malformed. Check it's exactly Authorization: Bearer YOUR_KEY — one space, no quotes around the value, case-sensitive.
  • Empty or undefined env var. Print the first 8 characters of the loaded key (with the rest redacted) to confirm it's not empty.
  • Wrong environment's key. Production keys against test endpoints, or vice versa.
  • Key revoked or rotated. Check the dashboard.
  • Deep dive: the authentication section of API Documentation.

Symptom: 403 Forbidden despite valid auth

  • Wrong scope. Read-only key trying to write, or scope-restricted key accessing a resource outside its allowlist.
  • Account-level feature flag missing. Beta endpoints often require explicit opt-in.
  • Diagnostic: read the error envelope. The code or message usually says exactly which permission is missing.

Symptom: 429 Too Many Requests

  • Cause: exceeding rate limit. Check X-RateLimit-Remaining on responses to see how close you are.
  • Most common root cause: retry storm. You hit a 429 and retried immediately without backoff, hitting the limit again, retrying again, and so on.
  • Fix: implement exponential backoff with jitter, respect Retry-After, self-throttle proactively for known bursty work.
  • Deep dive: the rate limiting section of API Documentation.

Symptom: Intermittent 5xx errors

  • Transient downstream issues. With exponential backoff retries, most are absorbed automatically.
  • Persistent 5xx on one endpoint: provider-side bug. Capture request_id values and report.
  • Don't retry forever. Cap retry counts (3-5 attempts) so a persistently broken endpoint doesn't pin your worker.

Symptom: Silent timeouts

  • Cause: no timeout configured. Default for many HTTP libraries is "wait forever".
  • Fix: set explicit timeouts (10-30s for batch jobs, 5-10s for user-facing flows) on every API call.
  • Recovery: after a timeout, you don't know if the request was processed. Use idempotency keys on writes so retry is safe.

Symptom: Pagination losing or duplicating records

  • Cause: using offset pagination on a list that is changing during iteration (records being added or deleted shifts subsequent pages).
  • Fix: switch to cursor pagination if the API supports it (it almost certainly does for any data that changes). Cursor pagination is stable across writes; offset pagination is not.
  • Deep dive: the pagination section of API Documentation.

Symptom: Schema drift breaking your code

  • Cause: provider added or removed a field, or returns a different shape on edge cases (e.g. successful empty list returns []; failed empty list returns null).
  • Fix: defensive parsing — use .get('field', default) in Python, optional chaining (?.) in TypeScript, never assume nested fields exist. Wrap parsing in try/except and log full payload on failures.
  • Prevention: pin to a specific API version (header-based or URL-based) so the provider can ship schema changes without breaking you. Subscribe to their changelog.

Symptom: Works on my laptop, fails in CI/production

  • API key not configured in CI secrets / production env.
  • Outbound HTTPS blocked by firewalls or egress rules in production network.
  • DNS resolution failing from the production environment (some private network setups block public DNS).
  • TLS trust chain stale in old container CA bundles — fails on freshly-rotated certs.
  • Diagnostic: curl -v https://api.team-connect.co.uk/v1/health from inside the failing environment confirms or rules out network reachability in seconds.
Always include the request_id in support queries. Every reputable API returns a request_id on responses (often in X-Request-Id or in error envelopes). When you contact support, include it. The provider can look up exactly what their server saw and tell you precisely what went wrong — instead of guessing from your description.

08Logs, Traces and Monitoring

The single biggest predictor of how fast you can diagnose a voice AI issue is whether you have good observability in place before the issue occurs. Teams that instrument well solve issues in minutes; teams that don't spend hours guessing.

What to log on every API call and webhook

Minimum viable structured log line for any external HTTP interaction:

  • Direction (outbound API call vs inbound webhook)
  • Method and URL (or webhook path)
  • HTTP status code
  • Latency in milliseconds
  • Provider's request_id if returned (essential for support)
  • Internal trace ID linking this log to the user request that triggered it
  • Error type and code on failures (parsed from response envelope)
  • Retry attempt number, if retrying

Use structured logging (JSON, key-value) so you can grep, aggregate and chart these fields.

Distributed tracing for voice calls

A single voice call spans many services: telephony provider, your media server, ASR, LLM, TTS, your business APIs, the user's client. Distributed tracing (OpenTelemetry standard) ties them together with a single trace ID per call so you can see the entire timeline.

Minimum spans to instrument:

  • call.session — root span, full duration of the call
  • asr.utterance — one per user utterance, with audio bytes received and final transcript
  • llm.completion — one per LLM round-trip with prompt size, response size, time-to-first-token
  • tts.synthesis — one per TTS request with input length, time-to-first-audio, total duration
  • tool.call — one per agent action (booking, lookup, transfer)
  • webhook.delivery — one per webhook fired downstream

With these in place, "this call felt slow" becomes a one-click investigation rather than a multi-hour fishing expedition.

Metrics that matter

Track these as time-series with p50, p95, p99 percentiles:

MetricWhat it tells you
API request rate per endpointUsage patterns, retry storms
API error rate per status code4xx = your bugs; 5xx = provider issues
API latency per endpointGeographic distance, provider performance, your internal slowness
Webhook delivery success rateEndpoint reliability, network reachability
Webhook handler latencyWhether you risk timing out the sender
Webhook queue depth + DLQ sizeHealth of async processing pipeline
Voice call end-to-end latency (user_speech_ended → ai_audio_started)The metric users feel; alert above 1.5s p95
ASR final-transcript latencyRecognition stage health
TTS time-to-first-audioTTS stage health; alert above 400ms p95
Call quality MOS (if calculated)Audio path health

Alerting thresholds

Default alert triggers worth setting:

  • API error rate > 1% sustained for 5 minutes — something is broken.
  • Webhook DLQ growing — events are being lost.
  • Webhook delivery gap > 10x normal — deliveries are stalling.
  • p95 voice call latency > 1.5s — users feeling slowness.
  • p95 TTS first-byte > 500ms — TTS stage degraded.
  • Auth failure rate > baseline — key compromised or rotated incorrectly.

Audit your observability quarterly

Set a calendar reminder every three months to ask: "If a major issue happened right now, what would I check first? Do I have the data to check it?" Add the missing instrumentation before you need it. Teams that do this catch issues early; teams that don't learn the hard way.

Modern SDKs ship with optional OpenTelemetry instrumentation. Turn it on. Most providers' Node, Python and Go SDKs have a single config flag that enables full tracing of every API call as a child span of your application's existing trace. Free observability with no extra code.

09Common Error Messages Decoded

The cryptic error messages you actually encounter, with their real-world meaning and the fix.

Voice and audio errors

MessageWhat it actually meansFix
NotAllowedError: Permission deniedBrowser blocked microphone accessUser must grant permission; you cannot bypass
NotFoundError: Requested device not foundNo audio input device availableCheck device list; user may have unplugged microphone
OverconstrainedErrorYour getUserMedia constraints are too strictRelax constraints (sample rate, channel count); fall back gracefully
ICE connection failed / ICE failedWebRTC could not find a network path through NAT/firewallConfigure TURN servers (UDP+TCP), see WebRTC Integration
SDP negotiation failed / incompatible codecThe two endpoints share no common audio codecCheck SDP offer/answer; ensure both sides support a common codec (Opus or G.711)
RTP timeout / media timeoutNo RTP packets received for N seconds during a callCheck firewall RTP ports; check NAT traversal; check carrier path
SIP 408 Request TimeoutSIP signalling didn't receive a response in timeCheck connectivity to SIP registrar; check carrier provisioning
SIP 486 Busy HereCalled party is busyNormal; route to voicemail or callback queue
SIP 503 Service UnavailableSIP server overloaded or in maintenanceRetry with backoff; check provider status page

ASR errors

MessageWhat it actually meansFix
Empty transcript returnedASR detected no speech in the audioCheck VAD threshold; check input gain; verify audio format
Audio format not supportedSample rate, encoding or channel count doesn't match what API expectsVerify against provider docs (8kHz mu-law vs 16kHz PCM, etc.)
Language not supportedLanguage code not available on this provider/modelCheck provider's supported language list; use BCP 47 codes (en-GB not en)
WebSocket disconnected during streamStreaming ASR connection dropped mid-utteranceAdd reconnection logic; check network stability; verify auth doesn't expire mid-stream

TTS errors

MessageWhat it actually meansFix
Voice not foundThe voice ID you specified doesn't exist or isn't available on your planCheck voice ID spelling; verify plan includes the voice
Text too long / character limit exceededYour input exceeds the provider's per-request limitSplit into chunks at sentence boundaries; submit sequentially
SSML parse errorYour SSML markup is malformedValidate against the provider's SSML schema; escape special characters
Voice cloning quota exceededYou've created more cloned voices than your plan allowsDelete unused clones or upgrade plan

API errors

CodeWhat it actually meansFix
401 UnauthorizedAuth header missing, malformed, or key invalidCheck Authorization header; verify key value
403 ForbiddenAuthenticated but key lacks permissionCheck scope; check feature flag; check resource ownership
404 Not FoundResource doesn't exist (or isn't visible to your account)Verify ID; verify ownership
409 ConflictRequest conflicts with current resource stateRead response for details; usually re-fetch and retry with current state
422 Unprocessable EntityRequest well-formed but validation failedRead error envelope — param or field shows which input was rejected
429 Too Many RequestsRate limit hitRespect Retry-After; back off; self-throttle
500/502/503/504Server-side issue (bug, downstream failure, overload)Retry with backoff; check provider status page; report with request_id if persistent

Webhook errors

Symptom in sender's logWhat it actually meansFix
Connection refusedYour endpoint is not listening on that port/URLCheck service is running; check URL spelling
SSL handshake failedTLS cert expired, hostname mismatch, or chain incompleteRenew cert; check CN/SAN; ensure full chain served
Connection timeoutNetwork unreachable from sender to your endpointCheck firewall, DNS, public reachability with curl from outside
Response timeout / 504Endpoint accepted connection but didn't return 2xx in timeMove heavy work to async queue; return 200 within milliseconds
Receiver returned 401Your handler rejected the signatureCheck raw body capture; check secret value; check timing-safe compare

10Contacting Support Effectively

When you have done the diagnostic work and concluded the issue is provider-side, contacting support well is what gets it fixed in hours instead of days. The single biggest factor: include enough detail that support can reproduce or trace your issue without back-and-forth.

The information that gets fast resolution

  • Precise expected vs actual behaviour. "Webhooks are broken" is unhelpful; "I expected a call.ended webhook for call ID call_abc123 at 14:32 BST today, but my handler received nothing" gives support somewhere to start.
  • Exact time with timezone. "This morning" is useless; "2026-05-03 09:14:32 BST" lets support pull logs for that exact moment.
  • request_id values from your logs. Every reputable API returns these on every response. Each one is a thread support can pull on. Include 3-5 from the failing window if you have them.
  • Sample request and response payloads with secrets redacted. Not a description of the request — the actual request. With actual headers (redacting auth values), actual body, actual response, actual status code.
  • Environment. Production or test; which API key (by prefix, not full value); which region.
  • Frequency. Every call? Intermittent (1 in 100)? One-off? This routes the investigation differently.
  • Recent changes on your side. Deploys, config updates, library upgrades, infra changes. "Nothing changed and it just stopped working" is rarely true and often points at the cause once examined.
  • What you've already tried. Saves support repeating obvious checks.

The good support ticket template

Copy-paste this template for any provider issue

Summary: [one-line description of the issue]

Expected: [what should happen]
Actual: [what is happening]

Reproduction:
- Time of last occurrence: [YYYY-MM-DD HH:MM:SS timezone]
- Frequency: [every call / intermittent X% / one-off]
- Steps to reproduce: [if reliably reproducible]

Environment:
- Production or test: [production/test]
- API key prefix: [first 8 characters]
- Region: [eu-west, us-east, etc.]

Evidence:
- request_id values: [id1, id2, id3]
- call_id / event_id values: [id1, id2]
- Sample request: [redacted curl or HTTP block]
- Sample response: [actual response body]

Recent changes:
- [any recent deploys, config updates, library upgrades]

Already tried:
- [things you've ruled out]

What slows support down

  • Vague descriptions without IDs or timestamps.
  • Screenshots of code instead of pasted text (can't be searched or grep'd).
  • Multiple unrelated issues bundled into one ticket.
  • Asking "is there an outage" when the provider's status page exists.
  • Reposting an unchanged ticket every few hours pinging "any update?". The notification noise actively delays the queue.

When to escalate

Most providers have escalation paths for genuine emergencies: production-down, security incidents, financial impact at scale. Use these channels (typically marked "urgent" or via a phone number) only when the situation truly fits — abusing them dilutes the signal for the next genuine incident.

The tickets that get resolved in hours have one thing in common: they read like a debugging log entry, not a conversation. Headings, bullet points, IDs, timestamps, payloads. The tickets that take days have prose paragraphs, no IDs, and "it's broken" energy. The investment to write a good ticket is 5 minutes of your time and saves multiple back-and-forth round trips. Always worth it.

Troubleshooting FAQs

The questions our customers ask most often when something has gone wrong.

How do I systematically debug a voice AI failure?

Identify the symptom precisely first — is it audio (no sound, choppy, distorted), recognition (wrong words, cut off, slow), TTS (wrong voice, mispronunciation, robotic), data (wrong CRM info, missing webhook), or timing (everything works but feels sluggish). Then isolate to a single layer of the stack: network and SIP transport, audio codec, voice recognition (ASR), LLM, TTS, webhook delivery, or API integration. Each layer has its own diagnostic tools — SIP traces for transport issues, audio capture for codec issues, transcript inspection for ASR, request/response logs for webhooks and API. Once you know which layer, the troubleshooting is dramatically narrower and the fix becomes obvious.

Why is the audio one-way during my voice calls?

One-way audio is almost always a NAT or firewall problem in the RTP path. SIP signalling negotiates one set of IPs and ports; RTP media flows on different ports that the firewall has not opened. Common fixes: enable SIP ALG on the firewall (or disable it if it is mangling SIP messages incorrectly — both are common), open the RTP port range your provider uses (typically 10000-20000 UDP), use STUN/TURN to discover and traverse NAT, or use a session border controller (SBC) that handles the NAT traversal centrally. For WebRTC the same problem manifests as ICE connection failure — the same fixes apply, with TURN being the most reliable fallback.

Why is voice recognition failing or returning wrong words?

Three common causes. First, audio quality — if the input is heavily compressed (G.711 narrowband, mu-law) or noisy, ASR accuracy drops sharply. Capture a sample and play it back — if you cannot easily understand it, the ASR cannot either. Second, language and domain mismatch — your provider may default to general English while your callers use industry vocabulary or strong accents. Configure custom vocabulary and the right language code. Third, sample rate or encoding mismatch — sending 8kHz audio to a model expecting 16kHz, or sending mu-law where the API expects PCM, causes silent recognition failure. Verify your audio format against the provider's requirements.

Why are my webhooks not arriving?

Five things to check in order. First, your endpoint URL must be HTTPS, publicly reachable, and respond to the sender within 5-10 seconds. Use ngrok or curl from outside your network to confirm reachability. Second, signature verification — if you reject malformed signatures the sender records a 401 and may stop retrying after exhausting retries. Third, your handler must return 2xx for accepted events; 4xx tells the sender to give up, 5xx triggers retries with backoff. Fourth, check the sender's webhook log dashboard for delivery attempts — most providers show every delivery, the response code received, and any errors. Fifth, network egress from the sender to your endpoint — intermittent connectivity issues, DNS problems, or expired TLS certificates can cause silent failures.

Why am I getting 429 Too Many Requests on the API?

You are exceeding the rate limit your plan allows. Check the response headers — X-RateLimit-Limit shows your cap, X-RateLimit-Remaining shows quota left, X-RateLimit-Reset shows when the window resets, and Retry-After tells you how long to wait. Three fixes. First, implement exponential backoff with jitter on 429 responses so you do not retry-storm against the limit. Second, self-throttle proactively by checking remaining quota before bursty operations. Third, if the limit is genuinely too low for your workload, upgrade plan or contact support to discuss a higher cap. Most rate limit hits are caused by missing backoff on retries rather than legitimately exceeding the cap.

Why does my voice AI feel slow even though everything works?

End-to-end latency is the sum of every stage: audio transport, voice activity detection, ASR endpointing, LLM time-to-first-token, TTS time-to-first-audio, and audio playback. The user-perceived gap from when they stop speaking to when the AI starts speaking should be under 1 second for the conversation to feel natural. Profile each stage independently with timestamps in your logs. Common culprits: ASR endpointing too slow (cuts off after 800ms+ of silence), LLM not streaming (waits for full response before starting TTS), TTS using batch instead of streaming (full generation before any audio plays), or geographic distance to providers (US-East TTS for UK calls adds 100ms round-trip). Most slow voice AI is a fixable pipeline issue, not a fundamental limit.

What does ICE connection failed mean in WebRTC?

ICE (Interactive Connectivity Establishment) is how WebRTC peers discover a network path to each other through NAT and firewalls. ICE failed means none of the candidate paths worked — direct connection, STUN-discovered public address, and TURN relay all failed. Common causes: TURN server not configured (so no fallback when direct path fails), corporate firewall blocking UDP entirely (use TCP TURN as fallback), or browser permission denied for microphone/camera (stops ICE before it can complete). The single most reliable fix is configuring TURN servers with both UDP and TCP listeners — this works through almost any firewall that allows outbound HTTPS.

What information should I include when contacting support?

Include enough detail that support can reproduce or trace your issue without back-and-forth. Minimum: a precise description of expected vs actual behaviour, the exact time (with timezone) the issue occurred, request_id or call_id from your logs, the API endpoint or webhook event involved, sample request and response payloads with secrets redacted, your environment (production or test), how often the issue happens (every time, intermittent, one-off), and any recent changes on your side (new deploys, config updates). The single most important field is the request_id — support can look up exactly what their server saw and tell you precisely what went wrong, instead of guessing from your description.

Continue Reading

Once you've narrowed down which layer the issue is at, dive into the deep guide for that subsystem:

Audio Codecs → SIP Protocol Basics → WebRTC Integration → Voice Recognition Setup → TTS Configuration → Webhook Integration → API Documentation →