Team-Connect Engineering Guide · Updated 3 May 2026

SIP Protocol Basics

A practical engineer's guide to Session Initiation Protocol — how SIP signalling sets up VoIP calls, what every request method and response code means, how SIP trunking works, and where SIP fits next to WebRTC in 2026.

RFC 3261 compliant · Covers UDP, TCP & TLS transports · Includes SIP trunking + WebRTC
Jump to a section

01What is SIP?

SIP (Session Initiation Protocol) is a text-based application-layer signalling protocol used to set up, modify and terminate real-time communication sessions over IP networks. In plain English: SIP is what tells two endpoints — phones, soft clients, voice AI systems, conferencing servers — that they want to talk to each other, on what terms, and when to hang up. It is defined in RFC 3261 (June 2002), with dozens of subsequent RFCs extending it.

SIP is the signalling, not the audio

One of the most common misunderstandings about SIP is that it carries the voice. It does not. SIP only carries the negotiation: who is calling whom, what codecs are supported, where to send the media, when to terminate. Once the call is set up, the actual audio (or video) flows over a separate protocol called RTP (Real-time Transport Protocol), usually on dynamically negotiated UDP ports. Think of SIP as the call-setup protocol and RTP as the call itself.

A simple analogy: SIP is to VoIP what HTTP is to the web

HTTP is text-based, request-and-response, stateless, and every modern web framework speaks it. SIP is text-based, request-and-response, has its own URI scheme (sip: and sips:), and every VoIP system speaks it. The similarity is not accidental — SIP was deliberately modelled on HTTP and SMTP so that it would feel familiar to engineers and be easy to debug by reading the wire.

Where SIP sits in the VoIP stack

  • SIP — signalling: setup, modification, teardown
  • SDP (Session Description Protocol) — carried inside SIP messages, describes media (codecs, IP addresses, ports)
  • RTP / RTCP — the actual audio and video packets, plus control statistics
  • SRTP / DTLS-SRTP — encrypted variants of RTP for secure media
  • STUN / TURN / ICE — NAT traversal helpers (mandatory in WebRTC, optional but common in SIP)

A typical voice call uses several of these protocols at once: SIP carries the signalling; SDP inside SIP describes the codec and media endpoints; RTP carries the actual voice packets. SIP only sees the start and end of the call — the media flow in the middle bypasses SIP entirely.

Why this matters in 2026: Modern voice AI systems (including Team-Connect) usually terminate SIP at the network edge, then bridge into a non-SIP backend — typically WebSocket-based real-time audio — because that is easier to plug into an LLM pipeline. But the carrier-side and PSTN side of the call still speak SIP, so understanding SIP basics is essential whenever calls leave or enter a real phone network.

02How a SIP Call Works (Step by Step)

The classic SIP call flow uses a three-message handshake at setup and a two-message exchange at teardown. Here is exactly what goes on the wire when Alice calls Bob.

The three-way handshake: INVITE, 200 OK, ACK

  1. Alice's phone sends an INVITE to Bob's address. The INVITE carries an SDP body describing the codecs Alice can speak and where she wants to receive media.
  2. Bob's phone (or the proxy in front of it) responds with provisional responses — typically 100 Trying immediately, then 180 Ringing when Bob's phone starts ringing.
  3. When Bob answers, his phone sends 200 OK back to Alice. This response also carries an SDP body describing Bob's media capabilities and endpoint.
  4. Alice's phone sends ACK to confirm receipt of the 200 OK. At this moment, the call is up and RTP starts flowing in both directions.
  5. When either side hangs up, they send BYE. The other side responds with 200 OK. The call is now over.

A real INVITE message on the wire

INVITE sip:bob@biloxi.example.com SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.example.com;branch=z9hG4bK776asdhds
Max-Forwards: 70
To: Bob <sip:bob@biloxi.example.com>
From: Alice <sip:alice@atlanta.example.com>;tag=1928301774
Call-ID: a84b4c76e66710@pc33.atlanta.example.com
CSeq: 314159 INVITE
Contact: <sip:alice@pc33.atlanta.example.com>
Content-Type: application/sdp
Content-Length: 142

v=0
o=alice 53655765 2353687637 IN IP4 pc33.atlanta.example.com
s=-
c=IN IP4 pc33.atlanta.example.com
t=0 0
m=audio 49170 RTP/AVP 0 8 97
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 opus/48000/2

Everything above the blank line is the SIP message. Everything below it is the SDP body describing the media offer. Notice how readable it is — SIP was designed so engineers can debug it by eye. The m=audio 49170 RTP/AVP 0 8 97 line says "I want to receive audio on port 49170, and I support PCMU (codec 0), PCMA (codec 8) and Opus (dynamic payload type 97)".

The 200 OK response from Bob

SIP/2.0 200 OK
Via: SIP/2.0/UDP pc33.atlanta.example.com;branch=z9hG4bK776asdhds
To: Bob <sip:bob@biloxi.example.com>;tag=a6c85cf
From: Alice <sip:alice@atlanta.example.com>;tag=1928301774
Call-ID: a84b4c76e66710@pc33.atlanta.example.com
CSeq: 314159 INVITE
Contact: <sip:bob@192.0.2.4>
Content-Type: application/sdp
Content-Length: 131

v=0
o=bob 2890844730 2890844730 IN IP4 host.biloxi.example.com
s=-
c=IN IP4 host.biloxi.example.com
t=0 0
m=audio 3456 RTP/AVP 97
a=rtpmap:97 opus/48000/2

Bob's 200 OK accepts the call and picks one codec from Alice's offer (Opus, payload type 97). Audio will now flow from Alice's port 49170 to Bob's port 3456, and from Bob's port 3456 back to Alice's port 49170 — bypassing both SIP servers.

Why three messages and not two?

The third message (ACK) exists because the 200 OK is the only SIP final response that needs reliable, end-to-end confirmation in a stateless network. Provisional responses (1xx) and other final responses (3xx-6xx) are confirmed by the transaction layer automatically. INVITE/200 OK is the exception — ACK confirms the offer/answer media negotiation is complete and locked in. If Alice never sends ACK, Bob retransmits 200 OK every 500ms and eventually tears down the call.

Call termination: BYE

A BYE from Alice

BYE sip:bob@192.0.2.4 SIP/2.0
Via: SIP/2.0/UDP pc33.atlanta.example.com;branch=z9hG4bKnashds7
Max-Forwards: 70
From: Alice <sip:alice@atlanta.example.com>;tag=1928301774
To: Bob <sip:bob@biloxi.example.com>;tag=a6c85cf
Call-ID: a84b4c76e66710@pc33.atlanta.example.com
CSeq: 231 BYE
Content-Length: 0

Bob's phone responds with 200 OK and the call is over. Note that the Call-ID matches the original INVITE — that's how everyone in the path knows this BYE belongs to that earlier call.

03SIP Request Methods

SIP defines a set of request methods, similar to how HTTP has GET, POST, PUT and DELETE. The original SIP specification (RFC 3261) defines six methods; subsequent RFCs added more. Here is the complete working set you will encounter in any modern SIP deployment.

MethodRFCWhat it doesTypical use
INVITE3261Initiate a session (call)Starting any voice/video call
ACK3261Acknowledge a final response to INVITECompletes the INVITE handshake
BYE3261Terminate an established sessionHanging up
CANCEL3261Cancel a pending request (usually INVITE)Caller hangs up before callee answers
REGISTER3261Bind a SIP URI to one or more contact addressesPhone tells the registrar where to find it
OPTIONS3261Query the capabilities of a server or user agentHealth checks, capability discovery
INFO6086Send mid-dialog information without changing session stateDTMF signalling (legacy), application data
UPDATE3311Update session parameters before the dialog is fully establishedChanging codec or hold state during early media
REFER3515Ask the recipient to issue a request (typically INVITE) to a third partyCall transfer (attended and blind)
SUBSCRIBE6665Request notifications about an event packagePresence ("is Bob online?"), message-waiting indication
NOTIFY6665Send the notification matching a SUBSCRIBE"Bob is now busy", "you have 3 voicemails"
PRACK3262Reliably acknowledge a provisional (1xx) responseReliable early media (e.g. ringback tones from carriers)
MESSAGE3428Send a one-shot text message (like SMS)Pager-style instant messaging
PUBLISH3903Publish event state to a serverPushing presence updates

For a basic call, you only need INVITE, ACK and BYE, plus REGISTER if endpoints need to advertise their location, and CANCEL if the caller hangs up before the callee answers. The other methods are layered features — you can run a working SIP system without ever sending UPDATE, REFER, MESSAGE or SUBSCRIBE.

The four methods you absolutely must understand

  • INVITE — starts the call. Contains an SDP offer.
  • REGISTER — tells the registrar "if anyone wants to call sip:alice@atlanta.example.com, here is the IP and port where she is reachable right now". Sent at startup and periodically refreshed (typically every 60-3600 seconds).
  • ACK — the third leg of the INVITE handshake. Without it, the callee will keep retransmitting the 200 OK forever.
  • BYE — ends the call. Either party can send it. The other party MUST respond with 200 OK.

04SIP Response Codes

SIP uses three-digit response codes split into six classes, modelled directly on HTTP. The first digit is the class; the remaining two digits are the specific reason. As with HTTP, you should treat the class as the primary signal and the specific code as the detail.

The six response classes

ClassNameMeaning
1xxProvisionalRequest received, processing continues. Not final.
2xxSuccessThe request was successfully accepted.
3xxRedirectionFurther action needed; try a different URI.
4xxClient ErrorThe request was bad, retry might work but only if changed.
5xxServer ErrorThe server failed to fulfil a valid request.
6xxGlobal FailureDefinitive failure — no other server will succeed either.

The codes you will see most often

CodeReason PhraseWhat it means in practice
100TryingThe proxy received your INVITE and is working on it. Stop retransmitting.
180RingingThe callee's phone is ringing. Play local ringback to the caller.
183Session ProgressEarly media (carrier ringback, "the number you have called") is now flowing. Don't generate local ringback.
200OKSuccessful response. For INVITE, the callee answered. Send ACK.
301Moved PermanentlyThe user has moved — use the URI in the Contact header instead.
302Moved TemporarilyLike 301 but the move isn't permanent (call forwarding).
401UnauthorizedThe registrar wants you to authenticate. Resend with Authorization header.
403ForbiddenThe server understood you but refuses. Wrong account, no balance, blocked destination.
404Not FoundThe URI does not exist on this server.
407Proxy Authentication RequiredLike 401 but from a proxy. Resend with Proxy-Authorization header.
408Request TimeoutThe server didn't get a response from the next hop in time. Often a NAT or firewall issue.
480Temporarily UnavailableThe user is logged in but not accepting calls right now. Voicemail typically lives behind this.
486Busy HereThe specific contact is busy on another call. Try other locations or voicemail.
487Request TerminatedThe INVITE was cancelled (the caller sent CANCEL before it was answered).
488Not Acceptable HereCodec mismatch — offer/answer negotiation failed. The caller offered codecs the callee cannot speak.
491Request PendingGlare condition — both sides sent INVITE/UPDATE simultaneously. Retry after a delay.
500Server Internal ErrorThe server crashed mid-request. Retry might work.
503Service UnavailableThe server is temporarily overloaded or down for maintenance.
504Server Time-outThe server tried to reach a downstream server and got nothing back.
600Busy EverywhereDefinitively busy at all locations — no point trying alternates.
603DeclineThe user actively rejected the call.
604Does Not Exist AnywhereThis URI is gone for good.
606Not AcceptableLike 488 but globally — no point trying other branches.
Reading SIP responses for triage: 4xx is your fault as the caller (bad credentials, bad codec, blocked). 5xx is the server's fault (try retrying). 6xx is a definitive global "no" — do not retry, do not try alternate routes. 1xx and 2xx are normal flow. 3xx means follow the redirect.

05SIP Architecture (User Agents, Proxies, Registrars)

SIP describes a network of cooperating components, each with a defined role. Understanding what each one does — and where the boundaries are — is essential when reading SIP traces or designing a deployment.

User Agent (UA)

A user agent is any SIP endpoint that originates or terminates calls. A desk phone is a UA, a softphone app is a UA, a voice AI bot is a UA, a PSTN gateway is a UA on its IP-facing side. Within a single transaction, a UA acts as one of two things:

  • UAC (User Agent Client) — the side that sends the request. The caller's phone is a UAC when it sends INVITE.
  • UAS (User Agent Server) — the side that receives the request and responds. The callee's phone is a UAS for that same INVITE.

Roles flip per request. When the callee later sends BYE, that same phone becomes a UAC and the original caller becomes a UAS. Always think of UAC/UAS as transaction-scoped roles, not endpoint identities.

Registrar

The registrar is a server that accepts REGISTER requests and stores a mapping from a SIP address-of-record (e.g. sip:alice@atlanta.example.com) to one or more current contact URIs (e.g. sip:alice@192.0.2.5:5060). When someone wants to call Alice, the proxy queries this mapping to figure out where she actually is right now. Registrations expire (usually 60-3600 seconds) and need to be refreshed — if Alice's phone goes offline, her registration eventually times out and calls to her start hitting voicemail or 404.

Proxy server

A SIP proxy is a routing element. It receives requests, decides where they should go next, and forwards them. Proxies do not generate requests of their own and do not terminate dialogues — they just sit in the middle and route. Two flavours:

  • Stateless proxy — forwards each message and immediately forgets about it. Fast and scalable but cannot retransmit.
  • Stateful proxy — remembers the transaction state, can retransmit on packet loss, can fork requests to multiple destinations and pick the best response. The realistic default.

Redirect server

Instead of forwarding a request itself, a redirect server returns a 3xx response telling the client where to go. Useful for offloading routing decisions to the client. Less common than proxies in modern deployments.

Back-to-Back User Agent (B2BUA)

A B2BUA is a server that acts as a UAS on one side and a UAC on the other — it terminates the incoming call and originates a new outgoing call to the destination. To the caller it looks like the callee; to the callee it looks like the caller. B2BUAs are the foundation of most modern SIP applications: PBXs, IVRs, voice AI gateways, billing systems, and conference bridges are all B2BUAs because they need to do something with the call (record it, modify it, charge for it, transcribe it) that a pure proxy cannot.

Session Border Controller (SBC)

An SBC is a specialised B2BUA placed at the network edge. It terminates SIP and RTP from the outside world, applies security and policy rules, then re-originates them into the trusted network. Every serious SIP deployment has SBCs at every public boundary — they handle NAT traversal, topology hiding, encryption termination, codec transcoding, denial-of-service protection, and toll-fraud filtering. Running a SIP server directly on the public internet without an SBC in front of it is asking for trouble.

Real-world deployment: A typical Team-Connect call path looks like this: customer phone → carrier SIP trunk → SBC at the network edge → B2BUA running the voice AI logic → WebSocket bridge → LLM. The SBC handles security and NAT; the B2BUA handles the actual application logic. SIP only exists between the carrier and the B2BUA; everything past that point speaks WebSocket and JSON.

06SIP and SDP (Session Description Protocol)

SIP carries the signalling. SDP (Session Description Protocol, RFC 8866) carries the description of what media will flow once signalling completes. SDP is a separate protocol but in practice it is always seen embedded in SIP messages as the message body.

The offer/answer model

SIP uses an "offer/answer" pattern with SDP, defined in RFC 3264. The caller's INVITE contains an SDP offer listing the codecs and media endpoints it supports. The callee's 200 OK contains an SDP answer selecting which of those it will use. After this exchange, both sides know exactly where to send their media and what codec to encode it with.

An SDP body decoded

A typical SDP offer

v=0
o=alice 53655765 2353687637 IN IP4 pc33.atlanta.example.com
s=-
c=IN IP4 pc33.atlanta.example.com
t=0 0
m=audio 49170 RTP/AVP 0 8 97
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 opus/48000/2
a=fmtp:97 maxplaybackrate=16000;stereo=0;useinbandfec=1
a=sendrecv

Reading line by line:

  • v=0 — SDP version (always 0).
  • o=alice ... — origin: who created the session, session ID, version, network address.
  • s=- — session name (a dash is conventional for "no name").
  • c=IN IP4 pc33... — connection info: where to send media (IPv4 address).
  • t=0 0 — timing: 0 0 means "permanent / no scheduled end".
  • m=audio 49170 RTP/AVP 0 8 97 — media line: audio, port 49170, profile RTP/AVP, payload types 0, 8 and 97 in priority order.
  • a=rtpmap:0 PCMU/8000 — payload type 0 maps to PCMU at 8000 Hz (G.711 µ-law).
  • a=rtpmap:8 PCMA/8000 — payload type 8 maps to PCMA at 8000 Hz (G.711 a-law).
  • a=rtpmap:97 opus/48000/2 — payload type 97 maps to Opus at 48 kHz, 2 channels (Opus is dynamic, so payload type is negotiated).
  • a=fmtp:97 ... — format-specific parameters for Opus (max playback rate, mono/stereo, in-band FEC).
  • a=sendrecv — this stream is bidirectional. Other options: sendonly, recvonly, inactive (used for hold).

What the answer does

The callee's SDP answer keeps only the codecs it agrees to use, in the order it prefers. If the offer contained PCMU + PCMA + Opus and the callee only speaks Opus, the answer's m= line will list only payload type 97 and the call will run on Opus. If there is no codec in common, the callee responds with 488 Not Acceptable Here and the call fails before any media flows.

Hold and resume use SDP

To put a call on hold, the holding side sends a re-INVITE with the same SDP except a=sendrecv changed to a=sendonly (or a=inactive). The other side responds with recvonly (or inactive) and stops sending media. To resume, send another re-INVITE with a=sendrecv restored. Hold is just an SDP attribute change — no special method is needed.

07SIP Trunking for Business

SIP trunking is the service that connects a business phone system (PBX) to a telecoms provider over the internet using SIP, replacing traditional copper phone lines, ISDN30 / PRI circuits, or analogue lines. A SIP trunk is essentially a virtual phone line — or rather, several virtual phone lines bundled together as concurrent call channels.

What you actually buy when you buy a SIP trunk

  • Concurrent call channels — how many simultaneous calls can run on the trunk. A 10-channel trunk supports 10 concurrent calls; the 11th caller gets a busy tone or overflow handling.
  • DDI / DID numbers — the actual phone numbers customers dial. Direct Dial-In (UK) or Direct Inward Dial (US). One trunk usually has many numbers attached.
  • Outbound minutes — either bundled (e.g. unlimited UK landline + mobile) or pay-per-minute, often with separate rates per destination.
  • Geographic routing — some trunks support number portability; others don't. Some only support UK numbers; others are global.
  • Resilience options — failover destinations (a different trunk or a mobile number) if the primary fails.

SIP trunk vs ISDN/PRI

FeatureSIP TrunkISDN30 / PRI
TransportInternet (SIP/RTP)Dedicated copper or fibre circuit
CapacitySoftware-defined, scale up/down on demand30 channels per circuit (E1) or 23 (T1), fixed
Setup timeHoursWeeks to months
Cost (UK, typical)£3-8 per channel/month£15-40 per channel/month + line rental
FailoverTrivial — reroute to another IPRequires duplicate physical circuits
Future-proofYes — aligned with PSTN switch-off plans worldwideNo — PSTN being phased out (UK 2027, similar timelines elsewhere)

Why SIP trunks are eating ISDN

Three forces are converging. First, BT (UK) and equivalent incumbents in most countries have announced PSTN switch-off dates — the UK is targeting 2027 for ISDN and PSTN to be fully turned off. Second, SIP trunking is dramatically cheaper per channel and per minute, especially for international calls. Third, SIP trunks integrate natively with modern cloud PBXs, voice AI receptionists, CRM systems and analytics — whereas ISDN integrations require dedicated gateway hardware. By 2026 the question is no longer "should we move to SIP trunking" but "how soon".

Common SIP trunking gotchas: codec mismatch (the carrier offers G.711 only, your PBX wanted Opus), CLI presentation (your outbound number doesn't show because the carrier doesn't trust your SIP From header), and emergency call routing (999/911 needs accurate location data per channel, not just the trunk's billing address).

08SIP Security (TLS, SRTP, Authentication)

SIP was designed in an era when securing application-layer protocols was an afterthought. Plain SIP runs over UDP/5060 with no encryption, no integrity protection, and authentication that only happens on REGISTER and INVITE. Any SIP server placed on the public internet without protection will start receiving scanning traffic within minutes — this is not hyperbole.

Layer 1: SIPS and TLS for signalling

Use SIPS (SIP over TLS) on port 5061 instead of plain SIP on 5060. The URI scheme changes from sip: to sips:; the Via and Contact headers reflect TLS as transport; and all signalling messages are encrypted in transit. TLS protects against passive eavesdropping and tampering, but only between the two endpoints of each TLS hop — if there is a proxy in the middle, that proxy can still see the plaintext SIP. End-to-end signalling encryption requires every hop to be TLS.

Layer 2: SRTP for media

SRTP (Secure RTP, RFC 3711) encrypts the actual audio packets. SRTP keys are exchanged inside the SDP body using one of two mechanisms: SDES (keys in SDP, simple but only secure if SDP itself is encrypted by SIPS) or DTLS-SRTP (DTLS handshake performed in-band on the media port, the WebRTC standard). SDES + SIPS is the typical SIP combination; DTLS-SRTP is the typical WebRTC combination. SRTP without SIPS is theatre — the keys travel in plaintext SDP — so always pair them.

Layer 3: Digest authentication

SIP uses HTTP Digest authentication (RFC 8760, originally RFC 2617). The flow is:

  1. Client sends REGISTER (or INVITE) with no credentials.
  2. Server responds 401 Unauthorized (or 407 from a proxy) with a WWW-Authenticate header containing a fresh nonce.
  3. Client computes a hash of (username, password, realm, nonce, method, URI, request body) and resends the request with an Authorization header.
  4. Server validates the hash and accepts the request.

Digest auth never sends the password in cleartext — only hashes — but it has known weaknesses against offline brute-force attacks if the password is short. Always use long random passwords for SIP accounts (16+ characters, randomly generated, never reused).

Layer 4: SBC and policy enforcement

An SBC at the network edge handles the threats Digest auth and TLS cannot:

  • SIP scanning and brute force — rate-limit REGISTER attempts per source IP, blacklist known scanner ranges.
  • Toll fraud — block premium-rate destinations, cap concurrent outbound calls, alert on unusual destination patterns (sudden calls to high-cost countries at 3am).
  • Topology hiding — rewrite Via and Record-Route headers so attackers cannot map your internal network from external SIP traffic.
  • Denial-of-service — absorb floods of REGISTER or INVITE without overloading the registrar or B2BUA behind it.
The most common SIP security failure: a SIP account with a weak password (e.g. extension number used as the password, or "1234"), reachable from the public internet, leading to attackers registering as that extension and placing thousands of calls to premium-rate numbers overnight. Bills of £5,000-50,000+ are not unusual. Mitigations: long random passwords, per-account concurrent-call limits, blocking premium destinations by default, alerting on out-of-hours outbound surges.

09SIP vs WebRTC vs H.323

SIP is one of three real-time signalling stacks you will see in the wild. Choosing between them is mostly a question of where the endpoints live.

AspectSIPWebRTCH.323
Signalling formatText-based (HTTP-style)Application-defined — usually JSON over WebSocket, sometimes SIP over WebSocketBinary (ASN.1 PER encoding)
Where it runsServers, hardware phones, gatewaysBrowsers, mobile apps, embedded SDKsLegacy enterprise videoconferencing
NAT traversalExternal (SBC, manual config)Built in (ICE, STUN, TURN mandatory)External (manual or H.460)
Media transportRTP / SRTPSRTP (mandatory) over DTLSRTP / SRTP
Codec negotiationSDP offer/answerSDP offer/answer (same model)H.245 capability exchange
PSTN interopNative — the dominant standard for carriersRequires a SIP/PSTN gatewayRequires an H.323/SIP/PSTN gateway
Encryption defaultOptional (SIPS, SRTP)MandatoryOptional
Maturity2002, very mature2011, very mature1996, declining
Where you'd choose itCarriers, PBXs, voice AI back-end, SIP trunksBrowser-based calling, mobile apps, customer-facing videoLegacy MCUs, older boardroom video

The realistic 2026 picture

SIP has won the carrier and infrastructure layer — every major telecoms provider speaks SIP, every PBX speaks SIP, every PSTN gateway speaks SIP. WebRTC has won the browser and mobile-app layer — if your endpoint is a Chrome tab or an iOS app, WebRTC is the only sane choice. H.323 is in maintenance mode — it is still in production at large enterprises with sunk-cost video infrastructure, but no new deployments choose it.

In modern voice AI architectures (including Team-Connect), you typically run both SIP and WebRTC simultaneously: SIP between your carrier and your edge SBC for PSTN traffic, and WebRTC for any browser-based interface or mobile app. A B2BUA in the middle bridges the two. The user never sees the protocol; they just hear the call work.

What about SIP-over-WebSocket?

RFC 7118 defines a transport binding for SIP over WebSockets. This lets a browser-based softphone speak SIP directly to a SIP server, no gateway needed. It is real and widely supported, but in practice WebRTC's own signalling story (custom JSON over WebSockets) usually wins for new browser apps because SIP-over-WebSocket inherits SIP's NAT pain without gaining WebRTC's NAT benefits. SIP-over-WebSocket lives in two niches: legacy SIP softphones being moved to the browser, and federated business communications platforms.

10Common SIP Issues and Troubleshooting

SIP problems are usually not subtle — the failure mode is "the call doesn't work" and the root cause is one of about a dozen recurring things. Triage in this order:

Repeated 401 Unauthorized loops

The client receives 401, retries with credentials, receives another 401. Causes: wrong password, wrong realm in the Authorization header, nonce expired before client could respond, or the server's clock is too far out and the nonce timestamp validation rejects the response. Verify credentials from a known-good client first; if those work, capture the failing request and compare the realm and nonce values.

403 Forbidden on outbound calls

The trunk authenticated fine but the carrier refuses the specific destination. Common causes: destination is in a region your trunk plan doesn't cover, the called number is on a blocklist (premium-rate, mobile-while-roaming), or the From header presents an unauthorised CLI.

408 Request Timeout

SIP server sent a request to the next hop and got nothing back within the timer. Almost always a NAT or firewall issue: SIP went out, but the response cannot get back because the source port the server expects has been remapped or is blocked. Mitigations: keep-alives (OPTIONS pings every 30s), symmetric NAT-friendly transports (TCP or TLS), or an SBC with proper NAT handling.

One-way audio

Call connects, both phones say "hello", but only one direction works. Almost always RTP is the problem — signalling went through fine because SIP traversed the path, but RTP packets are being dropped by NAT or firewall in one direction. Diagnosis: open the RTP capture on each side and check whether packets actually arrive. Fix: configure the firewall to allow the negotiated RTP port range, or put an SBC in the path that pins the media to known ports.

486 Busy Here when the user is not on a call

The user agent rejected the call without ringing the user. Causes: do-not-disturb is on, the device only allows one concurrent call and it's already in one (e.g. an existing registration that didn't expire), or the codec offer was unacceptable and the device is incorrectly returning 486 instead of 488.

488 Not Acceptable Here (codec mismatch)

The offer/answer negotiation failed. The caller offered codecs the callee cannot speak. Diagnosis: read the SDP body of the INVITE and check what payload types are listed. Fix: ensure the caller's offer includes at least one codec the callee supports — usually adding G.711 (PCMU + PCMA) as a fallback solves 90% of these because every SIP endpoint speaks G.711.

Registration suddenly stopping

Phone was registered, calls were working, now nothing. Causes: the registration expired and the phone is not refreshing (firmware issue, network issue), or the registrar restarted and the phone hasn't tried to re-register yet. Quick test: from the phone, force a manual re-register; if that works, the registration interval is too long for your network. Reduce to 60-300 seconds.

NAT and "phantom" registrations

The phone registers from behind NAT. The Contact header inside the REGISTER says 192.168.1.50 (the phone's private IP). The registrar dutifully stores that. When someone calls, the proxy tries to send INVITE to 192.168.1.50 and it goes nowhere. Fix: enable rport (RFC 3581) and "force NAT detection" on the registrar so it stores the public IP it actually saw, not the private IP the phone claimed. Modern SIP servers default this on; legacy ones don't.

Capture, then think: the single most useful SIP debugging skill is being able to read a packet capture (Wireshark or sngrep). 95% of SIP problems are diagnosable by following one full transaction in a capture. Tools like sngrep render SIP traffic as ladder diagrams which makes the call flow obvious at a glance. Get comfortable with these before adding logging anywhere else.

SIP Protocol FAQs

The questions our voice AI customers ask most often when their calls cross a SIP boundary.

What is SIP in simple terms?

SIP (Session Initiation Protocol) is a text-based signalling protocol used to set up, modify and terminate real-time communication sessions over IP networks — most commonly voice and video calls. It is to VoIP what HTTP is to the web: SIP carries the negotiation and control messages, while the actual audio is carried separately by RTP. SIP is defined in RFC 3261 and runs by default on UDP/TCP port 5060, or TLS port 5061 for encrypted signalling.

What port does SIP use?

SIP uses port 5060 for unencrypted signalling over UDP or TCP, and port 5061 for SIPS (SIP over TLS). The actual media (audio/video) does not use these ports — it flows separately over RTP, typically on dynamically negotiated UDP ports in the range 16384–32767. Firewalls need to allow both the SIP signalling port and the RTP media port range.

What is the difference between SIP and VoIP?

VoIP (Voice over IP) is the broad concept of carrying voice calls over IP networks rather than the traditional phone network. SIP is one specific signalling protocol used to make VoIP work — it sets up, manages and tears down the calls. VoIP also uses RTP to carry the actual audio, plus codecs to compress it. So SIP is part of VoIP, not the same thing. Other VoIP signalling protocols exist (like H.323 or proprietary systems) but SIP is the dominant standard.

Is SIP secure?

SIP can be secure if configured correctly. Plain SIP over UDP/5060 is unencrypted and easy to intercept. For security use SIPS (SIP over TLS) on port 5061 for the signalling, plus SRTP (Secure RTP) for the media. Add Digest authentication on registration and per-request, and place the SIP server behind a session border controller (SBC) to mitigate scanning, brute-force and toll-fraud attempts. Default SIP installations are a known target for attackers — never run an unsecured SIP server on the public internet.

What is SIP trunking?

SIP trunking is a service where a business connects its phone system (PBX) to a telecoms provider over the internet using SIP, instead of using traditional copper phone lines or ISDN/PRI circuits. A SIP trunk is essentially a virtual phone line that carries SIP signalling and RTP media to and from the carrier. SIP trunks are sold by concurrent call channels (e.g. 10-channel trunk = 10 simultaneous calls), come with assigned phone numbers (DDIs/DIDs), and are typically much cheaper per minute than legacy PRI lines.

How does SIP differ from WebRTC?

SIP is a signalling protocol that runs between dedicated SIP infrastructure (PBXs, soft switches, phones, gateways). WebRTC is a browser-native real-time communications stack with no defined signalling protocol — applications choose their own. WebRTC uses ICE/STUN/TURN for NAT traversal which makes it more browser-friendly, while SIP traditionally needs an SBC to handle NAT. In practice many modern systems use both: SIP between servers and PSTN gateways, WebRTC for browser-side endpoints, with a gateway translating between them.

What is the most important SIP request method?

INVITE is the most important SIP method — it initiates a session (a call). REGISTER is also fundamental because it tells the registrar where a user is currently reachable. ACK and BYE round out the core set: ACK confirms a final response to INVITE, and BYE terminates the session. Together INVITE/ACK/BYE form the basic call lifecycle. Other methods (CANCEL, OPTIONS, REFER, INFO, UPDATE, MESSAGE, SUBSCRIBE, NOTIFY, PRACK) handle specific scenarios like cancellation, capability discovery, transfers and presence.

Why am I getting 401 Unauthorized in SIP?

401 Unauthorized is the SIP server telling your client that authentication is required. It is not an error in the strict sense — it is the first half of SIP Digest authentication. The server sends 401 with a WWW-Authenticate header containing a nonce; your client should resend the same request with an Authorization header containing the nonce, your username and a hash of credentials. If you keep getting 401 after authenticating, your username, password or realm is wrong — or the nonce has expired and your client isn't handling the retry correctly. 407 Proxy Authentication Required is the same thing but from a proxy rather than the registrar.

Continue Reading