SIP (Session Initiation Protocol) is a text-based application-layer signalling protocol used to set up, modify and terminate real-time communication sessions over IP networks. In plain English: SIP is what tells two endpoints — phones, soft clients, voice AI systems, conferencing servers — that they want to talk to each other, on what terms, and when to hang up. It is defined in RFC 3261 (June 2002), with dozens of subsequent RFCs extending it.
SIP is the signalling, not the audio
One of the most common misunderstandings about SIP is that it carries the voice. It does not. SIP only carries the negotiation: who is calling whom, what codecs are supported, where to send the media, when to terminate. Once the call is set up, the actual audio (or video) flows over a separate protocol called RTP (Real-time Transport Protocol), usually on dynamically negotiated UDP ports. Think of SIP as the call-setup protocol and RTP as the call itself.
A simple analogy: SIP is to VoIP what HTTP is to the web
HTTP is text-based, request-and-response, stateless, and every modern web framework speaks it. SIP is text-based, request-and-response, has its own URI scheme (sip: and sips:), and every VoIP system speaks it. The similarity is not accidental — SIP was deliberately modelled on HTTP and SMTP so that it would feel familiar to engineers and be easy to debug by reading the wire.
Where SIP sits in the VoIP stack
- SIP — signalling: setup, modification, teardown
- SDP (Session Description Protocol) — carried inside SIP messages, describes media (codecs, IP addresses, ports)
- RTP / RTCP — the actual audio and video packets, plus control statistics
- SRTP / DTLS-SRTP — encrypted variants of RTP for secure media
- STUN / TURN / ICE — NAT traversal helpers (mandatory in WebRTC, optional but common in SIP)
A typical voice call uses several of these protocols at once: SIP carries the signalling; SDP inside SIP describes the codec and media endpoints; RTP carries the actual voice packets. SIP only sees the start and end of the call — the media flow in the middle bypasses SIP entirely.