IPsec Explained: Tunnel/Transport Modes and IKE thumbnail

IPsec Explained: Tunnel/Transport Modes and IKE

⏱ approx. 23 min views 53 likes 0 LOG_DATE:2026-05-10
TOC

IPsec (IP Security) #

IPsec (IP Security, RFC 4301 et al.) is a family of protocols that encrypts and authenticates the IP packet itself at L3. Where TLS and SSH are per-application crypto layers added by the app, IPsec sits inside the OS network stack on top of IP, so every TCP / UDP / ICMP flow gets protected without rewriting any application. That property is why "IPsec is the de facto standard for Site-to-Site VPN" and why "the iPhone's Always-On VPN is also IPsec."

"IPsec" isn't a single protocol — it's the set of three actors AH / ESP / IKE, plus two ways to use it (Transport mode / Tunnel mode), plus a state-management vocabulary (SA / SPD / SAD / SPI) on top. Too many terms to grasp at once is the first hurdle in learning IPsec, so this piece draws that map first and then drops into the concrete: how the packet actually transforms and how IKE reaches the keys in two round trips. Protocol-comparison and WireGuard contrasts live in the VPN article — this one stays inside IPsec.

1. The map of IPsec — three protocols #

IPsec consists of three main protocols with clean separation of duty:

Acronym Full name Role IP proto / port
AH Authentication Header (RFC 4302) Authentication + integrity only (no encryption) IP proto 51
ESP Encapsulating Security Payload (RFC 4303) Encryption + authentication + integrity IP proto 50
IKE Internet Key Exchange (RFC 7296, IKEv2) Key exchange + authentication + SA management UDP 500 (UDP 4500 under NAT-T)

AH is essentially unused today, for two reasons. (1) It can't traverse NAT — it authenticates the source IP too, so any NAT rewrite breaks the integrity check. (2) It doesn't encrypt, so its role overlaps with AEAD ESP, which delivers encryption and authentication in one step. Modern IPsec ≈ ESP + IKE; that frame is right 99% of the time.

Each SA (Security Association) — "this direction's encrypted stream" — is identified by an SPI (Security Parameter Index, 32-bit), and that number rides at the head of every ESP packet. The receiver uses it to look up the SAD (SA Database) and recover "which key, where in the sequence space."

The decision "should this flow be IPsec-protected, sent in the clear, or dropped?" lives in the SPD (Security Policy Database). When SPD says "protect," the stack pulls the matching SA from SAD; if there isn't one, it kicks IKE to make one — that loop is the heart of IPsec.

2. Two modes and the contents of the ESP packet #

The same ESP can be used in two ways that differ by what is encrypted: Transport mode and Tunnel mode. The use case differs, but so does the resulting packet shape.

ESP — packet transforms in Transport mode and Tunnel mode Same ESP protocol, different "what to encrypt" → different packet shape ▼ Original packet (before IPsec) IP hdr 10.0.0.5→.20 TCP hdr 443 App data ▼ Transport mode — original IP header stays; only the payload is wrapped in ESP Use: host ⇔ host point-to-point encryption (e.g. inside L2TP/IPsec, Windows AuthIP) IP hdr (orig) proto=ESP(50) ESP hdr SPI / Seq# [ TCP hdr + App data (encrypted) ] ESP trail pad / next ICV AEAD tag ▲ Auth covers: ESP hdr through right before ICV / Encrypt covers: TCP hdr + App data + ESP trailer ▼ Tunnel mode — wrap the whole original packet inside a new IP header (= VPN) Use: Site-to-Site VPN, remote-access VPN — the canonical "IPsec as a VPN" shape Outer IP hdr (new) VPN GW → VPN GW ESP hdr SPI / Seq# [ Original IP hdr + TCP hdr + App data (encrypted) ] ESP trail pad / next ICV AEAD tag ▲ The original IP hdr is encrypted too, so externally only "ESP traffic between VPN GWs" is visible ▼ Under NAT-T — wrap ESP inside UDP (for NAT traversal) Outer IP hdr proto=UDP(17) UDP hdr dst=4500 [ ESP packet (= the entire Tunnel mode packet above) ] Proto 50 doesn't traverse NAT, so we wrap it in UDP/4500 — that is NAT-T (RFC 3948)

The takeaways:

  • Transport mode — the original IP header runs on the wire as-is; everything from the TCP header onward is wrapped in ESP and encrypted. The shape used when hosts speak directly (Windows-domain IPsec, the inner half of L2TP).
  • Tunnel mode — the entire original packet (including its IP header) is wrapped in a new IP header. The new header points between two VPN GWs, so externally the only thing visible is "ESP traffic between VPN GWs." This is the canonical IPsec-as-VPN shape.
  • Encrypted range and authenticated range are different. AEAD (AES-GCM, ChaCha20-Poly1305) handles both in one pass, but internally the rule is "encryption covers the payload, authentication covers ESP header + payload + trailer."
  • An ICV (Integrity Check Value) is appended at the end. It's the AEAD authentication tag — if it doesn't match, the receiver silently drops the packet (tamper detection).

ESP's sequence number is not for retransmission or reordering (the inner TCP does that). It's for anti-replay — receivers refuse anything outside the window (RFC 4303 §3.4.3).

3. SA / SPD / SAD / SPI — the state-management vocabulary #

Roughly half of IPsec is "once you've negotiated keys, how do you manage them?" The terms come at you fast, so worth a single pass:

  • SA (Security Association) — the "contract" of a one-way encrypted stream. A bundle of key / cipher / mode / sequence number / lifetime. Bidirectional traffic needs two SAs (one in, one out).
  • SPI (Security Parameter Index) — the 32-bit identifier of an SA. Lives at the head of every ESP packet. Receivers use it to find "which of my SAs is this?"
  • SAD (Security Association Database) — the live registry of currently established SAs. On Linux, ip xfrm state peeks into it.
  • SPD (Security Policy Database) — the rule table for "source X / dest Y / proto Z → protect with IPsec / send in the clear / drop." On Linux, ip xfrm policy.

Walked through one session:

  1. App sends to 10.0.0.5 → 10.0.0.20 on TCP/443
  2. Kernel consults SPD → "this flow is protect"
  3. Is there a matching SA in SAD? → If not, fire IKE
  4. IKE completes the key exchange → registers two SAs in SAD
  5. Outbound SA wraps the packet in ESP → sent
  6. Receiver parses the SPI → looks up SAD → decrypts → re-checks SPD ("is this legitimate for this flow?")
# See current SAs / SPs on Linux (ip-xfrm)
sudo ip xfrm state                    # SAD (key, SPI, cipher alg, seq#)
sudo ip xfrm policy                   # SPD (which flows to protect)

# Check strongSwan state
sudo swanctl --list-sas               # Current IKE_SAs and Child_SAs
sudo swanctl --list-conns             # Configured connection definitions

The fields shown by ip xfrm statespi, auth/encrypt alg, replay-window, lifetime — are exactly the SAD fields described above. Most "IPsec is up but no traffic flows" diagnoses end at "only one of SPD/SAD is set" or "only one direction matched."

4. The IKEv2 handshake — two round trips to the keys #

IKE (Internet Key Exchange) is the key-negotiating half of IPsec. The reason IPsec keeps up with modern crypto requirements is IKEv2 (RFC 7296, 2014), the rebuilt second generation. It reaches an IKE SA + a Child SA (= ESP keys) in just two round trips (4 messages), wiping out the verbosity of IKEv1 (Main Mode 3 RTT + Quick Mode 1.5 RTT).

IKEv2 — two round trips to an IKE SA + a Child SA (ESP keys) Starts on UDP/500 → switches to UDP/4500 if NAT is detected → continues on the same port Initiator (Client) Responder (VPN GW) ▼ Round trip 1: IKE_SA_INIT (cleartext — DH key exchange) HDR, SAi1, KEi, Ni Proposed ciphers (SAi1) / my DH public (KEi) / nonce (Ni) HDR, SAr1, KEr, Nr [, CERTREQ] Accepted cipher (SAr1) / their DH public (KEr) / nonce (Nr) ▲ Both sides now compute SKEYSEED → IKE SA keys exist (PFS achieved) ▼ Round trip 2: IKE_AUTH (encrypted — auth + Child SA creation) {HDR, IDi, [CERT,] AUTH, SAi2, TSi, TSr} My ID / cert / AUTH payload / Child SA proposal / traffic selectors (TSi/TSr) {HDR, IDr, [CERT,] AUTH, SAr2, TSi, TSr} Their ID / cert / AUTH / accepted Child SA / traffic selectors ▲ IKE SA + Child SA (= 2 ESP keys) are now complete → ESP data transfer can begin ▼ Onward: ESP carries app data (over UDP/4500 if NAT-T is in effect) ESP packet (SPI=X, Seq=1, encrypted) Inner content is the original IP packet (Tunnel mode) or TCP/UDP payload (Transport mode) ▼ As needed: CREATE_CHILD_SA (rekey or extra tunnel) {HDR, SAi, Ni, [KEi,] TSi, TSr} Switch to fresh keys before SA lifetime expires (rekey) — without dropping traffic IKEv1's Main+Quick took 9 messages; IKEv2 compresses the same outcome into 4

Things to keep in mind:

  • Round trip 1 (IKE_SA_INIT) is in cleartext, but the only meaningful secret in it is the DH key exchange. An eavesdropper sees only "this cipher suite combination was proposed and accepted."
  • Round trip 2 (IKE_AUTH) is encrypted (denoted by {...}). This is where "who you are (ID + cert + AUTH)" and "we'll set up an ESP Child SA with this cipher" are settled.
  • What's in AUTH: with PSK it's "SK_pi(hash(shared message))"; with certificates it's "sign(hash(shared message)) with private key." Even with PSK auth, the PSK itself never crosses the wire.
  • CREATE_CHILD_SA reuses the IKE SA to "rotate keys (rekey)" or "add a Child SA for a different traffic selector" without rebuilding the IKE SA from scratch. A win for long-running VPNs.

4.1 IKEv1 vs IKEv2 — why the rewrite #

IKEv1 (RFC 2409, 1998) used Main Mode 6 messages + Quick Mode 3 messages = 9 messages for the same outcome. It wasn't only verbose — IKEv1 had a lethal pitfall:

  • Aggressive Mode — a "lightweight" 3-message replacement for Main Mode. But because the PSK HMAC rides in the first round in cleartext, it allows offline dictionary attack on the PSK.
  • Configuration combinatorial explosion — cipher / DH group / auth method / mode are independent dimensions, and vendor interop becomes hell.
  • NAT traversal was bolted on later, with implementation-dependent behavior.

IKEv2 cleaned all of that with "collapse to two round trips + bring NAT-T into the spec + ID every message so you can distinguish retransmits from losses." There is no reason to pick IKEv1 for new construction, but Cisco ASA and older Fortinet still run IKEv1 in production, so when you inherit operations, plan a migration to IKEv2 as the right posture.

5. NAT-T — why IPsec can't traverse NAT, and the fix #

ESP runs on IP protocol number 50. Home routers' NAPT (PAT) uses TCP/UDP port numbers to multiplex sessions. ESP has no port concept, so NAPT can't tell ESP sessions apart — only one ESP gets through, or all of them break.

The fix is NAT-T (NAT Traversal, RFC 3948):

  1. The IKE_SA_INIT carries NAT-D payloads — both sides exchange hashes of their addresses, so they can detect whether the path rewrote any IP
  2. If NAT is detected, IKE switches its destination port from UDP/500 to UDP/4500
  3. ESP packets get wrapped in UDP/4500 (the bottom row of the SVG above)

Multiple sessions can now coexist behind NAPT, multiplexed by outer-UDP port, ESP inside. Since you only need UDP/500 and UDP/4500 open (no special ESP firewall hole), modern IPsec stacks default to NAT-T enabled (forceencaps=yes to always wrap in UDP is also a common setting).

6. Authentication — PSK / certificate / EAP #

What you compute the AUTH payload from inside IKE_AUTH determines the authentication method. Three options, each with a clean fit and a clean misfit:

Method How it works Fits Pitfall
PSK (Pre-Shared Key) Both sides hold the same secret string Lab / small Site-to-Site IKEv1 Aggressive Mode + PSK allows offline cracking. One leak → every peer compromised
Certificate (X.509 + PKI) Mutual auth via CA-issued certs Large-scale Site-to-Site / commercial VPN CA operations (rotation, CRL, OCSP) required
EAP (over RADIUS) Remote-user auth with ID/PW + MFA Remote-access VPN (IKEv2 + EAP-MSCHAPv2 / EAP-TLS) RADIUS is a single point of failure

The Aggressive Mode + PSK vulnerability is the most classic IPsec operations pitfall. An attacker only needs to capture (or actively trigger) the first IKE round trip, and the PSK hash is offline-crackable. The classic toolchain is ike-scan -A to find vulnerable VPN GWs and psk-crack (or John the Ripper) to crack. Forcing Main Mode (IKEv1) or IKEv2 alone defeats it.

# IKE GW fingerprinting / Aggressive Mode detection (only on networks you're authorized for)
ike-scan -M <target>          # Probe in Main Mode / show accepted SA → identify vendor
ike-scan -A <target> -id=test # Probe in Aggressive Mode → a response = PSK extractable

7. The real settings — where IPsec actually runs #

The "quietly inside the enterprise" image of IPsec underplays it; it actually runs everywhere:

  • AWS Site-to-Site VPN — Customer Gateway (on-prem) ⇔ AWS Virtual Private Gateway / Transit Gateway over IKEv2 + IPsec ESP. The AWS-side cipher menu is enumerated in the official docs
  • Azure VPN Gateway / GCP Cloud VPN — also IKEv2/IPsec as the standard
  • Apple iOS / macOS Always On VPN — the "VPN Type: IKEv2" in Settings is IPsec
  • Windows AlwaysOn VPN — IKEv2/IPsec or SSTP (TLS) are the standard options
  • Cisco / Juniper / Fortinet / Palo Alto / Check Point — Site-to-Site on enterprise firewalls is essentially the IKEv2/IPsec common-language axis
  • Linux implementations: strongSwan (modern mainstream), libreswan (default on Red Hat family), with the kernel's XFRM stack underneath

A minimum strongSwan (swanctl.conf) Site-to-Site example:

connections {
    site-a-to-b {
        version = 2                         # Force IKEv2
        local_addrs  = 192.0.2.10
        remote_addrs = 198.51.100.20
        proposals = aes256gcm16-sha384-modp3072
        local  { auth = pubkey; certs = my-cert.pem; id = a.example.com }
        remote { auth = pubkey;             id = b.example.com }
        children {
            net-net {
                local_ts  = 10.1.0.0/16
                remote_ts = 10.2.0.0/16
                esp_proposals = aes256gcm16
                start_action = trap         # Auto-establish on first matching traffic
            }
        }
    }
}

That aes256gcm16-sha384-modp3072 blob means "AES-256-GCM (AEAD, 16-byte ICV) / SHA-384 / DH group 15 (3072-bit MODP)." Knowing how to read it lets you configure AWS, Cisco, Juniper — same vocabulary across products.

8. Attack surface and operations — the 2026 floor #

More components ≈ more places to misconfigure is IPsec's tax. The frequent failures and the modern floor:

  • IKE fingerprintingike-scan can identify the vendor and IKE implementation (response patterns, accepted-SA priorities). Operating a VPN GW directly exposed to the Internet means assuming the world will probe it the moment a CVE drops
  • Aggressive Mode + PSK (above) — the detection tools have existed for ages, and older VPN appliances still ship with it on
  • Weak cipher suites / DH groups3DES, MD5, SHA-1, DH group 1/2 (768/1024-bit MODP) are out. Logjam (2015) showed nation-state-scale break of 1024-bit MODP. Group 14 (2048-bit) is the floor; 19 (P-256 ECDH) / 20 (P-384) / 21 (P-521) are recommended
  • Cisco ASA CVE-2016-1287unauthenticated buffer overflow → RCE in IKEv1/IKEv2. CVSS 10.0; hundreds of thousands of devices vulnerable at the time
  • VPNFilter (2018) — Russia-linked APT MitM framework on routers. Reminded everyone that owning router-side IPsec/SSL termination means owning the LAN
  • PSK operations — PSKs distributed by email or chat should be assumed leaked. A single PSK shared across all peers means one site's leak compromises all of them. Per-peer PSKs or move to certificate-based
  • Enable Dead Peer Detection (DPD) — without it, when a peer dies, the SA isn't cleaned up and you get ghost SAs blocking reconnection
  • Logs — IKE negotiation failures log "proposed X, rejected at Y." Whether you can read this determines your operational ceiling

The 2026 baseline: IKEv2 + AES-256-GCM (or ChaCha20-Poly1305) + DH group 15+ (or P-384 ECDH) + certificate-based auth + PFS required + DPD enabled + Aggressive Mode fully disabled. Anything older, in any existing tunnel, gets a planned migration.


IPsec is "a crypto layer that sits at L3 on top of IP." Once you narrow it to its two modern halves — ESP and IKEv2 — it's not as monstrous as it looks. Tunnel-mode ESP + the IKEv2 two-RT handshake + UDP/4500 NAT-T is what's actually happening in nearly every real deployment, and SA / SPD / SAD / SPI are just the vocabulary that holds that state.

WireGuard is displacing IPsec on configuration simplicity, but the breadth of Site-to-Site interop and enterprise-gear support keep IPsec firmly in active duty — especially for cloud providers' VPN gateways, where IKEv2/IPsec remains the standard. As a protocol, IPsec's lifespan continues — running alongside WireGuard rather than being replaced.