IPsec Explained — Tunnel/Transport Modes and the IKE Key Exchange thumbnail

IPsec Explained — Tunnel/Transport Modes and the IKE Key Exchange

⏱ approx. 23 min views 160 likes 0 LOG_DATE:2026-05-10
TOC

IPsec (IP Security, RFC 4301 and friends) is a family of protocols that encrypts and authenticates IP packets themselves at L3. Where TLS and SSH are "per-application overlays the application adds on top", IPsec lives inside the OS's network stack, sitting on top of IP — so any TCP / UDP / ICMP flow gets protected without modifying the application. That's the reason "Site-to-Site VPNs are de facto IPsec" and "iPhone Always On VPN is IPsec underneath" are both true. This article walks through AH / ESP / IKE / Transport / Tunnel / SA · SPD · SAD · SPI / IKEv2 / NAT-T / authentication / attack surface in order.

01

The IPsec map — three protocols #

IPsec is not one protocol. It's a trio: AH / ESP / IKE. Each one has a clean role.

Acronym Full name Role Protocol number / port
AH Authentication Header (RFC 4302) Authentication + integrity only (no encryption) IP protocol 51
ESP Encapsulating Security Payload (RFC 4303) Encryption + authentication + integrity IP protocol 50
IKE Internet Key Exchange (RFC 7296, IKEv2) Key exchange + authentication + SA management UDP 500 (UDP 4500 with NAT-T)
▸ AH is essentially unused today

Two reasons — (1) it can't traverse NAT (the source IP is part of what AH authenticates, so when NAT rewrites the IP the integrity check breaks). (2) It doesn't encrypt, so AEAD ESP already does encryption + authentication in one shot. Modern IPsec = ESP + IKE is a safe mental model in 99% of situations.

Each SA (Security Association) — the "this directional encrypted stream" — is identified by a 32-bit SPI (Security Parameter Index), and that number is the first thing in the ESP header. On receipt, the receiver uses the SPI to look up the SAD (SA Database) and recover "which key to decrypt with, and how far in the sequence we are".

The decisions of "protect this traffic with IPsec / let it through in cleartext / drop it" are written in the SPD (Security Policy Database). When SPD says "protect", the stack looks up the matching SA in SAD, and if none exists it triggers IKE to create one — that's the basic loop of an IPsec stack.

02

Two modes — and what the ESP packet looks like #

Even the same ESP has two distinct uses, Transport mode and Tunnel mode, depending on what's being encrypted. The use cases differ — and so does the actual shape of the resulting packet.

Original packet → Transport mode → Tunnel mode → NAT-T
# Original packet (before IPsec) [ IP hdr 10.0.0.5→.20 ][ TCP hdr 443 ][ App data ]

# Transport mode — original IP hdr stays; only the payload is wrapped in ESP # Use: host ⇔ host point-to-point (inside L2TP/IPsec, Windows AuthIP) [ IP hdr (proto=ESP) ][ ESP hdr SPI/Seq ][ TCP hdr + App data (encrypted) ][ ESP trailer ][ ICV ] ▲ Authenticated range: ESP hdr to just before ICV / Encrypted range: TCP hdr + App data + trailer

# Tunnel mode — the entire original packet is wrapped in a new IP hdr (= the real VPN use case) # Use: Site-to-Site VPN, remote access VPN [ outer IP hdr GW→GW ][ ESP hdr ][ original IP hdr + TCP hdr + App data (encrypted) ][ ESP trailer ][ ICV ] ▲ From outside, all you see is "ESP traffic between two VPN gateways"

# With NAT-T — wrap ESP in UDP (RFC 3948, for NAT traversal) [ outer IP (proto=UDP 17) ][ UDP hdr dst=4500 ][ ESP packet (= the Tunnel-mode whole above) ]

Key points:

  • Transport mode — the original IP header travels in the clear; everything from the TCP header on is what goes inside ESP and gets encrypted. This is the shape when two hosts talk directly (IPsec in Windows domains, the inner layer of L2TP)
  • Tunnel mode — the entire original packet (original IP header included) is wrapped inside a new IP header whose endpoints are VPN GWs. From outside, you only see "ESP between the two VPN GWs". This is the main event — IPsec as a VPN
  • Encryption range and authentication range are different — with AEAD (AES-GCM, ChaCha20-Poly1305) both happen in one step, but conceptually "encryption is the payload, authentication is ESP header + payload + trailer"
  • ICV (Integrity Check Value) sits at the end. With AEAD it's the authentication tag, and if it doesn't match the receiver silently drops the packet (tamper detection)
▸ What ESP sequence numbers are really for

ESP sequence numbers aren't for retransmission or reordering (the inner TCP does that). They exist to prevent replay attacks: the receiver only accepts numbers inside its window (RFC 4303 §3.4.3).

03

SA / SPD / SAD / SPI — the vocabulary of state #

In IPsec, "how to manage keys once you've got them" is roughly half of the protocol.

  • SA (Security Association) — A "contract" for a single-direction encrypted stream. A bundle of key / cipher / mode / sequence number / lifetime. A bidirectional flow needs two SAs (one for sending, one for receiving)
  • SPI (Security Parameter Index) — The 32-bit ID that identifies an SA. Carried at the front of the ESP header. The receiver uses it to look up "which SA of mine is this for"
  • SAD (Security Association Database) — The list of currently active SAs and their internal state. On Linux, look with ip xfrm state
  • SPD (Security Policy Database) — The rule table of "protect this flow with IPsec / let this through cleartext / drop this", keyed on src / dst / proto. On Linux, ip xfrm policy
1. App sends TCP/443
A normal socket call sends 10.0.0.5 → 10.0.0.20.
2. Kernel checks the SPD
The flow matches a "protect" policy.
3. Is there a matching SA in the SAD?
If not, invoke IKE and run a key exchange.
4. IKE exchanges keys with the peer
On completion, register the outbound and inbound SAs in the SAD.
5. Encapsulate and send as ESP
Encrypt with the outbound SA and send as an ESP packet.
6. Receiver looks up SAD by SPI
Pulls the key, decrypts, and re-checks the result against the SPD for legitimacy.
Inspecting SA / SPD on Linux
# ip-xfrm — look at the kernel's IPsec state directly $ sudo ip xfrm state # SAD (key, SPI, cipher alg, seq#) $ sudo ip xfrm policy # SPD (which flows IPsec protects) # strongSwan status $ sudo swanctl --list-sas # current state of IKE_SA and Child_SA $ sudo swanctl --list-conns # configured connection definitions
▸ The "connected but nothing flows" diagnostic path

The spi / auth/encrypt alg / replay-window / lifetime fields in ip xfrm state are literally the SAD fields just described. When IPsec "is up but no traffic flows", you almost always land on only one of SPD and SAD is configured, or only one direction matches.

04

The IKEv2 handshake — two round trips to a key #

IKE (Internet Key Exchange) is the part of IPsec that negotiates the keys. IKEv2 (RFC 7296, 2014) gets you both an IKE SA and a Child SA (= the ESP key) in two round trips (four messages) — a dramatic cleanup from IKEv1's verbose Main Mode (3 RTT) + Quick Mode (1.5 RTT).

IKEv2 sequence — IKE_SA_INIT → IKE_AUTH → ESP data transfer
# Round 1: IKE_SA_INIT (in cleartext — DH key exchange) Initiator → Responder: HDR, SAi1, KEi, Ni proposed ciphers (SAi1) / own DH public (KEi) / nonce (Ni) Responder → Initiator: HDR, SAr1, KEr, Nr [, CERTREQ] accepted ciphers (SAr1) / DH public / nonce (Nr) ▲ Both sides now compute SKEYSEED → the IKE SA's key is established (PFS achieved)

# Round 2: IKE_AUTH (encrypted — authenticate + create Child SA) Initiator → Responder: {HDR, IDi, [CERT,] AUTH, SAi2, TSi, TSr} ID / cert / AUTH data / Child SA proposal / traffic selectors Responder → Initiator: {HDR, IDr, [CERT,] AUTH, SAr2, TSi, TSr} ▲ IKE SA + Child SA (= two ESP keys) are now complete → data transfer can begin

# After: ESP carries the application data (over UDP/4500 when NAT-T is in play) ESP packet (SPI=X, Seq=1, encrypted)

# As needed: CREATE_CHILD_SA (rekey / additional tunnel) — never interrupts traffic {HDR, SAi, Ni, [KEi,] TSi, TSr}

What to remember:

  • Round 1 (IKE_SA_INIT) is cleartext, but the only thing a sniffer can see is the DH exchange and "this side proposed and that side accepted these ciphers" — no meaningful secret is on the wire
  • Round 2 (IKE_AUTH) is already encrypted (shown with {...}). This is where "who you are" (ID + cert + AUTH) and "spin up the Child SA for ESP with this cipher" are settled
  • What's in AUTH — for PSK it's SK_pi(hash(shared message)), for certificates it's private-key-signature(hash(shared message)). Even in PSK mode, the PSK itself never travels on the wire
  • CREATE_CHILD_SA reuses the existing IKE SA to rekey or add another Child SA, without needing to renegotiate the IKE SA itself — great for long-lived VPNs

IKEv1 vs IKEv2 — why a rewrite was needed #

IKEv1 (RFC 2409, 1998) accomplished the same job in Main Mode 6 + Quick Mode 3 = 9 messages. It was not just verbose — it had a fatal pitfall.

  • Aggressive Mode — a "lightweight" alternative to Main Mode that did everything in 3 messages. The catch: the HMAC of the PSK rides in cleartext on the first message, allowing offline dictionary attack on the PSK
  • Combinatorial explosion of settings — cipher / DH group / authentication method / mode all combine independently, making interoperability between vendors a nightmare
  • NAT traversal — bolted on after the fact, behaving differently from one implementation to another

IKEv2 cleaned all of that up: a simple two-round-trip flow + NAT-T baked into the spec + per-message IDs so retransmission and loss can be told apart. There's no good reason to deploy new IKEv1 today.

05

NAT-T — why IPsec can't cross NAT on its own #

ESP runs as IP protocol 50. Home-grade routers, on the other hand, use NAPT (PAT) to multiplex by TCP/UDP port. ESP has no port-number concept, so NAPT can't distinguish ESP flows correctly — the result is only one ESP works / they all collapse.

This is what NAT-T (NAT Traversal, RFC 3948) exists for.

1. NAT detection
During IKE_SA_INIT, the two ends swap NAT-D payloads — hashes of their IP addresses — to detect whether the path is rewriting them.
2. Port switch
If NAT is detected, IKE itself moves from UDP/500 to UDP/4500.
3. Wrap ESP in UDP
ESP packets are encapsulated in UDP/4500 from then on.

This lets multiple sessions coexist through NAPT — multiplexed by the outer UDP port, with ESP inside. Because all you really need open is UDP/500 and UDP/4500, modern IPsec implementations leave NAT-T on by default (and many shops set forceencaps=yes to always wrap in UDP).

06

Authentication modes — PSK / certificate / EAP #

Which authentication method you're using is decided by what computes the AUTH payload in IKE_AUTH. There are three, with cleanly separated sweet spots.

Method Mechanism Where it fits Pitfall
PSK (Pre-Shared Key) Both ends share the same secret string Labs / small Site-to-Site IKEv1 Aggressive Mode + PSK is offline-attackable. Leakage means every peer is compromised
Certificate (X.509 + PKI) Mutual authentication using CA-issued certificates Large Site-to-Site / commercial VPNs Requires CA lifecycle work (rotation, CRL, OCSP)
EAP (via RADIUS) Authenticates remote users with ID/password + MFA Remote-access VPN (IKEv2 + EAP-MSCHAPv2 / EAP-TLS) The RADIUS server becomes a single point of failure
▸ The classic Aggressive Mode + PSK vulnerability

The most classic IPsec operational pitfall. By sniffing the first round trip of an IKE handshake (or actively triggering one), an attacker can perform an offline dictionary attack on the PSK hash. The classic recipe is to find vulnerable VPN GWs with ike-scan -A, then crack with psk-crack or John the Ripper. Forcing Main Mode (IKEv1) or IKEv2 is enough to defeat this.

Fingerprint IKE GWs with ike-scan (only on networks you're authorised on)
$ ike-scan -M <target> # probe Main Mode / show accepted SA → vendor identification $ ike-scan -A <target> -id=test # probe Aggressive Mode → a response means PSK extraction is possible
07

In the real world — where IPsec actually runs #

"Quietly humming in the back office of an enterprise" is the stereotype, but IPsec is in fact running everywhere.

  • AWS Site-to-Site VPN — Joins the Customer Gateway (on-prem) and the AWS Virtual Private Gateway / Transit Gateway via IKEv2 + IPsec ESP
  • Azure VPN Gateway / GCP Cloud VPN — Same — IKEv2/IPsec is the standard
  • Apple iOS / macOS Always On VPN — "VPN type: IKEv2" in the Settings app is exactly IPsec
  • Windows AlwaysOn VPN — Standard options are IKEv2/IPsec or SSTP (TLS)
  • Cisco / Juniper / Fortinet / Palo Alto / Check Point — Site-to-Site on enterprise firewalls is effectively IKEv2/IPsec as the only interop axis
  • Linux implementationsstrongSwan (today's mainstream), libreswan (default on Red Hat-family distros), with the kernel's XFRM stack underneath
Minimal strongSwan swanctl.conf — Site-to-Site
connections { site-a-to-b { version = 2 # force IKEv2 local_addrs = 192.0.2.10 remote_addrs = 198.51.100.20 proposals = aes256gcm16-sha384-modp3072 local { auth = pubkey; certs = my-cert.pem; id = a.example.com } remote { auth = pubkey; id = b.example.com } children { net-net { local_ts = 10.1.0.0/16 remote_ts = 10.2.0.0/16 esp_proposals = aes256gcm16 start_action = trap # auto-establish on first matching traffic } } } }
▸ How to read a cipher proposal

aes256gcm16-sha384-modp3072 means "AES-256-GCM (AEAD, 16-byte ICV) / SHA-384 / DH group 15 (3072-bit MODP)". Once you know this vocabulary you can configure AWS / Cisco / Juniper with the same terminology.

08

Attack surface and operations — the 2026 baseline #

A lot of moving parts means a lot of misconfiguration surface. Common incidents and the modern minimum.

  • IKE fingerprintingike-scan identifies the vendor and IKE implementation. Putting a VPN GW directly on the internet means assuming the entire world will scan for it within hours of any CVE
  • Aggressive Mode + PSK — detection tooling has existed for years. Old VPN boxes still have it enabled
  • Weak ciphers / DH groups3DES, MD5, SHA-1, DH group 1/2 (768/1024-bit MODP) are out. Logjam (2015) demonstrated that 1024-bit MODP is within nation-state range. Minimum group 14 (2048-bit); preferred is group 19 (P-256 ECDH) / 20 (P-384) / 21 (P-521)
  • Cisco ASA CVE-2016-1287unauthenticated buffer overflow → RCE in IKEv1 / IKEv2. CVSS 10.0; hundreds of thousands of boxes were vulnerable at the time
  • VPNFilter (2018) — a router-targeting MitM framework attributed to a Russian APT. Owning a router's IPsec/SSL termination means the attacker can see the whole LAN
  • PSK operations — assume any PSK distributed via email or chat has leaked. Sharing the same PSK across every peer means a single leak compromises everyone. Move to a unique PSK per peer, or to certificate-based
  • Enable Dead Peer Detection (DPD) — without it, when a peer goes down the stale SA sticks around and prevents reconnection
  • Logs — IKE negotiation failures get logged with "what was proposed, what was rejected and why". Being able to read that determines how good your operations team is
▸ The 2026 baseline configuration

IKEv2 + AES-256-GCM (or ChaCha20-Poly1305) + DH group 15 or higher (or P-384 ECDH) + certificate-based authentication + mandatory PFS + DPD enabled + Aggressive Mode completely disabled. If you've got existing tunnels that don't meet this, plan a migration.

09

Wrap-up — how IPsec and WireGuard split the field #

The idea of IPsec — "an encrypted layer at L3, on top of IP" — isn't as complex as it looks once you narrow to the two modern actors, ESP and IKEv2. Tunnel-mode ESP + IKEv2's two-round-trip handshake + UDP/4500 NAT-T is what's happening in nearly every real-world deployment, and SA / SPD / SAD / SPI is just the vocabulary for the state behind it.

WireGuard is taking ground on "simplicity of configuration", but the depth of Site-to-Site interop and the breadth of enterprise gear support mean IPsec is still very much in active service. The cloud providers' VPN gateways in particular continue to ship IKEv2/IPsec as their standard interface. As a protocol, IPsec will live on alongside WireGuard for the foreseeable future.

𝕏 Post B! Hatena