Linux powers the majority of the world's servers, cloud infrastructure, smartphones (Android), embedded devices, and supercomputers — at least, that's the usual framing. Strictly speaking, "Linux" refers only to the kernel, and what we touch day-to-day is the whole stack of the GNU userland + a distribution's glue on top. This article walks that three-layer structure, the kernel/userland boundary cut by syscalls, "everything is a file", processes, permissions, the shell, and finally the natural emergence of containers from namespaces + cgroups.
The dual meaning of "the kernel only" vs "the whole OS" #
The first thing to unpack is the layered nature of the word "Linux".
| Layer | What's in it | Examples |
|---|---|---|
| Kernel | Process management / memory management / FS / networking / device drivers. The only thing "Linus Torvalds and the community maintain as Linux" | linux-6.x (kernel.org) |
| Userland (GNU + surroundings) | Libraries (glibc / musl), shells (bash / zsh), coreutils (ls, cp, cat), GNU toolchain (gcc / binutils) |
GNU project + util-linux + systemd |
| Distribution | Kernel + userland + package manager + the distribution's own config / startup scripts / release policy | Debian, Ubuntu, RHEL, Fedora, Arch, Alpine, Android |
So "I installed Ubuntu" really means "I installed a distribution that combines the Linux kernel + the GNU toolchain + Debian-style packaging + Canonical-specific Snap / Netplan / configuration", all bundled together.
Android uses the Linux kernel, but the userland on top is Bionic libc + Java/ART, not GNU. From kernel.org's perspective it's Linux, but it's often treated as a separate thing from "GNU/Linux". The kernel itself is GPLv2, which is the legal basis on which Android device makers are obligated to publish their kernel changes.
Architecture — kernel and userland are split by the system call interface #
The core of Linux's design (and UNIX-family designs in general) is that kernel space and user space are physically separated by CPU privilege levels. User processes never touch hardware directly — they go through the defined entry point, the system call, to ask the kernel to do things on their behalf.
[ Application ] nginx / PostgreSQL / Firefox / Python scripts / vim / Docker client
[ User Land ] bash/zsh / ls,cp,cat,grep / glibc/musl / systemd / sshd / cron / Wayland/Xorg
───── system call interface (read / write / open / fork / exec / mmap / socket / ...) ─────
[ Kernel ] Process/Sched (CFS/EEVDF, cgroups, namespaces)
Memory (MMU, page cache, slab, OOM killer, swap)
VFS/FS (ext4/XFS/Btrfs, tmpfs, procfs, FUSE, overlayfs, io_uring)
Network (TCP/IP, netfilter/nftables, XDP/eBPF, routing, socket)
+ Device Drivers (NIC, GPU, NVMe, USB, ACPI, audio — loaded dynamically via modprobe)
[ Hardware ] CPU / memory / storage / NIC / GPU / peripherals
# ~400 syscalls / e.g. read=0, write=1, open=2, fork=57, execve=59, mmap=9 (x86_64)What to remember:
- The kernel is never part of any app. An application can only reach the kernel through one of about 400 syscalls —
read(),write(),open(),fork(),socket(), and so on. The reasonstraceworks at all is precisely because this boundary is explicit - The GNU userland is a separate project from the Linux kernel.
lsandcatcome from GNU coreutils, developed independently from the kernel. Alpine Linux replaces these with musl + BusyBox for compactness — the same kernel can feel like a completely different OS depending on what's on top - systemd is the modern glue between kernel and userland — a userspace PID 1 daemon that unifies process management, cgroups, logging, networking, DNS, and boot ordering
- Dynamic driver loading — kernel modules shown by
lsmodare loaded on demand viamodprobe. New hardware can be plugged in without rebooting
# live-trace syscalls (what the app is asking the kernel to do)
$ strace -f -e trace=openat,read,write ls /tmp
# kernel version and build settings
$ uname -a
$ cat /proc/version
# loaded kernel modules
$ lsmod
# system-wide syscall stats (needs perf)
$ sudo perf stat -e 'syscalls:sys_enter_*' -a sleep 5Everything is a file — UNIX's biggest abstraction #
"Everything is a file" is the most influential design call in UNIX philosophy. Because regular files, directories, devices, pipes, sockets, and symlinks all use the same read() / write() / open() / close() API, programs can be composed without caring what's on the other end.
grep foo /var/log/syslog, grep foo < /dev/ttyS0 (a serial port), and cat hello | grep foo are all literally the same read() call — that's the heart of UNIX's simplicity and power.
| ls -l first char | Type | Examples |
|---|---|---|
- |
regular — regular file | /etc/hosts, /bin/ls |
d |
directory — name → inode listing | /home, /etc, /tmp |
l |
symlink — reference to another path | /lib → /usr/lib |
c |
char device — byte-level I/O | /dev/tty, /dev/null |
b |
block device — block-level I/O | /dev/sda, /dev/nvme0n1 |
p |
FIFO — named pipe | Created with mkfifo |
s |
socket — UNIX socket | /tmp/.X11-unix/X0 |
Special filesystems — /dev /proc /sys #
- /dev — The doorway to hardware and virtual devices.
echo "hello" > /dev/tty1writes to the first virtual console;dd if=/dev/zero of=test.bin bs=1M count=100creates a 100 MB zero-filled file — both are just normal file writes via the same syscalls - /proc — A virtual FS the kernel generates on the fly.
/proc/PID/mapsshows a process's memory mappings;/proc/PID/fd/shows its open file descriptors. None of this lives on disk — the kernel generates the text the moment youcatit - /sys — Exposes the kernel's internal object hierarchy (devices, buses, classes) as a directory tree. Operations like changing a NIC's MAC with
echo 02:11:22:33:44:55 > /sys/class/net/eth0/addressare possible
They don't exist on disk. They are virtual filesystems the kernel produces at runtime. Their contents go away when the process exits or the system reboots, so if you want to keep something, copy it elsewhere.
# check file types
$ ls -l /dev/null /etc/hosts /lib /tmp/.X11-unix/X0
$ stat -c '%F %n' /dev/sda /proc/meminfo /sys/class/net/eth0
# pulling process info from /proc
$ ls /proc/$$/fd # FDs my shell has open
$ cat /proc/$$/maps # memory mappings
$ cat /proc/$$/status # state (VmRSS etc)
# dynamic tuning via sysfs / sysctl
$ sudo sysctl -w net.ipv4.ip_forward=1
$ echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward # equivalentProcesses and signals — fork/exec and PID 1 #
Linux's process model inherits UNIX's distinctive design — "fork() to clone → exec() to replace yourself with another program".
This is the root of "every process descends from PID 1 in a single family tree". pstree lets you trace your way back from vim ← bash ← sshd ← systemd (PID 1). When a parent dies before its child, the child becomes an orphan and is adopted by PID 1 (systemd) — the classic "init reaping" rule, still in force.
Signals are asynchronous notifications between processes. SIGINT from Ctrl+C, the default SIGTERM from kill PID, the un-catchable SIGKILL from kill -9 PID (force termination), SIGSEGV for memory errors — 31 signals are defined.
These two are the only signals a process cannot install a handler for. They exist by design so the user always has a way to stop something. kill -9 doesn't let cleanup run, with risk of resource leaks — start with SIGTERM and ask politely.
# view the process tree
$ pstree -p $$ # from my shell up to PID 1
# state and resources
$ ps -ef # all processes (UNIX format)
$ ps auxf # tree view (BSD format)
$ top / htop # real-time
$ ss -tnp / lsof -p PID # connections / files for a PID
# sending signals
$ kill -SIGTERM PID # polite request (cleanup possible)
$ kill -9 PID # force kill (watch for leaks)
$ kill -SIGUSR1 PID # app-defined notification (nginx log-rotate etc)A zombie (Z state) = a child that has exited but whose parent hasn't called wait() yet. Resources are released, but the PID and process-table entry are left. A zombie outbreak is a parent bug — exhausting PID space prevents new processes from being created.
The permission model — rwx → setuid → capabilities → namespaces #
The UNIX permission model starts simple — UID + GID + 9 rwx bits per file. The nine characters in ls -l's -rwxr-xr-x line up as "owner rwx / group rwx / other rwx". UID 0 = root is the all-powerful super-user; the modern norm is to operate as a non-root user and reach for sudo only when needed.
But "rwx alone isn't fine-grained enough" came up repeatedly, and the model has been extended in stages.
| Feature | What it solved |
|---|---|
| setuid / setgid bits | passwd needs to update /etc/shadow, so it has to run with root privileges when invoked by a regular user → run with the file owner's privileges at exec time. Misuse creates a paradise for local privilege escalation, and looking for them (find / -perm -4000) is a classic post-exploit step |
| POSIX ACL | rwx's three buckets can't give different permissions to multiple users → setfacl lets you set per-file fine-grained ACLs |
| Capabilities | Split root's authority into ~40 capabilities (CAP_NET_BIND_SERVICE = open ports below 1024, CAP_NET_ADMIN = network config, CAP_SYS_ADMIN = essentially anything) → grant a daemon only what it needs |
| MAC (Mandatory Access Control) | rwx is DAC (Discretionary) — "the file owner decides". SELinux / AppArmor enforce a system-wide policy: "this binary can only do these specific operations" |
| Namespaces + cgroups | Show each process a different world (PID / FS / network / UID) and bound its CPU / memory / I/O → the underpinnings of containers (§07) |
# basic permissions
$ ls -l /etc/shadow # -rw-r----- root:shadow → invisible to regular users
$ chmod 600 ~/.ssh/id_ed25519 # SSH private keys want 600 (rw------- owner only)
# find setuid binaries (basic system inventory)
$ find / -perm -4000 -type f 2>/dev/null
# capabilities (split root)
$ sudo setcap cap_net_bind_service=+ep /usr/bin/python3.11
# → python can now bind 80/443 without being root
# SELinux (RHEL family) status
$ sestatus
$ ls -lZ /var/www/html
# sudo config (NOPASSWD should stay scoped)
$ sudo visudo # safe edit of /etc/sudoersThe shell wires everything together — pipes and standard I/O #
The productivity of Linux/UNIX boils down to chaining small single-purpose programs with | to do whatever you want. What makes this work is "everything is a file" + standard streams (stdin / stdout / stderr) + the pipe (|).
ls | grep '\.log$' | wc -l spawns three independent processes at once and wires each stdout into the next stdin to compute "how many .log files in the current directory" in one line. The kernel mediates via a pipe (FIFO buffer) — the processes share no memory.
| Construct | Meaning |
|---|---|
cmd > file |
Redirect stdout to a file, overwriting |
cmd >> file |
Append stdout to a file |
cmd 2> err.log |
Redirect just stderr to a file |
cmd 2>&1 |
Merge stderr into stdout |
cmd < file |
Take stdin from a file |
cmd1 | cmd2 |
Pipe cmd1's stdout into cmd2's stdin |
cmd1 ; cmd2 |
Run sequentially (regardless of exit code) |
cmd1 && cmd2 |
Run cmd2 only if cmd1 succeeded (exit 0) |
cmd1 || cmd2 |
Run cmd2 only if cmd1 failed |
The exit code is the single most important return value between programs. 0 = success / non-zero = failure. Shell scripts branch on it, and CI/CD systems decide build success or failure from it. Bash's set -euo pipefail ("die on undefined variables / exit immediately on error / detect failures in the middle of a pipe") is the standard safety harness for shell scripts.
Environment variables (PATH, HOME, LANG, LD_LIBRARY_PATH, etc.) propagate from parent to child processes by copying. export FOO=bar makes them visible to subsequently spawned children.
Distributions — which one to pick #
"Install Linux" really means "pick a distribution". The kernel is the same, but package manager, release cadence, default daemons, and community culture all differ.
| Distribution | Family | Package | Characteristics / typical use |
|---|---|---|---|
| Debian | Original | apt (.deb) |
Stability-first, community-driven, strong on long-haul server workloads |
| Ubuntu | Debian-derived | apt |
Most popular on desktops, LTS releases, Canonical-supported |
| RHEL | Commercial | dnf (.rpm) |
The de facto enterprise production standard, subscription-based, 10-year support |
| Fedora | RHEL upstream | dnf |
Bleeding edge — testbed for what eventually trickles down to RHEL |
| CentOS Stream / Rocky / AlmaLinux | RHEL-compatible | dnf |
Successors to the original CentOS — Rocky / Alma are binary-compatible |
| Arch Linux | Independent | pacman |
Rolling release, latest-everything, ArchWiki is the best Linux documentation on the internet |
| Alpine Linux | Independent | apk |
Tiny (~5 MB) thanks to musl + BusyBox, the default base for Docker images |
| Android | (proprietary) | Linux kernel + Bionic libc + Java/ART — the most-deployed Linux in the world | |
| WSL2 | Microsoft | Depends on the parent distro | A Linux kernel running on Windows (Hyper-V VM) |
- Production servers → RHEL / Rocky / Ubuntu LTS (10-year support, proven)
- Development / desktop → Ubuntu / Fedora / Arch (personal preference)
- Container base → Alpine (small) or Debian slim / Ubuntu minimal (compatible)
- Reviving old hardware → Lightweight derivatives (Lubuntu, MX Linux, …)
- Learning → Arch (the experience of building from zero with
pacstrap) or Debian
The container revolution — namespaces + cgroups, combined #
When Docker and Kubernetes took over the late 2010s, what really happened inside Linux? The answer is "we just started combining two features that had been in the kernel all along".
- namespaces (PID / network / mount / UTS / IPC / user) — Show each process a different world
- PID namespace — Inside the container,
psonly shows the container's own processes - network namespace — Independent eth0 / routing table
- mount namespace — A different
/filesystem from the host's - user namespace — A container's root is non-root from the host's view (rootless containers)
- PID namespace — Inside the container,
- cgroups (control groups) — Limit and measure CPU / memory / I/O / pid count per process group
Combine these with overlayfs (layered FS) and you get "an independent miniature OS, per process, on top of the host OS" — a container. Docker initially used lxc to wire them together, then wrote its own runc, and today everything is standardised via OCI runtimes.
No hardware virtualization (Hyper-V, KVM, VMware) — they share the host's kernel, isolating only at the process level, with no hardware emulation. The cost is that a kernel CVE can collapse container isolation (CVE-2022-0185, CVE-2024-1086, and others are famous examples).
$ docker inspect --format '{{.State.Pid}}' my-container # the PID on the host
$ ls /proc/<PID>/ns # that PID's namespaces
$ nsenter -t <PID> -n -p ip addr # enter the NS, check ip
$ cat /sys/fs/cgroup/system.slice/docker-<ID>.scope/cpu.stat # cgroups CPU statsKubernetes layers a cluster-level control plane on top — automatically scheduling, healing, and scaling containers. Two layers: Linux kernel runs the containers, Kubernetes manages them.
Where modern Linux actually runs #
| Where | Share / status |
|---|---|
| Cloud servers | 95%+. The default image on AWS EC2 / Azure VM / GCP CE is Linux. Managed databases, Kubernetes worker nodes, Lambda, and Fargate all run on Linux |
| Smartphones (Android) | 70%+ of the world runs Android = the Linux kernel. The most-deployed OS in the world |
| Embedded | Routers, TVs, refrigerators, cars, industrial gear, smartwatches, IoT — countless devices, all running invisibly |
| Supercomputers | 100% of the TOP500 is Linux (since 2017). HPC is, practically speaking, Linux only |
| Desktops | ~3-5%. Windows / macOS dominate, but the developer / scientist / engineer community is significantly Linux |
| WSL2 | Growing fast. We've arrived at an era where Microsoft officially ships a Linux kernel with Windows |
"Powering servers and mobile and every embedded device, but a minority on the desktop". Windows and macOS compete as desktop OS products; Linux occupies a different layer — the OS-as-infrastructure layer.
What started as "a kernel Linus Torvalds wrote as a hobby in 1991" has become the name for the OS ecosystem that supports more than half of the world's computing. Understanding it comes down to grasping the three-layer structure — "kernel = Linux proper", "userland = GNU + distros", "the two are physically split by syscalls" — and seeing how UNIX-inherited design choices — "everything is a file", "fork/exec", "pipe + exit code", "rwx → capabilities → namespaces" — are still alive in modern systems.
Whatever you dig into — cloud, containers, smartphones, serverless — you end up landing on Linux's syscalls and namespaces and cgroups. Understanding Linux once well lets you approach almost all of modern infrastructure with the same vocabulary.