Linux Explained: Architecture, Commands, and Distros thumbnail

Linux Explained: Architecture, Commands, and Distros

⏱ approx. 28 min views 71 likes 0 LOG_DATE:2026-05-10
TOC

Linux #

Linux is what runs most servers, the cloud, smartphones (Android), embedded devices, and supercomputers in the world — that's the convenient sentence to say, but strictly, "Linux" names only the kernel. What you actually touch as a user is a stack of the Linux kernel + a GNU userland + distribution-specific glue. Richard Stallman's insistence on calling it "GNU/Linux" reflects that very layered fact.

In practice, "Linux" is universally understood to mean "the entire ecosystem of Unix-like OSes running the Linux kernel." This article first untangles that three-layer structure, then covers the foundational design — kernel and userland are physically separated by the system-call boundary — followed by the unifying "everything is a file" abstraction, the process / permission / shell layers that shape the daily user experience, and finally how the container revolution falls naturally out of namespaces + cgroups — all at the granularity of "understand Linux from one diagram."

1. What Linux is — both "just the kernel" and "the whole OS" #

Untangling the layered meaning of the word "Linux" first.

Layer Contents Examples
Kernel Process management / memory / filesystem / network / device drivers. The only "Linux" that Linus Torvalds and the community govern. linux-6.x (kernel.org)
Userland (GNU + neighbors) Libraries (glibc / musl), shells (bash / zsh), coreutils (ls cp cat), GNU toolchain (gcc / binutils) GNU project + util-linux + systemd
Distribution Kernel + userland + package manager + distro-specific config / init scripts / release policy Debian, Ubuntu, RHEL, Fedora, Arch, Alpine, Android

So "I installed Ubuntu" = "I installed a distribution that bundles a Linux kernel + GNU toolchain + Debian packaging + Canonical's own Snap / Netplan / configuration." Android also uses the Linux kernel but pairs it with Bionic libc + Java/ART instead of the GNU userland, so whether to call it "Linux" gets murky (kernel.org counts it as Linux).

The philosophical "Free Software" vs "Open Source" debate sits in the background, but practically both rest on source published under licenses like GPL/MIT/Apache. The Linux kernel itself is GPLv2, which is the legal hook obligating Android device makers to publish their kernel modifications.

2. Architecture — kernel and userland are split by the system-call boundary #

The defining design of Linux (and Unix-likes generally) is that kernel space and user space are physically separated by CPU privilege levels. User processes don't touch hardware directly — they always go through system calls, the well-defined door into the kernel.

Linux architecture — layered stack from hardware to applications "Kernel space / user space" are physically separated by CPU privilege levels / syscalls are the only door Application — what users write or install nginx / PostgreSQL / Firefox / Python scripts / vim / Docker client … User Land — shell + GNU coreutils + libraries + daemons (= what makes the OS feel like an OS) bash/zsh / ls,cp,cat,grep / glibc/musl / systemd / sshd / cron / Wayland/Xorg ───── system call interface (read / write / open / fork / exec / mmap / socket / ...) ───── Kernel — the single "Linux" that Linus Torvalds governs Process / Sched CFS / EEVDF PID, task state signals, futex cgroups namespaces Memory virtual mem (MMU) page cache slab allocator OOM killer swap VFS / FS ext4 / XFS / Btrfs tmpfs / procfs FUSE / NFS / overlayfs page cache io_uring Network TCP/IP stack netfilter / nftables XDP / eBPF routing / qdisc socket layer + Device Drivers — NIC, GPU, NVMe, USB, ACPI, audio … (loadable as kernel modules) Hardware — CPU / memory / storage / NIC / GPU / peripherals Linux itself doesn't touch this layer — drivers absorb the hardware-specific differences ~400 syscalls / examples: read=0, write=1, open=2, fork=57, execve=59, mmap=9, socket=41, futex=202 (x86_64)

The takeaways:

  • The kernel is never "part of the app." Apps reach the kernel only through the ~400 system callsread() write() open() fork() socket(). The reason strace can follow execution is exactly that this boundary is sharp.
  • The GNU userland is "a separate project from the Linux kernel." ls and cat are from GNU coreutils, developed independently of the Linux kernel. When Alpine Linux slims the userland down to musl + BusyBox, the same kernel produces an OS that feels entirely different.
  • systemd is "the modern glue between kernel and userland." Process management / cgroups / logging / network configuration / DNS / startup ordering — capabilities that used to be separate are unified. Behind systemctl start nginx is a user-space PID 1 daemon at work.
  • Dynamic driver loading — a kernel module listed in lsmod is loaded on demand by modprobe. The reason Linux can start using a newly attached device without a reboot is this design.
# Trace syscalls live (what an app actually asks the kernel for)
strace -f -e trace=openat,read,write ls /tmp

# Kernel version and build options
uname -a
cat /proc/version

# Loaded kernel modules
lsmod

# System-wide syscall stats (requires perf)
sudo perf stat -e 'syscalls:sys_enter_*' -a sleep 5

3. Everything is a file — Unix's most influential abstraction #

"Everything is a file" is the most influential design choice in the Unix philosophy. Because regular files, directories, devices, pipes, sockets, and symlinks all share the same read() / write() / open() / close() API, programs can be composed without caring what's on the other end. grep foo /var/log/syslog, grep foo < /dev/ttyS0 (a serial port), and cat hello | grep foo all reach the same read() call — that's the actual substance of Unix's simplicity and power.

"Everything is a file" — Linux's 7 file types and the special FSes All readable / writable through read() / write() / open() / close() — first char of `ls -l` tells the type - (regular) regular file text/binary /etc/hosts /bin/ls d (directory) directory name → inode list /home /etc /tmp l (symlink) symbolic link reference to a path /lib → /usr/lib c (char dev) character device byte-stream I/O /dev/tty /dev/null b (block dev) block device block-oriented I/O /dev/sda /dev/nvme0n1 p (FIFO) named pipe mkfifo s (sock) UNIX socket ▼ /dev — the door to physical / virtual devices (device files) /dev/null (writes discarded, reads return EOF) /dev/zero (reads return zeros forever) /dev/random /dev/urandom (kernel RNG) /dev/sda /dev/nvme0n1 (storage) /dev/tty1+ (virtual consoles) /dev/ttyS0 (serial) /dev/loop0 (loopback) ▼ /proc — virtual FS exposing kernel and process internal state "as files" /proc/cpuinfo /proc/meminfo /proc/loadavg /proc/uptime /proc/version /proc/PID/status /proc/PID/cmdline /proc/PID/maps /proc/PID/fd/ (per process) /proc/sys/net/ipv4/ip_forward (sysctl) — writes change kernel settings ▼ /sys — sysfs: device/driver hierarchy exposed as a directory tree /sys/class/net/eth0/ (NIC statistics, MTU, address …) /sys/block/sda/queue/ (block-device I/O scheduler etc.) /sys/class/thermal/ (temperature sensors) /sys/class/leds/ (LED control) /proc, /sys, /dev are recreated by the kernel on every boot → they don't exist on disk

The most important concepts:

  • /dev is the door to hardware and virtual devices. echo "hello" > /dev/tty1 writes to virtual console 1; dd if=/dev/zero of=test.bin bs=1M count=100 builds a 100 MB zero-filled file — both are driven by the same syscall as a write to a regular file.
  • /proc is a virtual FS the kernel synthesizes on the fly. /proc/PID/maps shows a process's memory map; /proc/PID/fd/ lists its open file descriptors. Nothing is on disk — the kernel composes the text the moment you cat it.
  • /sys exposes the kernel's internal object hierarchy (devices, buses, classes) as a directory tree. Operations like echo 02:11:22:33:44:55 > /sys/class/net/eth0/address to change a NIC's MAC become possible.
  • A symbolic link (starts with l) is a pointer to another path, similar to a Windows shortcut but transparently resolved by the kernel. Used heavily for things like /lib → /usr/lib consolidation, or current → releases/56 swaps in deployments.
# Check file types
ls -l /dev/null /etc/hosts /lib /tmp/.X11-unix/X0
stat -c '%F %n' /dev/sda /proc/meminfo /sys/class/net/eth0

# Pull process info from /proc
ls /proc/$$/fd                       # FDs my shell has open
cat /proc/$$/maps                    # memory map
cat /proc/$$/status                  # state (incl. VmRSS)

# Change runtime settings via sysfs / sysctl
sudo sysctl -w net.ipv4.ip_forward=1
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward    # the same thing

4. Processes and signals — fork/exec and PID 1 #

Linux's process model inherits the distinctive Unix design: "fork() to copy yourself → exec() to replace with another program." When bash runs ls, the shell:

  1. Calls fork() to make a copy of itself (a child process; same memory and FDs as the parent)
  2. The child calls execve("/bin/ls", ...) to replace its own image with ls (PID stays the same)
  3. The parent shell calls waitpid() to wait for the child to exit

This is why all processes form a single genealogy descending from PID 1 (a process tree). pstree reveals a chain like systemd (PID 1) → sshdbashvim — you can walk back to your ancestors. When a parent dies before its children, the children become orphans and are adopted by PID 1 (systemd) — the classic init-reaping rule, still in effect.

Signals are asynchronous notifications between processes. SIGINT from Ctrl+C, SIGTERM from kill PID (default), SIGKILL from kill -9 PID (uncatchable, forced kill), SIGSEGV from a memory error — 31 of them defined. Only SIGKILL and SIGSTOP are uncatchable (so users always retain a way to stop a process).

# Process tree
pstree -p $$                         # from my shell up to the ancestors

# State and resources
ps -ef                               # all processes (UNIX style)
ps auxf                              # tree (BSD style)
top  /  htop                         # real time
ss -tnp  /  lsof -p PID              # connections / files for a PID

# Sending signals
kill -SIGTERM PID                    # polite request (app can clean up)
kill -9 PID                          # force kill (can leak resources)
kill -SIGUSR1 PID                    # app-defined notification (e.g. nginx log rotation)

Zombies (Z state) are children that have exited but whose parent hasn't called wait() yet. Resources are released; only the PID and process-table entry remain. A zombie buildup means the parent has a bug (not waiting on children); leave it long enough and you exhaust the PID space and can't fork any more.

5. The permission model — UID/GID/rwx → setuid → capabilities → namespaces #

The Unix permission model started simple: UID (user ID) + GID (group ID) + 9 rwx bits on each file. The 9 characters -rwxr-xr-x in ls -l are "owner rwx / group rwx / other rwx."

UID 0 = root is the all-powerful super-user, and doing day-to-day work as a non-root user + reaching for sudo only when necessary is the foundational rule of modern security.

But "rwx alone isn't fine-grained enough" came up many times, and the model expanded in stages:

Feature What it solved
setuid / setgid bits passwd updating /etc/shadow requires root, but a regular user invokes it → marking the binary setuid runs it as the owner's identity at execution time. Common abuse vector for local privilege escalation, so post-compromise analysts always run find / -perm -4000 to enumerate them
POSIX ACL The "owner / group / other" 3-tier model can't grant distinct rights to multiple userssetfacl adds per-file granular ACLs
Capabilities Instead of "give me all of root," divide root into ~40 capabilities (CAP_NET_BIND_SERVICE = bind ports below 1024, CAP_NET_ADMIN = network configuration, CAP_SYS_ADMIN = catch-all) → grant a daemon only the minimum it needs
MAC (Mandatory Access Control) rwx is DAC (Discretionary) — owners decide → SELinux / AppArmor enforce a system-wide policy like "this binary may only do these operations"
Namespaces + cgroups "Show each process its own world (PID space / FS / network / UID mapping)" + "limit CPU / memory / I/O" → this is exactly what makes containers (§7)
# Basic permissions
ls -l /etc/shadow                    # `-rw-r-----` root:shadow → invisible to regular users
chmod 600 ~/.ssh/id_ed25519          # SSH private key must be 600 (rw------- owner only)

# Find setuid binaries (a baseline survey of the system)
find / -perm -4000 -type f 2>/dev/null

# Capabilities (subdivide root)
sudo setcap cap_net_bind_service=+ep /usr/bin/python3.11
# → python can now bind 80/443 without being root

# SELinux (RHEL family) status
sestatus
ls -lZ /var/www/html

# sudo configuration (use NOPASSWD sparingly)
sudo visudo                          # safe edit of /etc/sudoers

6. The shell ties everything together — pipes and standard I/O #

The substance of Linux/Unix productivity is stitching small single-purpose programs together with | to build anything. What enables that is "everything is a file" + standard I/O (stdin / stdout / stderr) + pipes (|).

┌──────┐  stdout    stdin  ┌──────┐  stdout   stdin  ┌──────┐
│ ls   │ ─────────────────→│ grep │ ────────────────→│ wc   │
└──────┘                   └──────┘                  └──────┘

ls | grep '\.log$' | wc -l starts three independent processes simultaneously and wires each one's stdout into the next one's stdin, computing "the count of .log files in the current directory" in one line. The kernel mediates via a pipe (FIFO buffer); no shared memory is needed.

Form Meaning
cmd > file stdout overwrites the file
cmd >> file stdout appends to the file
cmd 2> err.log only stderr goes to the file
cmd 2>&1 merge stderr into stdout
cmd < file stdin from the file
cmd1 | cmd2 cmd1's stdout into cmd2's stdin
cmd1 ; cmd2 sequential (regardless of cmd1's exit)
cmd1 && cmd2 run cmd2 only if cmd1 succeeded (exit 0)
cmd1 || cmd2 run cmd2 only if cmd1 failed

Exit codes are the most important inter-program return value. 0 = success, anything else = some failure. Shell scripts branch on this; CI/CD pipelines decide "build passed or failed" by it.

Environment variables (PATH, HOME, LANG, LD_LIBRARY_PATH, …) propagate by copy from parent to child. export FOO=bar makes it visible to subsequently spawned children. bash's set -euo pipefail is the canonical "die on undefined variables / stop immediately on errors / detect failures inside pipelines" safety setup for scripts.

7. Distributions — picking one #

"Installing Linux" = "picking a distribution." The kernel is the same; package manager / release cycle / default daemons / community character differ.

Distro Family Packages Character / where it shines
Debian Original apt (.deb) Stability first / community-driven / strong long-term server use
Ubuntu Debian-derived apt Most popular on desktop / LTS releases / Canonical commercial support
RHEL (Red Hat Enterprise Linux) Commercial dnf (.rpm) Enterprise production standard / subscription / 10-year support
Fedora RHEL upstream dnf Cutting edge / the testing ground before features descend into RHEL
CentOS Stream / Rocky / AlmaLinux RHEL-compatible dnf Successors to old CentOS — Rocky / Alma are binary-compatible
Arch Linux Independent pacman Rolling release / minimalist / DIY culture / ArchWiki is the world's best Linux documentation
Alpine Linux Independent apk musl + BusyBox makes it tiny (~5 MB) / the de facto base for Docker images
Android Google (custom) Linux kernel + Bionic libc + Java/ART — the most-deployed Linux on Earth
WSL2 (on Windows) Microsoft (depends on parent distro) A Linux kernel running on Windows (in a Hyper-V VM) — surging adoption as a dev environment

How to pick:

  • Production serversRHEL / Rocky / Ubuntu LTS (10-year support, enterprise track record)
  • Workstation / desktopUbuntu / Fedora / Arch (a matter of taste)
  • Container baseAlpine (small) or Debian slim / Ubuntu minimal (compatibility-first)
  • Reviving old hardwarelightweight derivatives (Lubuntu, MX Linux, …)
  • LearningArch (build it from pacstrap) or Debian (minimal by default)

8. The container revolution — falling naturally out of namespaces + cgroups #

Docker / Kubernetes swept the late 2010s — what actually happened inside Linux? The answer is "two features the Linux kernel had all along finally got combined and used."

  • namespaces (PID / network / mount / UTS / IPC / user) — show each process its own world
    • PID namespace: inside a container, ps only shows the container's own processes
    • network namespace: its own eth0 / routing table
    • mount namespace: its own filesystem as /
    • user namespace: container root looks like a non-root from the host (rootless containers)
  • cgroups (control groups) — limit and measure CPU / memory / I/O / pid count at the process-group level

Combine these with overlayfs (a stacking filesystem) and you get "a self-contained mini-OS running per process on top of the host OS" = a container. Docker first reached for these via lxc, later wrote runc, and the modern world standardizes on OCI runtimes.

"Containers are lighter than VMs" because they skip hardware virtualization (Hyper-V, KVM, VMware) — the kernel is shared with the host, with only process isolation, no hardware emulation. The cost: kernel vulnerabilities can break container isolation (CVE-2022-0185, CVE-2024-1086 are well-known examples).

# Look at "the container is just Linux processes"
docker inspect --format '{{.State.Pid}}' my-container    # PID on the host
ls /proc/<PID>/ns                                        # that PID's namespaces
nsenter -t <PID> -n -p ip addr                           # enter the network ns and check IP
cat /sys/fs/cgroup/system.slice/docker-<ID>.scope/cpu.stat  # cgroups CPU stats

Kubernetes layers on top another tier of "automatic placement, healing, and scaling for clusters of containers." Two layers: the Linux kernel runs the containers, Kubernetes manages them.

9. Where modern Linux actually runs #

Domain Share / status
Cloud servers 95%+. Default images on AWS EC2 / Azure VM / GCP CE are Linux. Managed databases, Kubernetes workers, Lambda, Fargate — all on Linux
Smartphones (Android) 70%+ of the world is Android = the Linux kernel. The most-deployed OS on Earth
Embedded Routers, TVs, refrigerators, cars, industrial gear, smart watches, IoT — uncountable, mostly invisible
Supercomputers 100% of the TOP500 since 2017. HPC is effectively only Linux
Desktop ~3-5%. Windows / macOS dominate, though the developer / scientist / engineer community is large
WSL2 (on Windows) Surging. An era where Microsoft officially ships a Linux kernel inside Windows

"Powering servers, mobile, and every embedded thing — but a minority on desktops" is Linux's modern position. Windows and macOS compete as OS products; Linux occupies a different layer as infrastructure OS.


Linux began as "the kernel Linus Torvalds wrote as a hobby in 1991" and grew into the name for the OS ecosystem powering more than half the world's computing. The keys to understanding it: the three-layer structure of "kernel = Linux proper / userland = GNU + distro / both physically separated by syscalls," and the fact that "everything is a file" / "fork/exec" / "pipe + exit codes" / "rwx → capabilities → namespaces" — design choices inherited from Unix — all remain alive today.

Cloud, containers, smartphones — all rest on the fact that "Linux's design philosophy continues to apply directly to modern infrastructure." Containers are just "using Linux features in a new combination." Kubernetes is "another tier on top of Linux containers." Serverless is "running a function inside a tiny VM (Firecracker etc.) on top of a Linux kernel." Wherever you dig, you land on Linux's syscalls, namespaces, and cgroups. Once you understand Linux properly, you can confront most of modern infrastructure with the same vocabulary.