Linux System Internals
🖥️ Linux System Internals — Understanding the Engine Under the Hood
🎯 What You Will Learn
- What actually happens inside the OS when you type and run a command in a terminal
- What a process is, how it’s represented, and how it differs from a program
- What a system call (syscall) is and why it’s the bridge between user space and kernel space
- What file descriptors are and why almost everything in Linux is treated as a file
- How
/procand/devexpose the kernel’s internal state as navigable file trees
📝 Topic Overview
🔹 What Happens When You Run a Command?
When you type ls -la and press Enter, a multi-step process unfolds across several layers of the operating system.
Step-by-step breakdown:
- The shell (e.g.,
bash,zsh) reads your input and parses it into a command name and arguments. - The shell calls
fork()— a syscall that creates a child process as a near-identical copy of the shell itself. - In the child process, the shell calls
execve()— another syscall — to replace the child’s memory image with thelsbinary found in$PATH(typically/usr/bin/ls). - The kernel loads the ELF binary into memory, sets up the stack, heap, and text segments, and begins execution.
lsruns, makes syscalls likegetdents64()to read directory entries, and writes output viawrite()to stdout (file descriptor 1).- The parent shell calls
wait(), blocking until the child exits. On exit, the child’s resources are cleaned up and its exit code is returned to the shell.
Key insight: You never “run” a program directly — the shell always forks itself first, then replaces the forked child with the target binary. This is the fork-exec pattern, the cornerstone of Unix process creation.
🔹 What Is a Process?
A process is an instance of a running program — a program in execution, complete with its own isolated resources.
Every process has:
- A unique PID (Process ID)
- Its own virtual address space (stack, heap, text/code, data segments)
- File descriptors (open files, sockets, pipes)
- A process state: running, sleeping, zombie, stopped, etc.
- A parent process (every process except PID 1 has one)
Program vs Process: A program is a static file on disk (e.g., /usr/bin/python3). A process is that program actively loaded into memory and running. You can have 10 Python processes all spawned from the same binary.
1
2
3
4
5
6
7
8
9
# View all running processes with full details
ps aux
# View process tree (shows parent-child relationships)
pstree -p
# Monitor processes in real time
top
htop # more user-friendly
🔹 What Is a Syscall?
A system call (syscall) is a controlled entry point into the kernel — the mechanism user-space programs use to request privileged operations from the OS.
Modern CPUs enforce a hard boundary between user space (where applications run) and kernel space (where the OS kernel runs). User code cannot directly touch hardware, allocate memory from the kernel, or read another process’s memory. To do any of these, it must ask the kernel via a syscall.
Common syscalls you use every day:
| Syscall | What It Does |
|---|---|
fork() | Create a child process |
execve() | Replace current process with a new program |
open() | Open a file, return a file descriptor |
read() | Read bytes from a file descriptor |
write() | Write bytes to a file descriptor |
close() | Close a file descriptor |
mmap() | Map files or memory into the process address space |
exit() | Terminate the current process |
wait() | Wait for a child process to finish |
socket() | Create a network socket |
How it works mechanically: The program places the syscall number in a CPU register (e.g., rax on x86-64), puts arguments in other registers, then executes a special CPU instruction (syscall on x86-64). The CPU switches to kernel mode, the kernel dispatches to the appropriate handler, executes it, and returns the result.
1
2
3
4
5
# Trace all syscalls made by a running command
strace ls -la
# Count syscall frequency
strace -c ls -la
Analogy: Syscalls are like a restaurant menu — user programs can only order from the menu (the defined syscall interface). They cannot walk into the kitchen (kernel space) directly.
🔹 What Is a File Descriptor?
A file descriptor (FD) is a non-negative integer that represents an open I/O resource within a process. In Linux, almost everything — regular files, directories, pipes, sockets, terminals, devices — is accessed through file descriptors.
Standard file descriptors (always pre-opened):
| FD | Name | Default Target |
|---|---|---|
0 | stdin | Keyboard (terminal input) |
1 | stdout | Terminal output |
2 | stderr | Terminal error output |
When you call open(), the kernel returns the lowest available FD integer (starting from 3). Reading and writing then use that integer via read(fd, ...) and write(fd, ...).
1
2
3
4
5
# See all open file descriptors for a process (replace PID)
ls -la /proc/<PID>/fd
# Example: see FDs for your shell
ls -la /proc/$$/fd
“Everything is a file” is one of Unix’s foundational philosophies. A network socket, a hardware device, even inter-process communication via pipes — all use the same
open/read/write/closeinterface. This uniformity is what makes shell pipelines (cmd1 | cmd2) and I/O redirection (cmd > file) so powerful.
🔹 What Is /proc?
/proc is a virtual filesystem (not on disk) — a window into the live state of the kernel and all running processes, presented as a directory tree.
1
2
3
4
5
6
7
8
9
ls /proc
# Output: numbered dirs (one per PID), plus: cpuinfo, meminfo, uptime, version, etc.
cat /proc/cpuinfo # CPU model, cores, flags
cat /proc/meminfo # RAM usage and breakdown
cat /proc/uptime # System uptime in seconds
cat /proc/version # Kernel version string
cat /proc/$$/status # Status of the current shell process
cat /proc/$$/maps # Virtual memory map of the current process
Each numbered directory (e.g., /proc/1234) represents process 1234, containing: cmdline, environ, fd/, maps, status, stat, and more.
🔹 What Is /dev?
/dev is a directory containing device files — special files that represent hardware and virtual devices. Interacting with a device is as simple as reading or writing to its file.
| Device File | Description |
|---|---|
/dev/sda, /dev/nvme0n1 | Block storage (hard drive, SSD) |
/dev/null | Discards all writes; reads return EOF |
/dev/zero | Reads return an endless stream of zero bytes |
/dev/random, /dev/urandom | Cryptographically secure random bytes |
/dev/tty | The process’s controlling terminal |
/dev/stdin, /dev/stdout | Symlinks to FD 0 and 1 |
1
2
3
4
5
# Silence output by redirecting to the void
rm important_file 2>/dev/null
# Fill a file with zeros (e.g., create a 1MB blank file)
dd if=/dev/zero of=blank.bin bs=1M count=1
🔹 How Does the Kernel Schedule Tasks?
The Linux scheduler determines which process runs on which CPU core at any given moment. Linux uses the Completely Fair Scheduler (CFS) as its default scheduler (introduced in kernel 2.6.23).
Core concepts:
- The scheduler maintains a red-black tree of runnable processes, sorted by virtual runtime (
vruntime) — how long each process has run weighted by its priority. - The process with the smallest vruntime always runs next — ensuring every process gets a fair share of CPU time.
- Nice values (
-20to+19) adjust priority. Lower nice = higher priority. Default is0. - Processes can be preempted — the kernel can forcibly interrupt a running process to give CPU time to another.
1
2
3
4
5
6
7
8
# Run a command with lower priority (nicer to other processes)
nice -n 10 ./my_script.sh
# Change priority of a running process
renice +5 -p <PID>
# View scheduler stats per process
cat /proc/<PID>/sched
đź’ˇ References & Learning Resources
- “The Linux Programming Interface” by Michael Kerrisk — the definitive reference (Advanced/Deep dive)
- “Linux Kernel Development” by Robert Love — internals explained clearly (Intermediate)
man 2 syscalls— complete syscall list from the Linux man pages (Beginner-friendly)- “Operating Systems: Three Easy Pieces” (ostep.org) — free online OS textbook (Beginner-friendly)
- Linux kernel source:
https://elixir.bootlin.com/linux/latest/source(Advanced/Deep dive)
📊 Quick Recap
- Every command runs via the fork-exec pattern: the shell forks a child, then the child execs the target binary.
- A process is a running instance of a program with its own PID, memory space, and file descriptors.
- Syscalls are the only safe, controlled gateway from user-space code into the kernel — triggered by a CPU instruction that switches privilege levels.
- File descriptors are integers that abstract all I/O resources; FDs 0, 1, 2 are stdin, stdout, and stderr.
/procis a live, in-memory pseudo-filesystem exposing the kernel’s view of every process and system state./devcontains device files — reading/writing them interacts with hardware or virtual devices like/dev/nulland/dev/urandom.- The CFS scheduler uses virtual runtime on a red-black tree to ensure fair CPU time distribution;
nicevalues tune priority.
🏷️ Tags
1
#Linux #SystemInternals #Kernel #Process #Syscall #FileDescriptor #proc #dev #CFS #Scheduler #UnixPhilosophy #OperatingSystems #CLI #Intermediate #ComputerScience