From fca59bea770346cf1c1f9b0e00cb48a61b44a8f3 Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Sun, 25 Oct 2015 21:00:20 +0100 Subject: import of old now defunct presentation slides svn repo --- 2003/linux-kernel-knf2003/abstract | 26 ++ 2003/linux-kernel-knf2003/linux-kernel-knf2003.mgp | 300 +++++++++++++++++++++ 2 files changed, 326 insertions(+) create mode 100644 2003/linux-kernel-knf2003/abstract create mode 100644 2003/linux-kernel-knf2003/linux-kernel-knf2003.mgp (limited to '2003/linux-kernel-knf2003') diff --git a/2003/linux-kernel-knf2003/abstract b/2003/linux-kernel-knf2003/abstract new file mode 100644 index 0000000..ade9d71 --- /dev/null +++ b/2003/linux-kernel-knf2003/abstract @@ -0,0 +1,26 @@ +Wie waere es mit folgendem Titel: +"Einfuehrung in die Architektur des Linux-Kernels - Blicke jenseits des + Syscall-Horizonts der Userspace-Prozesse" + +Teil 1: Theoretische Grundlagen +- kernel/userspace: Aufgaben, Grenzen, Beruehrungspunkte +- Execution context: User, Syscall, Softirq, Hardirq, Kernelthread, Tasklet +- Der Scheduler +- Primitives: Spinlocks, rwlocks, Mutex, Waitqueues + +Teil 2: Exemplarischer Einblick in einzelne Subsysteme +- Netzwerkstack: Vom Empfang des Pakets auf der Netzwerkkarte bis zum + empfang im Userspace-prozess +- Filesystem: Vom read-syscall bis zum lesen der platte und zurueck + +- aufgaben + - virt. speicherverwaltung + - prozessverwaltung + - filesystem + - networking + - hardware abstraction + - interprozesskommunikation + +- schnittstellen fuer userspace-programme + - syscalls + - diff --git a/2003/linux-kernel-knf2003/linux-kernel-knf2003.mgp b/2003/linux-kernel-knf2003/linux-kernel-knf2003.mgp new file mode 100644 index 0000000..af367f4 --- /dev/null +++ b/2003/linux-kernel-knf2003/linux-kernel-knf2003.mgp @@ -0,0 +1,300 @@ +%include "default.mgp" +%default 1 bgrad +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +%nodefault +%back "blue" + +%center +%size 7 + + +Architecture of the Linux kernel +%size 5 +or: The world beyond the syscall barrier + + +%center +%size 4 +by + +Harald Welte + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +Prerequirements + +Due to the technical nature of this presentation, the audience should be familiar with the following subjects + + experience in programming on a Linux/*NIX system + C language preferred + general knowledge about computer hardware + interrupts / IO / DMA + general knowledge about modern CPU architeture + address space / MMU + 'protected mode' / supervisor mode / ... + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +Kernel / Userspace + + OS kernel provides + hardware abstraction (file I/O, network I/O, ...) + ressource allocation / limiting + address sepraration + privilege separation + IPC + + the traditional process model in *NIX operating systems + processes reside in seperate virtual address spaces + kernel only executes one process (init) at bootup + all other processes descend from from init + processes are scheduled and preempted by the kernel + processes invoke system functions via syscalls. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +System calls + +Definition + + a userspace process enters the kernel + mechanism is CPU architecture dependent + can be software interrupt (int 0x80) + can be special asm instruction (sysenter) + arguments are passed on the stack + common examples + open/close/read/write + exit/fork/execve/kill + socketcall, implements (socket/bind/connect/listen) + about 270 system calls in 2.6.x kernels + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +Invocation of system call + +chronological order of events in case of a system call + + userspace process calls library function + library function is executed within the process' address space + library will eventually issue a systemcall, pushing arguments on the stack + library will issue syscall (int 0x80 / sysenter / ...) + execution will switch to syscall context in kernel mode + kernel will look up systemcall table and dispatch to respective function + syscall function in the kernel will handle the syscall + all data between kernel/userspace needs to be copied between address spaces + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +Execution contexts + +apart from scheduling between different userspace processes, the kernel has different jobs like reacting to an external event + + hardirq + hardware interrupt line was triggered + softirq + the workhorse behind a hardirq + userspace + executing within userspace process + syscall + invoked by a system call from userspace + vsyscall + virtual system calls, executed in userspace context + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +hardirq context + + interrupt generated by hardware is received + handled + can be interrupted by other hardirq's + does only minimal job and returns + examples + packet has arrived on network board + character was received on serial port + dma read/write to disk drive has completed + timer interrupt went off + + in most cases, a hardirq is followed by softirq or tasklet. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +softirq context + + softirqs are run after hardirq + do the real work associated withe a hardirq + multithreaded (can run simultaneously on multiple cpus) + examples + network receive softirq + timer softirq + + prior to softirq's, linux had so-called 'bottom halves' + softirq introduced in 2.4.x (net rx/tx softirq) + bottom halves removed in 2.6.x + difference: only one BH can be run at a time + BH's have to be converted to tasklets in 2.6.x + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +tasklets + + tasklets are somewhat in between of softirq's and bottom halves + one particular tasklet cannot run on multiple CPUs simultaneously + different tasklets can run on different CPUs simultaneosly + + otherwise, same as softirq context + tasklets are impl. inside the 'tasklet softirq' + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +syscall / userspace context + + userspace context + in userspace, executing a process + + syscall context + inside kernel, when userspace process issues syscall() + + vsyscalls (virtual syscalls) + first introduced with the x86-64 (AMD Opteron) arch + fast read-only access to kernel data structures + can do stuff like gettimeofday() without context switch + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +synchronization + +Due to reentrancy and SMP, synchronization issues arise: + + simple case: UP system + softirq can be interrupted by hardirq + thus, shared structures (queues, ...) need to be protected + complex case: SMP system + softirq can run at the same time on multiple CPU's + as softirqs are multithreaded, synchronization between threads has to be implemented + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +synchronization primitives + +busy-waiting locks + + spinlocks + if lock was not taken, take it and continue + if lock was taken, bysy-loop until it is free + rwlocks + special case of spinlocks + useful when structure protected by lock is often read but rarely updated/written to + allows either + multiple readers simultaneously, or + only one writer [and no readers] + brlocks + super-fast read/write locks, with write-side penalty + avoid cache ping-pong in multi reader case + only in kernel 2.4.x + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +synchronization primitives (cont'd) + +sleeper locks + + semaphores + if semaphore can be acquired, continue + if semaphore cannot be acquired, put current process to sleep + once semaphore is available again, wakeup process + + WARNING: can only be used for sync userspace/syscall context + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +new locking primitives in 2.6.x + + seqlocks + introduced with vsyscalls in 2.5/2.6 + reader/writer consistent mechanism without starving writers + readers never block but may have to retry if write in progress + + read copy update + new lockless mechanism in kernel 2.5/2.6 + defers update of data structure until all CPU's have scheduled and thus nobody has any references left + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +example: incoming network packet + +hardirq context + NIC issues interrupt line after a packet was received + kernel enters (arch/i386/kernel/entry.S:common_interrupt) + core interrupt handler (arch/i386/kernel/irq.c:do_IRQ) + hardirq handler of network driver (drivers/net/tulip/interrupt.c:tulip_interrupt) + net/core/dev.c:netif_rx(): append skb to backlog queue + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +example: incoming network packet + +softirq context + net/core/dev.c:net_rx_action() + net/core/dev.c:process_backlog() + net/core/dev.c:netif_receive_skb() + net/core/dev.c:deliver_skb() + net/ipv4/ip_input.c:ip_rcv() + netfilter prerouting hook + net/ipv4/ip_input.c:ip_rcv_finish() + call routing code + net/ipv4/ip_input.c:ip_local_deliver() + netfilter localin hook + net/ipv4/ip_input.c:ip_local_deliver_finish() + call l4 protocol + net/ipv4/udp.c:udp_rcv() + lookup socket, if any + include/net/sock.h:sock_queue_rcv_skb() + enqueue into socket receiver queue + net/core/sock.c:sock_def_readable() + wake_up_interruptible() on socket waitqueue + return from recv() via socketcall + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Architecture of the Linux kernel +example: reading of a file + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Future of Linux packet filtering +Thanks + The slides and the an according paper of this presentation are available at http://www.gnumonks.org/ + + Thanks to + the BBS people, Z-Netz, FIDO, ... + for heavily increasing my computer usage in 1992 + KNF + for bringing me in touch with the internet as early as 1994 + for providing a playground for technical people + for telling me about the existance of Linux! + Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen + for implementing (one of?) the world's best TCP/IP stacks + Paul 'Rusty' Russell + for starting the netfilter/iptables project + for trusting me to maintain it today + Astaro AG + for sponsoring parts of my netfilter work + -- cgit v1.2.3