From fca59bea770346cf1c1f9b0e00cb48a61b44a8f3 Mon Sep 17 00:00:00 2001 From: Harald Welte Date: Sun, 25 Oct 2015 21:00:20 +0100 Subject: import of old now defunct presentation slides svn repo --- .../netfilter-internals-lt2002.tex | 537 +++++++++++++++++++++ 1 file changed, 537 insertions(+) create mode 100644 2002/netfilter-internals-lt2002/netfilter-internals-lt2002.tex (limited to '2002/netfilter-internals-lt2002/netfilter-internals-lt2002.tex') diff --git a/2002/netfilter-internals-lt2002/netfilter-internals-lt2002.tex b/2002/netfilter-internals-lt2002/netfilter-internals-lt2002.tex new file mode 100644 index 0000000..c3a28ea --- /dev/null +++ b/2002/netfilter-internals-lt2002/netfilter-internals-lt2002.tex @@ -0,0 +1,537 @@ +\documentclass{article} +\usepackage{german} +\usepackage{fancyheadings} +\usepackage{a4} + +\setlength{\oddsidemargin}{0in} +\setlength{\evensidemargin}{0in} +\setlength{\topmargin}{0.0in} +\setlength{\headheight}{0in} +\setlength{\headsep}{0in} +\setlength{\textwidth}{6.5in} +\setlength{\textheight}{9.5in} +\setlength{\parindent}{0in} +\setlength{\parskip}{0.05in} + + +\begin{document} +\title{Linux 2.4.x netfilter/iptables firewalling internals} + +\author{Harald Welte\\ + laforge@gnumonks.org\\ + \copyright{}2002 H. Welte} + +\date{25. April 2002} + +\maketitle + +\setcounter{section}{0} +\setcounter{subsection}{0} +\setcounter{subsubsection}{0} + +\section{Introduction} +The Linux 2.4.x kernel series has introduced a totally new kernel firewalling +subsystem. It is much more than a plain successor of ipfwadm or ipchains. + +The netfilter/iptables project has a very modular design and it's +sub-projects can be split in several parts: netfilter, iptables, connection +tracking, NAT and packet mangling. + +While most users will already have learned how to use the basic functions +of netfilter/iptables in order to convert their old ipchains firewalls to +iptables, there's more advanced but less used functionality in +netfilter/iptables. + +The presentation covers the design principles behind the netfilter/iptables +implementation. This knowledge enables us to understand how the individual +parts of netfilter/iptables fit together, and for which potential applications +this is useful. + +\section{Internal netfilter/iptables architecture} + +\subsection{Netfilter hooks in protocol stacks} + +One of the major motivations behind the redesign of the linux packet +filtering and NAT system during the 2.3.x kernel series was the widespread +firewall specific code parts within the core IPv4 stack. Ideally the core +IPv4 stack (as used by regular hosts and routers) shouldn't contain any +firewalling specific code, resulting in no unwanted interaction and less +code complexity. This desire lead to the invention of {\it netfilter}. + +\subsubsection{Architecture of netfilter} + +Netfilter is basically a system of callback functions within the network +stack. It provides a non-portable API towards in-kernel networking +extensions. + +What we call {\it netfilter hook} is a well-defined call-out point within a +layer three protocol stack, such as IPv4, IPv6 or DECnet. Any layer three +network stack can define an arbitrary number of hooks, usually placed at +strategic points within the packet flow. + +Any other kernel code can now subsequently register callback functions for +any of these hooks. As in most sytems will be more than one callback +function registered for a particular hook, a {\it priority} is specified upon +registration of the callback function. This priority defines the order in +which the individual callback functions at a particular hook are called. + +The return value of any registered callback functions can be: +\begin{itemize} +\item +{\bf NF\_ACCEPT}: continue traversal as usual +\item +{\bf NF\_DROP}: drop the packet; do not continue traversal +\item +{\bf NF\_STOLEN}: callback function has taken over the packet; do not continue +\item +{\bf NF\_QUEUE}: enqueue the packet to userspace +\item +{\bf NF\_REPEAT}: call this hook again +\end{itemize} + +\subsubsection{Netfilter hooks within IPv4} + +The IPv4 stack provides five netfilter hooks, which are placed at the +following peculiar places within the code: + +\begin{verbatim} + --->[1]--->[ROUTE]--->[3]--->[4]---> + | ^ + | | + | [ROUTE] + v | + [2] [5] + | ^ + | | + v | + + local processes +\end{verbatim} + +Packets received on any network interface arrive at the left side of the +diagram. After the verification of the IP header checksum, the +NF\_IP\_PRE\_ROUTING [1] hook is traversed. + +If they ``survive'' (i.e. NF\_ACCEPT is returned), the packet enters the +routing code. Where we continue from here depends on the destintion of the +packet. + +Packets with a local destination (i.e. packets where the destination address is +one of the own IP addresses of the host) traverse the NF\_IP\_LOCAL\_IN [2] +hook. If all callback function return NF\_ACCEPT, the packet is finally passed +to the socket code, which eventually passes the packet to a local process. + +Packets with a remote destination (i.e. packets which are forwarded by the +local machine) traverse the NF\_IP\_FORWARD [3] hook. If they ``survive'', +they finally pass the NF\_IP\_POST\_ROUTING [4] hook and are sent off the +outgoing network interface. + +Locally generated packets first traverse the NF\_IP\_LOCAL\_OUT [5] hook, then +enter the routing code, and finally go through the NF\_IP\_POST\_ROUTING [4] +hook before being sent off the outgoing network interface. + +\subsubsection{Netfilter hooks within IPv6} + +As the IPv4 and IPv6 protocols are very similar, the netfilter hooks within the +IPv6 stack are placed at exactly the same locations as in the IPv4 stack. The +only change are the hook names: NF\_IP6\_PRE\_ROUTING, NF\_IP6\_LOCAL\_IN, +NF\_IP6\_FORWARD, NF\_IP6\_POST\_ROUTING, NF\_IP6\_LOCAL\_OUT. + +\subsubsection{Netfilter hooks within DECnet} + +There are seven decnet hooks. The first five hooks (NF\_DN\_PRE\_ROUTING, +NF\_DN\_LOCAL\_IN, NF\_DN\_FORWARD, NF\_DN\_LOCAL\_OUT, NF\_DN\_POST\_ROUTING) +are prretty much the same as in IPv4. The last two hooks (NF\_DN\_HELLO, +NF\_DN\_ROUTE) are used in conjunction with DECnet Hello and Routing packets. + +\subsubsection{Netfilter hooks within ARP} + +Recent kernels\footnote{IIRC, starting with 2.4.19-pre3} have added support for netfilter hooks within the ARP code. +There are two hooks: NF\_ARP\_IN and NF\_ARP\_OUT, for incoming and outgoing +ARP packets respectively. + +\subsubsection{Netfilter hooks within IPX} + +There have been experimental patches to add netfilter hooks to the IPX code, +but they never got integrated into the kernel source. + +\subsection{Packet selection using IP Tables} + +The IP tables core (ip\_tables.o) provides a generic layer for evaluation +of rulesets. + +An IP table consists out of an arbitrary number of {\it chains}, which in turn +consist out of a linear list of {\it rules}, which again consist out of any +number of {\it matches} and one {\it target}. + +{\it Chains} can further be devided into two classes: Either {\it builtin +chains} or {\it user-defined chains}. Builtin chains are always present, they +are created upon table registration. They are also the entry points for table +iteration. User defined chains are created at runtime upon user interaction. + +{\it Matches} specify the matching criteria, there can be zero or more matches + +{\it Targets} specify the action which is to be executed in case {\bf all} +matches match. There can only be a single target per rule. + +Matches and targets can either be {\it builtin} or {\it linux kernel modules}. + +There are two special targets: +\begin{itemize} +\item +By using a chain name as target, it is possible to jump to the respective chain +in case the matches match. +\item +By using the RETURN target, it is possible to return to the previous (calling) +chain +\end{itemize} + +The IP tables core handles the following functions +\begin{itemize} +\item +Registering and unregistering tables +\item +Registering and unregistering matches and targets (can be implemented as linux kernel modules) +\item +Kernel / userspace interface for manipulation of IP tables +\item +Traversal of IP tables +\end{itemize} + +\subsubsection{Packet filtering unsing the ``filter'' table} + +Traditional packet filtering (i.e. the successor to ipfwadm/ipchains) takes +place in the ``filter'' table. Packet filtering works like a sieve: A packet +is (in the end) either dropped or accepted - but never modified. + +The ``filter'' table is implemented in the {\it iptable\_filter.o} module +and contains three builtin chains: + +\begin{itemize} +\item +{\bf INPUT} attaches to NF\_IP\_LOCAL\_IN +\item +{\bf FORWARD} attaches to NF\_IP\_FORWARD +\item +{\bf OUTPUT} attaches to NF\_IP\_LOCAL\_OUT +\end{itemize} + +The placement of the chains / hooks is done in such way, that evey concievable +packet always traverses only one of the built-in chains. Packets destined for +the local host traverse only INPUT, packets forwarded only FORWARD and +locally-originated packets only OUTPUT. + +\subsubsection{Packet mangling using the ``mangle'' table} + +As stated above, operations which would modify a packet do not belong in the +``filter'' table. The ``mangle'' table is available for all kinds of packet +manipulation - but not manipulation of addresses (which is NAT). + +The mangle table attaches to all five netfilter hooks and provides the +respectiva builtin chains (PREROUTING, INPUT, FORWARD, OUTPUT, POSTROUTING) +\footnote{This has changed through recent 2.4.x kernel series, old kernels may +only support three (PREROUTING, POSTROUTING, OUTPUT) chains.}. + +\subsection{Connection Tracking Subsystem} + +Traditional packet filters can only match on matching criteria within the +currently processed packet, like source/destination IP address, port numbers, +TCP flags, etc. As most applications have a notion of connections or at least +a request/response style protocol, there is a lot of information which can not +be derived from looking at a single packet. + +Thus, modern (stateful) packet filters attempt to track connections (flows) +and their respective protocol states for all traffic through the packet +filter. + +Connection tracking within linux is implemented as a netfilter module, called +ip\_conntrack.o. + +Before describing the connection tracking subsystem, we need to describe a couple of definitions and primitives used throughout the conntrack code. + +A connection is represented within the conntrack subsystem using {\it struct +ip\_conntrack}, also called {\it connection tracking entry}. + +Connection tracking is utilizing {\it conntrack tuples}, which are tuples +consisting out of (srcip, srcport, dstip, dstport, l4prot). A connection is +uniquely identified by two tuples: The tuple in the original direction +(IP\_CT\_DIR\_ORIGINAL) and the tuple for the reply direction +(IP\_CT\_DIR\_REPLY). + +Connection tracking itself does not drop packets\footnote{well, in some rare +cases in combination with NAT it needs to drop. But don't tell anyone, this is +secret.} or impose any policy. It just associates every packet with a +connection tracking entry, which in turn has a particular state. All other +kernel code can use this state information\footnote{state information is +internally represented via the {\it struct sk\_buff.nfct} structure member of a +packet.}. + +\subsubsection{Integration of conntrack with netfilter} + +If the ip\_conntrack.o module is registered with netfilter, it attaches to the +NF\_IP\_PRE\_ROUTING, NF\_IP\_POST\_ROUTING, NF\_IP\_LOCAL\_IN and +NF\_IP\_LOCAL\_OUT hooks. + +Because forwarded packets are the most common case on firewalls, I will only +describe how connection tracking works for forwarded packets. The two relevant +hooks for forwarded packets are NF\_IP\_PRE\_ROUTING and NF\_IP\_POST\_ROUTING. + +Every time a packet arrives at the NF\_IP\_PRE\_ROUTING hook, connection +tracking creates a conntrack tuple from the packet. It then compares this +tuple to the original and reply tuples of all already-seen connections +\footnote{Of course this is not implemented as a linear search over all existing connections.} to find out if this just-arrived packet belongs to any existing +connection. If there is no match, a new conntrack table entry (struct +ip\_conntrack) is created. + +Let's assume the case where we have already existing connections but are +starting from scratch. + +The first packet comes in, we derive the tuple from the packet headers, look up +the conntrack hash table, don't find any matching entry. As a result, we +create a new struct ip\_conntrack. This struct ip\_conntrack is filled with +all necessarry data, like the original and reply tuple of the connection. +How do we know the reply tuple? By inverting the source and destination +parts of the original tuple.\footnote{So why do we need two tuples, if they can +be derived from each other? Wait until we discuss NAT.} +Please note that this new struct ip\_conntrack is {\bf not} yet placed +into the conntrack hash table. + +The packet is now passed on to other callback functions which have registered +with a lower priority at NF\_IP\_PRE\_ROUTING. It then continues traversal of +the network stack as usual, including all respective netfilter hooks. + +If the packet survives (i.e. is not dropped by the routing code, network stack, +firewall ruleset, ...), it re-appears at NF\_IP\_POST\_ROUTING. In this case, +we can now safely assume that this packet will be sent off on the outgoing +interface, and thus put the connection tracking entry which we created at +NF\_IP\_PRE\_ROUTING into the conntrack hash table. This process is called +{\it confirming the conntrack}. + +The connection tracking code itself is not monolithic, but consists out of a +couple of seperate modules\footnote{They don't actually have to be seperate +kernel modules; e.g. TCP, UDP and ICMP tracking modules are all part of +the linux kernel module ip\_conntrack.o}. Besides the conntrack core, there +are two important kind of modules: Protocol helpers and application helpers. + +Protocol helpers implement the layer-4-protocol specific parts. They currently +exist for TCP, UDP and ICMP (an experimental helper for GRE exists). + +\subsubsection{TCP connection tracking} + +As TCP is a connection oriented protocol, it is not very difficult to imagine +how conntection tracking for this protocol could work. There are well-defined +state transitions possible, and conntrack can decide which state transitions +are valid within the TCP specification. In reality it's not all that easy, +since we cannot assume that all packets that pass the packet filter actually +arrive at the receiving end, ... + +It is noteworthy that the standard connection tracking code does {\bf not} +do TCP sequence number and window tracking. A well-maintained patch to add +this feature exists almost as long as connection tracking itself. It will +be integrated with the 2.5.x kernel. The problem with window tracking is +it's bad interaction with connection pickup. The TCP conntrack code is able to +pick up already existing connections, e.g. in case your firewall was rebooted. +However, connection pickup is conflicting with TCP window tracking: The TCP +window scaling option is only transferred at connection setup time, and we +don't know about it in case of pickup... + +\subsubsection{ICMP tracking} + +ICMP is not really a connection oriented protocol. So how is it possible to +do connection tracking for ICMP? + +The ICMP protocol can be split in two groups of messages + +\begin{itemize} +\item +ICMP error messages, which sort-of belong to a different connection +ICMP error messages are associated {\it RELATED} to a different connection. +(ICMP\_DEST\_UNREACH, ICMP\_SOURCE\_QUENCH, ICMP\_TIME\_EXCEEDED, +ICMP\_PARAMETERPROB, ICMP\_REDIRECT). +\item +ICMP queries, which have a request->reply character. So what the conntrack +code does, is let the request have a state of {\it NEW}, and the reply +{\it ESTABLISHED}. The reply closes the connection immediately. +(ICMP\_ECHO, ICMP\_TIMESTAMP, ICMP\_INFO\_REQUEST, ICMP\_ADDRESS) +\end{itemize} + +\subsubsection{UDP connection tracking} + +UDP is designed as a connectionless datagram protocol. But most common +protocols using UDP as layer 4 protocol have bi-directional UDP communication. +Imagine a DNS query, where the client sends an UDP frame to port 53 of the +nameserver, and the nameserver sends back a DNS reply packet from it's UDP +port 53 to the client. + +Netfilter trats this as a connection. The first packet (the DNS request) is +assigned a state of {\it NEW}, because the packet is expected to create a new +'connection'. The dns servers' reply packet is marked as {\it ESTABLISHED}. + +\subsubsection{conntrack application helpers} + +More complex application protocols involving multiple connections need special +support by a so-called ``conntrack application helper module''. Modules in +the stock kernel come for FTP and IRC(DCC). Netfilter CVS currently contains +patches for PPTP, H.323, Eggdrop botnet, tftp ald talk. We're still lacking +a lot of protocols (e.g. SIP, SMB/CIFS) - but they are unlikely to appear +until somebody really needs them and either develops them on his own or +funds development. + +\subsubsection{Integration of connection tracking with iptables} + +As stated earlier, conntrack doesn't impose any policy on packets. It just +determines the relation of a packet to already existing connections. To base +packet filtering decision on this sate information, the iptables {\it state} +match can be used. Every packet is within one of the following categories: + +\begin{itemize} +\item +{\bf NEW}: packet would create a new connection, if it survives +\item +{\bf ESTABLISHED}: packet is part of an already established connection +(either direction) +\item +{\bf RELATED}: packet is in some way related to an already established connection, e.g. ICMP errors or FTP data sessions +\item +{\bf INVALID}: conntrack is unable to derive conntrack information from this packet. Please note that all multicast or broadcast packets fall in this category. +\end{itemize} + +\subsection{NAT Subsystem} + +The NAT (Network Address Translation) subsystem is probably the worst +documented subsystem within the whole framework. This has two reasons: NAT is +nasty and complicated. The Linux 2.4.x NAT implementation is easy to use, so +nobody needs to know the nasty details. + +Nonetheless, as I was traditionally concentrating most on the conntrack and NAT +systems, I will give a short overview. + +NAT uses almost all of the previously described subsystems: +\begin{itemize} +\item +IP tables to specify which packets to NAT in which particular way. NAT +registers a ``nat'' table with PREROUTING, POSTROUTING and OUTPUT chains. +\item +Connection tracking to associate NAT state with the connection. +\item +Netfilter to do the actuall packet manipulation transparent to the rest of the +kernel. NAT registers with NF\_IP\_PRE\_ROUTING, NF\_IP\_POST\_ROUTING, +NF\_IP\_LOCAL\_IN and NF\_IP\_LOCAL\_OUT. +\end{itemize} + +The NAT implementation supports all kinds of different nat; Source NAT, +Destination NAT, NAT to address/port ranges, 1:1 NAT, ... + +This fundamental design principle is still frequently misunderstood:\\ +The information about which NAT mappings apply to a certain connection +is only gathered once - with the first packet of every connection. + +So let's start to look at the life of a poor to-be-nat'ed packet. +For ease of understanding, I have chosen to describe the most frequently +used NAT scenario: Source NAT of a forwarded packet. Let's assume the +packet has an original source address of 1.1.1.1, an original destination +address of 2.2.2.2, and is going to be SNAT'ed to 9.9.9.9. Let's further +ignore the fact that there are port numbers. + +Once upon a time, our poor packet arrives at NF\_IP\_PRE\_ROUTING, where +conntrack has registered with highest priority. This means that a conntrack +entry with the following two tuples is created: +\begin{verbatim} +IP_CT_DIR_ORIGINAL: 1.1.1.1 -> 2.2.2.2 +IP_CT_DIR_REPLY: 2.2.2.2 -> 1.1.1.1 +\end{verbatim} +After conntrack, the packet traverses the PREROUTING chain of the ``nat'' +IP table. Since only destination NAT happens at PREROUTING, no action +occurs. After it's lengthy way through the rest of the network stack, +the packet arrives at the NF\_IP\_POST\_ROUTING hook, where it traverses +the POSTROUTING chain of the ``nat'' table. Here it hits a SNAT rule, +causing the following actions: +\begin{itemize} +\item +Fill in a {\it struct ip\_nat\_manip}, indicating the new source address +and the type of NAT (source NAT at POSTROUTING). This struct is part of the +conntrack entry. +\item +Automatically derive the inverse NAT transormation for the reply packets: +Destination NAT at PREROUTING. Fill in another {\it struct ip\_nat\_manip}. +\item +Alter the REPLY tuple of the conntrack entry to +\begin{verbatim} +IP_CT_DIR_REPLY: 2.2.2.2 -> 9.9.9.9 +\end{verbatim} +\item +Apply the SNAT transformation to the packet +\end{itemize} + +Every other packt within this connection, independent of its direction, +will only execute the last step. Since all NAT information is connected +with the conntrack entry, there is no need to do anything but to apply +the same transormations to all packets witin the same connection. + +\subsection{IPv6 Firewalling with ip6tables} + +Yes, Linux 2.4.x comes with a usable, though incomplete system to secure +your IPv6 network. + +The parts ported to IPv6 are +\begin{itemize} +\item +IP tables (called IP6 tables) +\item +The ``filter'' table +\item +The ``mangle'' table +\item +The userspace library (libip6tc) +\item +The command line tool (ip6tables) +\end{itemize} + +Due to the lack of conntrack and NAT\footnote{for god's sake we don't have NAT +with IPv6}, only traditional, stateless packet filtering is possible. Apart +from the obvious matches/targets, ip6tables can match on +\begin{itemize} +\item +{\it EUI64 checker}; verifies if the MAC address of the sender is the same as in the EUI64 64 least significant bits of the source IPv6 address +\item +{\it frag6 match}, matches on IPv6 fragmentation header +\item +{\it route6 match}, matches on IPv6 routing header +\item +{\it ahesp6 match}, matches on SPIDs within AH or ESP over IPv6 packets +\end{itemize} + +However, the ip6tables code doesn't seem to be used very widely (yet?). +So please expect some potential remaining issues, since it is not tested +as heavily as iptables. + +\subsection{Recent Development} + +Please refer to the spoken word at the presentation. Development at the +time this paper was written can be quite different from development at the +time the presentation is held. + +\section{Thanks} + +I'd like to thank +\begin{itemize} +\item +{\it Linus Torvalds} for starting this interesting UNIX-like kernel +\item +{\it Alan Cox, David Miller, Alexey Kuznetesov, Andi Kleen} for building +(one of?) the world's best TCP/IP stacks. +\item +{\it Paul ``Rusty'' Russell} for starting the netfilter/iptables project +\item +{\it The Netfilter Core Team} for continuing the netfilter/iptables effort +\item +{\it Astaro AG} for partially funding my current netfilter/iptables work +\item +{\it Conectiva Inc.} for partially funding parts of my past netfilter/iptables +work and for inviting me to live in Brazil +\item +{\it samba.org and Kommunikationsnetz Franken e.V.} for hosting the netfilter +homepage, CVS, mailing lists, ... +\end{itemize} + +\end{document} -- cgit v1.2.3