summaryrefslogtreecommitdiff
path: root/iproute2
diff options
context:
space:
mode:
authorHarald Welte <laforge@gnumonks.org>2015-10-25 21:00:20 +0100
committerHarald Welte <laforge@gnumonks.org>2015-10-25 21:00:20 +0100
commitfca59bea770346cf1c1f9b0e00cb48a61b44a8f3 (patch)
treea2011270df48d3501892ac1a56015c8be57e8a7d /iproute2
import of old now defunct presentation slides svn repo
Diffstat (limited to 'iproute2')
-rw-r--r--iproute2/abstract7
-rw-r--r--iproute2/iproute2+tc-script.sgml463
-rw-r--r--iproute2/iproute2+tc-slides.mgp454
3 files changed, 924 insertions, 0 deletions
diff --git a/iproute2/abstract b/iproute2/abstract
new file mode 100644
index 0000000..f59a4f8
--- /dev/null
+++ b/iproute2/abstract
@@ -0,0 +1,7 @@
+The whole ipv4 stack has undergone some radical changes in recent linux
+kernel versions. Featuring iproute2, a routing core able to build routing
+decisions based not only on the destination IP address, the kernel provides
+a powerful basis for all kinds of advanced routing concepts. The second
+part of the talk covers TC (traffic control) which is linux' implementation
+of packet scheduling, capable of class based queuing and DiffServ.
+
diff --git a/iproute2/iproute2+tc-script.sgml b/iproute2/iproute2+tc-script.sgml
new file mode 100644
index 0000000..7b76fd1
--- /dev/null
+++ b/iproute2/iproute2+tc-script.sgml
@@ -0,0 +1,463 @@
+<!doctype book PUBLIC "-//OASIS//DTD DocBook V3.1//EN"[]>
+
+<book id="iproute2+tc-presentation">
+<bookinfo>
+<title>Advanced Linux Networking with iproute2 and tc</title>
+<authorgroup>
+<author>
+<firstname>Harald</firstname>
+<surname>Welte</surname>
+<affiliation>
+<address>
+<email>laforge@gnumonks.org</email>
+</address>
+</affiliation>
+</author>
+</authorgroup>
+
+<copyright>
+<year>2000</year>
+<holder>Harald Welte</holder>
+</copyright>
+
+<legalnotice>
+<para>
+INSERT GNU FDL HERE
+</para>
+</legalnotice>
+</bookinfo>
+
+<toc></toc>
+
+<chapter id="intro">
+<title>Introduction</title>
+<para>
+As the Linux kernel is developed further and further, the network stack is one of the areas with the biggest changes and improvements at all time. Starting with Kernel 2.2, Alexey Kuznetsov introduced a whole new IPv4 routing subsystem (iproute2) as well as a traffic shaping subsystem (tc). Starting with Kernel 2.4.x, we now also have a real multithreading network stack, and of course the more-than-flexible netfilter and iptables subsystems.
+</para>
+<para>
+While most people know about the presence of these subsystems, the knowledge about their usage and the vast amount of possible applications is very little. One major problem is, that almost nobody who didn't read the source code or spent weeks and month playing around with those features is able to understand it. Mostly the lack of documentation is to blame for this situation.
+</para>
+<para>
+This documents intention is mainly to accompany my talk/presentation on CCC Congress 2000, but I think it still is worth reading independently.
+</para>
+</chapter>
+
+<chapter id="overview">
+<title>Overview</title>
+<sect1 id="over-what">
+<title>What can I do using all this stuff?</title>
+<para>
+First I'll give a short overview about the possible applications of iproute2 and tc.
+</para>
+<variablelist>
+<varlistentry>
+<term>Have routing decisions based on other things than destination address</term>
+<listitem>
+<para>
+Traditional IP routing base the routing decision only on the destination IP address. While this is sufficient for most cases, modern networking scenarios may call for more sophisticated routing. Using iproute2, you may base the routing decision for each packet seperately, depending on various properties like owner of the sending socket, port numbers, type of service, ...
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Help you sharing bandwidth according to your needs</term>
+<listitem>
+<para>
+In real-world scenarios you always have a limited bandwidth. As soon as this bandwidth is used by more and more users and/or services, you might want to control how much of your uplink's bandwidth is availabe for which service.
+</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Prevent certain DoS attacks</term>
+<listitem>
+<para>
+There are certain kinds of DoS attacks which can be prevented through clever iproute2/tc usage. I'm especially referring to various flooding attacks.
+</para>
+</listitem>
+</varlistentry>
+
+</variablelist>
+</sect1>
+
+</chapter>
+
+<chapter id="iproute2">
+<title>Advanced Routing with iproute2</title>
+<sect1 id="iproute2-traditional">
+<title>Traditional IP Routing</title>
+<para>
+Before we'll dive into the iproute2 specific stuff, I'll give a short overview about how traditional IP routing works.
+</para>
+<para>
+Every host inside the IP network which is connected to more than one physical network segment is called a <indexterm id="router"><primary>router</primary><seconrary>gateway</secondary></indexterm>router or gateway. Each of it's interfaces has a particular ip address and netmask configured. Now the router knows about which hosts to reach in which physical segment. To keep track about this information, it has a ## routing table. In addition to the information about which networks / hosts can be reached directly, it is possible to manually insert additional entries into this routing table. In most cases we have at least one default route entry, which specifies where to send all packets, which have a destination outside of the locally attached network segments. More advanced routers are using dynamic routing protocols like <glossterm linkend="gloss-rip">RIP</glossterm>, <glossterm linkend="gloss-ospf">OSPF</glossterm>, ... to automatically adopt the routing table entries to network failures.
+</para>
+<para>
+Independent from how entries get into this ## routing table, sometimes also referred as <indexterm id="rib"><primary>Routing Information Base</primary><secondary>RIB</secondary></indexterm>RIB (routing information base) - the decision about where to send the packet on pyhsical layer is always based on the destination IP address.
+</para>
+<para>
+At the first glance this seems quite obvious and correct - you want to get your packet to the destination, so why care about where the packet came from, or any other information. But it isn't that easy anymore. Nowadays people want to have stuff like pre-allocated or guaranteed bandwidth, or want to rout packets depending on which service they belong to (i.e. route web traffic over a different line than mail traffic).
+</para>
+<para>
+This is where iproute2 comes in: It is Linux's answer to this demand.
+</para>
+</sect1>
+
+<sect1 id="iproute2-overview">
+<title>iproute2 overview
+</title>
+<para>
+iproute2 is the 'new' IP network stack, as introduced in Linux 2.2.x by our Linux networking god ## Alexey Kuznetsov. Apart from a lot of other architectural changes, which mostly aim at increased performance, it also faciliates a routing engine capable of building routing decisions on almost anything you want (of course including the default case: Routing decision based on destination IP address).
+</para>
+<para>
+To make things more complicated, iproute2 has two meanings:
+<itemizedlist>
+<listitem><para>The IP network stack</para></listitem>
+<listitem><para>The command to configure it</para></listitem>
+</itemizedlist>
+</para>
+</sect1>
+
+<sect1 id="iproute2-rules">
+<title>Policy Routing</title>
+<para>
+Sow what architecture did Alexey and the other guys invent to provide the advanced routing features while keeping a backwards-compatible default behaviour?
+</para>
+<para>
+Instead of having one routing table for all packets, iproute2 enables us of having multiple routing tables. So how do we decide which routing table to use for a particular packet? We decide by information present in the ## routing policy database
+</para>
+<para>
+If we want to decide upon a packet's new destination (in other words: make a routing decision for this packet), we first look into the ## routing policy database, which tells us which routing table to use.
+</para>
+<para>
+The routing policy database consists out of a list of rules. Each rule consists out of three parts:
+</para>
+<variablelist>
+<varlistentry>
+<term>priority</term>
+<listitem>
+<para>
+A priority, which tells us about in which order we should traverse the ## routing policy database.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>match</term>
+<listitem>
+<para>
+A match, telling us which packets match this rule. We have the following matches available:
+<itemizedlist>
+<listitem><para>packet source address</para></listitem>
+<listitem><para>packet destination address</para></listitem>
+<listitem><para>TOS value</para></listitem>
+<listitem><para>Incoming interface</para></listitem>
+<listitem><para>fwmark (firewallmark, set by ipchains / iptables)</para></listitem>
+</itemizedlist>
+</para>
+<para>
+The most flexible (and therefore most commonly used) match is the <indexterm id="fwmark"><primary>fwmark</primary></indexterm>fwmark match. Firewalling (to be more precise: Packet filtering based on <glossterm linkend="gloss-ipchains">ipchains</ipchains> or <glossterm linkend="gloss-iptables">iptables</glossterm>) already has very sophisticated means for matching packets. You can easily select packets based on their TCP flags, TCP/UDP port numbers, and even on the state of the connection they belong to. Interactin between firewalling rules and policy routing works like this:
+</para>
+<para>
+iptables/ipchains rules assign the packet a fwmark according to the packet filtering rules (you can specify arbitrary 32bit numbers as fwmark for each rule). When the packet is to be routed and policy routing has to make a decision, it looks for a policy routing rule with the same fwmark the packet has, and performs the apropriate action connected with this rule (usually look up a specific routing table).
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>action</term>
+<listitem>
+<para>
+which action to perform, if a packet is matching this rule. Usually the action would point us to one of the routing tables, but we can also decide to drop the pacet or to return an ICMP error message to the sender.
+</para>
+</listitem>
+</varlistentry>
+</variablelist>
+<para>
+In order to use this routing policy database, you have to enable the compile-time kernel option "IP: policy routing" (CONFIG_IP_MULTIPLE_TABLES).
+</para>
+</sect1>
+<sect1 id="iproute2-command">
+<title>The iproute2 command</title>
+<para>To configure the new linux IP stack, we use the iproute2 command. We can configure things like interface addresses, neigbour/arp tables, policy routing, routing table entries, tunnels, multicast routing, and a lot of other network-related stuff using this tool.
+</para>
+<para>
+iproute2 communicates over a sophisticated kernel-userspace interface, called ## netlink sockets, which are quite commonly used in other recent network-related stuff like netfilters userspace queueing and packet logging framework.
+</para>
+<sect2 id="iproute2-command-rule">
+<title>iprote2 rule</title>
+<para>
+The iproute2 rule management (like most other iproute2-managable information) allows three basic operations:
+</para>
+<variablelist>
+<varlistentry>
+<term>show</term>
+<listitem>
+<para>
+Surprisingly, this command shows us the current policy routing rules. It doesn't take any additional arguments.
+</para>
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>add</term>
+<listitem>
+<para>
+We can add a new entry to list of policy routing rules. Valid parameters are:
+</para>
+<itemizedlist>
+<listitem>
+<para>type</para>
+<para>type of this rule</para>
+</listitem>
+<listitem>
+<para>from</para>
+<para> source address and mask </para>
+</listitem>
+<listitem>
+<para>to</para>
+<para> destination address and mask </para>
+</listitem>
+<listitem>
+<para>iif</para>
+<para>incoming interface name</para>
+</listitem>
+<listitem>
+<para>tos</para>
+<para>TOS value</para>
+</listitem>
+<listitem>
+<para>fwmark</para>
+<para>firewall mark field, set by ipchains/iptables</para>
+</listitem>
+</itemizedlist>
+
+</listitem>
+</varlistentry>
+<varlistentry>
+<term>delete</term>
+<listitem>
+<para>
+delete
+</para>
+</listitem>
+</varlistentry>
+</variablelist>
+<para>
+</para>
+</sect2>
+</sect1>
+</chapter>
+
+<chapter id="tc">
+<title>Bandwidth Management</title>
+<para>
+Apart from having more flexible routing decisions, there are other demands for modern routers. Imagine an ISP which wants to pre-allocate specific bandwidthts of its uplink to a particular customer. Or even if you don't want to have hard bandwidth limits, you may want to give specific traffic a higher priority than other traffic. The major Buzzwords are <glossterm linkend="gloss-qos">QoS</glossterm>, ## packet scheduling, <glossterm linkend="gloss-diffserv">DiffServ</glossterm>.
+</para>
+<sect1 id="tc-basics">
+<title>How to do bandwidth management</title>
+<para>
+The best way to influence which kind of packets get which part of the total available bandwidth is to influence how packets are enqueued at a intermediate router between a high-bandwidth and a low-bandwidth interface. More packets arrive on the high-bandwidth link than we can send out on the other side, the low-bandwidth link. The router has to enqueue the packets which are to be sent on the low-bandwidth interface. Once the queue is full, the router has to drop packets.
+</para>
+<para>
+Although there are several ways to influence this queue, in the end it's nothing more than deciding which packets are enqueued at which position inside the queue.
+</para>
+<para>
+Please note, that you can always only influence the sending path.
+</para>
+</sect1>
+
+<sect1 id="tc-linux">
+<title>TC: Linux Traffic Control</title>
+<para>
+The traffic control code in the Linux kernel consists of the following major conceptual components:
+<itemizedlist>
+<listitem><para>queuing disciplines</para></listitem>
+<listitem<para>classes (within a queuing discipline)</para></listitem>
+<listitem><para>filters</para></listitem>
+<listitem><para>policy</para></listitem>
+</itemizedlist>
+<para>
+After the network stack inside the Linux kernel has made its routing decision, it knows on which network device the packet has to be sent out. Each network device has some information about how to enqueue the packets for this particular interface attached to its device structure. This queuing information is what the Linux developers called <indexterm id="qdisc"><primary>queuing discipline</primary></indexterm>queuing discipline.
+</para>
+<para>
+A very simple queuing discipline ma just consist of a single queue, where all packets are stored in the order in which they have been enqueued, and which is emptied as fast as the respecitve network device can send.
+</para>
+<para>
+More elaborate queuing disciplines ma use ##filters to disinguish among different ##classes of packes an process each class in a specific way, e.g. by giving one class priority over other classes.
+<mediaobject>
+<imageobject>
+<imagedata fileref="qdisc_basic.gif" format="gif" width="100" scalefit="1">
+</imageobject>
+</mediaobject>
+</para>
+<para>
+Queuing disciplines and classes are itimatel tied together: the presence of classes and their semantics are fundamental properties of the queuing discipline. In contrast to that, filters can be combined arbitrarily with queuing disciplines and classes as long as the queuing discipline does provide classes at all. To further increase flexibility, each class can use another queuing discipline for enqueuing the packets. This queuing discipline can, in turn, again have multiple classes which each have their own queuing discipline attached, etc.
+<inlinemediaobject>
+<imageobject>
+<imagedata fileref="qdisc_sophisticated.png" format="gif">
+</imageobject>
+</inlinemediaobject>
+</para>
+<para>
+All items inside TC are identified by a Handle. A handle consists out of a major and a minor number, seperated by a colon (example 10:0).
+</para>
+</sect1>
+<sect1>
+<title>Available queuing disciplines</title>
+<para>
+This chapter lists the currently available queuing disciplines an gives a short description of their functionality.
+</para>
+<sect2>
+<title>Class Based Queue (CBQ)</title>
+<para>
+</para>
+</sect2>
+<sect2>
+<title>Tocken Bucket Filter (TBF)</title>
+<para>
+The Token Bucket Filter (TBF) is a simple queue, that only passes packets arriving at rate in bounds of some administratively set rates, with possibility to buffer short bursts.
+</para>
+<para>
+The TBF implementation consists of a buffer (bucket), constantly filled by some virtual pieces of information (called tokens) at a specific rate (called token rate). The most important parameter of the bucket is its size, that is the number of tokens it can store.
+</para>
+<para>
+Each arriving token lets one data packet out of the queue and is then delete from the bucket. Associating this algorithm with the two floews - token and data, gives us three possible scenarios:
+<itemizedlist>
+<listitem>
+<para>
+Data arrives into TBF at a rate equal to the rate of incoming tokens. In this case each packet has its matchin token and passes the queue without futher delay.
+</para>
+</listitem>
+<listitem>
+<para>
+Data arrives into TBF at a rate smaller than the token rate. Only some tokens are deleted from the bucket - one as each packet leaves - so tokens accumulate in the bucket, up to bucket size. The saved tokens can be used to send data in a higher rate than the token rate to compensate small bursts.
+</para>
+</listitem>
+<listitem>
+<para>
+Data arrives at a rate higer than the token rate. In this case a filter overrun occurs - incoming data can only be sent out without loss until all accumulated tokens are used. After that, overlimit packets are dropped.
+</para>
+</listitem>
+</sect2>
+
+<sect2 id="tc-qdisc-cbq">
+<title>Class Based Queue (CBQ)</title>
+<para>
+This queue discipline classifies the waiting packets into a tree-like hierarchy of classes. The leaves of this tree are in turn scheduled by seperate queue disciplines.
+</para>
+<para>
+CBQ is a very commonly used scheduler. It is used as a basis for all the other queue disciplines.
+</para>
+</sect2>
+<sect2 id="tc-qdisc-sfq">
+<title>Stochastic Fairness Queuing (SFQ)</title>
+<para>
+SFQ is not quite deterministic, but works (on average). Its main benefits are that it requires little CPU and memory.
+</para>
+<para>
+SFQ consists of a dynamically allocated number of FIFO queues, one for each conversation. A conversation (or flow) is distinguished by its source/destination IP address and port numberso. The discipline runs in round-robin, sending one packet from each FIFO in one turn, and this is why it's called fair. The main advantage of SFQ is that it allows fair sharing the link between different applications. It prevents bandwidth-takeover by one client / one application.
+</para>
+</sect2>
+<sect2 id="tc-qdisc-pfifo">
+<title>pfifo_fast</title>
+<para>
+The queue is, as the name says, first in, first out. That means that no packet receives any special treatment. At least, not quite. This qdisc has three so-called 'bands'. Within each band, FIFO rules apply. However, if there are packets waiting in band 0, band 1 won't be processed. Same goes for band 1 and band 2.
+</para>
+</sect2>
+<sect2 id="tc-qdisc-red">
+<title>Random Early Detect (RED)</title>
+<para>
+RED only works with TCP packets. It manipulates TCP's flow control (slow start). Once the link is filling up, it starts dropping packets. This indicates the TCP stack on the sending machine, that the link is congested, and the sender slows down. The clue is, that it simulates real congestion.
+</para>
+</sect2>
+<sect2 id="tc-qdisc-ingres">
+<title>Ingress policer</title>
+<para>
+The ingress policer implements a hard limit. You configure it to a specific rate, and all packets entering this queue exceeding the configured rate are dropped.
+</para>
+</sect2>
+
+
+</sect1>
+
+
+</chapter>
+
+<appendix id="further-reading">
+<title>Further Reading</title>
+
+</appendix>
+
+<appendix id="acknowledgements">
+<title>Acknowledgements</title>
+<para>
+Although I wrote this document, I wasn't involved in any of the iproute2 / tc development. I am still baffled by the abstract, flexible concept it provides. My thanks go out to the iproute2+tc developers, especially Alexey Kuznetsov (our Linux networking god) and Werner Almesberger. Thanks to Rusty Russel, who inspired me at OLS2000 and LBW2000 to get involved more deeply with netfilter. I want to thank Andi Kleen and Marc Boucer, for having some really nice discussions on our meetings in Munich. Not to forget Bert Hubert and his team for writing the Linux 2.4 Advanced Routing HOWTO. Additional special thanks to the people who invented DocBook.
+</para>
+
+</appendix>
+
+<glossary>
+<title>Glossary</title>
+<glossentry id="gloss-diffserv">
+<glossterm>Differentiated Services</glossterm>
+<acronym>DiffServ</acronym>
+<glossdef><para>
+DiffServ is one of two actual <glossterm linkend="gloss-qos">QoS</glossterm> implementations (the other one is called Integrated Services) that is based on a value carried by packets in the DS field of the IP header.
+</para></glossdef>
+</glossentry>
+
+<glossentry id="gloss-ipchains">
+<glossterm>ipchains</glossterm>
+<glossdef><para>
+The packet filtering system in Linux 2.2
+</para></glossdef>
+</glossentry>
+
+<glossentry id="gloss-iptables">
+<glossterm>ipchains</glossterm>
+<glossdef><para>
+The packet filtering system in Linux 2.4, based on <glossterm linkend="gloss-netfilter">netfilter</glossterm>.
+</para></glossdef>
+</glossentry>
+
+<glossentry id="gloss-netfilter">
+<glossterm>netfilter</glossterm>
+<glossdef><para>
+Common term used for the Linux 2.4 firewalling subsystem. To be more precies, it is the infrastructure underlying packet filtering, NAT and packet mangling.
+</para></glossdef>
+</glossentry>
+
+<glossentry id="gloss-netlink">
+<glossterm>Netlink Socket</glossterm>
+<glossdef><para>
+A special socket between kernel and userspace. Used by iproute2 to alter information in the routing tables, arp cache, policy routing database, ...
+</para></glossdef>
+</glossentry>
+
+<glossentry id="gloss-ospf">
+<glossterm>Open Shortest Path First</glossterm>
+<acronym>OSPF</acronym>
+<glossdef><para>
+A dynamic routing protocol.
+</para></glossdef>
+</glossentry>
+
+<glossentry id="gloss-qos">
+<glossterm>Quality of Service</glossterm>
+<acronym>QoS</acronym>
+<glossdef><para>
+Guaranteeing a certain bandwidth for specific applications
+</para></glossdef>
+</glossentry>
+
+<glossentry id="gloss-rip">
+<glossterm>Routing Information Protocol</glossterm>
+<acronym>RIP</acronym>
+<glossdef><para>
+A dynamic routing protocol.
+</para></glossdef>
+</glossentry>
+
+</glossary>
+
+
+</para>
+</book>
diff --git a/iproute2/iproute2+tc-slides.mgp b/iproute2/iproute2+tc-slides.mgp
new file mode 100644
index 0000000..9791f79
--- /dev/null
+++ b/iproute2/iproute2+tc-slides.mgp
@@ -0,0 +1,454 @@
+%include "default.mgp"
+%default 1 bgrad
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+%nodefault
+%back "blue"
+
+%center
+%size 7
+
+
+Advanced Linux Networking
+
+
+%center
+%size 4
+by
+
+Harald Welte <laforge@gnumonks.org>
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+Contents
+
+ Introduction
+
+ Advanced Routing with iproute2
+
+ Bandwidth Management using tc
+
+ Advanced netfilter concepts
+
+ References / Further Reading
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+Introduction
+
+Changes in the Linux IP stack
+
+ Alexey Kuznetsov introduced new routing in 2.2
+
+ IPv6 support required generalization
+
+ tc subsystem (traffic control)
+
+ Hooks in the Network stack (netfilter)
+
+ Netlink Sockets
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+Introduction
+
+What can Linux do for me?
+
+ Sophisticated routing (not only destination based)
+
+ Control how the bandwidth is divided
+
+ Prevent DoS attacks (various kinds of flooding)
+
+ Advanced packet filtering (see my other talk)
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART I - Advanced Routing
+
+ Traditional IP routing
+
+ router is connected to more than one network segment
+
+ router knows which hosts are direcltly attached to these segments
+
+ router knows where to send packets if destination not link-local
+
+ router builds decision for each packet, based on its destination
+
+ Why is this insufficient?
+
+ Real-world network scenario getting more complex
+
+ People want to have different routing for different services
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART I - Advanced Routing
+
+Policy routing with iproute2
+
+ Multiple routing tables
+
+ Rules describing which routing table to use
+
+ Configurable using commandline tool 'iproute2'
+
+ Each rule consists of
+ priority (Determining order of rules)
+ match (Which packets match this rule)
+ packet source address
+ packet destination address
+ TOS value
+ incoming interface
+ fwmark (set by ipchains / iptables)
+ action Which routing table
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART I - Advanced Routing
+
+The 'ip' command
+ used for
+ interface configuration
+ neighbour/arp tables
+ policy routing
+ routing tables
+ tunnels
+ multicast routing
+ communication with kernel through netlink sockets
+
+ Important commands for policy routing
+ ip rule show Show all rules in policy database
+ ip rule add Add new rule to policy database
+ ip rule delete Delete rule from policy database
+
+Examples:
+%font "typewriter"
+%size 3
+> ip rule add from 1.2.3.4/16 to 5.6.7.8/24 dev eth0 table 10
+
+> ip rule show
+
+0: from all lookup local
+32765: from 1.2.3.4/16 to 5.6.7.8/24 iif eth0 lookup 10
+32766: from all lookup main
+32767: from all lookup 253
+%font "standard"
+%size 5
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART I - Advanced Routing
+
+The 'ip' command
+
+ Important commands for routing tables
+ ip route add Add routing table entry
+ ip route del Delete routing table entry
+ ip route list List routing table
+ ip route flush Flush routing cache
+
+ In reality far more sophisticated
+%font "typewriter"
+%size 2
+Usage: ip route { list | flush } SELECTOR
+ ip route get ADDRESS [ from ADDRESS iif STRING ]
+ [ oif STRING ] [ tos TOS ]
+ ip route { add | del | change | append | replace | monitor } ROUTE
+SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact PREFIX ]
+ [ table TABLE_ID ] [ proto RTPROTO ]
+ [ type TYPE ] [ scope SCOPE ]
+ROUTE := NODE_SPEC [ INFO_SPEC ]
+NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]
+ [ table TABLE_ID ] [ proto RTPROTO ]
+ [ scope SCOPE ] [ metric METRIC ]
+INFO_SPEC := NH OPTIONS FLAGS [ nexthop NH ]...
+NH := [ via ADDRESS ] [ dev STRING ] [ weight NUMBER ] NHFLAGS
+OPTIONS := FLAGS [ mtu NUMBER ] [ advmss NUMBER ]
+ [ rtt NUMBER ] [ rttvar NUMBER ]
+ [ window NUMBER] [ cwnd NUMBER ] [ ssthresh REALM ]
+ [ realms REALM ]
+TYPE := [ unicast | local | broadcast | multicast | throw |
+ unreachable | prohibit | blackhole | nat ]
+TABLE_ID := [ local | main | default | all | NUMBER ]
+SCOPE := [ host | link | global | NUMBER ]
+FLAGS := [ equalize ]
+NHFLAGS := [ onlink | pervasive ]
+RTPROTO := [ kernel | boot | static | NUMBER ]
+%font "standard"
+%size 5
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART II - Bandwidth Management
+
+ What do I need Bandwidth Management for?
+
+ Decide how and who available bandwidth is devided
+
+ Limit available bandwidth for certain users / applications
+
+ Guarantee bandwidth for certain users / applications
+
+ Divide bandwidth more equally between users / applications
+
+ QoS, DiffServ, IntServ
+
+ Linux 2.2 / 2.4 provides elaborate framework
+
+ Called 'packet scheduling' or 'traffic control'
+ Another major achievement of Alexey Kuznetsov
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART II - Bandwidth Management
+
+Basic iptables commands
+
+To build a complete iptable command, we must specify
+ which table to work with
+ which chain in this table to use
+ an operation (insert, add, delete, modify)
+ a match
+ a target
+
+The syntax is
+%font "typewriter"
+%size 3
+iptables -t table -Operation chain -j target match(es)
+%font "standard"
+%size 5
+
+Example:
+%font "typewriter"
+%size 3
+iptables -t filter -A INPUT -j ACCEPT -p tcp --dport smtp
+%font "standard"
+%size 5
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART II - packet filtering
+
+Targets
+
+ Builtin Targets to be used in filter table
+ ACCEPT accept the packet
+ DROP silently drop the packet
+ QUEUE enqueue packet to userspace
+ RETURN return to previous (calling) chain
+ foobar user defined chain
+
+Targets implemented as loadable modules
+ REJECT drop the packet but inform sender
+ MIRROR change source/destination IP and resend
+ LOG log via syslog
+ ULOG log via userspace
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART II - packet filtering
+
+Matches
+
+ Basic matches
+ -p protocol (tcp/udp/icmp/...)
+ -s source address (ip/mask)
+ -d destination address (ip/mask)
+ -i incoming interface
+ -o outgoing interface
+
+ Match extensions
+ --dport destination port
+ --sport source port
+ --mac-source source MAC address
+ --mark nfmark
+ --tos TOS field of IP header
+ --ttl TTL field of IP header
+ --limit rate limiting (n packets per timeframe)
+ --owner owner uid of the socket sending the packet
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART III - NAT
+
+Overview
+
+ Previous Linux Kernels only implemented one special case of NAT: Masquerading
+
+ Netfilter enables Linux to do any kind of NAT.
+
+ All matches from packet filtering are available for the nat tables, too
+
+ We divide NAT into 'source NAT' and 'destination NAT'
+
+ SNAT changes the packet's source whille passing NF_IP_POST_ROUTING
+
+ DNAT changes the packet's destination while passing NF_IP_PRE_ROUTING
+
+ MASQUERADE is a special case of SNAT
+
+ REDIRECT is a special case of DNAT
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART III - NAT
+
+Source NAT
+
+ SNAT Example:
+%font "typewriter"
+%size 3
+
+iptables -t nat -A POSTROUTING -j SNAT --to-source 1.2.3.4 -s 10.0.0.0/8
+%font "standard"
+%size 4
+
+Masquerading does almost the same as SNAT, but if the outgoing interfaces' address changes (in case we have a dialup with dynamic ip), the new address is used.
+
+ MASQUERADE Example:
+%font "typewriter"
+%size 3
+
+iptables -t nat -A POSTROUTING -j MASQUERADE -o ppp0
+%font "standard"
+%size 5
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART III - NAT
+
+Destination NAT
+
+ DNAT example:
+%font "typewriter"
+%size 3
+
+iptables -t nat -A PREROUTING -j DNAT --to-destination 1.2.3.4:8080 -p tcp --dport 80 -i eth1
+%font "standard"
+%size 4
+
+REDIRECT is a special case of DNAT, which alters the destination to the address of the incoming interface.
+
+ REDIRECT example:
+%font "typewriter"
+%size 3
+
+iptables -t nat -A PREROUTING -j REDIRECT --to-port 3128 -i eth1 -p tcp --dport 80
+%font "standard"
+%size 5
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+PART IV - Packet mangling
+
+ Change certain parts of a packet based on rules in IP tables
+
+ Again all the matches available, as described in packet filtering section.
+
+ Currently, the supported packet mangling targets are:
+ TOS manipulate the TOS bits
+ TTL set / increase / decrease TTL field
+ MARK change the nfmark field of the skb
+
+Simple example:
+%font "typewriter"
+%size 3
+
+iptables -t mangle -A PREROUTING -j MARK --set-mark 10 -p tcp --dport 80
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+Advanced Netfilter concepts
+
+ Connection tracking
+
+ Implemented seperately from NAT
+
+ Enables stateful filtering
+
+ Implementation
+ hooks into NF_IP_PRE_ROUTING to track packets
+ hooks into NF_IP_POST_ROUTING and NF_IP_LOCAL_IN to drop information about connections which got filtered out
+ protocol modules (currently TCP/UDP/ICMP)
+ application helpers (currently FTP and IRC-DCC)
+
+ Conntrack divides packets in the following four categories
+ NEW - would establish new connection
+ ESTABLISHED - part of already established connection
+ RELATED - is related to established connection
+ INVALID - (multicast, errors...)
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+Advanced Netfilter concepts
+
+%size 4
+ Userspace logging
+ flexible replacement for old syslog-based logging
+ packets to userspace via multicast netlink sockets
+ easy-to-use library (libipulog)
+ plugin-extensible userspace logging daemon already available
+
+ Queuing
+ reliable asynchronous packet handling
+ packets to userspace via unicast netlink socket
+ easy-to-use library (libipq)
+ experimental queue multiplex daemon (ipqmpd)
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+Current Development and Future
+
+Netfilter (although it proved very stable) is still work in progress.
+
+Areas of current development
+ infrastructure for conntrack/nat helpers in userspace
+ full TCP sequence number tracking
+ multicast support for connection tracking
+ more flexible matches (MAXCONN, ...)
+ more conntrack and NAT modules (RPC, SNMP, SMB, ...)
+ better IPv6 support (conntrack, more matches / targets)
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Advanced Linux Networking
+Availability of slides / Links
+
+The slides and the an according paper of this presentation are available at
+ http://www.gnumonks.org
+
+The netfilter homepage is mirrored at:
+ http://netfilter.samba.org
+ http://netfilter.kernelnotes.org
+ http://netfilter.filewatcher.org
+
+More documents / netfilter extensions (ulogd, ipqmpd, ...)
+ http://www.gnumonks.org/projects
personal git repositories of Harald Welte. Your mileage may vary