summaryrefslogtreecommitdiff
path: root/2001/netfilter-6fevu2001/netfilter-6fevu.html
blob: a24252135c7b7c4f39065bdd0ac55a94f3cfcd4f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE>The netfilter framework in Linux 2.4</TITLE>


</HEAD>
<BODY>
<H1>The netfilter framework in Linux 2.4</H1>

<H2>Harald Welte <CODE>laforge@gnumonks.org</CODE></H2>$Date: 2004-10-10 15:04:54 +0200 (Sun, 10 Oct 2004) $
<P><HR>
<EM>This is the paper on which my talk about netfilter at Linux-Kongress 2000, CCC Congress 2000 (and probably some more occassions where I give this talk) is based. It describes the netfilter infrastructure, as well as the systems for packet filtering, NAT and packet mangling on top of it</EM>
<HR>
<H2><A NAME="s1">1. PART I - Netfilter basics / concepts</A></H2>

<H2>1.1 What is netfilter?</H2>

<P>Netfilter is definitely more than any of the firewall subsystems in the past linux kernels. Netfilter provides a abstract, generalized framework of which one particular incarnation is the packet filtering subsystem. So don't expect a talk about "how to set up a firewall or a masquerading gateway in 2.4". This would only cover a part of netfilter.
<P>The netfilter framework consists out of three parts:
<P>
<P>
<OL>
<LI>Each protocol defines a set of 'hooks' (IPv4 defines 5), which are well-defined points in a packet's traversal of that protocol stack. At each of these points, the protocol stack will call the netfilter framework with the packet and the hook number.
</LI>
<LI>Parts of the kernel can register to listen to the different hooks for each protocol. So when a packet is passed to the netfilter framework, it checks to see if anyone has registered for that protocol and hook; if so, they get a chance to examine (and possibly alter) the packet, discard it, allow it to pass or ask netfilter to queue the packet for userspace.
</LI>
<LI>Packets that have been queued are collected for sending to userspace; these packets are handled asynchronously. A userspace process can examine the packet, can alter it, and reinject it at the same hook it left the kernel.</LI>
</OL>
<P>
<P>All the packet filtering / NAT / ... stuff is based on this framework. There is no more dirty packet altering code spread all over the network stack. 
<P>
<P>The netfilter framework currently has been implemented for IPv4, IPv6 and DECnet.
<P>
<H2>1.2 Why did we need netfilter?</H2>

<P>This chapter could be called 'What is wrong with ipchains?', too. So why did we need this change? (I only give a few examples here)
<P>
<UL>
<LI>No infrastructure for passing packets to userspace, so all code which does some packet fiddling must be done as kernel code. Kernel programming is hard, must be done in C, and is dangerous. 
</LI>
<LI>Transparent proxying is extremely difficult
We have to look up _every_ packet to see if there's a socket bound to that adderess. No clean interface, 34 #ifdef' in 11 different files of the network stack
</LI>
<LI>Creating of packet filter rules independent of interface address is impossible.
We must know local interface address to distinguish locally-generated or locally-terminated packets from through packets. The forward chain has only information on outgoing interface. So we must try to figure out where the packet came from.
</LI>
<LI>Masquerading and packet filtering are implemented as one part
This makes the firewalling code way too complex.
</LI>
<LI>Ipchains code is neither modular nor extensible (eg. for MAC adress filtering)</LI>
</UL>
<P>
<H2>1.3 The authors of netfilter</H2>

<P>The concept of the netfilter framework and most of its implementation were done by Rusty Russell. He is co-author if ipchains and is the current Linux Kernel IP firewall maintainer. Rusty got paid one Year by Watchguard (a firewall company) to do nothing, so he had enough time to do it :) 
<P>
<P>The official netfilter core team consists out of Rusty Russell, Marc Boucher, James Morris and Harald Welte. Of course there are various other hackers who have contributed some stuff (for more information see 
<A HREF="http://netfilter.samba.org/scoreboard.html">http://netfilter.samba.org/scoreboard.html</A>).
<P>
<H2>1.4 Netfilter architecture in IPv4</H2>

<P>A Packet Traversing the Netfilter System:
<BLOCKQUOTE><CODE>
<PRE>

   --->[1]--->[ROUTE]--->[3]--->[4]--->
                 |            ^
                 |            |
                 |         [ROUTE]
                 v            |
                [2]          [5]
                 |            ^
                 |            |
                 v            |
</PRE>
</CODE></BLOCKQUOTE>
<P>
<P>
<P>Packets come in from the left. After verification of the IP checksum, the packets hit the NF_IP_PRE_ROUTING [1] hook. 
<P>Next they enter the routing code, which decides if the packets are local or have to be passed to another interface. 
<P>If the packets are considered to be local, they traverse th NF_IP_LOCAL_IN [2] hook and get passed to the process (if any) afterwards.
<P>If the packets are routed to another interface, they pass the NF_IP_FORWARD [3] hook.
<P>The packet passes a final netfilter hook, NF_IP_POST_ROUTING [4], before they get transmitted on the target interface.
<P>The NF_IP_LOCAL_OUT [5] hook is called for locally generated packets. Here You can see that routing occurs after this hook is called: in fact, the routing code is called first (to figure out the source IP address and some IP options), and called again if the packet is altered. 
<P>Locally generated packets hit NF_IP_POST_ROUTING [4], too.
<P>
<H2>1.5 Netfilter base</H2>

<P>Kernel modules can register a callback function for each one of these hooks. This callback function is called for each packet traversing the hook. The module is free to alter the packet. It has to return netfilter one of these constants:
<P>
<UL>
<LI>NF_ACCEPT   continue traversal as normal</LI>
<LI>NF_DROP             drop the packet; do not continue traversal</LI>
<LI>NF_STOLEN   I've taken over the packet; do not continue traversal</LI>
<LI>NF_QUEUE    queue the packet (usually for userspace handling)</LI>
<LI>NF_REPEAT   call this hook again</LI>
</UL>
<P>
<P>
<H2>1.6 Packet selection: IP tables</H2>

<P>A packet selection system called IP tables has been built. It is a direct descendant of ipchains, with extensibility.
<P>Kernel modules can create a new table utilizing the IP tables core, and ask for a packet to traverse a given table. 
<P>IP tables are used for packet filtering (the 'filter' table), Network Address Translation (the 'nat' table) and general packet mangling (the 'mangle' table).
<P>The three big parts of Linux 2.4 packet handling are built using netfilter hooks and IP tables. They are seperate modules and are independent from each other. They all plug in nicely into the infrastructure provided by netfilter.
<P>
<OL>
<LI>Packet filtering 
<P>This table 'filter' should never alter packets, only filter them.
One of the advantages of iptables over ipchains is that it is small and fast, and it hooks into netfilter at the NF_IP_LOCAL_IN, NF_IP_FORWARD and NF_IP_LOCAL_OUT hooks. 
<P>Therefore, for each packet there is one, and only one, place to filter it. This is one big change compared to ipchains, where a forwarded packet used to traverse three chains.
<P>
</LI>
<LI> NAT
<P>The nat table listens at three netfilter hooks: NF_IP_PRE_ROUTING and NF_IP_POST_ROUTING to do source and destination NAT for routed packets. For destination altering of local packets, the NF_IP_LOCAL_OUT hook is used.
<P>This table is different from the 'filter' table, in that only the first packet of a new connection will traverse the table. The result of this traversal is then applied to all future packets of the same connection.
<P>The NAT table is used for source NAT, destination NAT, masquerading (which is a special case of source nat) and transparent proxying (which is a special case of destination nat).
<P>
</LI>
<LI> Packet mangling
<P>The 'mangle' table registers at the NF_IP_PRE_ROUTING and NF_IP_LOCAL_OUT hooks.
<P>Using the mangle table You can modify the packet itself or some of the out-of-band data attached to the packet. Currently the alteration of the TOS bits as well as setting the nfmark field inside the skb is implemented on top of the mangle table.
</LI>
</OL>
<P>
<H2>1.7 Connection tracking</H2>

<P>Connection tracking is fundamental to NAT, but has been implemented as a seperate module. This allows an extension to the packet filtering code to simply use connection tracking for "stateful firewalling". (the 'state' match)
<P>
<P>
<H2><A NAME="s2">2. PART II - packet filtering using iptables and netfilter</A></H2>

<H2>2.1 Overview</H2>

<P>I expect You are familiar with TCP/IP, routing, firewall concepts and packet filtering in general.
<P>As already explained in Part I, the filter table listens on three hooks, thus providing us three chains for packet filtering.
<P>All packets coming from the network and destined for the local box traverse the INPUT chain.
<P>All packets which are forwarded (routed) by us traverse the FORWARD chain (and only the FORWARD chain). Please again note this difference to the previous linux firewall implementations!
<P>Finally, the packets originating from the local box traverse the OUTPUT chain.
<P>
<H2>2.2 Inserting rules into chains</H2>

<P>To insert/delete/modify any rules in linux 2.4 IP tables we have a neat and powerful commandline tool, called 'iptables'. I don't want to get too deep into all its features and extensibility. Here are some of its major features:
<UL>
<LI>It handles all different kinds of IP tables. Currently the filter, nat and mangle tables, but also all future table modules
</LI>
<LI>It supports plugins for new matches and new targets. Thus, nobody ever needs to patch anything to provide a netfilter extension. You have a kernel module doing the real work and a iptables plugin (dynamic library) to add the neccessary configuration options.
</LI>
<LI>It comes in two incarnations: iptalbes (IPv4) and ip6tables (IPv6). Both of them are based on the same library and mostly the same code.</LI>
</UL>
<P>
<H3>Basic iptables commands</H3>

<P>An iptables command usually consists out of 5 parts:
<OL>
<LI>which table we want to work with</LI>
<LI>which chain in this table we want it to use</LI>
<LI>an operation (insert, add, delete, modify)</LI>
<LI>a target for this particular rule</LI>
<LI>a description of which packets we want to match this rule</LI>
</OL>
<P>The basic syntax is
<PRE>
iptables -t table -Operation chain -j target match(es)
</PRE>
<P>To add a rule allowing all traffic from anywhere to our local smtp port:
<PRE>
iptables -t filter -A INPUT -j ACCEPT -p tcp --dport smtp
</PRE>
<P>Of course there are various other commands like flush chain, set the default policy of a chain, add a user-defined chain, ...  
<P>Basic Operations:
<PRE>
-A      append rule
-I      insert rule
-D      delete rule
-R      replace rule
-L      list rules
</PRE>
<P>Basic Targets, common to all chains:
<PRE>
ACCEPT  accept the packet
DROP    drop the packet
QUEUE   queue packet to userspace
RETURN  return to the previous (calling) chain
foobar  user defined chain
</PRE>
<P>
<P>Basic matches, common to all chains:
<PRE>
-p      protocol (tcp/icmp/udp/...)
-s      source address (ip address/masklen)
-d      destination address (ip address/masklen)
-i      incoming interface 
-o      outgoing interface
</PRE>
<P>Apart from these basic operations, matches and targets there are various extensions, which I'll describe in the apropriate chapters.
<P>
<H2>2.3 iptables match extensions for filtering</H2>

<P>There are various extensions which are useful for packet filtering. Describing them all in detail would take way too much time. Just to give You an impression about the power :)
<P>At first there are some match extensions, which give us more power to describe which packets to match:
<UL>
<LI>TCP match extensions to match source port, destination port, arbitrary combinations of TCP flags, tcp options.</LI>
<LI>UPD match extensions to match source port and destination port</LI>
<LI>ICMP match extension to match icmp type</LI>
<LI>MAC match extension to match incoming mac (ethernet) address</LI>
<LI>MARK match extension to match the nfmark </LI>
<LI>OWNER match extension (for locally generated packets only) to match user id, group id, process id, session id</LI>
<LI>LIMIT match extension to match only a certain limit of packets per time frame. Very useful to prevent forwarding of any kind of flooding.</LI>
<LI>STATE match extension to match packets of a certain state (decided by the connection tracking subsystem). Possible states are 
<UL>
<LI>INVALID (not matched against a connection), </LI>
<LI>ESTABLISHED (packet belongs to an already established connection), </LI>
<LI>NEW (packet would establish a new connection) and </LI>
<LI>RELATED (packet is in some way related to an already established connection. For example an ICMP error message or a ftp data connection)</LI>
</UL>
</LI>
<LI>TOS match extension to match the value of the TOS IP header field</LI>
<LI>TTL match extension to match the value of the TTL IP header field</LI>
</UL>
<P>
<P>
<H2>2.4 iptables target extensions for filtering</H2>

<P>
<UL>
<LI>LOG log matched packets via syslog()</LI>
<LI>ULOG        log matched packets via userspace logging daemon
(supports interpreter and output plugins for flexible logging)</LI>
<LI>REJECT      not only drop the packet, but also send some kind of error 
message to the sender (which message is configurable)</LI>
<LI>MIRROR      retransmit the packet after exchanging source and destination
IP address </LI>
</UL>
<P>
<H2><A NAME="s3">3. PART III - NAT using iptables and netfilter</A></H2>

<P>Regarding to NAT (Network Address Translation) the previous Linux Kernels only supported one spacial case called "Masquerading"
<P>Netfilter now enables Linux to do any kind of NAT. 
<P>Nat is divided into `source NAT' and `destination NAT'. 
<P>Source NAT alters the source address of a packet while passing the NF_IP_POST_ROUTING hook. Masquerading is a special application of SNAT
<P>Destination NAT alters the destination address of a packet while passing the NF_IP_LOCAL_OUT respectively NF_IP_PRE_ROUTING hook. Port forwarding and transparent proxying are forms of DNAT.
<P>
<H2>3.1 iptables target extensions for NAT</H2>

<P>
<P>
<DL>
<P>
<DT><B>SNAT</B><DD><P>Change the source address to something different
<P>Example:     
<PRE>
iptables -t nat -A POSTROUTING -j SNAT --to-source 1.2.3.4
</PRE>
<P>
<DT><B>MASQUERADE</B><DD><P>SNAT for dialup connections with dynamic ip address
<P>Does almost the same as SNAT, but if the link goes down, all connection tracking information is dropped. The connections are lost anyway, because we get a different IP address at reconnect.
<P>Example:     
<PRE>
iptables -t nat -A POSTROUTING -j MASQUERADE -o ppp0
</PRE>
<P>
<DT><B>DNAT</B><DD><P>Change the destination address to something different
<P>This is done at the PREROUTING chain, just as the packet comes in. Therefore, anything else on the Linux box itself (routing, packet filtering) will se the packet to its real (new) destination.
<P>Example:     
<PRE>
iptables -t nat -A PREROUTING -j DNAT --to-destination 1.2.3.4:8080 -p tcp --dport 80 -i eth1
</PRE>
<P>
<DT><B>REDIRECT</B><DD><P>Redirect packets to local destination
<P>Exactly the same as doing DNAT to the address of the incoming interface
<P>Example:
<PRE>
iptables -t nat -A PREROUTING -j REDIRECT --to-port 3128 -i eth1 -p tcp --dport 80
</PRE>
<P>
</DL>
<P>
<H2><A NAME="s4">4. PART IV - Packet mangling using iptables and netfilter</A></H2>

<P>The `mangle' table enables us to alter the packet itself or some data accompaning the packet. 
<P>
<H2>4.1 iptables target extensions for packet mangling</H2>

<P>
<DL>
<P>
<DT><B>MARK</B><DD><P>set the value of the nfmark field
<P>We can change the value of the nfmark field. The nfmark is just a user defined mark (anything within the range of an unsigned long) of the packet. The mark value is used to do policy routing, tell ipqmpd (the userspace queue multiplex daemon) which process to queue the packet to, etc. 
<P>Example:
<BLOCKQUOTE><CODE>
<PRE>
iptables -t mangle -A PREROUTING -j MARK --set-mark 0x0a -p tcp
</PRE>
</CODE></BLOCKQUOTE>
<P>
<DT><B>TOS</B><DD><P>set the value of the TOS bits inside the IP header
<P>We can change the value of the type of service bits inside the IP haeder. This is useful if You are using TOS based packet scheduling / routing.
<P>Example: 
<BLOCKQUOTE><CODE>
<PRE>
iptables -t mangle -A PREROUTING -j TOS --set-tos 0x10 -p tcp --dport ssh
</PRE>
</CODE></BLOCKQUOTE>
<P>
<DT><B>TTL</B><DD><P>alther the value of the TTL field inside the IP header
<P>Enables the user to set, increase or decrease the TTL field.
<P>Example:
<BLOCKQUOTE><CODE>
<PRE>
iptables -t mangle -A PREROUTING -j TTL --ttl-dec 2 -i eth0
</PRE>
</CODE></BLOCKQUOTE>
</DL>
<P>
<H2><A NAME="s5">5. Queueing packets to userspace</A></H2>

<P>As I already mentioned, at any time in any netfilter chain, the packet can be queued to userspace. The actual queuing is done by a kernel module (ip_queue.o).
<P>The packets (including metadata like nfmark and mac address) are sent to an userspace process using netlink sockets. This process can do whatever it wants to do with the packet. 
<P>After the userspace process is done with its work on the packet, it can either reinject the packet into the kernel, or set a verdict (DROP, ...) what to do with the packet.
<P>This is one key technology of netfilter, enabling to do complicated packet handling by userspace processes. Thus, preventing more complexity in the kernel space.
<P>
<P>Userspace packet handling processes can be easily developed using a netfilter-provided library called 'libipq'. 
<P>
<P>Currently only one userspace process is supported, but the first beta release of an userspace ip queueing multiplex daemon (ipqmpd) is available. ipqmpd provides a compatibility library (libipqmpd) which makes upgrading from raw ipqueue interface to the new ipqpmd as easy as relinking to another library.
<P>
<H2><A NAME="s6">6. PART V Credits</A></H2>

<P>Credits to all the netfilter hackers, especially the core team. 
<P>Namely: <B>Paul 'Rusty' Russel</B>, <B>Marc Boucher</B> and <B>James Morris</B>. 
<P>Additional special thanks to Rusty for his `netfilter-hacking-HOWTO', `packet-filtering-HOWTO' and `NAT-HOWTO' which I heavily used as a basis for this presentation.
<P>
</BODY>
</HTML>
personal git repositories of Harald Welte. Your mileage may vary