%include "default.mgp" %default 1 bgrad %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %deffont "typewriter" tfont "MONOTYPE.TTF" %page %nodefault %back "blue" %center %size 7 Programming netfilter/iptables extensions %center %size 4 by Harald Welte %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Contents Introduction The netfilter/iptables architecture Netfilter hooks in protocol stacks Packet selection based on IP Tables The Connection Tracking Subsystem The NAT Subsystem based on netfilter + iptables Packet filtering using the 'filter' table Packet mangling using the 'mangle' table Advanced netfilter concepts Current development and Future Developing a netfilter module Developing a new iptables match %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Introduction Why did we need netfilter/iptables? Because ipchains... has no infrastructure for passing packets to userspace makes transparent proxying extremely difficult has interface address dependent Packet filter rules has Masquerading implemented as part of packet filtering code is too complex and intermixed with core ipv4 stack is neither modular nor extensible only barely supports one special case of NAT (masquerading) has only stateless packet filtering %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Introduction Who's behind netfilter/iptables Paul 'Rusty' Russel co-author of iptables in Linux 2.2 was paid by Watchguard for about one Year of development James Morris userspace queuing (kernel, library and tools) REJECT target Marc Boucher NAT and packet filtering controlled by one command Mangle table Harald Welte Conntrack+NAT helper infrastructure (newnat) Userspace packet logging (ULOG) PPTP and IRC conntrack/NAT helpers Jozsef Kadlecsik TCP window tracking H.323 conntrack + NAT helper Continued newnat development Non-core team contributors http://www.netfilter.org/scoreboard/ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Netfilter Hooks What is netfilter? System of callback functions within network stack Callback function to be called for every packet traversing certain point (hook) within network stack Protocol independent framework Hooks in layer 3 stacks (IPv4, IPv6, DECnet, ARP) Multiple kernel modules can register with each of the hooks Asynchronous packet handling in userspace (ip_queue) Traditional packet filtering, NAT, ... is implemented on top of this framework Can be used for other stuff interfacing with the core network stack, like DECnet routing daemon. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Netfilter Hooks Netfilter architecture in IPv4 %font "typewriter" --->[1]--->[ROUTE]--->[3]--->[4]---> | ^ | | | [ROUTE] v | [2] [5] | ^ | | v | %font "standard" 1=NF_IP_PRE_ROUTING 2=NF_IP_LOCAL_IN 3=NF_IP_FORWARD 4=NF_IP_POST_ROUTING 5=NF_IP_LOCAL_OUT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Netfilter Hooks Netfilter Hooks Any kernel module may register a callback function at any of the hooks The module has to return one of the following constants NF_ACCEPT continue traversal as normal NF_DROP drop the packet, do not continue NF_STOLEN I've taken over the packet do not continue NF_QUEUE enqueue packet to userspace NF_REPEAT call this hook again %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture IP tables Packet selection using IP tables The kernel provides generic IP tables support Each kernel module may create it's own IP table The three major parts of 2.4 firewalling subsystem are implemented using IP tables Packet filtering table 'filter' NAT table 'nat' Packet mangling table 'mangle' Can potentially be used for other stuff, i.e. IPsec SPDB %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture IP Tables Managing chains and tables An IP table consists out of multiple chains A chain consists out of a list of rules Every single rule in a chain consists out of match[es] (rule executed if all matches true) target (what to do if the rule is matched) %size 4 matches and targets can either be builtin or implemented as kernel modules %size 6 The userspace tool iptables is used to control IP tables handles all different kinds of IP tables supports a plugin/shlib interface for target/match specific options %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture IP Tables Basic iptables commands To build a complete iptables command, we must specify which table to work with which chain in this table to use an operation (insert, add, delete, modify) one or more matches (optional) a target The syntax is %font "typewriter" %size 3 iptables -t table -Operation chain -j target match(es) %font "standard" %size 5 Example: %font "typewriter" %size 3 iptables -t filter -A INPUT -j ACCEPT -p tcp --dport smtp %font "standard" %size 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture IP Tables Matches Basic matches -p protocol (tcp/udp/icmp/...) -s source address (ip/mask) -d destination address (ip/mask) -i incoming interface -o outgoing interface Match extensions (examples) tcp/udp TCP/udp source/destination port icmp ICMP code/type ah/esp AH/ESP SPID match mac source MAC address mark nfmark length match on length of packet limit rate limiting (n packets per timeframe) owner owner uid of the socket sending the packet tos TOS field of IP header ttl TTL field of IP header %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture IP Tables Targets very dependent on the particular table. Table specific targets will be discussed later Generic Targets, always available ACCEPT accept packet within chain DROP silently drop packet QUEUE enqueue packet to userspace LOG log packet via syslog ULOG log packet via ulogd RETURN return to previous (calling) chain foobar jump to user defined chain %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Packet Filtering Overview Implemented as 'filter' table Registers with three netfilter hooks NF_IP_LOCAL_IN (packets destined for the local host) NF_IP_FORWARD (packets forwarded by local host) NF_IP_LOCAL_OUT (packets from the local host) Each of the three hooks has attached one chain (INPUT, FORWARD, OUTPUT) Every packet passes exactly one of the three chains. Note that this is very different compared to the old 2.2.x ipchains behaviour. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Packet Filtering Targets available within 'filter' table Builtin Targets to be used in filter table ACCEPT accept the packet DROP silently drop the packet QUEUE enqueue packet to userspace RETURN return to previous (calling) chain foobar user defined chain Targets implemented as loadable modules REJECT drop the packet but inform sender MIRROR change source/destination IP and resend LOG log via syslog ULOG log via userspace %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Connection Tracking Subsystem Connection tracking... implemented seperately from NAT enables stateful filtering implementation hooks into NF_IP_PRE_ROUTING to track packets hooks into NF_IP_POST_ROUTING and NF_IP_LOCAL_IN to see if packet passed filtering rules protocol modules (currently TCP/UDP/ICMP) application helpers currently (FTP,IRC,H.323,talk,SNMP) divides packets in the following four categories NEW - would establish new connection ESTABLISHED - part of already established connection RELATED - is related to established connection INVALID - (multicast, errors...) does _NOT_ filter packets itself can be utilized by iptables using the 'state' match is used by NAT Subsystem %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfillter/iptables Connection Tracking Subsystem Common structures struct ip_conntrack_tuple, representing unidirectional flow layer 3 src + dst layer 4 protocol layer 4 src + dst connetions represented as struct ip_conntrack original tuple reply tuple timeout l4 state private data app helper app helper private data expected connections %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfillter/iptables Connection Tracking Subsystem Flow of events for new packet packet enters NF_IP_PRE_ROUTING tuple is derived from packet lookup conntrack hash table with hash(tuple) -> fails new ip_conntrack is allocated fill in original and reply == inverted(original) tuple initialize timer assign app helper if applicable see if we've been expected -> fails call layer 4 helper 'new' function ... packet enters NF_IP_POST_ROUTING do hashtable lookup for packet -> fails place struct ip_conntrack in hashtable %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfillter/iptables Connection Tracking Subsystem Flow of events for packet part of existing connection packet enters NF_IP_PRE_ROUTING tuple is derived from packet lookup conntrack hash table with hash(tuple) assosiate conntrack entry with skb->nfct call l4 protocol helper 'packet' function do l4 state tracking update timeouts as needed [i.e. TCP TIME_WAIT,...] ... packet enters NF_IP_POST_ROUTING do hashtable lookup for packet -> succeds do nothing else %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Network Address Translation Overview Previous Linux Kernels only implemented one special case of NAT: Masquerading Linux 2.4.x can do any kind of NAT. NAT subsystem implemented on top of netfilter, iptables and conntrack NAT subsystem registers with all five netfilter hooks 'nat' Table registers chains PREROUTING, POSTROUTING and OUTPUT Following targets available within 'nat' Table SNAT changes the packet's source whille passing NF_IP_POST_ROUTING DNAT changes the packet's destination while passing NF_IP_PRE_ROUTING MASQUERADE is a special case of SNAT REDIRECT is a special case of DNAT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Network Address Translation flow of events for NEW packet: packet enters NF_IP_PRE_ROUTING after conntrack resolve conntrack entry for packet if (expectfn of helper) call it else iterate over rules in PREROUTING chain of nat table save respective NAT mappings in conntrack apply the NAT mappings to the packet call NAT helper function, if there is one for this proto ... packet enters NF_IP_POST_ROUTING resolve conntrack entry for packet iterate over rules in POSTROUTING chain of nat table save respectiva NAT mappings in conntrack apply the NAT mappings to the packet call NAT helper function, if there is one for this proto %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Network Address Translation flow of events for ESTABLISHED packets: packet enters NF_IP_PRE_ROUTING after conntrack reseolve conntrack entry for packet apply the NAT mappings (read from conntrack entry) to the packet call NAT helper function, if there is one for this proto ... packet enters NF_IP_POST_ROUTING resolve conntrack entry for packet apply the NAT mappings (read from conntrack entry) to the packet call NAT helper function, if there is one for this proto %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Network Address Translation Source NAT SNAT Example: %font "typewriter" %size 3 iptables -t nat -A POSTROUTING -j SNAT --to-source 1.2.3.4 -s 10.0.0.0/8 %font "standard" %size 4 MASQUERADE Example: %font "typewriter" %size 3 iptables -t nat -A POSTROUTING -j MASQUERADE -o ppp0 %font "standard" %size 5 Destination NAT DNAT example %font "typewriter" %size 3 iptables -t nat -A PREROUTING -j DNAT --to-destination 1.2.3.4:8080 -p tcp --dport 80 -i eth1 %font "standard" %size 4 REDIRECT example %font "typewriter" %size 3 iptables -t nat -A PREROUTING -j REDIRECT --to-port 3128 -i eth1 -p tcp --dport 80 %font "standard" %size 5 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Packet Mangling Purpose of mangle table packet manipulation except address manipulation Integration with netfilter 'mangle' table hooks in all five netfilter hooks priority: after conntrack Targets specific to the 'mangle' table: DSCP - manipulate DSCP field IPV4OPTSSTRIP - strip IPv4 options MARK - change the nfmark field of the skb TCPMSS - set TCP MSS option TOS - manipulate the TOS bits TTL - set / increase / decrease TTL field Simple example: %font "typewriter" %size 3 iptables -t mangle -A PREROUTING -j MARK --set-mark 10 -p tcp --dport 80 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Advanced Netfilter concepts %size 4 Userspace logging flexible replacement for old syslog-based logging packets to userspace via multicast netlink sockets easy-to-use library (libipulog) plugin-extensible userspace logging daemon (ulogd) Can even be used to directly log into MySQL Queuing reliable asynchronous packet handling packets to userspace via unicast netlink socket easy-to-use library (libipq) provides Perl bindings experimental queue multiplex daemon (ipqmpd) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The netfilter/iptables architecture Current Development and Future Netfilter (although it proved very stable) is still work in progress. Areas of current development infrastructure for conntrack manipulation from userspace failover of stateful firewalls making iptables layer3 independent (pkttables) new userspace library (libiptables) to hide plugins from apps more matches and targets for advanced functions (pool, hashslot) more conntrack and NAT modules (RPC, SNMP, SMB, ...) better IPv6 support (conntrack, more matches / targets) conntrack hash optimizations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Developing netfilter/iptables extensions Developing a netfilter module Netfilter modules are very low-layer Get called for every packet passing the hook in this l3prot Examples of netfilter modules are: ip_tables, ip_conntrack, iptable_nat API for netfilter : %font "typewriter" nf_register_hook(struct nf_hook_ops *reg) nf_unregister_hook(struct nf_hook_ops *reg) struct nf_hook_ops: struct list_head list; /* list header {NULL,NULL}) */ nf_hookfn *hook; /* the callback function */ int pf; /* protocol family */ int hooknum; /* hook to register with */ int priority; /* priority, determines order */ %font "standard" Example code see "nf_workshop.c" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Developing netfilter/iptables extensions Developing an ip_tables match module ip_tables modules are at a high layer Get called for every packet iterating a rule with this match Examples of iptables modules are: ipt_ttl, ipt_tos, ipt_tcpmss API for iptables matches : %font "typewriter" ipt_register_match(struct ipt_match *match) ipt_unregister_match(struct ipt_match *match) struct ipt_match: struct list_head list; /* list header {NULL,NULL} */ const char name[]; /* name of the match */ int (*match); /* called when pkt is matched */ int (*checkentry); /* called when entry inserted */ void (*destroy); /* called when entry deleted */ struct modulea *me; /* set to THIS_MODULE */ %font "standard" Example code see "ipt_workshop.c" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Developing netfilter/iptables extensions Developing an iptables match module Something has to parse the commandline optins for ipt_workshop.c Solution: libpt_workshop.c as iptables plugin API for iptables-command plugins : %font "typewriter" register_match(struct iptables_match) struct iptables_match: struct iptables_match *next; /* next one */ ipt_chainlabel name; /* name */ const char *version; /* version */ size_t size; /* size of match data */ size_t userspacesize; /* size for userspace */ void (*help); /* print help message */ void (*init); /* init the matchinfo */ int (*parse); /* parse getopt chars */ void (*final_check); /* consistency check */ void (*print); /* print (iptables -L) */ void (*save); /* iptables-save */ struct option extra_opts; /* getopt-style opts */ %font "typewriter" Example code see "libipt_workshop.c" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Future of Linux packet filtering Thanks The slides and the an according paper of this presentation are available at http://www.gnumonks.org/ The netfilter homepage http://www.netfilter.org/ Thanks to the BBS people, Z-Netz, FIDO, ... for heavily increasing my computer usage in 1992 KNF for bringing me in touch with the internet as early as 1994 for providing a playground for technical people for telling me about the existance of Linux! Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen for implementing (one of?) the world's best TCP/IP stacks Paul 'Rusty' Russell for starting the netfilter/iptables project for trusting me to maintain it today Astaro AG for sponsoring parts of my netfilter work for sponsoring my travel cost to 5CLT