%include "default.mgp" %default 1 bgrad %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %nodefault %back "blue" %center %size 7 Netfilter BOF %center %size 4 by Harald Welte %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF Contents Problems with current 2.4/2.6 netfilter/iptables Solution to code replication Solution for dynamic rulesets Solution for API to GUI's and other management programs Other current work nf_conntrack - l3 independent connection tracking ulogd2 - conntrack based flow accounting (ipfix) qsearch - efficient in-kernel pattern matching ctstat - runtime conntrack statistics ipset - replacement for ippool benchmarking at gigagbit wirespeed %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF Problem with 2.4/2.6 netfilter/iptables code replication between iptables/ip6tables/arptables/ebtables iptables was never meant for other protocols, but people did copy+paste 'ports' replication of core kernel code layer 3 independent matches (mac, interface, ...) userspace library (libiptc) userspace tool (iptables) userspace plugins (libipt_xxx.so) doesn't suit the needs for dynamically changing rulesets dynamic rulesets becomming more common due (service selection, IDS) a whole table is created in userspace and sent as blob to kernel for every ruleset the table needs to be copied to userspace and back inside kernel consistency checks on whole table, loop detection %page Netfilter BOF Problem with 2.4/2.6 netfilter/iptables too extensible for writing any forward-compatible GUI new extensions showing up all the time a frontend would need to know about the options and use of a new extension thus frontends are always incomplete and out-of-date no high-level API other than piping to iptables-restore %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF Reducing code replication code replication is a real problem: unclean, bugfixes missed we need layer 3 independent layer for submitting rules to the kernel traversing packet-rulesets supporting match/target modules registering matches/targets layer 3 specific (like matching ipv4 address) layer 3 independent (like matching MAC address) solution pkt_tables inside kernel pkt_tables_ipv4 registers layer 3 handler with pkt_tables pkt_tables_ipv6 registers layer 3 handler with pkt_tables everybody registering a pkt_table (like iptable_filter) needs to specify the l3 protocol libraries in userspace (see later) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF Supporting dynamic rulesets atomic table-replacement turned out to be bad idea need new interface for sending individual rules to kernel policy routing has the same problem and good solution: rtnetlink solution: nfnetlink multicast-netlink based packet-orinented socket between kernel and userspace has extra benefit that other userspace processes get notified of rule changes [just like routing daemons] nfnetlink will be low-layer below all kernel/userspace communication pkttnetlink [aka iptnetlink] ctnetlink ulog ip_queue %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF Communication with other programs whole set of libraries libnfnetlink for low-layer communication libpkttnetlink for rule modifications will handle all plugins [which are currently part of iptables] query functions about avaliable matches/targets query functions about parameters query functions for help messages about specific match/parameter of a match generic structure from which rules can be built conversion functions to parse generic structure into in-kernel structure conversion functions to perse kernel structure into generic structure functions to convert generic structure in plain text libipq will stay API-compatible to current version libipulog will stay API-compatible to current version libiptc will go away [compatibility layer extremely difficult] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF Optimizing rule load time Current situation loading 10,000 rules in 1,000 chains takes about 4 minutes on a PIII 733Mhz this is caused by two bottlenecks loop detection algorithm on kernel side inefficient a couple of O^2 complexity functions in libiptc Solution efficient loop detection and mark_source_chains() algorithm (graph coloring) current CVS libiptc with only one O^2 function: 2minutes37 whole reimplementation of libiptc needed for removing the last O^2 function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF nf_conntrack USAGI did a port of ip_conntrack to ip6_conntrack same code replication we're fighting with ip[6]tables :( netfilter core team had ideas about layer 3 independent conntrack Yasuyuki Kozakai implemented nf_conntrack based on those ideas Implementation is now clean, available from CVS Needs re-sync with all the ip_conntrack changes of the last months Needs support for ipv4 and ipv4<->ipv6 transition NAT %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF ulogd2 Linux doesn't currently offer any sane accounting system nacctd - needs all packets via PF_PACKET in userspace ulogd - uses efficient netlink socket, but still packet based Solution: add per-direction packet and byte counters to ip_conntrack combination with ctnetlink delete events needs userspace daemon for further processing is related to what IETF ipfix working group doees Redesign of ulogd to ulogd2: no difference between input and output plugins stack of plugins like: ctnetlink->ipfix other possible stack: ULOG->interpreter->flow_aggregator->mysql implementation on underway, author highly motivated ;) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF qsearch Conntrack helpers (FTP, IRC, ...) often have to do pattern-matching Some people like to employ ipt_string matching This all became more complex through nonlinear/fragmented skb's Solution: Implement a single pattern-matching api to be used from all places Starting point: Rusty's skb_iter() and libqsearch Turns out that libqsearch API needs more work Many similarities to cryptoAPI %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF ctstat Martin Josefsson wrote ctstat similar to rtstat of Robert Olsson runtime per-cpu statistics of number of conntracks how many lookups how many found how many new how many invalid packets how many ignored packets how many deleted conntracks how many instered conntrack how many icmp errors how many new expects how many deleted expects %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF ipset Implemented by Jozsef Kadlecsik Efficient way to handle a whole set of addresses in single rule also provides target to add addresses into set currently implemented: ipmap, macipmap, portmap and iphash ipmap uses bitmask where each bit represents one ip address ipmacmap uses memory range with 8 byte per IP/mac portmap uses memory range where each bit represents one port iphash uses fixed size hash (for random adresses) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF benchmarking at gigagbit wirespeed Harald did lots of benchmarking Dual Opteron machines e1000 Gigabit adapters with irq-affinity 2.4.x / 2.6.x kernel, both 32bit and 64bit Results to be published soon Performance problems mostly ip_tables related, not ip_conntrack %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Netfilter BOF Thanks Thanks to the BBS scenee, Z-Netz, FIDO, ... for heavily increasing my computer usage in 1992 KNF for bringing me in touch with the internet as early as 1994 for providing a playground for technical people for introducing me to the existance of Linux! Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen for implementing (one of?) the world's best TCP/IP stacks Paul 'Rusty' Russell for starting the netfilter/iptables project for trusting me to maintain it today Astaro AG for sponsoring my netfilter failover work %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Availability of slides / Links The slides http://www.gnumonks.org/ The netfilter homepage http://www.netfilter.org/ My Sponsor, Astaro AG http://www.astaro.com/