--author Harald Welte --title What's been happening in the netfilter world --date 16 Jul 2005 This is an overview about what has been going on in the netfilter world recently. The main purpose is to keep the rest of the linux kenrel networking crowd informed. --footer This presentation is made with tpp http://synflood.at/tpp.html --newpage --footer netconf'05 - netfilter update --header Overview rustynat nfnetlink ctnetlink flow-based accounting conntrack tool helpers (pptp, h.323, sip) pkttables ipset ct_sync transparent proxies misc --newpage --footer netconf'05 - netfilter update --header rustynat Three years ago, the "newnat" design was adopted as architecture and API for conntrack/nat helpers. This is what most people are using, and what's in kernel 2.4.x and 2.6.x (for x < 11). In 2.6.11, a new scheme (which I call "rustynat") was integrated. Fundamental changes: struct ip_conntrack no longer has sibling_list struct ip_conntrack_expect is killed when expected conntrack comes in NAT helpers are now called by callback functions from conntrack helpers cleanup of NAT manip data structures to reduce size of ip_conntrack Problems: All existing helpers need to be ported (non-trivial port) Some fallout related to sequence number updates in NAT helper case --newpage --footer netconf'05 - netfilter update --header nfnetlink Fundamental idea is to have a generic layer for all netfilter related netlink messages. It basically adds another layer of abstraction/multiplexing on top of netlink. Is it really needed? Looking at the real users, they are extremely different: ctnetlink dump/read/flush/update connection tracking table dump/read/flush/update connection tracking expectation table ulog-ng log arbitrary (even non-ip) packets to userspace nf_queue queue arbitrary (even non-ip) packets to userspace pkttnetlink ruleset management --newpage --footer netconf'05 - netfilter update --header ctnetlink Purpose of ctnetlink is to have a userspace interface to the conntrack table message types IPCTNL_MSG_CT_NEW - create a new conntrack IPCTNL_MSG_CT_DELETE - delete a conntrack, flush table IPCTNL_MSG_CT_GET - read one or more conntracks IPCTNL_MSG_CT_GET_CTRZERO - read conntrack and zero counters IPCTNL_MSG_EXP_NEW - create a new expect IPCTNL_MSG_EXP_DELETE - delete an expect IPCTNL_MSG_EXP_GET - read one or more expects IPCTNL_MSG_CONFIG - configuration of masks (see later) --newpage --footer netconf'05 - netfilter update --header conntrack event cache ctnetlink also wants to have events, i.e. inform userspace about updates ip_conntrack was extended to build an 'event cache', i.e. a list of events that have happened while one specific packet passes throught the stack: IPCT_DESTROY IPCT_NEW IPCT_RELATED IPCT_STATUS IPCT_PROTOINFO IPCT_HELPER IPCT_HELPINFO IPCT_NATINFO When packet traversal finishes, a notifier is called with the bitmask of accumulated events for this packet (skb->nfcache) Event API is used by ct_sync and ctnetlink --newpage --footer netconf'05 - netfilter update --header ctnetlink ctnetlink registers with the event API and sends ctnetlink multicast msgs ctnetlink event messages are either NEW, NEW with F_UPDATE or DELETE Problem: There can be lots of events. We can easily see 200,000 NEW conntracks per second Interim Solution: Have userspace app specify the bitmask of interesting events via IPCTNL_MSG_CONFIG. This defeats use by multiple incooperative apps. --newpage --footer netconf'05 - netfilter update --header ctnetlink Proposed Real Solution: Have generic netlink event message filters. - Every socket can set it's local bitmask of events using setsockopt() - netlink core maintains ORed event mask that is used by ctnetlink - Whenever a socket disappears (or changes its mask), we recalculate the global mask This scheme should really be generic, since other subsystems with potentially many messages can profit from it. --newpage --footer netconf'05 - netfilter update --header conntrack tool To test and use ctnetlink, Pablo Neira wrote the "conntrack" tool Basically "iproute2" for conntrack: -L [table] [-z] List conntrack or expect table -G [table] params Show conntrac or expect -D [table] params Delete conntrack or expect -I [table] params Create conntrack or expect -E [table] [options] Show events (equals "ip route monitor") --newpage --footer netconf'05 - netfilter update --header flow-based accounting Linux misses good accounting solution. Lots of people use inefficient net-acct/nacctd, ip-acct, ulog-acct, ... Specialized solutions exist (ipt_ACCOUNT, ...) but are limited in scope Most people want to have flow-based instead of packet-based logs NETFLOW (or now IPFIX) format can be used by standard tools for analysis Idea: We already have a flow cache in the kernel Problem: It's read-only per packet But: ip_conntrack already has per-packet write acccess So: We can put counters in same already-written-to ip_conntrack cache line Userspace interface is ctnetlink (either polling or event-based) Simplistic implementation can use "conntrack" tool and pipe to perl script Fully-featured logging daemon (ulogd2) is in the final implementation stage See my OLS 2005 paper for more details --newpage --footer netconf'05 - netfilter update --header helpers PPTP helper is now finally ported to rustynat will be merged soon since I'm tired of syncing it with core changes H.323 now has a simplified ASN.1 parser instead of brute-force replace needs more testing but could probably be merged soon, too SIP first development version showed up extremely complex protocol, helper can only cover common cases some features (like host names in SDP) cannot be solved in-kernel --newpage --footer netconf'05 - netfilter update --header pkttables Sorry, no real progress since last year. Too much other work :( We'll have to wait a bit longer until we see the next linux packet filter.. --newpage --footer netconf'05 - netfilter update --header nf_conntrack nf_conntrack is the layer3-independent connection tracking code (ipv4+ipv6) - Code is still kept in-sync with ip_conntrack changes - We still don't have IPv4-NAT on top of it - Should already have been submitted a long time ago - Problem: you can only have ip_conntrack or nf_conntrack loaded at once - All the existing users ('state' and 'conntrack' iptables match, ..) can't deal with it transparently. - Should get fixed up, but like many ipv6 issues it has low prio :( --newpage --footer netconf'05 - netfilter update --header ipset http://ipset.netfilter.org/ - Supersedes old ippool code - Idea is to have certain groups of addresses (called "sets") - Instead of having 100 iptables rules to match on 100 addresses, you have 1 iptables rule and an ipset with 100 addresses - It's more optimal since it has efficient data types (such as a 256bit long bitmask for any N addresses out of a /24) - Should IMHO get merged soon, too. --newpage --footer netconf'05 - netfilter update --header ct_sync - Development of 2.6.x port seems to have stabilized now - We're not seeing any oopses for quite some time - Still doesn't support working failover for 'helped' connections - 2.6.x branch allows one node to participate in multiple virtual clusters - Currently working on real active-active failover - Current code based on 2.6.10, so no "rustynat" port yet --newpage --footer netconf'05 - netfilter update --header transparent proxying In 2.2.x we had the kludy bind-to-foreign-address code In 2.4.x it was removed because netfilter had to clean up core networking code Now we have huge bloaty TPROXY patches out-of-tree instead: - they do DNAT of incoming connection - SNAT on outgoing connection - use SO_GETORIGDST on incoming connection to retrieve un-nat'ed addr While the code is working fine, I think it's just not worth the effort: - NATing _twice_ just to route packets to local sockets, plus - kludgy socket options and other nasty stuff.... Al we need is - route certain packets to local sockets (based on destip/destport) - bind local processes to foreign addresses (already works) - send packets from sockets bound to foreign addreses Transparent proxies with ctnetlink-issued expectations is what you want to enable conntrack helpers in userspace! --newpage --footer netconf'05 - netfilter update --header misc - new sourcecode directory structure: /net/netfilter/* for core stuff - ipsec interaction -> Patrick - conntrack reference issue (rmmod ip_conntrack vs. nf_reset() vs. local nat vs. GETORIGDST) not netfilter-related - would somebody mind 'alias' devices that had their own mac address?