%include "default.mgp" %default 1 bgrad %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %nodefault %back "blue" %center %size 7 How to replicate the fire HA for netfilter-based firewalls %center %size 4 by Harald Welte %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Contents Introduction Connection Tracking Subsystem Packet selection based on IP Tables The Connection Tracking Subsystem The NAT Subsystem Poor man's failover Real state replication %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Introduction What is special about firewall failover? Nothing, in case of the stateless packet filter Common IP takeover solutions can be used VRRP Heartbeat Distribution of packet filtering ruleset no problem can be done manually or implemented with simple userspace process Problems arise with stateful packet filters Connection state only on active node NAT mappings only on active node %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Connection Tracking Subsystem Connection tracking... enables stateful filtering implementation hooks into netfilter to track packets protocol modules (currently TCP/UDP/ICMP) application helpers currently (FTP,IRC,H.323,talk,SNMP) divides packets in the following four categories NEW - would establish new connection ESTABLISHED - part of already established connection RELATED - is related to established connection INVALID - (multicast, errors...) does _NOT_ filter packets itself can be utilized by iptables using the 'state' match is used by NAT Subsystem %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Connection Tracking Subsystem Common structures struct ip_conntrack_tuple, representing unidirectional flow layer 3 src + dst layer 4 protocol layer 4 src + dst connections represented as struct ip_conntrack original tuple reply tuple timeout l4 state private data app helper app helper private data expected connections %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Connection Tracking Subsystem Flow of events for new packet packet enters NF_IP_PRE_ROUTING tuple is derived from packet lookup conntrack hash table with hash(tuple) -> fails new ip_conntrack is allocated fill in original and reply == inverted(original) tuple initialize timer assign app helper if applicable see if we've been expected -> fails call layer 4 helper 'new' function ... packet enters NF_IP_POST_ROUTING do hashtable lookup for packet -> fails place struct ip_conntrack in hashtable %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Connection Tracking Subsystem Flow of events for packet part of existing connection packet enters NF_IP_PRE_ROUTING tuple is derived from packet lookup conntrack hash table with hash(tuple) associate conntrack entry with skb->nfct call l4 protocol helper 'packet' function do l4 state tracking update timeouts as needed [i.e. TCP TIME_WAIT,...] ... packet enters NF_IP_POST_ROUTING do hashtable lookup for packet -> succeds do nothing else %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Network Address Translation Overview Previous Linux Kernels only implemented one special case of NAT: Masquerading Linux 2.4.x can do any kind of NAT. NAT subsystem implemented on top of netfilter, iptables and conntrack NAT subsystem registers with all five netfilter hooks 'nat' Table registers chains PREROUTING, POSTROUTING and OUTPUT Following targets available within 'nat' Table SNAT changes the packet's source while passing NF_IP_POST_ROUTING DNAT changes the packet's destination while passing NF_IP_PRE_ROUTING MASQUERADE is a special case of SNAT REDIRECT is a special case of DNAT NAT bindings determined only for NEW packet and saved in ip_conntrack Further packets within connection NATed according NAT bindings %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Poor man's failover Poor man's failover principle let every node do its own tracking rather than replicating state two possible implementations connect every node to shared media (i.e. real ethernet) forwarding only turned on on active node slave nodes use promiscuous mode to sniff packets copy all traffic to slave nodes active master needs to copy all traffic to other nodes disadvantage: high load, sync traffic == payload traffic IMHO stupid way of solving the problem %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Poor man's failover Poor man's failover advantages very easy implementation only addition of sniffing mode to conntrack needed existing means of address takeover can be used same load on active master and slave nodes no additional load on active master disadvantages can only be used with real shared media (no switches, ...) can not be used with NAT remaining problem no initial state sync after reboot of slave node! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Real state replication (ct_sync) Real state replication (ct_sync) characteristics replicates state changes from active master to slave(s) seperate shared ethernet segment for sync advantages can be used with any network media works with NAT initial sync after new slave is introduced problems complex implementation current limitations no replication of connection relations (ftp/h.323/...) current problems bugs, bugs, bugs %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Real state replication (ct_sync) Required parts state replication protocol multicast based sequence numbers for detection of packet loss NACK-based retransmission no security, since private ethernet segment to be used event interface on active node calling out to callback function at all state changes exported interface to manipulate conntrack hash table %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Real state replication (ct_sync) Required parts kernel thread for sending conntrack state protocol messages registers with event interface creates and accumulates state replication packets sends them via in-kernel sockets api kernel thread for receiving conntrack state replication messages receives state replication packets via in-kernel sockets uses conntrack hashtable manipulation interface kernel thread for initial or full re-sync sends full conntrack table with fixed speed %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Real state replication Flow of events in chronological order: on active node, inside the network RX softirq connection tracking code is analyzing a forwarded packet connection tracking gathers some new state information connection tracking updates local connection tracking database connection tracking sends event message to event API function registered at event API enqueues message to send ring on active node, inside the conntrack-sync kernel thread conntrack sync daemon aggregates multiple event messages into a state replication protocol message, removing possible redundancy conntrack sync daemon dequeues packets from ring conntrack sync daemon sends state replication protocol packet via in-kernel sockets %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Real state replication Flow of events in chronological order: on slave node(s), inside network RX softirq connection tracking code ignores packets coming from the interface attached to the private conntrac sync network state replication protocol messages is appended to socket receive queue of conntrack-sync kernel thread on slave node(s), inside conntrack-sync kernel thread conntrack sync daemon receives state replication message conntrack sync daemon creates/updates conntrack entry %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Real state replication Neccessary changes to conntrack core event generation (callback functions) for all state changes is needed (and already implemented) for 'ctnetlink' API conntrack hashtable manipulation API is needed (and already implemented) for 'ctnetlink' API conntrack exemptions needed to _not_ track conntrack state replication packets is needed for other cases as well (raw table / NOTRACK target) works by layer two packet drop (l2netfilter hooks) disables any incoming or outgoing packets on other than the sync device on slave nodes %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Usage To set up a conntrack cluster you need hardware two firewalls with identical iptables rulesets all ethernet interfaces (internal, dmz, external) connected to both nodes seperate network segment for conntrack sync device software configure any working ip address range/subnet to sync device assign every node a unique node id (0..255) decide which of the nodes is master, which slave %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Usage To set up a conntrack cluster you need configuration on master first: modprobe ct_sync syncdev=ethX state=1 id=1 l2drop=1 second: configure your 'real' devices (internal, external) configuration on slave modprobe ct_sync syncdev=ethX state=0 id=2 l2drop=1 second: configure your 'real' devices (internal, external) after loading ct_sync with l2drop=1, a slave node will be invisible on the 'real' networks. ssh access is only possible via sync device %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Usage Cluster manager set up a cluster manager with some heartbeat mechanism configure it to run the following command on a slave that is to be propagated to master: echo "1" > /proc/net/ct_sync %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Thanks Thanks to the BBS scenee, Z-Netz, FIDO, ... for heavily increasing my computer usage in 1992 KNF for bringing me in touch with the internet as early as 1994 for providing a playground for technical people for introducing me to the existance of Linux! Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen for implementing (one of?) the world's best TCP/IP stacks Paul 'Rusty' Russell for starting the netfilter/iptables project for trusting me to maintain it today Astaro AG for sponsoring my netfilter failover work %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page HA for netfilter/iptables Availability of slides / Links The code http://cvs.netfilter.org/netfilter-ha/ct_sync The slides http://www.gnumonks.org/ The netfilter homepage http://www.netfilter.org/ Astaro AG http://www.astaro.com/