summaryrefslogtreecommitdiff
path: root/2004/netfilter-failover-lk2004/netfilter-failover-lk2004.mgp
blob: 76a9206162c5540a3d6937c296621cd6af244e36 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
%include "default.mgp"
%default 1 bgrad
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
%nodefault
%back "blue"

%center
%size 7


How to replicate the fire
HA for netfilter-based firewalls


%center
%size 4
by

Harald Welte <laforge@netfilter.org>


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Contents


	Introduction
	Connection Tracking Subsystem
	Packet selection based on IP Tables
	The Connection Tracking Subsystem
	The NAT Subsystem
	Poor man's failover
	Real state replication

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page 
HA for netfilter/iptables
Introduction

What is special about firewall failover?

	Nothing, in case of the stateless packet filter
		Common IP takeover solutions can be used
			VRRP
			Heartbeat
	Distribution of packet filtering ruleset no problem
		can be done manually
		or implemented with simple userspace process
	Problems arise with stateful packet filters
		Connection state only on active node
		NAT mappings only on active node


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Connection Tracking Subsystem

Connection tracking...
	enables stateful filtering 
	implementation
		hooks into netfilter to track packets
		protocol modules (currently TCP/UDP/ICMP)
		application helpers currently (FTP,IRC,H.323,talk,SNMP)
	divides packets in the following four categories
		NEW - would establish new connection
		ESTABLISHED - part of already established connection
		RELATED - is related to established connection
		INVALID - (multicast, errors...)
	does _NOT_ filter packets itself
	can be utilized by iptables using the 'state' match 
	is used by NAT Subsystem


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Connection Tracking Subsystem

Common structures
	struct ip_conntrack_tuple, representing unidirectional flow
		layer 3 src + dst 
		layer 4 protocol
		layer 4 src + dst

	connections represented as struct ip_conntrack
		original tuple
		reply tuple
		timeout
		l4 state private data
		app helper
		app helper private data
		expected connections

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Connection Tracking Subsystem

Flow of events for new packet
	packet enters NF_IP_PRE_ROUTING
		tuple is derived from packet
		lookup conntrack hash table with hash(tuple) -> fails
		new ip_conntrack is allocated
			fill in original and reply == inverted(original) tuple
			initialize timer
			assign app helper if applicable
			see if we've been expected -> fails
			call layer 4 helper 'new' function
	...
	packet enters NF_IP_POST_ROUTING
		do hashtable lookup for packet -> fails
		place struct ip_conntrack in hashtable			


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Connection Tracking Subsystem

Flow of events for packet part of existing connection
	packet enters NF_IP_PRE_ROUTING
		tuple is derived from packet
		lookup conntrack hash table with hash(tuple)
		associate conntrack entry with skb->nfct
		call l4 protocol helper 'packet' function
			do l4 state tracking
			update timeouts as needed [i.e. TCP TIME_WAIT,...]
	...
	packet enters NF_IP_POST_ROUTING
		do hashtable lookup for packet -> succeds
		do nothing else


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Network Address Translation

Overview
		Previous Linux Kernels only implemented one special case of NAT: Masquerading
		Linux 2.4.x can do any kind of NAT.
		NAT subsystem implemented on top of netfilter, iptables and conntrack
		NAT subsystem registers with all five netfilter hooks
		'nat' Table registers chains PREROUTING, POSTROUTING and OUTPUT
		Following targets available within 'nat' Table
			SNAT changes the packet's source while passing NF_IP_POST_ROUTING
			DNAT changes the packet's destination while passing NF_IP_PRE_ROUTING
			MASQUERADE is a special case of SNAT
			REDIRECT is a special case of DNAT
		NAT bindings determined only for NEW packet and saved in ip_conntrack
		Further packets within connection NATed according NAT bindings

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Poor man's failover

Poor man's failover
	principle
		let every node do its own tracking rather than replicating state
	two possible implementations
		connect every node to shared media (i.e. real ethernet)
			forwarding only turned on on active node
			slave nodes use promiscuous mode to sniff packets
		copy all traffic to slave nodes
			active master needs to copy all traffic to other nodes
			disadvantage: high load, sync traffic == payload traffic
			IMHO stupid way of solving the problem 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Poor man's failover

Poor man's failover
	advantages
		very easy implementation
			only addition of sniffing mode to conntrack needed
			existing means of address takeover can be used
		same load on active master and slave nodes
		no additional load on active master
	disadvantages
		can only be used with real shared media (no switches, ...)
		can not be used with NAT
	remaining problem
		no initial state sync after reboot of slave node!


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication (ct_sync)

Real state replication (ct_sync)
	characteristics
		replicates state changes from active master to slave(s)
		seperate shared ethernet segment for sync
	advantages
		can be used with any network media
		works with NAT
		initial sync after new slave is introduced
	problems
		complex implementation
	current limitations
		no replication of connection relations (ftp/h.323/...)
	current problems	
		bugs, bugs, bugs

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication (ct_sync)

Required parts
	state replication protocol
		multicast based
		sequence numbers for detection of packet loss
		NACK-based retransmission
		no security, since private ethernet segment to be used
	event interface on active node
		calling out to callback function at all state changes
	exported interface to manipulate conntrack hash table

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication (ct_sync)

Required parts
	kernel thread for sending conntrack state protocol messages
		registers with event interface
		creates and accumulates state replication packets
		sends them via in-kernel sockets api
	kernel thread for receiving conntrack state replication messages
		receives state replication packets via in-kernel sockets
		uses conntrack hashtable manipulation interface
	kernel thread for initial or full re-sync
		sends full conntrack table with fixed speed 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication

Flow of events in chronological order:
	on active node, inside the network RX softirq
		connection tracking code is analyzing a forwarded packet
		connection tracking gathers some new state information
		connection tracking updates local connection tracking database
		connection tracking sends event message to event API
		function registered at event API enqueues message to send ring
	on active node, inside the conntrack-sync kernel thread
		conntrack sync daemon aggregates multiple event messages into a state replication protocol message, removing possible redundancy
		conntrack sync daemon dequeues packets from ring
		conntrack sync daemon sends state replication protocol packet via in-kernel sockets

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication

Flow of events in chronological order:
	on slave node(s), inside network RX softirq
		connection tracking code ignores packets coming from the interface attached to the private conntrac sync network
		state replication protocol messages is appended to socket receive queue of conntrack-sync kernel thread
	on slave node(s), inside conntrack-sync kernel thread
		conntrack sync daemon receives state replication message
		conntrack sync daemon creates/updates conntrack entry

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Real state replication

Neccessary changes to conntrack core
	event generation (callback functions) for all state changes
		is needed (and already implemented) for 'ctnetlink' API
	conntrack hashtable manipulation API
		is needed (and already implemented) for 'ctnetlink' API
	conntrack exemptions
		needed to _not_ track conntrack state replication packets
		is needed for other cases as well (raw table / NOTRACK target)
		works by 
	layer two packet drop (l2netfilter hooks)
		disables any incoming or outgoing packets on other than the sync device on slave nodes


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Usage

To set up a conntrack cluster you need

	hardware
		two firewalls with identical iptables rulesets
		all ethernet interfaces (internal, dmz, external) connected to both nodes
		seperate network segment for conntrack sync device
	software
		configure any working ip address range/subnet to sync device
		assign every node a unique node id (0..255)
		decide which of the nodes is master, which slave


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Usage

To set up a conntrack cluster you need

	configuration on master
		first: modprobe ct_sync syncdev=ethX state=1 id=1 l2drop=1
		second: configure your 'real' devices (internal, external)
	configuration on slave
		modprobe ct_sync syncdev=ethX state=0 id=2 l2drop=1
		second: configure your 'real' devices (internal, external)
		
		after loading ct_sync with l2drop=1, a slave node will be invisible on the 'real' networks.  ssh access is only possible via sync device

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Usage

	Cluster manager
		set up a cluster manager with some heartbeat mechanism
		configure it to run the following command on a slave that is to be propagated to master:
		echo "1" > /proc/net/ct_sync

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Thanks

	Thanks to
		the BBS scenee, Z-Netz, FIDO, ...
			for heavily increasing my computer usage in 1992
		KNF
			for bringing me in touch with the internet as early as 1994
			for providing a playground for technical people
			for introducing me to the existance of Linux!
		Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen
			for implementing (one of?) the world's best TCP/IP stacks
		Paul 'Rusty' Russell
			for starting the netfilter/iptables project
			for trusting me to maintain it today
		Astaro AG
			for sponsoring my netfilter failover work

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Availability of slides / Links

The code
	http://cvs.netfilter.org/netfilter-ha/ct_sync

The slides 
	http://www.gnumonks.org/

The netfilter homepage
	http://www.netfilter.org/

Astaro AG
	http://www.astaro.com/
personal git repositories of Harald Welte. Your mileage may vary