summaryrefslogtreecommitdiff
path: root/2004/netfilter-bof-ols2004/netfilter-bof-ols2004.mgp
blob: ccf8ba4f57d337c9df941cb2def051b9bb227f2b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
%include "default.mgp"
%default 1 bgrad
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
%nodefault
%back "blue"

%center
%size 7


Netfilter BOF



%center
%size 4
by

Harald Welte <laforge@netfilter.org>

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
Contents


	Problems with current 2.4/2.6 netfilter/iptables
		Solution to code replication
		Solution for dynamic rulesets
		Solution for API to GUI's and other management programs

	Other current work
		nf_conntrack - l3 independent connection tracking
		ulogd2 - conntrack based flow accounting (ipfix)
		qsearch - efficient in-kernel pattern matching
		ctstat - runtime conntrack statistics
		ipset - replacement for ippool
		benchmarking at gigagbit wirespeed

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
Problem with 2.4/2.6 netfilter/iptables

	code replication between iptables/ip6tables/arptables/ebtables
		iptables was never meant for other protocols, but people did copy+paste 'ports'
		replication of
			core kernel code
			layer 3 independent matches (mac, interface, ...)
			userspace library (libiptc)
			userspace tool (iptables)
			userspace plugins (libipt_xxx.so)

	doesn't suit the needs for dynamically changing rulesets
		dynamic rulesets becomming more common due (service selection, IDS)
		a whole table is created in userspace and sent as blob to kernel
		for every ruleset the table needs to be copied to userspace and back
		inside kernel consistency checks on whole table, loop detection

%page
Netfilter BOF
Problem with 2.4/2.6 netfilter/iptables

	too extensible for writing any forward-compatible GUI
		new extensions showing up all the time
		a frontend would need to know about the options and use of a new extension
		thus frontends are always incomplete and out-of-date
		no high-level API other than piping to iptables-restore

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
Reducing code replication

	code replication is a real problem: unclean, bugfixes missed
	we need layer 3 independent layer for
		submitting rules to the kernel
		traversing packet-rulesets supporting match/target modules
		registering matches/targets
			layer 3 specific (like matching ipv4 address)
			layer 3 independent (like matching MAC address)

	solution
		pkt_tables inside kernel
			pkt_tables_ipv4 registers layer 3 handler with pkt_tables
			pkt_tables_ipv6 registers layer 3 handler with pkt_tables
			everybody registering a pkt_table (like iptable_filter) needs to specify the l3 protocol
		libraries in userspace (see later)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
Supporting dynamic rulesets

	atomic table-replacement turned out to be bad idea
	need new interface for sending individual rules to kernel
	policy routing has the same problem and good solution: rtnetlink
	solution: nfnetlink
		multicast-netlink based packet-orinented socket between kernel and userspace
		has extra benefit that other userspace processes get notified of rule changes [just like routing daemons]
		nfnetlink will be low-layer below all kernel/userspace communication
			pkttnetlink [aka iptnetlink]
			ctnetlink
			ulog
			ip_queue

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
Communication with other programs

whole set of libraries
	libnfnetlink for low-layer communication
	libpkttnetlink for rule modifications
		will handle all plugins [which are currently part of iptables]
		query functions about avaliable matches/targets
		query functions about parameters
		query functions for help messages about specific match/parameter of a match
		generic structure from which rules can be built
		conversion functions to parse generic structure into in-kernel structure
		conversion functions to perse kernel structure into generic structure
		functions to convert generic structure in plain text
	libipq will stay API-compatible to current version
	libipulog will stay API-compatible to current version
	libiptc will go away [compatibility layer extremely difficult]

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page 
Netfilter BOF
Optimizing rule load time

	Current situation
		loading 10,000 rules in 1,000 chains takes about 4 minutes on a PIII 733Mhz
		this is caused by two bottlenecks
			loop detection algorithm on kernel side inefficient
			a couple of O^2 complexity functions in libiptc

	Solution
		efficient loop detection and mark_source_chains() algorithm (graph coloring)
		current CVS libiptc with only one O^2 function: 2minutes37
		whole reimplementation of libiptc needed for removing the last O^2 function 



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
nf_conntrack

	USAGI did a port of ip_conntrack to ip6_conntrack
		same code replication we're fighting with ip[6]tables :(
	netfilter core team had ideas about layer 3 independent conntrack
	Yasuyuki Kozakai implemented nf_conntrack based on those ideas
	Implementation is now clean, available from CVS
	Needs re-sync with all the ip_conntrack changes of the last months
	Needs support for ipv4 and ipv4<->ipv6 transition NAT

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
ulogd2

	Linux doesn't currently offer any sane accounting system
		nacctd - needs all packets via PF_PACKET in userspace
		ulogd - uses efficient netlink socket, but still packet based
	Solution: add per-direction packet and byte counters to ip_conntrack
		combination with ctnetlink delete events
		needs userspace daemon for further processing
		is related to what IETF ipfix working group doees
	Redesign of ulogd to ulogd2:
		no difference between input and output plugins
		stack of plugins like: ctnetlink->ipfix
		other possible stack: ULOG->interpreter->flow_aggregator->mysql
		implementation on underway, author highly motivated ;)


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
qsearch

	Conntrack helpers (FTP, IRC, ...) often have to do pattern-matching
	Some people like to employ ipt_string matching
	This all became more complex through nonlinear/fragmented skb's
	Solution:
		Implement a single pattern-matching api to be used from all places
		Starting point: Rusty's skb_iter() and libqsearch
		Turns out that libqsearch API needs more work
		Many similarities to cryptoAPI


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
ctstat

	Martin Josefsson wrote ctstat
		similar to rtstat of Robert Olsson
		runtime per-cpu statistics of
			number of conntracks
			how many lookups
			how many found
			how many new
			how many invalid packets
			how many ignored packets
			how many deleted conntracks
			how many instered conntrack
			how many icmp errors
			how many new expects
			how many deleted expects

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
ipset

	Implemented by Jozsef Kadlecsik
	Efficient way to handle a whole set of addresses in single rule
		also provides target to add addresses into set
		currently implemented: ipmap, macipmap, portmap and iphash
		ipmap uses bitmask where each bit represents one ip address
		ipmacmap uses memory range with 8 byte per IP/mac
		portmap uses memory range where each bit represents one port
		iphash uses fixed size hash (for random adresses)


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
benchmarking at gigagbit wirespeed

	Harald did lots of benchmarking
		Dual Opteron machines
		e1000 Gigabit adapters with irq-affinity
		2.4.x / 2.6.x kernel, both 32bit and 64bit
	Results to be published soon
	Performance problems mostly ip_tables related, not ip_conntrack

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Netfilter BOF
Thanks

	Thanks to
		the BBS scenee, Z-Netz, FIDO, ...
			for heavily increasing my computer usage in 1992
		KNF
			for bringing me in touch with the internet as early as 1994
			for providing a playground for technical people
			for introducing me to the existance of Linux!
		Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen
			for implementing (one of?) the world's best TCP/IP stacks
		Paul 'Rusty' Russell
			for starting the netfilter/iptables project
			for trusting me to maintain it today
		Astaro AG
			for sponsoring my netfilter failover work

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
HA for netfilter/iptables
Availability of slides / Links

The slides 
	http://www.gnumonks.org/

The netfilter homepage
	http://www.netfilter.org/

My Sponsor, Astaro AG
	http://www.astaro.com/
personal git repositories of Harald Welte. Your mileage may vary