summaryrefslogtreecommitdiff
path: root/2004/linux2.6-networktour-lb2004/linux2.6-networktour-lb2004.mgp
blob: 7c52001d5e0a0744eac672c6f53f9e630e08af09 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
%include "default.mgp"
%default 1 bgrad
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
%nodefault
%back "blue"



%center
%size 7
A tour of the 
Linux 2.6 network stack


%center
%size 4
by

Harald Welte <laforge@hmw-consulting.de>


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Contents


	Introduction
	Hardirq Context
	Hard Interrupt Handler
	Softirq Context
	Network RX Softirq
	IPv4 Packet Handler
	IPv4 Packet Forwarding
	IPv4 Packet Output
	Driver TX routine


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page 
Linux 2.6 Network Tour
Introduction


Who is speaking to you?
		an independent Free Software developer
		who earns his living off Free Software since 1997
		who is one of the authors of the Linux kernel firewall system called netfilter/iptables


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Interrupt context

	Also called 'hardirq'
	Triggered by external interrupt to the cpu
	Is not reentrant, because the irq is disabled before handler is called
	Should only do minimum of work and leave as fast as possible

	hardirq handler registered via request_irq()

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Receive Interrupt

	NIC receives packet for local mac address
	NIC issues interrupt
	Interrupt is routed to one CPU
	Kernel enters hardirq context and disables this irq on local cpu
	Driver's interrupt handler
		allocates skb (struct sk_buff)
		calls net/core/dev.c:netif_rx()
		return irqreturn_t	
	Kernel leaves hardirq context and reenables this irq

	2.6.x introduces NAPI for polling at high irq rates: netif_rx_schedule()


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Softirq context

	Softirq is the real workhorse of interrupts
	Continues work where hardirq has finished
	Can be interrupted by hardirq context
	Can run in parallel on any number of CPU's

	softirq handler registered via kernel/softirq.c:open_softirq()

	softirq's need to be 'raised' by raise_softirq() from hardirq
	softirq's are scheduled
		after hardirq context exits
		from softirqd in case there's too much work

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Network RX Softirq


	kernel/softirq.c:do_softirq()
		generic softirq code
	net/core/dev.c:net_rx_action()
		function that is registered at open_softirq() time
	net/core/dev.c:process_backlog()
		dequeue skb from local CPU's backlog queue
		uses a weighting scheme between different devices

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
netif_receive_skb()


	net/core/dev.c:netif_receive_skb()
		main network rx softirq workhorse
		check if there are any netpoll users, if yes netpoll_rx()
		if somebody requested skb rx timestamp, net_timestamp()
		if interface is part of bound group, skb_bound()
		tc ingress filtering: ing_filter()
		packet diverter: handle_diverter()
		bridging handler: net/core/dev.c:handle_bridge()
		deliver to l3 protocol handler: net/core/dev.c:deliver_skb()

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
IPv4 packet handler


	net/ipv4/ip_input.c:ip_rcv()
		checksum check
		size check
		NF_IP_PRE_ROUTING netfilter hook
	net/ipv4/ip_input.c:ip_rcv_finish()
		net/ipv4/route.c/ip_route_input()
			route/dst cache lookup
			if lookup fails, ip_route_input_slow()
				fib lookup
				allocation of new dst_entry / rtable
	include/net/dst.h:dst_input()
		iterate over destination stack
		call destination function of the respective stack items


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
IPv4 packet forwarding


	net/ipv4/ip_forward.c:ip_forward()
		xfrm4_policy_check()
		router alert handling (ip_call_ra_chain)
		ttl decrement
		if route is redirect route, ip_rt_send_redirect()
		call NF_IP_FORWARD netfilter hook
	net/ipv4/ip_forward.c:ip_forward_finish()
		increase statistics for snmp mib
	include/net/dst.h:dst_output()
		iterate over output functions of dst stack

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
IPv4 packet output


	net/ipv4/ip_output.c:ip_output()
		fragment packet via ip_fragment() if needed
	net/ipv4/ip_output.c:ip_finish_output()
		call netfilter NF_IP_POST_ROUTING hook
	net/ipv4/ip_output.c:ip_finish_output2()
		attach hardware header
		call header cache output fn (if neighbour in cache) 
			net/core/dev.c:dev_skb_xmit()
		or neighbour output function (if neighbour unknown)
			net/core/neighbour.c:neigh_resolve_output()
	
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
dev_skb_xmit()


	skb->dev->qdisc->enqueue()
		enqueue into devices output queue 
	default: net/sched/sch_generic.c:pfifo_fast_enqueue()
	net/sched/sch_generic.c:qdisc_restart():
		dev->qdisc->dequeue()
			dequeue skb from queue
		dev->hard_start_xmit()
			transmit skb via driver

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Driver TX Routine
	
	drivers/net/e1000/e1000_main.c:e1000_xmit_frame()
		tons of workarounds for chip bugs
		set up TX DMA descriptor
		queue TX DMA descriptor to device hardware
		return NETDEV_TX_OK


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Thanks

	Thanks to
		Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen
			for implementing (one of?) the world's best TCP/IP stacks
		Paul 'Rusty' Russell
			for starting the netfilter/iptables project
			for trusting me to maintain it today
		Astaro AG
			for sponsoring parts of my netfilter work
		Free Software Foundation
			for the GNU Project 
			for the GNU General Public License
%size 3
	The slides of this presentation are available at http://www.gnumonks.org/

	Further Reading
%size 3
	The netfilter homepage http://www.netfilter.org/
%size 3
	The http://www.gpl-violations.org/ project


personal git repositories of Harald Welte. Your mileage may vary