blob: 7c52001d5e0a0744eac672c6f53f9e630e08af09 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
|
%include "default.mgp"
%default 1 bgrad
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
%nodefault
%back "blue"
%center
%size 7
A tour of the
Linux 2.6 network stack
%center
%size 4
by
Harald Welte <laforge@hmw-consulting.de>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Contents
Introduction
Hardirq Context
Hard Interrupt Handler
Softirq Context
Network RX Softirq
IPv4 Packet Handler
IPv4 Packet Forwarding
IPv4 Packet Output
Driver TX routine
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Introduction
Who is speaking to you?
an independent Free Software developer
who earns his living off Free Software since 1997
who is one of the authors of the Linux kernel firewall system called netfilter/iptables
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Interrupt context
Also called 'hardirq'
Triggered by external interrupt to the cpu
Is not reentrant, because the irq is disabled before handler is called
Should only do minimum of work and leave as fast as possible
hardirq handler registered via request_irq()
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Receive Interrupt
NIC receives packet for local mac address
NIC issues interrupt
Interrupt is routed to one CPU
Kernel enters hardirq context and disables this irq on local cpu
Driver's interrupt handler
allocates skb (struct sk_buff)
calls net/core/dev.c:netif_rx()
return irqreturn_t
Kernel leaves hardirq context and reenables this irq
2.6.x introduces NAPI for polling at high irq rates: netif_rx_schedule()
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Softirq context
Softirq is the real workhorse of interrupts
Continues work where hardirq has finished
Can be interrupted by hardirq context
Can run in parallel on any number of CPU's
softirq handler registered via kernel/softirq.c:open_softirq()
softirq's need to be 'raised' by raise_softirq() from hardirq
softirq's are scheduled
after hardirq context exits
from softirqd in case there's too much work
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Network RX Softirq
kernel/softirq.c:do_softirq()
generic softirq code
net/core/dev.c:net_rx_action()
function that is registered at open_softirq() time
net/core/dev.c:process_backlog()
dequeue skb from local CPU's backlog queue
uses a weighting scheme between different devices
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
netif_receive_skb()
net/core/dev.c:netif_receive_skb()
main network rx softirq workhorse
check if there are any netpoll users, if yes netpoll_rx()
if somebody requested skb rx timestamp, net_timestamp()
if interface is part of bound group, skb_bound()
tc ingress filtering: ing_filter()
packet diverter: handle_diverter()
bridging handler: net/core/dev.c:handle_bridge()
deliver to l3 protocol handler: net/core/dev.c:deliver_skb()
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
IPv4 packet handler
net/ipv4/ip_input.c:ip_rcv()
checksum check
size check
NF_IP_PRE_ROUTING netfilter hook
net/ipv4/ip_input.c:ip_rcv_finish()
net/ipv4/route.c/ip_route_input()
route/dst cache lookup
if lookup fails, ip_route_input_slow()
fib lookup
allocation of new dst_entry / rtable
include/net/dst.h:dst_input()
iterate over destination stack
call destination function of the respective stack items
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
IPv4 packet forwarding
net/ipv4/ip_forward.c:ip_forward()
xfrm4_policy_check()
router alert handling (ip_call_ra_chain)
ttl decrement
if route is redirect route, ip_rt_send_redirect()
call NF_IP_FORWARD netfilter hook
net/ipv4/ip_forward.c:ip_forward_finish()
increase statistics for snmp mib
include/net/dst.h:dst_output()
iterate over output functions of dst stack
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
IPv4 packet output
net/ipv4/ip_output.c:ip_output()
fragment packet via ip_fragment() if needed
net/ipv4/ip_output.c:ip_finish_output()
call netfilter NF_IP_POST_ROUTING hook
net/ipv4/ip_output.c:ip_finish_output2()
attach hardware header
call header cache output fn (if neighbour in cache)
net/core/dev.c:dev_skb_xmit()
or neighbour output function (if neighbour unknown)
net/core/neighbour.c:neigh_resolve_output()
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
dev_skb_xmit()
skb->dev->qdisc->enqueue()
enqueue into devices output queue
default: net/sched/sch_generic.c:pfifo_fast_enqueue()
net/sched/sch_generic.c:qdisc_restart():
dev->qdisc->dequeue()
dequeue skb from queue
dev->hard_start_xmit()
transmit skb via driver
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Driver TX Routine
drivers/net/e1000/e1000_main.c:e1000_xmit_frame()
tons of workarounds for chip bugs
set up TX DMA descriptor
queue TX DMA descriptor to device hardware
return NETDEV_TX_OK
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%page
Linux 2.6 Network Tour
Thanks
Thanks to
Alan Cox, Alexey Kuznetsov, David Miller, Andi Kleen
for implementing (one of?) the world's best TCP/IP stacks
Paul 'Rusty' Russell
for starting the netfilter/iptables project
for trusting me to maintain it today
Astaro AG
for sponsoring parts of my netfilter work
Free Software Foundation
for the GNU Project
for the GNU General Public License
%size 3
The slides of this presentation are available at http://www.gnumonks.org/
Further Reading
%size 3
The netfilter homepage http://www.netfilter.org/
%size 3
The http://www.gpl-violations.org/ project
|