diff options
Diffstat (limited to '2006/hardware_kerneltuning_netperf-slac')
-rw-r--r-- | 2006/hardware_kerneltuning_netperf-slac/gliederung.txt | 84 | ||||
-rw-r--r-- | 2006/hardware_kerneltuning_netperf-slac/network_performance.mgp | 236 | ||||
-rw-r--r-- | 2006/hardware_kerneltuning_netperf-slac/network_performance.pdf | bin | 0 -> 21494 bytes |
3 files changed, 320 insertions, 0 deletions
diff --git a/2006/hardware_kerneltuning_netperf-slac/gliederung.txt b/2006/hardware_kerneltuning_netperf-slac/gliederung.txt new file mode 100644 index 0000000..ec51802 --- /dev/null +++ b/2006/hardware_kerneltuning_netperf-slac/gliederung.txt @@ -0,0 +1,84 @@ + +- hardware selection is important + - linux runs on about anything from a cellphone to a mainframe + - good system performance depends on optimum selection of components + - sysadmins and managers have to undestand importance of hardware choice + - determine hardware needs before doing purchase ! + +- network usage patterns + - TCP server workload (web server, ftp server, samba, nfs-tcp) + - high-bandwidth TCP end-host performance + - UDP server workload (nfs udp) + - don't use it on gigabit speeds, data integrity problems! + - Router (Packet filter / IPsec / ... ) workload + - packet forwarding has fundamentally different requirements + - none of the offloading tricks works in this case + - important limit: pps, not bandwidth! + +- todays PC hardware + - CPU often is extremely fast + 2GHz CPU: 0.5nS clock cycle + L1/L2 cache access (four bytes): 2..3 clock cycles + - everything that is not in L1 or L2 cache is like a disk access + 40..180 clock cycles on Opteron (DDR-333) + 250.460 clock cycles on Xeon (DDR-333) + - I/O read + easily up to 3600 clock cycles for a register read on NIC + this happens synchronously, no other work can be executed! + - disk access + don't talk about them ;) +- hardware for high performance networking + - CPU + - cache + - as much cache as possible + - shared cache (in multi-core setup) is great + - SMP or not + - problem: increased code complexity + - problem: cache line ping-pong (on real SMP) + - depends on workload + - depends on number of interfaces! + - Pro: IPsec, tc, complex routing + - Con: NAT-only box + - RAM + - as fast as possible + - Bus architecture + - as little bridges as possible + - host bridge, PCI-X / PXE bridge + NIC chipset enough! + - check bus speeds + - real interrupts (PCI, PCI-X) have lower latency than message-signalled interrupts (MSI) + - NIC selection + - NIC hardware + avoid additional bridges (fourport cards) + PCI-X: 64bit, highest clock rate, if possible (133MHz) + - NIC driver support + - many optional features + checksum offload + scatter gather DMA + segmentation offload (TSO/GSO) + interrupt flood behaviour (NAPI) + - is the vendor supportive of the developers + - Intel: e100/e1000 docs ! + - is the vendor merging his patches mainline? + - syskonnect vs. Intel + - hard disk + - kernel network stack always is 100% resident in RAM + - therefore, disk performance not important for network stack + - however, one hint: + - for SMTP servers, use battery buffered RAM disks (Gigabyte) + +- tuning + - hardware related + - irq affinity + + - firewall specific + - organize ruleset in tree shape rather than linear list + - conntrack: hashsize / ip_conntrack_max + - log: don't use syslog, rather ulogd-1.x or 2.x + - local sockets + - SO_SNDBUF / SO_RCVBUF should be used by apps + - in recent 2.6.x kenrnels, they can override /proc/sys/net/ipv4/tcp_[rw]mem + - on long fat pipes, increase /proc/sys/net/ipv4/tcp_adv_win_scale + - core network stack + - disable rp_filter, it adds lots of per-packet routing lookups + + - check linux-x.y.z/Documentation/networking/ip-sysctl.txt for more information diff --git a/2006/hardware_kerneltuning_netperf-slac/network_performance.mgp b/2006/hardware_kerneltuning_netperf-slac/network_performance.mgp new file mode 100644 index 0000000..303f527 --- /dev/null +++ b/2006/hardware_kerneltuning_netperf-slac/network_performance.mgp @@ -0,0 +1,236 @@ +%include "default.mgp" +%default 1 bgrad +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +%nodefault +%back "blue" + +%center +%size 7 +Hardware Selection +and Kernel Tuning +for High Performance Networking + +Dec 07, 2006 +SLAC, Berlin + +%center +%size 4 +by + +Harald Welte <laforge@gnumonks.org> + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +About the Speaker + +Who is speaking to you? + an independent Free Software developer + Linux kernel related consulting + development for 10 years + one of the authors of Linux kernel packet filter + busy with enforcing the GPL at gpl-violations.org + working on Free Software for smartphones (openezx.org) + ...and Free Software for RFID (librfid) + ...and Free Software for ePassports (libmrtd) + ...and Free Hardware for RFID (openpcd.org, openbeacon.org) + ...and the worlds first Open GSM Phone (openmoko.com) + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Hardware selection is important + +Hardware selection is important + linux runs on about anything from a cellphone to a mainframe + good system performance depends on optimum selection of components + sysadmins and managers have to undestand importance of hardware choice + determine hardware needs before doing purchase ! + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Network usage patterns + +Network usage patterns + + TCP server workload (web server, ftp server, samba, nfs-tcp) + high-bandwidth TCP end-host performance + UDP server workload (nfs udp) + don't use it on gigabit speeds, data integrity problems! + Router (Packet filter / IPsec / ... ) workload + packet forwarding has fundamentally different requirements + none of the offloading tricks works in this case + important limit: pps, not bandwidth! +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Contemporary PC hardware + +Contemporary PC hardware + + CPU often is extremely fast + 2GHz CPU: 0.5nS clock cycle + L1/L2 cache access (four bytes): 2..3 clock cycles + everything that is not in L1 or L2 cache is like a disk access + 40..180 clock cycles on Opteron (DDR-333) + 250.460 clock cycles on Xeon (DDR-333) + I/O read + easily up to 3600 clock cycles for a register read on NIC + this happens synchronously, no other work can be executed! + disk access + don't talk about it. Like getting a coke from the moon. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Hardware selection + +Hardware selection + CPU + cache + as much cache as possible + shared cache (in multi-core setup) is great + SMP or not + problem: increased code complexity + problem: cache line ping-pong (on real SMP) + depends on workload + depends on number of interfaces! + Pro: IPsec, tc, complex routing + Con: NAT-only box + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Hardware selection + +Hardware selection + RAM + as fast as possible + use chipsets with highest possible speed + amd64 (Opteron, ..) + has per-cpu memory controller + doesn't waste system bus bandwidth for RAM access + Intel + has a traditional 'shared system bus' architecture + RAM is system-wide and not per-CPU + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Hardware selection + +Hardware selection + Bus architecture + as little bridges as possible + host bridge, PCI-X / PXE bridge + NIC chipset enough! + check bus speeds + real interrupts (PCI, PCI-X) have lower latency than message-signalled interrupts (MSI) + some boards use PCIe chipset and then additional PCIe-to-PCI-X bridge :( + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Hardware selection + +Hardware selection + NIC selection + NIC hardware + avoid additional bridges (fourport cards) + PCI-X: 64bit, highest clock rate, if possible (133MHz) + NIC driver support + many optional features + checksum offload + scatter gather DMA + segmentation offload (TSO/GSO) + interrupt flood behaviour (NAPI) + is the vendor supportive of the developers + Intel: e100/e1000 docs public! + is the vendor merging his patches mainline? + Syskonnect (bad) vs. Intel (good) + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Hardware selection + +Hardware selection + hard disk + kernel network stack always is 100% resident in RAM + therefore, disk performance not important for network stack + however, one hint: + for SMTP servers, use battery buffered RAM disks (Gigabyte) + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Network Stack Tuning + +Network Stack Tuning + hardware related + prevent multiple NICs from sharing one irq line + can be checked in /proc/interrupts + highly dependent on specific mainboard/chipset + configure irq affinity + in an SMP system, interrupts can be bound to one CPU + irq affinity should be set to assure all packets from one interface are handled on same CPU (cache locality) + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Network Stack Tuning + +Network Stack Tuning + 32bit or 64bit kernel? + most contemporary x86 systems support x86_64 + biggest advantage: larger address space for kernel memory + however, problem: all pointers now 8bytes instead of 4 + thus, increase of in-kernel data structures + thus, decreased cache efficiency + in packet forwarding applications, ca. 10% less performance + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Network Stack Tuning + +Network Stack Tuning + firewall specific + organize ruleset in tree shape rather than linear list + conntrack: hashsize / ip_conntrack_max + log: don't use syslog, rather ulogd-1.x or 2.x + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Network Stack Tuning + +Network Stack Tuning + local sockets + SO_SNDBUF / SO_RCVBUF should be used by apps + in recent 2.6.x kenrnels, they can override /proc/sys/net/ipv4/tcp_[rw]mem + on long fat pipes, increase /proc/sys/net/ipv4/tcp_adv_win_scale + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Network Stack Tuning + +Network Stack Tuning + core network stack + disable rp_filter, it adds lots of per-packet routing lookups + check linux-x.y.z/Documentation/networking/ip-sysctl.txt for more information + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%page +Network Performance & Tuning +Links + +Links + The Linux Advanced Routing and Traffic Control HOWTO + http://www.lartc.org/ + The netdev mailinglist + netdev@vger.kernel.org + diff --git a/2006/hardware_kerneltuning_netperf-slac/network_performance.pdf b/2006/hardware_kerneltuning_netperf-slac/network_performance.pdf Binary files differnew file mode 100644 index 0000000..399cf5f --- /dev/null +++ b/2006/hardware_kerneltuning_netperf-slac/network_performance.pdf |