summaryrefslogtreecommitdiff
path: root/2006/hardware_kerneltuning_netperf-slac
diff options
context:
space:
mode:
Diffstat (limited to '2006/hardware_kerneltuning_netperf-slac')
-rw-r--r--2006/hardware_kerneltuning_netperf-slac/gliederung.txt84
-rw-r--r--2006/hardware_kerneltuning_netperf-slac/network_performance.mgp236
-rw-r--r--2006/hardware_kerneltuning_netperf-slac/network_performance.pdfbin0 -> 21494 bytes
3 files changed, 320 insertions, 0 deletions
diff --git a/2006/hardware_kerneltuning_netperf-slac/gliederung.txt b/2006/hardware_kerneltuning_netperf-slac/gliederung.txt
new file mode 100644
index 0000000..ec51802
--- /dev/null
+++ b/2006/hardware_kerneltuning_netperf-slac/gliederung.txt
@@ -0,0 +1,84 @@
+
+- hardware selection is important
+ - linux runs on about anything from a cellphone to a mainframe
+ - good system performance depends on optimum selection of components
+ - sysadmins and managers have to undestand importance of hardware choice
+ - determine hardware needs before doing purchase !
+
+- network usage patterns
+ - TCP server workload (web server, ftp server, samba, nfs-tcp)
+ - high-bandwidth TCP end-host performance
+ - UDP server workload (nfs udp)
+ - don't use it on gigabit speeds, data integrity problems!
+ - Router (Packet filter / IPsec / ... ) workload
+ - packet forwarding has fundamentally different requirements
+ - none of the offloading tricks works in this case
+ - important limit: pps, not bandwidth!
+
+- todays PC hardware
+ - CPU often is extremely fast
+ 2GHz CPU: 0.5nS clock cycle
+ L1/L2 cache access (four bytes): 2..3 clock cycles
+ - everything that is not in L1 or L2 cache is like a disk access
+ 40..180 clock cycles on Opteron (DDR-333)
+ 250.460 clock cycles on Xeon (DDR-333)
+ - I/O read
+ easily up to 3600 clock cycles for a register read on NIC
+ this happens synchronously, no other work can be executed!
+ - disk access
+ don't talk about them ;)
+- hardware for high performance networking
+ - CPU
+ - cache
+ - as much cache as possible
+ - shared cache (in multi-core setup) is great
+ - SMP or not
+ - problem: increased code complexity
+ - problem: cache line ping-pong (on real SMP)
+ - depends on workload
+ - depends on number of interfaces!
+ - Pro: IPsec, tc, complex routing
+ - Con: NAT-only box
+ - RAM
+ - as fast as possible
+ - Bus architecture
+ - as little bridges as possible
+ - host bridge, PCI-X / PXE bridge + NIC chipset enough!
+ - check bus speeds
+ - real interrupts (PCI, PCI-X) have lower latency than message-signalled interrupts (MSI)
+ - NIC selection
+ - NIC hardware
+ avoid additional bridges (fourport cards)
+ PCI-X: 64bit, highest clock rate, if possible (133MHz)
+ - NIC driver support
+ - many optional features
+ checksum offload
+ scatter gather DMA
+ segmentation offload (TSO/GSO)
+ interrupt flood behaviour (NAPI)
+ - is the vendor supportive of the developers
+ - Intel: e100/e1000 docs !
+ - is the vendor merging his patches mainline?
+ - syskonnect vs. Intel
+ - hard disk
+ - kernel network stack always is 100% resident in RAM
+ - therefore, disk performance not important for network stack
+ - however, one hint:
+ - for SMTP servers, use battery buffered RAM disks (Gigabyte)
+
+- tuning
+ - hardware related
+ - irq affinity
+
+ - firewall specific
+ - organize ruleset in tree shape rather than linear list
+ - conntrack: hashsize / ip_conntrack_max
+ - log: don't use syslog, rather ulogd-1.x or 2.x
+ - local sockets
+ - SO_SNDBUF / SO_RCVBUF should be used by apps
+ - in recent 2.6.x kenrnels, they can override /proc/sys/net/ipv4/tcp_[rw]mem
+ - on long fat pipes, increase /proc/sys/net/ipv4/tcp_adv_win_scale
+ - core network stack
+ - disable rp_filter, it adds lots of per-packet routing lookups
+
+ - check linux-x.y.z/Documentation/networking/ip-sysctl.txt for more information
diff --git a/2006/hardware_kerneltuning_netperf-slac/network_performance.mgp b/2006/hardware_kerneltuning_netperf-slac/network_performance.mgp
new file mode 100644
index 0000000..303f527
--- /dev/null
+++ b/2006/hardware_kerneltuning_netperf-slac/network_performance.mgp
@@ -0,0 +1,236 @@
+%include "default.mgp"
+%default 1 bgrad
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+%nodefault
+%back "blue"
+
+%center
+%size 7
+Hardware Selection
+and Kernel Tuning
+for High Performance Networking
+
+Dec 07, 2006
+SLAC, Berlin
+
+%center
+%size 4
+by
+
+Harald Welte <laforge@gnumonks.org>
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+About the Speaker
+
+Who is speaking to you?
+ an independent Free Software developer
+ Linux kernel related consulting + development for 10 years
+ one of the authors of Linux kernel packet filter
+ busy with enforcing the GPL at gpl-violations.org
+ working on Free Software for smartphones (openezx.org)
+ ...and Free Software for RFID (librfid)
+ ...and Free Software for ePassports (libmrtd)
+ ...and Free Hardware for RFID (openpcd.org, openbeacon.org)
+ ...and the worlds first Open GSM Phone (openmoko.com)
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Hardware selection is important
+
+Hardware selection is important
+ linux runs on about anything from a cellphone to a mainframe
+ good system performance depends on optimum selection of components
+ sysadmins and managers have to undestand importance of hardware choice
+ determine hardware needs before doing purchase !
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Network usage patterns
+
+Network usage patterns
+
+ TCP server workload (web server, ftp server, samba, nfs-tcp)
+ high-bandwidth TCP end-host performance
+ UDP server workload (nfs udp)
+ don't use it on gigabit speeds, data integrity problems!
+ Router (Packet filter / IPsec / ... ) workload
+ packet forwarding has fundamentally different requirements
+ none of the offloading tricks works in this case
+ important limit: pps, not bandwidth!
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Contemporary PC hardware
+
+Contemporary PC hardware
+
+ CPU often is extremely fast
+ 2GHz CPU: 0.5nS clock cycle
+ L1/L2 cache access (four bytes): 2..3 clock cycles
+ everything that is not in L1 or L2 cache is like a disk access
+ 40..180 clock cycles on Opteron (DDR-333)
+ 250.460 clock cycles on Xeon (DDR-333)
+ I/O read
+ easily up to 3600 clock cycles for a register read on NIC
+ this happens synchronously, no other work can be executed!
+ disk access
+ don't talk about it. Like getting a coke from the moon.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Hardware selection
+
+Hardware selection
+ CPU
+ cache
+ as much cache as possible
+ shared cache (in multi-core setup) is great
+ SMP or not
+ problem: increased code complexity
+ problem: cache line ping-pong (on real SMP)
+ depends on workload
+ depends on number of interfaces!
+ Pro: IPsec, tc, complex routing
+ Con: NAT-only box
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Hardware selection
+
+Hardware selection
+ RAM
+ as fast as possible
+ use chipsets with highest possible speed
+ amd64 (Opteron, ..)
+ has per-cpu memory controller
+ doesn't waste system bus bandwidth for RAM access
+ Intel
+ has a traditional 'shared system bus' architecture
+ RAM is system-wide and not per-CPU
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Hardware selection
+
+Hardware selection
+ Bus architecture
+ as little bridges as possible
+ host bridge, PCI-X / PXE bridge + NIC chipset enough!
+ check bus speeds
+ real interrupts (PCI, PCI-X) have lower latency than message-signalled interrupts (MSI)
+ some boards use PCIe chipset and then additional PCIe-to-PCI-X bridge :(
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Hardware selection
+
+Hardware selection
+ NIC selection
+ NIC hardware
+ avoid additional bridges (fourport cards)
+ PCI-X: 64bit, highest clock rate, if possible (133MHz)
+ NIC driver support
+ many optional features
+ checksum offload
+ scatter gather DMA
+ segmentation offload (TSO/GSO)
+ interrupt flood behaviour (NAPI)
+ is the vendor supportive of the developers
+ Intel: e100/e1000 docs public!
+ is the vendor merging his patches mainline?
+ Syskonnect (bad) vs. Intel (good)
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Hardware selection
+
+Hardware selection
+ hard disk
+ kernel network stack always is 100% resident in RAM
+ therefore, disk performance not important for network stack
+ however, one hint:
+ for SMTP servers, use battery buffered RAM disks (Gigabyte)
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Network Stack Tuning
+
+Network Stack Tuning
+ hardware related
+ prevent multiple NICs from sharing one irq line
+ can be checked in /proc/interrupts
+ highly dependent on specific mainboard/chipset
+ configure irq affinity
+ in an SMP system, interrupts can be bound to one CPU
+ irq affinity should be set to assure all packets from one interface are handled on same CPU (cache locality)
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Network Stack Tuning
+
+Network Stack Tuning
+ 32bit or 64bit kernel?
+ most contemporary x86 systems support x86_64
+ biggest advantage: larger address space for kernel memory
+ however, problem: all pointers now 8bytes instead of 4
+ thus, increase of in-kernel data structures
+ thus, decreased cache efficiency
+ in packet forwarding applications, ca. 10% less performance
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Network Stack Tuning
+
+Network Stack Tuning
+ firewall specific
+ organize ruleset in tree shape rather than linear list
+ conntrack: hashsize / ip_conntrack_max
+ log: don't use syslog, rather ulogd-1.x or 2.x
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Network Stack Tuning
+
+Network Stack Tuning
+ local sockets
+ SO_SNDBUF / SO_RCVBUF should be used by apps
+ in recent 2.6.x kenrnels, they can override /proc/sys/net/ipv4/tcp_[rw]mem
+ on long fat pipes, increase /proc/sys/net/ipv4/tcp_adv_win_scale
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Network Stack Tuning
+
+Network Stack Tuning
+ core network stack
+ disable rp_filter, it adds lots of per-packet routing lookups
+ check linux-x.y.z/Documentation/networking/ip-sysctl.txt for more information
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%page
+Network Performance & Tuning
+Links
+
+Links
+ The Linux Advanced Routing and Traffic Control HOWTO
+ http://www.lartc.org/
+ The netdev mailinglist
+ netdev@vger.kernel.org
+
diff --git a/2006/hardware_kerneltuning_netperf-slac/network_performance.pdf b/2006/hardware_kerneltuning_netperf-slac/network_performance.pdf
new file mode 100644
index 0000000..399cf5f
--- /dev/null
+++ b/2006/hardware_kerneltuning_netperf-slac/network_performance.pdf
Binary files differ
personal git repositories of Harald Welte. Your mileage may vary