summaryrefslogtreecommitdiff
path: root/2005/flow-accounting-ols2005/OLS2005/welte/welte.tex
diff options
context:
space:
mode:
Diffstat (limited to '2005/flow-accounting-ols2005/OLS2005/welte/welte.tex')
-rw-r--r--2005/flow-accounting-ols2005/OLS2005/welte/welte.tex408
1 files changed, 408 insertions, 0 deletions
diff --git a/2005/flow-accounting-ols2005/OLS2005/welte/welte.tex b/2005/flow-accounting-ols2005/OLS2005/welte/welte.tex
new file mode 100644
index 0000000..aeb461c
--- /dev/null
+++ b/2005/flow-accounting-ols2005/OLS2005/welte/welte.tex
@@ -0,0 +1,408 @@
+% The file must begin with this \documentclass declaration. You can
+% give one of three different options which control how picky LaTeX
+% is when typesetting:
+%
+% galley - All ``this doesn't fit'' warnings are suppressed, and
+% references are disabled (the key will be printed as a
+% reminder). Use this mode while writing.
+%
+% proof - All ``this doesn't fit'' warnings are active, as are
+% references. Overfull hboxes make ugly black blobs in
+% the margin. Use this mode to tidy up formatting after
+% you're done writing. (Same as article's ``draft'' mode.)
+%
+% final - As proof, but the ugly black blobs are turned off. Use
+% this to render PDFs or PostScript to give to other people,
+% when you're completely done. (As with article, this is the
+% default.)
+%
+% You can also use the leqno, fleqn, or openbib options to article.cls
+% if you wish. None of article's other options will work.
+
+%%%
+%%% PLEASE CHANGE 'galley' to 'final' BEFORE SUBMITTING. THANKS!
+%%% (to submit: "make clean" in the toplevel directory; tar and gzip *only* your directory;
+%%% email the gzipped tarball to papers@linuxsymposium.org.)
+%%%
+\documentclass[final]{ols}
+
+% These two packages allow easy handling of urls and identifiers per the example paper.
+\usepackage{url}
+\usepackage{zrl}
+
+% The following package is not required, but is a handy way to put PDF and EPS graphics
+% into your paper using the \includegraphics command.
+\ifpdf
+\usepackage[pdftex]{graphicx}
+\else
+\usepackage{graphicx}
+\fi
+
+
+% Here in the preamble, you may load additional packages, or
+% define whatever macros you like, with the following exceptions:
+%
+% - Do not mess with the page layout, either by hand or with packages
+% (e.g., typearea, geometry).
+% - Do not change the principal fonts, either by hand or with packages.
+% - Do not use \pagestyle, or load any page header-related packages.
+% - Do not redefine any commands having to do with article titles.
+% - If you are using something that is not part of the standard
+% tetex-2 distribution, please make a note of whether it's on CTAN,
+% or include a copy with your submission.
+%
+
+\begin{document}
+
+% Mandatory: article title specification.
+% Do not put line breaks or other clever formatting in \title or
+% \shortauthor; these are moving arguments.
+
+\title{Flow-based network accounting with Linux}
+\subtitle{ } % Subtitle is optional.
+\date{} % You can put a fixed date in if you wish,
+ % allow LaTeX to use the date of typesetting,
+ % or use \date{} to have no date at all.
+ % Whatever you do, there will not be a date
+ % shown in the proceedings.
+
+\shortauthor{Harald Welte} % Just you and your coauthors' names.
+% for example, \shortauthor{A.N.\ Author and A.\ Nother}
+% or perchance \shortauthor{Smith, Jones, Black, White, Gray, \& Greene}
+
+\author{% Authors, affiliations, and email addresses go here, like this:
+Harald Welte \\
+{\itshape netfilter core team / hmw-consulting.de / Astaro AG} \\
+{\ttfamily\normalsize laforge@netfilter.org}\\
+% \and
+% Bob \\
+% {\itshape Bob's affiliation.}\\
+% {\ttfamily\normalsize bob@example.com}\\
+} % end author section
+
+\maketitle
+
+\begin{abstract}
+% Article abstract goes here.
+\input{welte-abstract.tex}
+\end{abstract}
+
+% Body of your article goes here. You are mostly unrestricted in what
+% LaTeX features you can use; however, the following will not work:
+% \thispagestyle
+% \marginpar
+% table of contents
+% list of figures / tables
+% glossaries
+% indices
+
+\section{Network accounting}
+
+Network accounting generally describes the process of counting and potentially
+summarizing metadata of network traffic. The kind of metadata is largely
+dependant on the particular application, but usually includes data such as
+numbers of packets, numbers of bytes, source and destination ip address.
+
+There are many reasons for doing accounting of networking traffic, among them
+
+\begin{itemize}
+\item transfer volume or bandwisth based billing
+\item monitoring of network utilization, bandwidth distribution and link usage
+\item research, such as distribution of traffic among protocols, average packet size, ...
+\end{itemize}
+
+\section{Existing accounting solutions for Linux}
+
+There are a number of existing packages to do network accounting with Linux.
+The following subsections intend to give a short overview about the most
+commonly used ones.
+
+
+\subsection{nacctd}
+
+\ident{nacctd} also known as \ident{net-acct} is probably the oldest known tool
+for network accounting under Linux (also works on other Unix-like operating
+systems). The author of this paper has used
+\ident{nacctd} as an accounting tool as early as 1995. It was originally
+developed by Ulrich Callmeier, but apparently abandoned later on. The
+development seems to have continued in multiple branches, one of them being
+the netacct-mysql\footnote{http://netacct-mysql.gabrovo.com} branch,
+currently at version 0.79rc2.
+
+It's principle of operation is to use an \lident{AF_PACKET} socket
+via \ident{libpcap} in order to capture copies of all packets on configurable
+network interfaces. It then does TCP/IP header parsing on each packet.
+Summary information such as port numbers, IP addresses, number of bytes are
+then stored in an internal table for aggregation of successive packets of the
+same flow. The table entries are evicted and stored in a human-readable ASCII
+file. Patches exist for sending information directly into SQL databases, or
+saving data in machine-readable data format.
+
+As a pcap-based solution, it suffers from the performance penalty of copying
+every full packet to userspace. As a packet-based solution, it suffers from
+the penalty of having to interpret every single packet.
+
+\subsection{ipt\_LOG based}
+
+The Linux packet filtering subsystem iptables offers a way to log policy
+violations via the kernel message ring buffer. This mechanism is called
+\ident{ipt_LOG} (or \texttt{LOG target}). Such messages are then further
+processed by \ident{klogd} and \ident{syslogd}, which put them into one or
+multiple system log files.
+
+As \ident{ipt_LOG} was designed for logging policy violations and not for
+accounting, it's overhead is significant. Every packet needs to be
+interpreted in-kernel, then printed in ASCII format to the kernel message ring
+buffer, then copied from klogd to syslogd, and again copied into a text file.
+Even worse, most syslog installations are configured to write kernel log
+messages synchronously to disk, avoiding the usual write buffering of the block
+I/O layer and disk subsystem.
+
+To sum up and anlyze the data, often custom perl scripts are used. Those perl
+scripts have to parse the LOG lines, build up a table of flows, add the packet
+size fields and finally export the data in the desired format. Due to the inefficient storage format, performance is again wasted at analyzation time.
+
+\subsection{ipt\_ULOG based (ulogd, ulog-acctd)}
+
+The iptables \texttt{ULOG target} is a more efficient version of
+the \texttt{LOG target} described above. Instead of copying ascii messages via
+the kernel ring buffer, it can be configured to only copies the header of each
+packet, and send those copies in large batches. A special userspace process,
+normally ulogd, receives those partial packet copies and does further
+interpretation.
+
+\ident{ulogd}\footnote{http://gnumonks.org/projects/ulogd} is intended for
+logging of security violations and thus resembles the functionality of LOG. it
+creates one logfile entry per packet. It supports logging in many formats,
+such as SQL databases or PCAP format.
+
+\ident{ulog-acctd}\footnote{http://alioth.debian.org/projects/pkg-ulog-acctd/}
+is a hybrid between \ident{ulogd} and \ident{nacctd}. It replaces the
+\ident{nacctd} libpcap/PF\_PACKET based capture with the more efficient
+ULOG mechanism.
+
+Compared to \ident{ipt_LOG}, \ident{ipt_ULOG} reduces the amount of copied data
+and required kernel/userspace context switches and thus improves performance.
+However, the whole mechanism is still intended for logging of security
+violations. Use for accounting is out of its design.
+
+\subsection{iptables based (ipac-ng)}
+
+Every packet filtering rule in the Linux packet filter (\ident{iptables}, or
+even its predecessor \ident{ipchains}) has two counters: number of packets and
+number of bytes matching this particular rule.
+
+By carefully placing rules with no target (so-called \textit{fallthrough})
+rules in the packetfilter ruleset, one can implement an accounting setup, i.e.
+one rule per customer.
+
+A number of tools exist to parse the iptables command output and summarized the
+counters. The most commonly used package is
+\ident{ipac-ng}\footnote{http://sourceforge.net/projects/ipac-ng/}. It
+supports advanced features such as storing accounting data in SQL databases.
+
+The approach works quite efficiently for small installations (i.e. small number
+of accounting rules). Therefore, the accounting granularity can only be very
+low. One counter for each single port number at any given ip address is certainly not applicable.
+
+\subsection{ipt\_ACCOUNT (iptaccount)}
+
+\ident{ipt_ACCOUNT}\footnote{http://www.intra2net.com/opensource/ipt\_account/}
+is a special-purpose iptables target developed by Intra2net AG and available
+from the netfilter project patch-o-matic-ng repository. It requires kernel
+patching and is not included in the mainline kernel.
+
+\ident{ipt_ACCOUNT} keeps byte counters per IP address in a given subnet, up to
+a '/8' network. Those counters can be read via a special \ident{iptaccount}
+commandline tool.
+
+Being limited to local network segments up to '/8' size, and only having per-ip
+granularity are two limiteations that defeat \ident{ipt_ACCOUNT}
+as a generich accounting mechainism. It's highly-optimized, but also
+special-purpose.
+
+\subsection{ntop (including PF\_RING)}
+
+\ident{ntop}\footnote{http://www.ntop.org/ntop.html} is a network traffic
+probe to show network usage. It uses \ident{libpcap} to capture
+the packets, and then aggregates flows in userspace. On a fundamental level
+it's therefore similar to what \ident{nacctd} does.
+
+From the ntop project, there's also \ident{nProbe}, a network traffic probe
+that exports flow based information in Cisco NETFLOW v5/v9 format. It also
+contains support for the upcoming IETF IPFIX\footnote{IP Flow Information
+Export http://www.ietf.org/html.charters/ipfix-charter.html} format.
+
+To increase performance of the probe, the author (Luca Deri) has implemented
+\lident{PF_RING}\footnote{http://www.ntop.org/PF\_RING.html}, a new
+zero-copy mmap()ed implementation for packet capture. There is a libpcap
+compatibility layer on top, so any pcap-using application can benefit from
+\lident{PF_RING}.
+
+\lident{PF_RING} is a major performance improvement, please look at the
+documentation and the paper published by Luca Deri.
+
+However, \ident{ntop} / \ident{nProbe} / \lident{PF_RING} are all packet-based
+accounting solutions. Every packet needs to be analyzed by some userspace
+process - even if there is no copying involved. Due to \lident{PF_RING}
+optimiziation, it is probably as efficient as this approach can get.
+
+\section{New ip\_conntrack based accounting}
+
+The fundamental idea is to (ab)use the connection tracking subsystem of the
+Linux 2.4.x / 2.6.x kernel for accounting purposes. There are several reasons
+why this is a good fit:
+\begin{itemize}
+\item It already keeps per-connection state information. Extending this information to contain a set of counters is easy.
+\item Lots of routers/firewalls are already running it, and therefore paying it's performance penalty for security reasons. Bumping a couple of counters will introduce very little additional penalty.
+\item There was already an (out-of-tree) system to dump connection tracking information to userspace, called ctnetlink
+\end{itemize}
+
+So given that a particular machine was already running \ident{ip_conntrack},
+adding flow based acconting to it comes almost for free. I do not advocate the
+use of \ident{ip_conntrack} merely for accounting, since that would be again a
+waste of performance.
+
+\subsection{ip\_conntrack\_acct}
+
+\ident{ip_conntrack_acct} is how the in-kernel
+\ident{ip_conntrack} counters are called. There is a set of four
+counters: numbers of packets and bytes for original and reply
+direction of a given connection.
+
+If you configure a recent (>= 2.6.9) kernel, it will prompt you for
+\lident{CONFIG_IP_NF_CT_ACCT}. By enabling this configuration option, the
+per-connection counters will be added, and the accounting code will
+be compiled in.
+
+However, there is still no efficient means of reading out those counters. They
+can be accessed via \textit{cat /proc/net/ip\_conntrack}, but that's not a real
+solution. The kernel iterates over all connections and ASCII-formats the data.
+Also, it is a polling-based mechanism. If the polling interval is too short,
+connections might get evicted from the state table before their final counters
+are being read. If the interval is too small, performance will suffer.
+
+To counter this problem, a combination of conntrack notifiers and ctnetlink is being used.
+
+\subsection{conntrack notifiers}
+
+Conntrack notifiers use the core kernel notifier infrastructure
+(\texttt{struct notifier\_block}) to notify other parts of the
+kernel about connection tracking events. Such events include creation,
+deletion and modification of connection tracking entries.
+
+The \texttt{conntrack notifiers} can help us overcome the polling architecture.
+If we'd only listen to \textit{conntrack delete} events, we would always get
+the byte and packet counters at the end of a connection.
+
+However, the events are in-kernel events and therefore not directly suitable
+for an accounting application to be run in userspace.
+
+\subsection{ctnetlink}
+
+\ident{ctnetlink} (short form for conntrack netlink) is a
+mechanism for passing connection tracking state information between kernel and
+userspace, originally developed by Jay Schulist and Harald Welte. As the name
+implies, it uses Linux \lident{AF_NETLINK} sockets as its underlying
+communication facility.
+
+The focus of \ident{ctnetlink} is to selectively read or dump
+entries from the connection tracking table to userspace. It also allows
+userspace processes to delete and create conntrack entries as well as
+\textit{conntrack expectations}.
+
+The initial nature of \ident{ctnetlink} is therefore again
+polling-based. An userspace process sends a request for certain information,
+the kernel responds with the requested information.
+
+By combining \texttt{conntrack notifiers} with \ident{ctnetlink}, it is possible
+to register a notifier handler that in turn sends
+\ident{ctnetlink} event messages down the \lident{AF_NETLINK} socket.
+
+A userspace process can now listen for such \textit{DELETE} event messages at
+the socket, and put the counters into it's accounting storage.
+
+There are still some shortcomings inherent to that \textit{DELETE} event
+scheme: We only know the amount of traffic after the connection is over. If a
+connection lasts for a long time (let's say days, weeks), then it is impossible
+to use this form of accounting for any kind of quota-based billing, where the
+user would be informed (or disconnected, traffic shaped, whatever) when he
+exceeds his quota. Also, the conntrack entry does not contain information
+about when the connection started - only the timestamp of the end-of-connection
+is known.
+
+To overcome limitation number one, the accounting process can use a combined
+event and polling scheme. The granularity of accounting can therefore be
+configured by the polling interval, and a compromise between performance and
+accuracy can be made.
+
+To overcome the second limitation, the accounting process can also listen for
+\textit{NEW} event messages. By correlating the \textit{NEW} and
+\textit{DELETE} messages of a connection, accounting datasets containign start
+and end of connection can be built.
+
+\subsection{ulogd2}
+
+As described earlier in this paper, \ident{ulogd} is a userspace
+packet filter logging daemon that is already used for packet-based accounting,
+even if it isn't the best fit.
+
+\ident{ulogd2}, also developed by the author of this paper, takes logging
+beyond per-packet based information, but also includes support for
+per-connection or per-flow based data.
+
+Instead of supporting only \ident{ipt_ULOG} input, a number of
+interpreter and output plugins, \ident{ulogd2} supports a concept
+called \textit{plugin stacks}. Multiple stacks can exist within one deamon.
+Any such stack consists out of plugins. A plugin can be a source, sink or
+filter.
+
+Sources acquire per-packet or per-connection data from
+\ident{ipt_ULOG} or \ident{ip_contnrack_acct}.
+
+Filters allow the user to filter or aggregate information. Filtering is
+requird, since there is no way to filter the ctnetlink event messages within
+the kernel. Either the functionality is enabled or not. Multiple connections
+can be aggregated to a larger, encompassing flow. Packets could be aggregated
+to flows (like \ident{nacctd}), and flows can be aggregated to
+even larger flows.
+
+Sink plugins store the resulting data to some form of non-volatile storage,
+such as SQL databases, binary or ascii files. Another sink is a NETFLOW or
+IPFIX sink, exporting information in industy-standard format for flow based accounting.
+
+\subsection{Status of implementation}
+
+\ident{ip_conntrack_acct} is already in the kernel since 2.6.9.
+
+\ident{ctnetlink} and the \texttt{conntrack event notifiers} are considered
+stable and will be submitted for mainline inclusion soon. Both are available
+from the patch-o-matic-ng repository of the netfilter project.
+
+At the time of writing of this paper, \ident{ulogd2} development
+was not yet finished. However, the ctnetlink event messages can already be
+dumped by the use of the "conntrack" userspace program, available from the
+netfilter project.
+
+The "conntrack" prorgram can listen to the netlink event socket and dump the
+information in human-readable form (one ASCII line per ctnetlink message) to
+stdout. Custom accounting solutions can read this information from stdin,
+parse and process it according to their needs.
+
+\section{Summary}
+
+Despite the large number of available accounting tools, the author is confident that inventing yet another one is worthwhile.
+
+Many existing implementations suffer from performance issues by design. Most
+of them are very special-purpose. nProbe/ntop together with \lident{PF_RING}
+are probably the most universal and efficient solution for any accounting
+problem.
+
+Still, the new \ident{ip_conntrack_acct}, \ident{ctnetlink} based mechanism
+described in this paper has a clear performance advantage if you want to do
+acconting on your Linux-based stateful packetfilter - which is a common
+case. The firewall is suposed to be at the edge of your network, exactly where
+you usually do accounting of ingress and/or egress traffic.
+
+\end{document}
+
personal git repositories of Harald Welte. Your mileage may vary