Associating Network Flows with User and Application Information

Ralf Ackermann 1 , Utz Roedig 1 , Michael Zink 1 , Carsten Griwodz 1 , Ralf Steinmetz 1,2

1 - Darmstadt University of Technology - Industrial Process and System Communications (KOM)

Merckstr. 25 - 64283 Darmstadt, Germany

2 - German National Research Center for Information Technology - GMD IPSI

Dolivo-Str. 15 - 64293 Darmstadt, Germany

{Ralf.Ackermann, Utz.Roedig, Michael.Zink, Carsten.Griwodz, Ralf.Steinmetz}@KOM.tu-darmstadt.de

The concept of authenticating users e.g. by means of a login process is very well established and there is no doubt that it is absolutely necessary and helpful in a multiuser environment. Unfortunately specific information about a user originating a data stream or receiving it, is no longer available at the passed network nodes. This applies to the even more specific question about what application is used as well. Routers, gateways or firewalls therefore usually have to base their classification of data on IP header inspection or have to try to extract information from the packets payload.

We present an approach that works transparently and allows to associate user specific and application specific information with IP data streams by only slightly modifying components of the operating system environment and infrastructure components. On top of this framework we place usage scenarios for dedicatedly placing copyright information in media content an for an enhancement of the interoperation with the security infrastructure.

I. Introduction and Motivation

In the internet communication model, only the header information of the specific layer should be used by the network nodes to process (route, filter, interpret ...) the data. In reality though, this strict layered concept is weakened at many points and information normally assigned to different layers is used to process the data. An example is the implementation of QoS routing functions in network nodes. The information of the application layer is necessary in the router devices, which normally should only know about the network layer, to fulfil their tasks. A dedicated information, which should not only be available in its original (application) layer is the one describing the originator or receiver of a data stream. Usually only the application layer should be aware of users, but many processes within a network also benefit from this information. Example scenarios where such a knowledge is very helpful include authentication at firewall systems, logging, admission control, billing but also the placement of copyright information.

In this paper we will describe an approach to map additional information to network streams and show its implications.

II. Existing Approaches

As described, the access to user and / or application relevant information within network nodes is helpful for a number of scenarios. There is an existing classification of information types and several approaches to obtain it.

A. User Information

Some network nodes (e.g. watermarking gateways, firewalls) have to map knowledge about user identities to the data flows, to fulfil their tasks. A network node basically has two possibilities for doing that. First, the node can break the communication at the application layer and force the user to identify himself (explicitly, actively concerning the user part). Second, the node can try to extract the information - if (still) present - by analyzing the application layer part of the communication flow (implicitly, passively concerning the user part). Both methods have drawbacks:

To avoid these drawbacks, out of band signalling can be used. The communication partners can signal the user relevant information in advance before the communication flow itself is sent. Using this method, a standardized protocol can be used and the drawbacks mentioned above can be avoided. Other problems occur though:

The method, which we will describe in this paper is to add the necessary information directly to a network data flow.

B. Application Information

Some network nodes (e.g. QoS router, firewalls) need information about the application that is generating a data stream to process the data.

One important information is the application type that is originating a specific flow. This information can usually be gained by analyzing the transport layer header (TCP/UDP header interpreting the port fields). But in some cases it is necessary to take into account, that a logical session between two endpoints consists of several flows 1 . In this case the first flow normally uses static ports and the node can extract the information about the application type from the transport layer header. The subsequent flows are negotiated dynamically by data exchange on the first channel. The intermediate network node has to retrieve this information from the first channel. The node often tries to treat all the flows that an application uses as a single entity (for example a firewall wants to authenticate a whole session and needs to know about the dependencies of the flows).

The method, which we will describe in this paper is to add also these application related informations directly to

III. Basic Approach

Our basic approach which is shown in Figure 1 assumes the deployment and use of a marking procedure for network data streams at dedicated network nodes (usually end systems but also gateways) of an administrative domain which is under our explicit control (since modifications have to be done at least for one communication partner). The network nodes passed through (e.g. gateways or firewalls which form a dedicated crossing point for traffic entering or leaving the domain) make use of the information. It must be mentioned that this is also applicable for traffic that flows in the opposite direction (originating from sources that are outside) but can be associated with mapped ones (e.g. answers to retrieval requests / bi-directional TCP flows).

We will have to consider both cases - either that the user has a strong interest in supplying and passing this information or does at least does not actively neglect it (e.g. because it allows for a better service or fair billing for him) or that we have to enforce the use of the mechanisms and prevent participants from miss-using or faking it.

Basic description of the approach and concerned components
Figure 1: Basic description of the approach and concerned components

A. Placement of the marking information

There are a number of possible places at which the information can be placed for transmission. The approaches and their advantages and disadvantages are described in the following. By including the information either in the layer 2 or layer 3 header, a fast access to the information at the intermediate nodes (router, proxy, firewall) is possible. In comparison, storing the additional data in the payload is more costly and involves more additional modifications. Both approaches allow the insertion and removal at either the endsystems or intermediate nodes depending on the desired operating mode.

Information placement as part of the MAC-Header

Placing the information in an additional field of the MAC header forms a very general approach. A technique like that is e.g. used for Label Switching [5] . This approach has the advantage that not only IP but also other layer 3 protocols (IPX, ...) can easily be supported. Additionally it has proven to perform well in terms of packet processing speed in routers or switches, since only the MAC header must be examined to gather the desired information.

Information placement as part of the IP header

An alternative place to insert the information is the IP header. Since IPv6 is fully standardized and already used in some testbeds we give a proposal for both of the IP versions IPv4 and IPv6.

IV. Implementation Considerations

We distinguish between an endsystem- and an infrastructure (e.g. by means of firewalls / gateways) -based implementation approach.

Endsystem based - Unix

Our approach decides to insert the information transparently for the user. A convenient place for doing that is as part of a modification of the network stack or by passing all traffic through a dedicated (network) tunneling device. Implementation alternatives for different operating systems differ both in their granularity as well as in the way and necessary permissions for performing them.

Depending of the kind of the system and the availability of sources we can decide to modify and replace the kernel. Since different machines usually use different kernels, this approach is not very flexible and involves a remarkable additional effort. Therefore placing the functionality in a shared library that is pre-loaded in order to replace the systems libsocket whenever an application is started or to replace the libsocket in general is considered to be more convenient. Alternatively also the use of a dedicated stream module [8] that can be dynamically pushed into the communication stack is an option for systems where this is supported, e.g. when using Solaris.

Endsystem based - Windows

WinSock, the Microsoft Windows networking API, consists of a set of layers called "service providers". It is possible to install new service providers in the form of a Dynamic Link Library (DLL) between any two existing layers in the Winsock stack. The DLL's pathname must be entered in the Windows Registry [6] . All programs using the Winsock API invoke the new service provider automatically. This mechanism allows the creation of a new service provider, that is responsible for performing the necessary routines to put user and application information into a layer 2 or layer 3 header.

Infrastructure Based Approach - Marking Gateways

In some cases, several hosts or even all hosts may not be extended (or extendable) as described above. In this case the packet marking process could be handled by a marking gateway. The following figure shows a possible scenario.

Infrastructure Based Approach
Figure 2: Infrastructure Based Approach

The marking gateway can be implemented in two different ways. First the gateway can use active or passive information gathering (as described above) to get user specific information. Then this information has to be added to the flows before they leave the gateway. All the mentioned marking techniques can be used for this purpose.

The second method is to summarize the subnets, which include not marking capable hosts. By doing this, data which leaves the subnet is generally marked with an information which represents the subnet. In this case these flows can be identified in the other networks as being originated from Subnet 1.

A. Protection and Security of the mapping information

Since the mapping of user or application information to data streams might often be sensitive to spoofing, we consider that it can be protected in a secure manner based on cryptographic algorithms if the operation environment (such as e.g. working outside an "internally trusted company network") demands that. Mechanisms for doing so exist with a message authentication code based on a (e.g. predefined) shared secret. We refer to the mechanisms Secure ONC RPC [9] and Security Enhanced SNMPv2 [10] use for now.

Our approach must and will be enhanced concerning its security but is viable for a number of environments with "cooperative participants" already.

V. Usage Scenarios

In this section we present a selection of scenarios that shows how packet marking can be used for the placement of copyright or origininator / retriever information in media content and for enhancing the interaction with the security infrastructure. The description is not comprehensive nor even fully representative and can be extended by e.g. billing support as well as by support for the enforcement of single login and (user or application) class based policies.

A. Enabling the placement of copyright or originator /retriever information on the fly

In this scenario we assume library servers for pictures, audio/video data or special documents. These servers can e.g. be accessed via HTTP or by means of a streaming protocol. In case of many user request (e.g. an electronic public library) there might be several servers for scalability reasons.

Watermarks [7] are one of the possibilities to add copyright or originator / retriever information in the data that is down- or uploaded by the users. To be able to track the user in case of a copyright violation, the watermark should include user specific information. Information that should have a copyright protection must then be watermarked before it is sent to the receiver. This could be done by each of the library servers itself.

With our approach it is possible to deploy the placement of the watermarking information at dedicated points that the data traffic passes through without modifying the original servers. The approach is not targeted at implementing dedicated watermarking mechanisms (which significantly differ e.g. for packaged vs. streamed content and different media) but makes use of those and parameterizes them. The parameterization info can be gathered either explicitely (e.g. because a user logs on to the service first) or implicitely by means of the analysis of network traffic dependencies (e.g. TCP requests / replies).

The scenario can be adapted for many use cases e.g. for tracing who brought certain data (pictures, documents) into an administrative domain. In this case the mechanism has to be deployed at the receiver instead of the server side.

B. Firewall Interaction

Firewalls [1] , [2] are specialized network nodes, which perform access control at network borders. These firewalls consist of packet filters, "stateful filters", proxies or a combination of all these. Based on the analysis of the traffic, the firewall decides whether packets may be passed through (passive or active information retrieval).

If the marking approach is used, a firewall could benefit in the following ways from the information included in the flows:

User information:

Application information:

As we have shown, firewall systems would benefit in many aspects from the marking approach. This way it would be possible to build firewalls which have a grater performance than existing firewalls, without compromising security.

VI. Conclusion and Future Work

We have described an approach which attaches and transmits user or application specific information to network data streams. Systems do definitely have a remarkable benefit from that additional available information. We consider this approach an "enabling mechanism" that can fulfill its potential especially in interaction with other existing and emerging technologies. Those can be parameterized using it. The viability of the mechanisms has been determined by means of prototype implementations for the main components and will furthermore be enhanced.

VII. References

  1. D. B. Chapman: Building Internet Firewalls, O'Reilly, Cambridge, 1995.
  2. W. R. Cheswick, S. M. Bellovin: Firewalls and Internet Security, Addison Wesley, 1994.
  3. S. Kent, R. Atkinson: IP Authentication Header, Internet Request for Comments Nr. 2402, November 1998.
  4. S. Deering, R. Hinden: Internet Protocol, Version 6 (IPv6) Specification, Internet Request for Comments Nr. 2460, December 1998.
  5. D. Awduche, J. Malcolm, J. Agogbua, M. O'Dell, J. McManus: Requirements for Traffic Engineering Over MPLS, Internet Request for Comments Nr. 2702, September 1999.
  6. Wei Hua, Jim Ohlund, Barry Butterklee: Unraveling the Mysteries of Writing a Winsock 2 Layered Service Provider, Microsoft Systems Journal.
  7. Stefan Katzenbeisser, Fabien A. P. Petitcolas (Editors): Information hiding techniques for steganography and digital watermarking, Artech House Books, 1999.
  8. SunSoft: STREAMS Programmers Guide, November 1995.
  9. A. Chiu: Authentication Mechanisms for ONC RPC, Internet Engineering Task Force, May 1999.
  10. William Stallings: SNMP and SNMPv2: The Infrastructure for Network Management, IEEE Communications Magazine, vol. 36, no. 3, pp. 37-43, Mar 1998.

1. A flow is a single data stream (channel), identified by a tuple of characteristic values (source address, source port, destination address, destination port, protocol number). A session describes the association of multiple flows that together form an application's data stream.