A tutorial on internet protocols and tunneling (VPN) from http://backreference.org/2010/03/26/tuntap-interface-tutorial/
Posted by waldner on 26 March 2010, 1:54 pm Foreword: please note that the code available here is only for demonstration purposes. If you want to be serious, you'll have to make it more robust and integrate it with other code. Also, the description is by no means a definitive reference on the subject, but rather the result of my experimentation. Please report any bug or error you find in the code or otherwise in this article. Thanks. Link to the source tarball described in the article: simpletun. Update 18/07/2010: Thanks to this post, I've learned that recent versions of iproute2 can (finally) create tun/tap devices, although the functionality is (still?) blissfully undocumented. Thus, installing tunctl (UML utilities) or OpenVPN just to be able to create tun devices is no longer needed. The following is with iproute2-2.6.34: # ip tuntap help Usage: ip tuntap { add | del } [ dev PHYS_DEV ] [ mode { tun | tap } ] [ user USER ] [ group GROUP ] [ one_queue ] [ pi ] [ vnet_hdr ] Where: USER := { STRING | NUMBER } GROUP := { STRING | NUMBER } Tun/tap interfaces are a feature offered by Linux (and probably by other UNIX-like operating systems) that can do userspace networking, that is, allow userspace programs to see raw network traffic (at the ethernet or IP level) and do whatever they like with it. This document attempts to explain how tun/tap interfaces work under Linux, with some sample code to demonstrate their usage. How it works Tun/tap interfaces are software-only interfaces, meaning that they exist only in the kernel and, unlike regular network interfaces, they have no physical hardware component (and so there's no physical "wire" connected to them). You can think of a tun/tap interface as a regular network interface that, when the kernel decides that the moment has come to send data "on the wire", instead sends data to some userspace program that is attached to the interface (using a specific procedure, see below). When the program attaches to the tun/tap interface, it gets a special file descriptor, reading from which gives it the data that the interface is sending out. In a similar fashion, the program can write to this special descriptor, and the data (which must be properly formatted, as we'll see) will appear as input to the tun/tap interface. To the kernel, it would look like the tun/tap interface is receiving data "from the wire". The difference between a tap interface and a tun interface is that a tap interface outputs (and must be given) full ethernet frames, while a tun interface outputs (and must be given) raw IP packets (and no ethernet headers are added by the kernel). Whether an interface functions like a tun interface or like a tap interface is specified with a flag when the interface is created. The interface can be transient, meaning that it's created, used and destroyed by the same program; when the program terminates, even if it doesn't explicitly destroy the interface, the interfaces ceases to exist. Another option (the one I prefer) is to make the interface persistent; in this case, it is created using a dedicated utility (like tunctl or openvpn --mktun), and then normal programs can attach to it; when they do so, they must connect using the same type (tun or tap) used to originally create the interface, otherwise they will not be able to attach. We'll see how that is done in the code. Once a tun/tap interface is in place, it can be used just like any other interface, meaning that IP addresses can be assigned, its traffic can be analyzed, firewall rules can be created, routes pointing to it can be established, etc. With this knowledge, let's try to see how we can use a tun/tap interface and what can be done with it. Creating the interface The code to create a brand new interface and to (re)attach to a persistent interface is essentially the same; the difference is that the former must be run by root (well, more precisely, by a user with the CAP_NET_ADMIN capability), while the latter can be run by an ordinary user if certain conditions are met. Let's start with the creation of a new interface. First, whatever you do, the device /dev/net/tun must be opened read/write. That device is also called the clone device, because it's used as a starting point for the creation of any tun/tap virtual interface. The operation (as with any open() call) returns a file descriptor. But that's not enough to start using it to communicate with the interface. The next step in creating the interface is issuing a special ioctl() system call, whose arguments are the descriptor obtained in the previous step, the TUNSETIFF constant, and a pointer to a data structure containing the parameters describing the virtual interface (basically, its name and the desired operating mode - tun or tap). As a variation, the name of the virtual interface can be left unspecified, in which case the kernel will pick a name by trying to allocate the "next" device of that kind (for example, if tap2 already exists, the kernel will try to allocate tap3, and so on). All of this must be done by root (or by a user with the CAP_NET_ADMIN capability - I won't repeat that again; assume it applies everywhere I say "must be run by root"). If the ioctl() succeeds, the virtual interface is created and the file descriptor we had is now associated to it, and can be used to communicate. At this point, two things can happen. The program can start using the interface right away (probably configuring it with at least an IP address before), and, when it's done, terminate and destroy the interface. The other option is to issue a couple of other special ioctl() calls to make the interface persistent, and terminate leaving it in place for other programs to attach to it. This is what programs like tunctl or openvpn --mktun do, for example. These programs usually can also optionally set the ownership of the virtual interface to a non-root user and/or group, so programs running as non-root but with the appropriate privileges can attach to the interface later. We'll come back to this below. The basic code used to create a virtual interface is shown in the file Documentation/networking/tuntap.txt in the kernel source tree. Modifying it a bit, we can write a barebone function that creates a virtual interface: #include #include int tun_alloc(char *dev, int flags) { struct ifreq ifr; int fd, err; char *clonedev = "/dev/net/tun"; /* Arguments taken by the function: * * char *dev: the name of an interface (or '\0'). MUST have enough * space to hold the interface name if '\0' is passed * int flags: interface flags (eg, IFF_TUN etc.) */ /* open the clone device */ if( (fd = open(clonedev, O_RDWR)) < 0 ) { return fd; } /* preparation of the struct ifr, of type "struct ifreq" */ memset(&ifr, 0, sizeof(ifr)); ifr.ifr_flags = flags; /* IFF_TUN or IFF_TAP, plus maybe IFF_NO_PI */ if (*dev) { /* if a device name was specified, put it in the structure; otherwise, * the kernel will try to allocate the "next" device of the * specified type */ strncpy(ifr.ifr_name, dev, IFNAMSIZ); } /* try to create the device */ if( (err = ioctl(fd, TUNSETIFF, (void *) &ifr)) < 0 ) { close(fd); return err; } /* if the operation was successful, write back the name of the * interface to the variable "dev", so the caller can know * it. Note that the caller MUST reserve space in *dev (see calling * code below) */ strcpy(dev, ifr.ifr_name); /* this is the special file descriptor that the caller will use to talk * with the virtual interface */ return fd; } The tun_alloc() function takes two parameters: char *dev contains the name of an interface (for example, tap0, tun2, etc.). Any name can be used, though it's probably better to choose a name that suggests which kind of interface it is. In practice, names like tunX or tapX are usually used. If *dev is '\0', the kernel will try to create the "first" available interface of the requested type (eg, tap0, but if that already exists, tap1, and so on). int flags contains the flags that tell the kernel which kind of interface we want (tun or tap). Basically, it can either take the value IFF_TUN to indicate a TUN device (no ethernet headers in the packets), or IFF_TAP to indicate a TAP device (with ethernet headers in packets). Additionally, another flag IFF_NO_PI can be ORed with the base value. IFF_NO_PI tells the kernel to not provide packet information. The purpose of IFF_NO_PI is to tell the kernel that packets will be "pure" IP packets, with no added bytes. Otherwise (if IFF_NO_PI is unset), 4 extra bytes are added to the beginning of the packet (2 flag bytes and 2 protocol bytes). IFF_NO_PI need not match between interface creation and reconnection time. Also note that when capturing traffic on the interface with Wireshark, those 4 bytes are never shown. A program can thus use the following code to create a device: char tun_name[IFNAMSIZ]; char tap_name[IFNAMSIZ]; char *a_name; ... strcpy(tun_name, "tun1"); tunfd = tun_alloc(tun_name, IFF_TUN); /* tun interface */ strcpy(tap_name, "tap44"); tapfd = tun_alloc(tap_name, IFF_TAP); /* tap interface */ a_name = malloc(IFNAMSIZ); a_name[0]='\0'; tapfd = tun_alloc(a_name, IFF_TAP); /* let the kernel pick a name */ At this point, as said before, the program can either use the interface as is for its purposes, or it can set it persistent (and optionally assign ownership to a specific user/group). If it does the former, there's not much more to be said. But if it does the latter, here's what happens. Two additional ioctl()s are available, which are usually used together. The first syscall can set (or remove) the persistent status on the interface. The second allows assigning ownership of the interface to a regular (non-root) user. Both features are implemented in the programs tunctl (part of UML utilities) and openvpn --mktun (and probably others). Let's examine the tunctl code since it's simpler, keeping in mind that it only creates tap interfaces, as those are what user mode linux uses (code slightly edited and simplified for clarity): ... /* "delete" is set if the user wants to delete (ie, make nonpersistent) an existing interface; otherwise, the user is creating a new interface */ if(delete) { /* remove persistent status */ if(ioctl(tap_fd, TUNSETPERSIST, 0) < 0){ perror("disabling TUNSETPERSIST"); exit(1); } printf("Set '%s' nonpersistent\n", ifr.ifr_name); } else { /* emulate behaviour prior to TUNSETGROUP */ if(owner == -1 && group == -1) { owner = geteuid(); } if(owner != -1) { if(ioctl(tap_fd, TUNSETOWNER, owner) < 0){ perror("TUNSETOWNER"); exit(1); } } if(group != -1) { if(ioctl(tap_fd, TUNSETGROUP, group) < 0){ perror("TUNSETGROUP"); exit(1); } } if(ioctl(tap_fd, TUNSETPERSIST, 1) < 0){ perror("enabling TUNSETPERSIST"); exit(1); } if(brief) printf("%s\n", ifr.ifr_name); else { printf("Set '%s' persistent and owned by", ifr.ifr_name); if(owner != -1) printf(" uid %d", owner); if(group != -1) printf(" gid %d", group); printf("\n"); } } ... These additional ioctl()s must still be run by root. But what we have now is a persistent interface owned by a specific user, so processes running as that user can successfully attach to it. As said, it turns out that the code to (re)attach to an existing tun/tap interface is the same as the code used to create it; in other words, tun_alloc() can again be used. When doing so, for it to be successful three things must happen: The interface must exist already and be owned by the same user that is attempting to connect (and probably be persistent) the user must have read/write permissions on /dev/net/tun The flags provided must match those used to create the interface (eg if it was created with IFF_TUN then the same flag must be used when reattaching) This is possible because the kernel allows the TUNSETIFF ioctl() to succeed if the user issuing it specifies the name of an already existing interface and he is the owner of the interface. In this case, no new interface has to be created, so a regular user can successfully perform the operation. So this is an attempt to explain what happens when ioctl(TUNSETIFF) is called, and how the kernel differentiates between the request for the allocation of a new interface and the request to connect to an existing interface: If a non-existent or no interface name is specified, that means the user is requesting the allocation of a new interface. The kernel thus creates an interface using the given name (or picking the next available name if an empty name was given). This works only if done by root. If the name of an existing interface is specified, that means the user wants to connect to a previously allocated interface. This can be done by a normal user, provided that: the user has appropriate rights on the clone device AND is the owner of the interface (set at creation time), AND the specified mode (tun or tap) matches the mode set at creation time. You can have a look at the code that implements the above steps in the file drivers/net/tun.c in the kernel source; the important functions are tun_attach(), tun_net_init(), tun_set_iff(), tun_chr_ioctl(); this last function also implements the various ioctl()s available, including TUNSETIFF, TUNSETPERSIST, TUNSETOWNER, TUNSETGROUP and others. In any case, no non-root user is allowed to configure the interface (ie, assign an IP address and bring it up), but this is true of any regular interface too. The usual methods (suid binary wrapper, sudo, etc.) can be used if a non-root user needs to do some operation that requires root privileges. This is a possible usage scenario (one I use all the time): The virtual interfaces are created, made persistent, assigned to an user, and configured by root (for example, by initscripts at boot time, using tunctl or equivalent) The regular users can then attach and detach as many times as they wish from virtual interfaces that they own. The virtual interfaces are destroyed by root, for example by scripts run at shutdown time, perhaps using tunctl -d or equivalent Let's try it After this lengthy but necessary introduction, it's time to do some work with it. So, since this is a normal interface, we can use it as we would another regular interface. For our purposes, there is no difference between tun and tap interfaces; it's the program that creates or attaches to it that must know its type and accordingly expect or write data. Let's create a persistent interface and assign it an IP address: # openvpn --mktun --dev tun2 Fri Mar 26 10:29:29 2010 TUN/TAP device tun2 opened Fri Mar 26 10:29:29 2010 Persist state set to: ON # ip link set tun2 up # ip addr add 10.0.0.1/24 dev tun2 Let's fire up a network analyzer and look at the traffic: # tshark -i tun2 Running as user "root" and group "root". This could be dangerous. Capturing on tun2 # On another console # ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.115 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.105 ms ... Looking at the output of tshark, we see...nothing. There is no traffic going through the interface. This is correct: since we're pinging the interface's IP address, the operating system correctly decides that no packet needs to be sent "on the wire", and the kernel itself is replying to these pings. If you think about it, it's exactly what would happen if you pinged another interface's IP address (for example eth0): no packets would be sent out. This might sound obvious, but could be a source of confusion at first (it was for me). Knowing that the assignment of a /24 IP address to an interface creates a connected route for the whole range through the interface, let's modify our experiment and force the kernel to actually send something out of the tun interface (NOTE: the following works only with kernels < 2.6.36; later kernels behave differently, as explained in the comments): # ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. From 10.0.0.1 icmp_seq=2 Destination Host Unreachable From 10.0.0.1 icmp_seq=3 Destination Host Unreachable ... # on the tshark console ... 0.000000 10.0.0.1 -> 10.0.0.2 ICMP Echo (ping) request 0.999374 10.0.0.1 -> 10.0.0.2 ICMP Echo (ping) request 1.999055 10.0.0.1 -> 10.0.0.2 ICMP Echo (ping) request ... Now we're finally seeing something. The kernel sees that the address does not belong to a local interface, and a route for 10.0.0.0/24 exists through the tun2 interface. So it duly sends the packets out tun2. Note the different behavior here between tun and tap interfaces: with a tun interface, the kernel sends out the IP packet (raw, no other headers are present - try analyzing it with tshark or wireshark), while with a tap interface, being ethernet, the kernel would try to ARP for the target IP address: # pinging 10.0.0.2 now, but through tap2 (tap) # ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. # on the tshark console ... 0.111858 82:03:d4:07:62:b6 -> Broadcast ARP Who has 10.0.0.2? Tell 10.0.0.1 1.111539 82:03:d4:07:62:b6 -> Broadcast ARP Who has 10.0.0.2? Tell 10.0.0.1 ... Furthermore, with a tap interface the traffic will be composed by full ethernet frames (again, you can check with the network analyzer). Note that the MAC address for a tap interface is autogenerated by the kernel at interface creation time, but can be changed using the SIOCSIFHWADDR ioctl() (look again in drivers/net/tun.c, function tun_chr_ioctl()). Finally, being an ethernet interface, the MTU is set to 1500: # ip link show dev tap2 7: tap2: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 500 link/ether 82:03:d4:07:62:b6 brd ff:ff:ff:ff:ff:ff Of course, so far no program is attached to the interface, so all these outgoing packets are just lost. So let's do a step ahead and write a simple program that attaches to the interface and reads packets sent out by the kernel. A simple program We're going to write a program that attaches to a tun interface and reads packets that the kernel sends out that interface. Remember that you can run the program as a normal user if the interface is persistent, provided that you have the necessary permissions on the clone device /dev/net/tun, you are the owner of the interface, and select the right mode (tun or tap) for the interface. The program is actually a skeleton, or rather the start of a skeleton, since we'll only demonstrate how to read from the device, and only explain what the program can do once it gets the data. We assume that the tun_alloc() function we defined earlier is available to the program. Here is the code: ... /* tunclient.c */ char tun_name[IFNAMSIZ]; /* Connect to the device */ strcpy(tun_name, "tun77"); tun_fd = tun_alloc(tun_name, IFF_TUN | IFF_NO_PI); /* tun interface */ if(tun_fd < 0){ perror("Allocating interface"); exit(1); } /* Now read data coming from the kernel */ while(1) { /* Note that "buffer" should be at least the MTU size of the interface, eg 1500 bytes */ nread = read(tun_fd,buffer,sizeof(buffer)); if(nread < 0) { perror("Reading from interface"); close(tun_fd); exit(1); } /* Do whatever with the data */ printf("Read %d bytes from device %s\n", nread, tun_name); } ... If you configure tun77 as having IP address 10.0.0.1/24 and then run the above program while trying to ping 10.0.0.2 (or any address in 10.0.0.0/24 other than 10.0.0.1, for that matter), you'll read data from the device: # openvpn --mktun --dev tun77 --user waldner Fri Mar 26 10:48:12 2010 TUN/TAP device tun77 opened Fri Mar 26 10:48:12 2010 Persist state set to: ON # ip link set tun77 up # ip addr add 10.0.0.1/24 dev tun77 # ping 10.0.0.1 ... # on another console $ ./tunclient Read 84 bytes from device tun77 Read 84 bytes from device tun77 ... If you do the math, you'll see where these 84 byetes come from: 20 are for the IP header, 8 for the ICMP header, and 56 are the payload of the ICMP echo message as you can see when you run the ping command: $ ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. ... Try experimenting with the above program sending various traffic types through the interface (also try using tap), and verify that the size of the data you're reading is correct for the interface type. Each read() returns a full packet (or frame if using tap mode); similarly, if we were to write, we would have to write an entire IP packet (or ethernet frame in tap mode) for each write(). Now what can we do with this data? Well, we could for example emulate the behavior of the target of the traffic we're reading; again, to keep things simple, let's stick with the ping example. We could analyze the received packet, extract the information needed to reply from the IP header, ICMP header and payload, build an IP packet containing an appropriate ICMP echo reply message, and send it back (ie, write it into the descriptor associated with the tun/tap device). This way the originator of the ping will actually receive an answer. Of course you're not limited to ping, so you can implement all kinds of network protocols. In general, this implies parsing the received packet, and act accordingly. If using tap, to correctly build reply frames you would probably need to implement ARP in your code. All of this is exactly what User Mode Linux does: it attaches a modified Linux kernel running in userspace to a tap interface that exist on the host, and communicates with the host through that. Of course, being a full Linux kernel, it does implement TCP/IP and ethernet. Newer virtualization platforms like libvirt use tap interfaces extensively to communicate with guests that support them like qemu/kvm; the interfaces have usually names like vnet0, vnet1 etc. and last only as long as the guest they connect to is running, so they're not persistent, but you can see them if you run ip link show and/or brctl show while guests are running. In the same way, you can attach with your own code to the interface and practice network programming and/or ethernet and TCP/IP stack implementation. To get started, you can look at (you guessed it) drivers/net/tun.c, functions tun_get_user() and tun_put_user() to see how the tun driver does that on the kernel side (beware that barely scratches the surface of the complete network packet management in the kernel, which is very complex). Tunnels But there's another thing we can do with tun/tap interfaces. We can create tunnels. We don't need to reimplement TCP/IP; instead, we can write a program to just relay the raw data back and forth to a remote host running the same program, which does the same thing in a specular way. Let's suppose that our program above, in addition to attaching to the tun/tap interface, also establishes a network connection to a remote host, where a similar program (connected to a local tun/tap interface as well) is running in server mode. (Actually the two programs are the same, who is the server and who is the client is decided with a command line switch). Once the two programs are running, traffic can flow in either direction, since the main body of the code will be doing the same thing at both sites. The network connection here is implemented using TCP, but any other mean can be used (ie UDP, or even ICMP!). You can download the full program source code here: simpletun. Here is the main loop of the program, where the actual work of moving data back and forth between the tun/tap interface and the network tunnel is performed. For clearness, debug statements have been removed (you can find the full version in the source tarball). ... /* net_fd is the network file descriptor (to the peer), tap_fd is the descriptor connected to the tun/tap interface */ /* use select() to handle two descriptors at once */ maxfd = (tap_fd > net_fd)?tap_fd:net_fd; while(1) { int ret; fd_set rd_set; FD_ZERO(&rd_set); FD_SET(tap_fd, &rd_set); FD_SET(net_fd, &rd_set); ret = select(maxfd + 1, &rd_set, NULL, NULL, NULL); if (ret < 0 && errno == EINTR) { continue; } if (ret < 0) { perror("select()"); exit(1); } if(FD_ISSET(tap_fd, &rd_set)) { /* data from tun/tap: just read it and write it to the network */ nread = cread(tap_fd, buffer, BUFSIZE); /* write length + packet */ plength = htons(nread); nwrite = cwrite(net_fd, (char *)&plength, sizeof(plength)); nwrite = cwrite(net_fd, buffer, nread); } if(FD_ISSET(net_fd, &rd_set)) { /* data from the network: read it, and write it to the tun/tap interface. * We need to read the length first, and then the packet */ /* Read length */ nread = read_n(net_fd, (char *)&plength, sizeof(plength)); /* read packet */ nread = read_n(net_fd, buffer, ntohs(plength)); /* now buffer[] contains a full packet or frame, write it into the tun/tap interface */ nwrite = cwrite(tap_fd, buffer, nread); } } ... (for the details of the read_n() and cwrite() functions, refer to the source; what they do should be obvious. Yes, the above code is not 100% correct with regard to select(), and makes some naive assumptions like expecting that read_n() and cwrite() do not block. As I said, the code is for demonstration purposes only) Here is the main logic of the above code: The program uses select() to keep both descriptors under control at the same time; if data comes in from either descriptor, it's written out to the other. Since the program usese TCP, the receiver will see a single stream of data, which makes recognizing packet boundaries difficult. So when a packet or frame is written to the network, its length is prepended (2 bytes) to the actual packet. When data comes in from the tap_fd descriptor, a single read reads a full packet or frame; thus this can directly be written to the network, with its length prepended. Since that length number is a short int, thus longer than one byte, written in "raw" binary format, ntohs()/htons() are used to interoperate between machines with different endianness. When data comes in from the network, thanks to the aforementioned trick, we can know how long the next packet is going to be by reading the two-bytes length that precedes it in the stream. When we've read the packet, we write it to the tun/tap interface descriptor, where it will be received by the kernel as coming "from the wire". So what can you do with such a program? Well, you can create a tunnel! First, create and confgure the necessary tun/tap interfaces on the hosts at both ends of the tunnel, including assigning them an IP address. For this example, I'll assume two tun interfaces: tun11, 192.168.0.1/24 on the local computer, and tun3, 192.168.0.2/24 on the remote computer. simpletun connects the hosts using TCP port 55555 by default (you can change that using the -p command line switch). The remote host will run simpletun in server mode, and the local host will run in client mode. So here we go (the remote server is at 10.2.3.4): [remote]# openvpn --mktun --dev tun3 --user waldner Fri Mar 26 11:11:41 2010 TUN/TAP device tun3 opened Fri Mar 26 11:11:41 2010 Persist state set to: ON [remote]# ip link set tun3 up [remote]# ip addr add 192.168.0.2/24 dev tun3 [remote]$ ./simpletun -i tun3 -s # server blocks waiting for the client to connect [local]# openvpn --mktun --dev tun11 --user waldner Fri Mar 26 11:17:37 2010 TUN/TAP device tun11 opened Fri Mar 26 11:17:37 2010 Persist state set to: ON [local]# ip link set tun11 up [local]# ip addr add 192.168.0.1/24 dev tun11 [local]$ ./simpletun -i tun11 -c 10.2.3.4 # nothing happens, but the peers are now connected [local]$ ping 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data. 64 bytes from 192.168.0.2: icmp_seq=1 ttl=241 time=42.5 ms 64 bytes from 192.168.0.2: icmp_seq=2 ttl=241 time=41.3 ms 64 bytes from 192.168.0.2: icmp_seq=3 ttl=241 time=41.4 ms 64 bytes from 192.168.0.2: icmp_seq=4 ttl=241 time=41.0 ms --- 192.168.0.2 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 2999ms rtt min/avg/max/mdev = 41.047/41.599/42.588/0.621 ms # let's try something more exciting now [local]$ ssh waldner@192.168.0.2 waldner@192.168.0.2's password: Linux remote 2.6.22-14-xen #1 SMP Fri Feb 29 16:20:01 GMT 2008 x86_64 Welcome to remote! [remote]$ When a tunnel like the above is set up, all that can be seen from the outside is just a connection (TCP in this case) between the two peer simpletuns. The "real" data (ie, that exchanged by the high level applications - ping or ssh in the above example) is never exposed directly on the wire (although it IS sent in cleartext, see below). If you enable IP forwarding on a host that is running simpletun, and create the necessary routes on the other host, you can reach remote networks through the tunnel. Also note that if the virtual interfaces involved are of the tap kind, it is possible to transparently bridge two geographically distant ethernet LANs, so that the devices think that they are all on the same layer 2 network. To do this, it's necessary to bridge, on the gateways (ie, the hosts that run simpletun or another tunneling software that uses tap interfaces), the local LAN interface and the virtual tap interface together. This way, frames received from the LAN are also sent to the tap interface (because of the bridge), where the tunneling application reads them and send them to the remote peer; there, another bridge will ensure that frames so received are forwarded to the remote LAN. The same thing will happen in the opposite direction. Since we are passing ethernet frames between the two LANs, the two LANs are effectively bridged together. This means that you can have 10 machines in London (for instance) and 50 in Berlin, and you can create a 60-computer ethernet network using addresses from the 192.168.1.0/24 subnet (or any subnet address you want, as long as it can accommodate at least 60 host addresses). However, do NOT use simpletun if you want to set up something like that! Extensions and improvements simpletun is very simple and simplistic, and can be extended in a number of ways. First of all, new ways of connecting to the peer can be added. For example, UDP connectivity could be implemented, or, if you're brave, ICMP (perhaps also over IPv6). Second, data is currently passed in cleartext over the network connection. But when the data is in the program's buffer it could be changed somehow before being transmitted, for example it could be encrypted (and similarly decrypted at the other end). However, for the purpose of this tutorial, the limited version of the program should already give you an idea of how tunnelling using tun/tap works. While simpletun is a simple demonstration, this is the way many popular programs that use tun/tap interfaces work, like OpenVPN, vtun, or Openssh's VPN feature. Finally, it's worth noting that if the tunnel connection is over TCP, we can have a situation where we're running the so-called "tcp over tcp"; for more information see "Why tcp over tcp is a bad idea". Note that applications like OpenVPN use UDP by default for this very reason, and using TCP is well-known for reducing performance (although in some cases it's the only option). ========================================== ← Building the Linux Microcore Kernel for 802.1Q Support Part2 – OPENVSWICH – VLANs, Trunks, L3 VLAN interface, InterVLAN Routing – Configuration And Testing → Bridging Qemu Image to the Real Network Using Tap Interface June 19, 2011 9 Comments In this tutorial I would like to show how to connect guest OS which is installed on virtual machine, with the host OS, installed on real computer. The virtual machine is created and running with Qemu emulator, installed on the host OS. In this case I have Fedora Linux installed as the host OS and the guest OS – Linux Microcore installed on virtual machine. When you start your virtual machine – Qemu image without specifying NIC options, one single ethernet interface e1000 is created in the guest OS. In this case guest OS is connected with host OS as it is NAT between them. It means that guest OS – Microcore can talk to host OS – Fedora but not vice versa. The traffic from the host to the guest OS is allowed to pass only if it is part of connection which was previously initialized and established from Microcore to the Fedora. The IP address of the ethernet interface of the guest OS is assigned automatically and it is typically 10.0.2.15/24 with default gw 10.0.2.2. Because of of NAT, IP address of the host OS is placed to the different subnet. Our goal is to create such a configuration which allows the traffic being initialized from both directions. In this scenario IP addresses of the Ethernet interfaces from both the guest and host OS are on the same subnet. This type of connection is called bridged. To make bridged connection we need to create a virtual tap interface for host OS. After that we have to bridge existing Ethernet interface and tap interface together with help of a bridge utility installed on the host OS. Finally, Qemu image must be started with tap interface connected to the Ethernet interface of the guest OS. Host OS – Fedora Linux 1/ Install bridge-utils – to create bridge, forwarding traffic between Ethernet interfaces sudo yum install bridge-utils 2/ Install tunctl - to create and manage persistent TUN/TAP interfaces It is required that the generic TUN/TAP driver is either built-in to kernel, or available as a module. To check the availability of this module do the following: ls -la /dev/net/tun If you get no such file or directory, try doing a modprobe tun. It should then appear in the lsmod output. sudo yum install tunctl 3/ Create bridge device virbr0 sudo brctl addbr virbr0 4/ Create tap0 interface – persistent and owned by user brezular /usr/sbin/tunctl -t tap0 -u brezular 5/ Add eth0 and tap0 to the bridge and bring the interfaces up sudo brctl addif virbr0 eth0 sudo brctl addif virbr0 tap0 sudo ifconfig eth0 up sudo ifconfig tap0 up sudo ifconfig virbr0 up Note: Check if tap0 and eth0 are properly bridged. brctl show 6/ Assign IP address to virbr0 and remove IP address from eth0 interface Only the bridge interface virbr0 should have IP address assigned. sudo ifconfig virbr0 172.16.1.2/16 sudo ifconfig eth0 0.0.0.0 promisc Note: If you run DHCP server in your network, issue the following command to assign IP address to the interface virbr0. sudo dhclient virbr0 Configure a default route if connection to the Internet is required. In the example below, the default gw IP address is the last usable IP andress for subnet 172.16.0.0/16 sudo route add default gw 172.16.255.254 7/ Disable ethernet filtering – ebtables, bridge-nf, arptables – to avoid traffic filtering Change value 1 to 0 for all the files in directory /proc/sys/net/bridge/. cd /proc/sys/net/bridge; ls bridge-nf-call-arptables bridge-nf-call-iptables bridge-nf-call-ip6tables bridge-nf-filter-vlan-tagged sudo su for f in bridge-nf-*; do echo 0 > $f; done 8/ Run Microcore Qemu image Ethernet card of the guest OS is the Intel e1000, with the MAC address 00:aa:00:60:00:01. NIC is marked as vlan1 and it is connected to the tap0 interface. sudo /usr/local/bin/qemu -m 1G -boot c -hda /home/brezular/qemu-image.img -net nic,vlan=1,macaddr=00:aa:00:60:00:01,model=e1000 -net tap,vlan=1,ifname=tap0,script=no Parameters: -boot c -> Boot form the HDD – form the qemu-mage.img -m 1G -> 1GB RAM allocated for the image Guest OS – Microcore Linux 1/ Configure IP address for eth0 interface (it is on the same subnet as virbr0) sudo su ifconfig eth0 172.16.1.1/16 up Now, if we are sure that the telnet daemon is running on the guest OS, we can try to make telnet session from Fedora to Microcore. telnet 172.16.1.1 Keeping network settings after restart One practical way to keep network settings after restart is not to keep them at all. Instead of it, a script will called when bridging is required. Example of such a script is here: http://brezular.wordpress.com/2012/03/12/bridging-guest-os-solaris-2-6-qemu-vm-with-the-host-os-fedora-15/ ================================= Bridging guest OS – Solaris 2.6 Qemu VM with the host OS – Fedora 15 March 12, 2012 6 Comments This tutorial shows how to connect guest OS – Solaris 2.6 installed into Qemu image to the real world. Depending how Qemu is configured, Qemu image can be started either in NAT or bridged mode. When no network parameter is specified for Qemu to start, the default NAT mode is used. In this mode Qemu image acts as a virtual router with its own DHCP server. DHCP server assigns IP address 10.0.2.15/24 to the Ethernet interface presented in guest OS (Solaris). Packets destined outside subnet 10.0.2.x/24 are forwarded to to the default gateway IP address 10.0.2.2/24. NAT is good enough for connection initialized from guest OS to the outside world. For instance, if it was ssh installed on guest OS we could copy files with scp command. Unfortunately, ssh is not installed on Solaris 2.6 by default. Other protocols such as FTP or NFS may not be working in NAT mode. For this reason we are not going to use this method. The second method is based on bridging an Ethernet interface of host (Fedora 15) with a virtual tap interface. Firstly, the virtual tap interface must be created in Fedora before actual bridging. Then both interfaces – Ethernet and tap interface presented in Fedora may be added to the bridge interfaces thus to be bridged together. As a last step, Qemu has to be said to start with configuration that bridges the virtual tap interface and the guest Ethernet interfaces together. Connected via bridge, the guest and host OS can have their IP addresses from the same subnet. Benefit of the solution is obvious – no incoming connection is blocked by NAT thus file transfer protocol such as FTP is supposed to be working. First, I was thinking about to describe the exact steps of bridging Qemu image with Ethernet interface. Then I decided not to do it as there is a nice tutorial describing the same topic on GNS3 blog. http://blog.gns3.net/2009/10/olive-juniper/5/ Instead of it I provide an installation script that installs necessary rpm packages, create a bridge an bridge chosen Ethernet with tap interface. Finally, it restores network connection. Always run this script before start Qemu VM. The script is available here. http://www.4shared.com/file/UdcViHR_/bridge_interfaces.html #!/bin/sh echo -e “\nChecking if bridge interface is presented” bridge=`ifconfig -a | grep br | cut -d ” ” -f1` if [ "$bridge" == "" ]; then echo “No bridge interface was found” echo -e “Checking if package ‘bridge-utils’ is installed” if [ `rpm -qa | grep bridge-utils` ]; then echo -e “Package ‘bridge-utils’ is installed, nothing to do” else echo -e “No package ‘bridge-utils’ was found, trying to install it” sudo yum install bridge-utils -y fi sudo /usr/sbin/brctl addbr virbr0 bridge=virbr0 echo -e “Interface $bridge was created” else echo “Interface $bridge is presented, nothing to do” fi sudo /sbin/ifconfig $bridge up echo -e “\nChecking if a virtual tap interface is presented” tap=`ifconfig -a | grep tap | cut -d ” ” -f1` if [ "$tap" == "" ]; then echo “No tap interface was found” echo “Checking if ‘tunctl’ package is installed” if [ `rpm -qa | grep tunctl` ]; then echo “Package ‘tunctl’ is installed, nothing to do” else echo “No package ‘tunctl’ was found, trying to install it” sudo yum install tunctl -y fi echo -e “Checking if module ‘tun’ is loaded in kernel” check=`lsmod | grep tun | cut -d ” ” -f1` if [ "$check" == "" ]; then echo “Module ‘tun’ is not loaded, trying to load it” sudo modprobe tun else echo “Module ‘tun’ is loaded, nothing to do” fi sudo /usr/sbin/tunctl -u $(whoami) -t tap0 tap=tap0 echo “Interface $tap was created” else echo “Interface $tap is presented, nothing to do” fi sudo /sbin/ifconfig $tap up echo “Checking if $tap is presented in the bridge $bridge” check=`/usr/sbin/brctl show $bridge | grep $tap` if [ "$check" == "" ]; then sudo /usr/sbin/brctl addif $bridge $tap echo “Interface $tap was added to $bridge” else echo “Interface $tap is presented in $bridge, nothing to do” fi echo -e “\nChecking if Ethernet interface is presented” interface_list=`ifconfig -a | grep Ethernet | cut -d ” ” -f1` if [ "$interface_list" != "" ]; then echo -e “\nThe following Ethernet interfaces were found” echo -e “\n$interface_list” a=0 while [ $a == 0 ]; do echo -n -e “\nPlease, enter the interface name and press [ENTER]: “ read interface b=`echo $interface_list | grep $interface` if [ "$b" == "" ]; then echo “Interface $interface cannot be found, try again” else a=1 fi done echo -e “\nChecking if interface $interface is presented in $bridge” check=`/usr/sbin/brctl show $bridge | grep $interface` if [ "$check" == "" ]; then sudo /usr/sbin/brctl addif $bridge $interface echo “Interface $interface was added to $bridge” else echo “Interface $interface is presented in $bridge, nothing to do” fi else echo “No Ethernet interfaces were found, script exits” exit fi echo -e “\nSwitching off Ethernet filtering” for file in /proc/sys/net/bridge/bridge* do echo 0 | sudo tee $file > /dev/null done echo “Ethernet filtering was switched off” echo -e “\nTrying to restore your network connectivity” ip=`ip addr show $interface | grep -w inet | cut -d “t” -f2 | cut -d “b” -f1` default=`ip route | grep default | cut -d ” ” -f3` if [ "$ip" == "" ]; then echo “No valid IP address is configured on interface $interface” echo “Sorry, your network connection cannot be restored” exit fi sudo ifconfig $bridge $ip up sudo ifconfig $interface 0.0.0.0 up sudo route add default gw $default echo -e “\nNote:” echo “IP address $ip was configured on interface $bridge” echo “IP address 0.0.0.0 was configured on interface $interface” echo “Default gateway address $default was restored” Note: You must assign run privileges to the script. Also be aware that script needs to be either started under root account or sudo user must be configured. $ chmod +x ./bridge_interfaces Finnaly, we can start Qemu. $ /usr/local/bin/qemu-system-sparc -bios /usr/local/share/qemu/ss5-170.bin -M SS-5 \ -nographic -boot c -hda ./36G.disk -m 256 -serial telnet:0.0.0.0:3000,server \ -net nic,vlan=0,macaddr=00:aa:00:60:00:01,model=lance -net tap,vlan=0,ifname=tap0,script=no Now, telnet to the Qemu and start Solaris with boot disk0 -v command. After Solaris is booted, check if connection is working. You should be able to ping at least a bridge interface. $ telnet localhost 3000 Here is a nice tutorial about bridging. It shows sudo configuration in detail and a Qemu start up script for the Ethernet interface. http://compsoc.dur.ac.uk/~djw/qemu.html