Xen guests isolation and ebtables concurrency
When using bridging for Xen Networking and your guests machines (domUs in Xen parlance) are fully managed by third parties, some sort of isolation is specially needed. A rogue admin can change the IP and/or MAC address(es) assigned to its domU and potentially cause an IP address conflict.
Xen provides an script called vif-bridge
that takes care of adding domU’s
virtual interfaces to dom0’s bridge, bring them up and add iptables rules
allowing datagrams whose source is one of the assigned IP address(es) coming in
through domU’s virtual interfaces.
Those iptables rules might be not enough. They don’t enforce usage of the assigned MAC addresses and could interfere with current deployed firewall. Another point, in my opinion, is that these addresses policies belong to Link Layer (bridge decision) instead of Network Layer (see PacketFlow), so I prefer to have them enforced with ebtables.
I picked then one of the existing scripts of vif-bridge
with
ebtables
and adapted it to only allow flow of assigned IP/MAC pairs and
ARP requests/replies.
After deploying the adapted vif-bridge
, domU creation began to fail
randomly. Some debug code added at the beginning of the script threw some
bizarre errors:
+ ebtables -F veth2250_IN
ebtables v2.0.9-2:communication.c:388:--BUG--:
Couldn't update kernel counters
++ sigerr
+ ebtables -N veth639a_IN
+ ebtables -P veth639a_IN DROP
Chain 'veth639a_IN' doesn't exist.
++ sigerr
+ ebtables -A veth639_OUT -p arp --arp-ip-dst 10.99.143.100 -j ACCEPT
+ ebtables -A veth639_OUT -p arp --arp-ip-dst 10.99.144.100 -j ACCEPT
The kernel doesn't support a certain ebtables extension, consider recompiling your kernel or insmod the extension.
++ sigerr
As you can see those ebtables
errors are triggered by correct trivial calls.
To make it worse, chain, interface and rule names varied from one error to
other. Looking some help for “Couldn’t update kernel counters” or
“communication.c:388:–BUG–:” didn’t help at all.
While debugging, I learned that an instance of vif-bridge
is run by Xen for
each defined network interface and they all are run in parallel. All my domU
have two virtual network interfaces defined.
At that point I had no clue about the problem’s cause. I decided to
upgrade ebtables
to discard those “make sure you’re running the last
version” support advises (squeeze’s version is 2.0.9.2, upstream is
2.0.10). With the new version I began to see this new error in logs:
+ ebtables -A FORWARD -o veth2450 -p ip4 -d 00:16:3d:1c:26:4a --ip-dst 10.49.216.50 -j ACCEPT
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
userspace tool doesn't by default support multiple ebtables programs running
concurrently. The ebtables option --concurrent or a tool like flock can be
used to support concurrent scripts that update the ebtables kernel tables.
2. The kernel doesn't support a certain ebtables extension, consider
recompiling your kernel or insmod the extension.
After reading this I did immediately understand what was happening. That error
description couldn’t be more clear and I thank upstream author for it. I
never considered any concurrency problem in ebtables
, not even after seeing
random illogical errors generated by trivial rules.
--concurrent
is available in 2.0.10 so I took the flock
way, the fixed
script is here.
Later I found the problem description in ebtables’ basic examples page:
Updating the ebtables kernel tables is a two-phase process. First, the userspace program sends the new table to the kernel and receives the packet counters for the rules in the old table. In a second phase, the userspace program uses these counter values to determine the initial counter values of the new table, which is already active in the kernel. These values are sent to the kernel which adds these values to the kernel’s counter values. Due to this two-phase process, it is possible to confuse the ebtables userspace tool when more than one instance is run concurrently. Note that even in a one-phase process it would be possible to confuse the tool.
It might be very difficult to reproduce the errors shown above if you don’t
have more than one network interface in your domUs and your vif-bridge
script
have more than a few ebtables rules.
Summarizing. If you:
- are calling
ebtables
from your Xen scripts. - have an
ebtables
prior to 2.0.10 (as the one in Debian squeeze or Ubuntu precise). - are facing seemingly random
ebtables
errors. - are not being helped by logs or
$SEARCHENGINE
.
high chances are that your scripts are running ebtables
concurrently. Just
fix them.