Hey guys I know its been a while since I post. I apologize about that. I moved into a new apartment, that took some adjustment. I'd like to make more posts as I'm working on quite a few new things.
Anyways, the purpose of this post is to share my notes on bonding. Previously I shared that I was using a dlink switch. Well, i've since upgraded to a Dell PowerConnect 2710. We use a few of these at my work--I was so impressed with how small and quiet the 2710's are, that I decided to use one at home. I bought a used one at
Westech Recycler's for $10 bucks.
According to some research that I've preformed (I think its mentioned in the PowerConnect's documentation), the 2710 supports 802.3ad Link Aggregation Groups. So, for bonding purposes, thats mode=4.
As mentioned in the kernel's network bonding documentation. (In slackware64-current thats: '/usr/src/linux/Documentation/networking/bonding.txt'.)
802.3ad or 4
IEEE 802.3ad Dynamic link aggregation. Creates
aggregation groups that share the same speed and
duplex settings. Utilizes all slaves in the active
aggregator according to the 802.3ad specification.
Slave selection for outgoing traffic is done according
to the transmit hash policy, which may be changed from
the default simple XOR policy via the xmit_hash_policy
option, documented below. Note that not all transmit
policies may be 802.3ad compliant, particularly in
regards to the packet mis-ordering requirements of
section 43.2.4 of the 802.3ad standard. Differing
peer implementations will have varying tolerances for
noncompliance.
Prerequisites:
1. Ethtool support in the base drivers for retrieving
the speed and duplex of each slave.
2. A switch that supports IEEE 802.3ad Dynamic link
aggregation.
Most switches will require some type of configuration
to enable 802.3ad mode.
So, I went ahead and used the ifenslave package I created (I will submit this to Slackbuilds as soon as the submissions are back up). Basically it just includes ifenslave.c (which as I mentioned in a previous post, ifenslave is no longer included in the current kernel due to the developers dropping it in support of a sysfs interface), and an rc.bond startup script I created:
#!/bin/sh
# rc.bond
#
# Diego Pineda
# 03/27/16
ip='192.168.0.211'
gw='192.168.0.1'
m1='00:16:3e:aa:aa:ab'
nmg='0.0.0.0'
case "$1" in
'start')
echo "start bond0"
#modprobe bonding mode=balance-alb miimon=100
modprobe bonding mode=4 miimon=100
modprobe tg3
ifconfig bond0 up
ifenslave bond0 eth0
ifenslave bond0 eth2
#TODO need to be changed
ifconfig bond0 hw ether "$m1"
ifconfig bond0 "$ip"
route add default gw "$gw" netmask "$nmg" dev bond0
;;
'stop')
ifconfig bond0 down
rmmod bonding
rmmod tg3
;;
*)
echo "Usage: $0 {start|stop}"
;;
esac
I've set this up before several times (as I've logged in my blog), but seeing as its been a few months, I was a little rusty. Everything seemed to be working fine, with the exception of some serious packet loss whenever I pinged the other bonded hosts. The packet loss would stop though against the gateway and normal web hosts. It was really bizarre, it was as if there was something going on with the bonding configuration I used.
I didn't stop to thnk that the problem could actually lie with my bonding configuration, until I started to jog my memory. I realized that I had actually
solved this problem before, and the problem lies with my script.
Apparently, 00:16:3e:aa:aa:aa is the XenSource MAC prefix. Xen
recommends its use because it will not conflict w/ any known hardware
mac address. I feel this reason merits leaving the prefix here.
What I neglected to realize, in using the same scritpt, even though my two hosts each used different ip addresses (192.168.0.200 & 192.168.0.211 respectively), the script I used set each host to the same HW mac address:
00:16:3e:aa:aa:aa
I felt this merrited being mentioned, because the solution is so simple: I simply modified the mac address of the 2nd host to:
00:16:3e:aa:aa:ab
I'm ashamed to admit, it took me a good 4-5 hours to figure this out :)