Sunday, February 15, 2015

Linux Bonding Channel Driver, on Slackware P2

So, I've been trying to get this to work for.. shit since October 2014, no success until now.

First, I tried bonding interfaces w/ my unmanaged HP Procurve2724 16 port gigabit switch. No Luck.

I figured I should look into a managed switch, hopefully LACP functionality will be present.

So I headed to my favourite place (Westech Recyclers--Resell Electronics), and picked up a D-Link DGS-1248T for a whopping 10 buckaroos. (Apparently a steal, this thing retailed for 300-600 new, holy cow.)

So I had a made a big misunderstanding in my research: the "trunking functionality" being described in the Linux Bonding Driver documentation as , well, compatible w/ 802.3ad!

What I didn't realize in my eagerness, I looked at the modes and chose 802.3ad (mode=4), as the one I wanted to use.

What I didn't realize, 802.3ad--which utilizes LACP--is a IEEE draft specification completely different than trunking.

In fact, I didn't know what Trunking really meant. I knew it was described in the Bonding Driver documentation, and that it was supported by my switch (DGS 1248T). When I reffered to the switches documentation, it wasn't descriptive in the least bit:

"The Trunk function enables the Switch to cascade two or more devices with larger bandwidths."

Devices? You Mean ports? Or Devices as in switches? Is this meant to trunk switches or ports? I was royally confused.

So here I was, spinning my wheels, for weeks, trying to get mode=4 w/ the DGS-1248T. I wasn't using VLANs, I simply enabled two groups of ports: one for my main system, and one for my other storage server.

Whenever I enabled the bond, I added the default gw route on the bond, but it would never reach the gateway. The command would time out. Actually it did work twice, but it would not work after reboots nor after disableing and re-enabling the bonded interface.

I was stumped. What is going on?

Finally, after reading a post on LQ forums regarding another users experience w/ this switch and bonded interfaces--specifically, his bonds were working! I thought--no way, how?

He wasn't using mode=4. He was using mode=2 (Balance-xor).

I decided to try it. In my system, I decided to use mode=0--Balance-rr. I was now able to add the default gw route on the bond! And I could communicate to the lan and internet!

I determined that the bond was initalized correctly and functioning on both of my servers (HP G5 ML350, and a poweredge 1850). Both bonded interfaces could send and recive on the lan and internet.

I still didn't understand why. It wasn't until I was looking at the Bonding Driver documentation this evening, that I realized why:

Requirements for specific modes:

... The Switch must be configured for "etherchannel," or "trunking," on the appropriate ports.

As I later discovered, etherchannel is Cisco's proprietary implementation of link aggregation which predated the 802.3ad spec. Furthermore, because etherchannel is cisco proprietary technology, the trunking variants seen in other mfg's switches must be that manufacturers implementation/etherchannel variant.

In other words, the only things they really lose is support for ISL and VTP, both Cisco proprietary technologies anyways.

The main reason I went through all this trouble was to determine first hand whether there was any measurable speed increase in using bonding.

I made a simple test, basically I transfered a large file over a single gigabit (via mounted nfs directory and using a simple copy).

As I've read--and discovered in my own experience--although the theoritical maximum transfer rate for a gigabit interface is 125MB/s, the real transfer rate is much slower. The reasons vary considerably, but generally mechanical drives are the main cause of the bottleneck (followed by system load or network traffic). In other tests that I've seen test a true gigabit transfer rate on a system, they utilized a ramdrive. So, for this test I created a 2 GB ramdisk via the newer tmpfs. I then copied the large file (1.5GB mp4), over the NFS mounted locally (hosted of second server also using bonded interface), to the mounted ramdisk filesystem (mounted as tmpfs and using tmpfs filesystem).

I then reran the same test, using the bonded interface w/ mode=2 (balance-rr).

The results did not dissapoint!

Transferred 1.5GB mp4 file over locally mounted nfs (hosted on secondary server), to tmpfs mounted filesystem:

1GB NIC Transfer:  50 MB/s

Bond0 (dual 1G NIC slaves): 94 MB/s

The bond transferred nearly twice as fast!

No comments:

Post a Comment