SCSI subsystem initialized: May 2013

Monday, May 20, 2013

Wierd issues with RTL-8185

I'll make this one short.

So I finally got Slack14.64 going on the Turion lappy. I could see that the interface was recognized, but attempting to scan failed:

iwlist wlan0 scan

interface doesn't support scanning

Upon further inspection, I noticed the following in /var/log/messages:

Jan 3 08:22:23 darkstar kernel: [ 689.812525] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Jan 3 08:24:33 darkstar kernel: [ 819.813053] rtl8180 0000:06:09.0: PCI INT A disabled
Jan 3 08:24:44 darkstar kernel: [ 830.320068] rtl8180 0000:06:09.0: PCI INT A -> Link[LNK1] -> GSI
11 (level, high) -> IRQ 11
Jan 3 08:24:44 darkstar kernel: [ 830.445461] ieee80211 phy1: hwaddr 00c0a8d3d0da, RTL8185vD + rtl
8225
Jan 3 08:25:00 darkstar kernel: [ 846.314529] ADDRCONF(NETDEV_UP): wlan0: link is not ready

The above shows that the kernel module rtl8180 is initializing the device via interupt request 11. However, following the NETDEV_UP wlan0:link is not ready, we should see a NETDEV_CHANGE showing the link is ready. I was puzzled why the rtl-8185 was getting stuck.

Upon perusing LQ.org's forums, I decided to give the function keys on the keyboard a try. This didn't work on my lappy server, which is older and utilizes a bcm4306 (the bios indicates a resource conflict, which I'll explain in another blogpost sometime).

The function keys actually initialized wlan0 successfully. Immediately, the wifi led lit on the lappy, and I could see the following in messages:

Jan 3 20:04:55 darkstar kernel: [42842.155104] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Jan 3 20:04:55 darkstar kernel: [42842.155650] cfg80211: Calling CRDA for country: US
Jan 3 20:04:55 darkstar kernel: [42842.162047] cfg80211: Regulatory domain changed to country: US
Jan 3 20:04:55 darkstar kernel: [42842.162056] cfg80211:     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
Jan 3 20:04:55 darkstar kernel: [42842.162064] cfg80211:     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
Jan 3 20:04:55 darkstar kernel: [42842.162071] cfg80211:     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm)
Jan 3 20:04:55 darkstar kernel: [42842.162078] cfg80211:     (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Jan 3 20:04:55 darkstar kernel: [42842.162085] cfg80211:     (5490000 KHz - 5600000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Jan 3 20:04:55 darkstar kernel: [42842.162091] cfg80211:     (5650000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Jan 3 20:04:55 darkstar kernel: [42842.162098] cfg80211:     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)

Note: ignore the timestamp. This actually occured last night, although it wasn't untill this morning that I updated the system's time in the BIOS.

So I could not successfully scan networks via wlan0. However, attempting to authenticate with my network failed. This is a simple, wpa2 network (no md5 challenge or authenticating with RADIUS), setup in our house. My attempts to use wpa_supplicant failed. All attempts eventually timed out.

At this point I was really frustrated. I abandoned the lappy until the morning, assuming I would somehow need to try and find a different version module (either new or old), to test. According to Realtek's support site, the last released linux driver for the RTL-8185 was sometime in 2007 and has 2.6.x support. This would be okay except Slack14 is a modern OS utilizing long term stable kernel 3.2.29.

The solution turned out to be much more simple than that. This morning, I was messing around and found the following worked:

ifconfig wlan0 down
<function key sequence for wifi> #disable RTL-8185
rmmod rtl8180
<function key sequence for wifi> #enable RTL-8185
modprobe rtl8180
ifconfig wlan0 up
wpa_supplicant -Dwext -iwlan0 -c/etc/wpa_supplicant.conf

And suddenly I magically authenticated with my access point.

Confusingly enough, after a reboot I was able to authenticate without having to malarky around with removing and modprobing the module.

That was after I modified my BIOS time. I did notice weird behavior as a result of my system being so far back in time (things like x would hang and firefox was being weird), but I can't say for certain if that would also affect this kernel module. When I have the time, I'll devise an experiment to test this hypothesis.

- Slug

Sunday, May 19, 2013

PXE Installation Notes

I recently discovered my siblings old laptop. Note: this unit is newer than mine, and has a turion 64 processor. Enough said; I can load slack64 and multi-lib this sucker. (Plus my current slack-lappy-server, has a whopping 256mb of ram--wow!--and a celeron, okay i'll just stop i can hear you laughing.)

The only problem is that the newer system has no HD, and no optical drive. What can one do?

Never fear! In 1999 Intel and Systemsoft developed the Pre Execution Boot environment, aka PXE. This makes use of IPv4, DHCP, and a TFTP server.

So, we use a variation of a PXE boot environment to install over the slack14-32 usb environment I previously had, and upgrade it to slack14-64.

Although I've shamelessly (and without permission I might add), have borrowed previous authors material to repost on this blogspace, I will not do so to the awesome contributors of slackware. I believe Eric Hamleers (aka the amazing Alien Bob), is the author of the current Slackware README_PXE.txt file. So, I refer everyone to the aforementioned readme, which can be found in your slackware installation disk, under the /USB-AND-PXE-INSTALLERS directory. (A link is also provided at the end of this post under Sources.)

What I wanted to add is some notes from my experience. Now, I've actually setup this at least.. oh 4-7 times in my life, so I'm already very familiar with the process. If anyone needs any help--feel free to ask me and I'll be happy to comment. Otherwise, join and post your question on LQ.

a. I used the simple setup. Note: this is appropriate for most home /SOHO users. If you have a corp network/ laboratory, or systems that you do NOT want to boot linux, use the advanced setup. The difference in configuring dhcpd.conf is not that great, it simply allows you to define which systems to boot via MAC address.

a1. The readme does not explicitly state this, however my first time took me about a day to figure this simple thing out, so I'll be nice and share.

The sample configuration defines the DHCP & TFTP server as 192.168.0.1. So, when you start dhcpd (after having defined your /etc/dhcpd.conf of course), make sure to:

# bring up the interface you will use to serve pxe, i.e. Duh.

ifconfig eth0 up
ifconfig eth0 192.168.0.1
dhcpd

And then you should see a response stating dhcp is being served on 192.168.0.0/24.

b. Usually after getting everything setup, I usually just plugged in my ethernet cable from the server system to the target installation system. Although technically you should be using a crossover cable, in my experience with newer gigabit interfaces, the negotiation is handled automatically by the interface firmware. This was not the case in my current setup.I sat dumbfounded as the target system had a big fat no on screen after booting to pxe-boot mode:

Check Interface
Operating System not found

So I was like, wtf? I was about to consider buying a pcmcia gigabit nic, when I remembered I have a 5port asus gigabit switch. I figured, I can connect the server to the switch, check if the port activates, and do the same with the target system. Note: the target system's ethernet lights do NOT work, and neither did the servers activity lights activate, so it was a reasonable assumption to consider that one of the nics was fucked.

However, when I connected each nic to seperate ports on the switch, I finally got activity on the nics (except the target system). I figured, if the switch shows activity, I'll give it a shot. PHY (OSI level 1), troubleshooting is very straightforward, but a necessary step (sometimes).

Boot the target system into PXE-Boot and BAM, I hit jackpot. When this shit works, you'll know right away: you should see your system grap a dhcp lease almost immediately, and you'll be greated by the slackware installation screen:

So if you see this, that means your DHCP and TFTP server is successfully serving, and if you can boot, your kernels and initrd.gz from the installation disk are intact.

Part2

c. Installation via NFS:

Now, in the past I usually opted for samba installation (simply because I had roomates that use windblows and wanted access to my goods), however I've also used http installation (recommended if you have the time), but nfs is pretty damn simple. My error in this configuration was

/mirror/slackware       192.168.0.0/24(ro,sync,insecure,all_squash)

Now, that is the default recommended config. However, I was trying to be fancy. Instead of populating the installation in /mirror/slackware as mentioned above, (say like /mirror/slackware/slackware-14.64), I populated it in /iso and created a soft-link to the dir (ln -s /iso /mirror/slackware/slackware-14.64).

The install did not like this. It kept coming up with errors, and essentially told me to fuck off. I'm know that somewhere in man exports I can determine the option which will facilitate softlinks to work, however instead I just modified exports to the actual mirror dir:

/iso       192.168.0.0/24(ro,sync,insecure,all_squash)

And the installation was off!

Reasons to recommend PXE/TFTP installation:
a. 100mb ethernet interface is much faster than typical CD/DVD rom read speads. (A typical install from DVD takes a while. If you've done this a couple times, you'll be surprised how fast a 100mb install is. A gigabit install is LIGHTNING fast (dont even get me started on fiber channel links, when I have the hardware I will revisit this post).

b. You can host more than just slackware. In fact, at Intel I used this setup to host our RHEL6 installs. Any Linux can be hosted. In fact--I used a slack pxe/server to host Windows7! This requires configuring pxelinux.cfg/default a little differently than what is included in the installation disk. I remember I had to do this when a friend handed me a semi-new Sony Vaio laptop, which for some reason did NOT want to boot any of the win7 burned iso's I had (probably Sony dicking us with some bullshit firmware hacks). When I have the time, I'll make another post exclusively on Slack PXE serving Windows.

c. For shear practice. Seriously its good for the soul (or masochistic if your not technically inclined, but hey whatever you call it you'll be happy when it works--trust me).

Sources:

1. Slackware USB-AND-PXE-INSTALLERS/README_USB.txt: http://taper.alienbase.nl/mirrors/slackware/slackware64-14.0/usb-and-pxe-installers/README_PXE.TXT

Friday, May 17, 2013

Deciphering a Linux Call Trace, Part 1

Deciphering a Linux Call Trace (aka Crash Dump)

While working at Intel, my primary responsibility was finding bugs in the then prototype C600 SAS/SATA RAID chipset, linux kernel driver. What this basically meant was setup a huge storage configuration and do my best to break the living shit out of it. This wasn't usually hard, as when I started we could barely support 2 level Expander attached storage configurations, where SAS was the only option (SATA support came later). Nonetheless this was a ton of fun as it meant I got to play with linux every day I was at work, and better yet I was getting paid for it. (Note, I was a green badge, which means contract employee--though still very cool.)

Our performance was measured by how many bugs we could find in the driver. Although I'm sure the dev's hated me every time I walked over to their desk, or bombarded them with emails, I got to have a very good relationship with a few of them. This was about the time I realized my aspirations to contribute to the Linux open source community.

So, basically we would setup a configuration, using SAS/SATA expanders, and say fill one with 12 SAS drives, and use PHY > table routing E1 | Subtractive Routing > E2; to a 2nd expander, filled with say another 12 SAS drives (The exact routing configuration used depends on what the expander supports, although eventually the driver could support various configurations). I would create oh lets say 4 raid 5 arrays (6 disks each), and run IO with a variety of tools. (We primarily used JDSU's Medusa labs. JDSU laid a lot of fiber in late 90's - 2000's.)

The gold is when, after running IO for a period of time, the system would eventually crash. Unfortunately I don't have any of my old logs with me (and I'm sure Intel will be pissed if I shared them, considering I've shared enough already), but you will see something like this:

(example of a call trace, from a module written explicitly to crash)

BUG: unable to handle kernel NULL pointer dereference at (null) 

IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 

PGD 7a719067 PUD 7b2b3067 PMD 0 

Oops: 0002 [#1] SMP 

last sysfs file: /sys/devices/virtual/misc/kvm/uevent 

CPU 1 

Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64 

RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 

RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292 

RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7 

RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 

RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004 

R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000 

R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010 

FS:  00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000 

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 

CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0 

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 

Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0) 

Stack: 

ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 

 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 

 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000 

Call Trace: 

[<ffffffff8100205f>] do_one_initcall+0x59/0x154 

[<ffffffff8107aac9>] sys_init_module+0xd1/0x230 

[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b 

Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00 

RIP  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 

RSP <ffff88007ad4bf08> 

CR2: 0000000000000000

Now, I need to emphasize I am not an expert at decyphering these. Rather, I wanted to provide the information you fellow slackers/hackers/bums need to figure out what the fuck happened.

a. The first line indicates a pointer with a NULL value.

 > BUG: unable to handle kernel NULL pointer dereference at (null)

b. IP is the instruction Pointer

 > IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21

c. Oops: Designates error code value (hex). Each bit designates the following:

 > Oops: 0002 [#1] SMP  

 > bit 0 == 0 means no page found, 1 means a protection fault 

   bit 1 == 0 means read, 1 means write 

   bit 2 == 0 means kernel, 1 means user-mode 

   [#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as a    

      cascading effect of the first one. 

d.  CPU 1 > Which CPU the error occurred (On the XEON systems we tested on I swear we would have PAGES of these call traces its ridiculous).

e. Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64

 > PID: the process ID of the action performed

 > comm: insmod >> the command performed when shit hit the fan

 > Tainted: P

 as defined in kernel/panic.c :

  P — Proprietary module has been loaded. 

  F — Module has been forcibly loaded. 

  S — SMP with a CPU not designed for SMP. 

  R — User forced a module unload. 

  M — System experienced a machine check exception. 

  B — System has hit bad_page. 

  U — Userspace-defined naughtiness. 

  A — ACPI table overridden. 

  W — Taint on warning. 

 > 2.6.33.3-85.fc13.x86_64: the kernel utilized when oops occured

 So Tainted:P means we most likely loaded a proprietary module (even though this  was just a sample module written exclusively to crash--bear with me folks). Note: if you ever see a tainted kernel due to P, this is most likely due to a closed source module, and if you seek help from the community they will most likely point you to the software developer/ hardware mfg of the module/driver.

f. RIP is the CPU register containing the address of the instruction executed.

 > RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 

 > 0010 is the code segment register

 > my_oops_init+0x12/0x21 is the <symbol> + the offset length

g. The following is a dump of the listed CPU registers:
 > 
RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292 

RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7 

RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 

RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004 

R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000 

R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010

.

.

h. The following is a stack trace:

 > 

Stack: 

ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 

 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 

 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000

i. And now comes the call trace!

> 

Call Trace: 

[<ffffffff8100205f>] do_one_initcall+0x59/0x154 

[<ffffffff8107aac9>] sys_init_module+0xd1/0x230 

[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
 
> these are the functions being performed prior to the oops

Now, to really get into the meat of this, you will need a debuger. I am not very skilled at using these (yet!), however with time I will make a 2nd blog post outlineing how to figure out what the fuck is going on. 

Note: some tutorials will want you to use fancy utilities like crash, or kdump. Now, I have configured these and used these before, although I found the best way to truly capture these is to have an active serial port connection to a 2nd computer, that is actively recording the contents of your dump. (I would use uucp--which I can explain in another blog post. The recording is quite simple, just redirect output to a file, and in another terminal watch everything crash and burn.) Sure, it seems like a waste of a 2nd compy but in all honesty sometimes these utilities fail or the damn contents of the dump will become corrupted depending on how severe things became.

Thanks goes to Surya Prabhakar, whom I shamelessly borrowed most of this information from (some from memory but hey I needed a refresher and he did a good job of outlining whats going on).

Further reading:

a. Dedoimedo; http://www.dedoimedo.com/computers/crash-analyze.html#mozTocId45838 

b. Linuxforu: Understanding a Kernel Oops!; http://www.linuxforu.com/2011/01/understanding-a-kernel-oops/

Thursday, May 16, 2013

Slackware 14 USB Install

Okay, its been quite a while since I've shared anything.

Credit for the following post goes to Gareth Lowe, a LQ newbie (although the level of advice given in this post proves Gareth is anything BUT).

Unfortunately I didn't have much computer access for about oh a good 5 months. This was due to several reasons, although in retrospect I should've taken the initiative to hit up the local goodwill's in search of a barton amd system (a good 10 years old now, but still reliable in my opinion).

Anyways fast foward to April 14 2013 and I discover Slack14 has been released. There was an extra lappy sitting around my folks house, and not wanting to permenantly alter the HD data (and having no storage resources to perform a backup), I thought I'd go ahead and instal Slack14 on a good ol thumb drive. I briefly considered using Slax (which I've used before, and is quite good), although considering Slax is based on 12.0, I wouldn't settle until I had the real enchilada.

If I was in practice, I'm sure I could've figured this out. Although, I wasn't able to immediately, and I give a many humble thanks to Gareth for sharing this advice.

Now, onto the good stuff!

INSTRUCTIONS:

OK yall, a quick rundown on how I setup a usb HDD with a bootable Slackware 14 install today.

Firstly, I installed the base system from the CD, with my partition table looking a little something like this..

/dev/sda1   *     63                 1558304         779121           83 Linux                       /boot
/dev/sda2         1558305        9365894        3903795          82 Linux swap              swap
/dev/sda3   *     9365895       204684164     97659135       83 Linux                       /
/dev/sda4          204684165   1953519615   874417725+   7 HPFS/NTFS/exFAT /store

Now during the install I chose the simple LILO install option, into the MBR of /dev/sda.

After that I modified my initrd tree, firstly by deleting it, located at /boot/initrd-tree, and then by running mkinitrd, which will give us a fresh tree and populate it. Now to get a working initrd image to bring up the usb disk on boot, first I had to modify /boot/initrd-tree/wait-for-root and set the value to something in seconds to allow the drive time to come up once the modules are loaded. I set mine to 15. Next I modified the fstab to only reference UUIDs when mounting the disks, as If you are using this on different machines, you may find the disk moves locations depending on how many disks are in said machine, ie : /dev/sda if singlular, or first drive (unlikely, given USB) becomes /dev/sdb or c, if detected after others. This way the drive is always referenced correctly and you dont get kernel panics. You can get the UUID of your partitions by ‘blkid’.

My fstab is as follows:

UUID=af7efa55-2f37-415a-b131-130d2accbd5d        swap             swap       defaults         0   0
UUID=ddee4a6a-900d-494e-9573-acb6fd371faf        /                    ext4         defaults         1   1
UUID=dac53074-92d8-4fb1-abc9-0bd0f0631102       /boot             ext2        defaults         1   2
UUID=3E58608D586045AD        /store        ntfs        fmask=111,dmask=000 1   0
#/dev/cdrom      /mnt/cdrom      auto        noauto,owner,ro,comment=x-gvfs-show 0   0
/dev/fd0             /mnt/floppy       auto        noauto,owner         0   0
devpts               /dev/pts            devpts     gid=5,mode=620   0   0
proc                  /proc                 proc        defaults                   0   0
tmpfs                /dev/shm          tmpfs       defaults                   0   0

Next I built my initrd image using the command ‘mkinitrd -s /boot/initrd-tree -k 3.2.29-smp -m ehci-hcd:uhci-hcd:usb-storage -f ext4 -o /boot/initrd.gz’. You will note the modules we are placing into the image, these will be loaded and allow the drive to be initialised, and the root fs duties to be handed off to it. Change the variables for kernel and filesystem as needed.

Lastly, I configured and reinstalled LILO. Again, like the example above, we want to modify it to only use UUIDs as reference, and to add in the lines for our initrd. Make sure you place the initrd line above the root line when configuring, it denotes hierachy.

My lilo.conf entries, dont forget to change the boot line to the target HDD.

boot = /dev/sdc
image = /boot/vmlinuz
initrd = /boot/initrd.gz
root = “UUID=ddee4a6a-900d-494e-9573-acb6fd371faf”
label = Slack14
read-only

Lastly, run ‘lilo -v’ to commit the whole thing to the MBR.

Sources:

1. Linuxquestions.org: http://www.linuxquestions.org/questions/slackware-installation-40/slackware-14-usb-hdd-install-4175457861/

2. Blogger.com: http://unsoundadvice.wordpress.com/2013/04/12/slackware-14-usb-hdd-install/

Notes:
a. Initrd:
i. Del initrd tree: /boot/initrd-tree
ii. run mkinitrd
iii. modify /boot/initrd-tree/wait-for-root to 15s

b. Use UUID's in /etc/fstab to identify partitions

c. Create initrd via the following command:
mkinitrd -s /boot/initrd-tree -k 3.2.29-smp -m ehci-hcd:uhci-hcd:usb-storage -f ext4 -o /boot/initrd.gz

It is important to include the ehci-hcd, uhci-hcd, and usb-storage modules, as these modules are the drivers which allow the system to load usb storage devices upon boot, and make it the root filesystem.

d. Modifly Lilo.conf as in example above. Make sure to denote root partition by UUID, and include the initrd.gz file in the configuration to ensure loading the aforementioned modules.

I am using this to boot off of my 32gb verbatim thumb drive. And its reasonably fast! (Excluding the 15s wait.)

- Slug