Tuesday, June 11, 2013

Wireless Repeater via DD-WRT

Currently listening to: Deftones, Diamond Eyes album 2010.

For those of you who just want the meat of how to setup ddwrt, and don't give a damn about my little story, scroll down until you see the screenshot and start reading the paragraph above that. (Oh, and screw you!)

So, the turion lappy (who I've happily named neptune64), is temporarily being held hostage by a group of mexican thugs (no, seriously). (Well not quite hostage but they've requested a ransom--at a modest price, so I can't complain. Ah the perils of the physical world).

This leaves me entirely dependent on my old celeron lappy (which was meant to be purposed as a server; in fact, this was the system I setup to host a pxe server and nfs). My Celeron lappy, which I've happily named centauri, has a Family 15 cpu, Model 2, Stepping 9, 128KB cache, 2790.8 mhz, which I recently upgraded from 256MB of PC2700 ddr, to 512 MB at PC2100 (sure 2700 is faster, but twice the ram is much faster!). When I have money I'll upgrade this sucker to 4GB. Too bad there aren't more ram slots or I could use Physical Address Extension (PAE).

As I somewhat explained in a previous blogpost, centauri has a problem with the internal wireless adapter (bcm4306). Whereas before the BIOS would boot stating a IRQ resource conflict with the bcm4306 as the culprit, now lspci -v fails to even show the device present in the system). So, attempting any software hacks will definetely not work at this point--we need a new physical solution.

So, sometime in the past 7 years I came across a Linksys WRT54GL. I rarely used it, as I would rely on my 5-port gigabit airlink switch instead (and later my asus 8-port gigabit switch). Eventually, I used slackware to setup my own router via gigabit interfaces using a bridge (in conjunction with the 8-port-gigabit-asus), I decided to let the Linksys go to my parents where they could use it in the house. Since I am currently unemployed and living with my parents (bummer!), I had to fix the network a couple of times. I am the IT admin/ lackey/ janitor here, and my pay is no rent plus food (not a bad deal if you ask me, although when I do find work I'd love to upgrade the dsl connection to something > than 1.5 MB).

So, I was told the Linksys no longer worked. My family went and purchased a wireless N capable Dlink. However, they were having a host of other problems. Turns out the actiontec modem/ap was broadcasting one ssid, and the dlink was broadcasting an entirely different network. Little did they know even though they purchased the dlink, they weren't in fact actually using it. All of the clients would connect to the airlink, that is, everyone except the little netflix streaming roku, which I have yet to explain why.

So I had to consolidate everything to one network, disable the ssid on the actiontec, and bingo everything works on the dlink (while of course setting the dlink on a seperate lan, 192.168.1.x).

Note: do not read the next paragraph unless you are absolutely curious as to the process of what I had to endure in order to fix my home network. If you truly don't care, I promise I wont be upset. Also, it may confuse most of you. Those of you who are interested purely for the challenge, feel free to comment on my gimped setup (i.e. seasoned *nix users, I welcome your input).

(That was the shortened instruction set. Most people would disable dhcp entirely on the first router, or place it in bridge mode (I seriously think only 2wire routers have this option). And, since most instructions would have you connect the first ap to the 2nd ap via the lan ports, I was having trouble passing NAT and DNS via the 2nd ap's wifi. Since there is no bridge mode on the actiontec, I opted to simply leave dhcp on in the first ap/modem while disabling the ssid, set the 2nd ap on a seperate lan, while connecting ap 1 to ap 2 via the wan port. Although a bit convoluted, I no longer have issues with dns. Well, mostly. Most of the windows clients, except my little sisters laptop and mine, which I had to hardcode dns in /etc/resolv.conf, oh and her ipod. Oh, and get this, the actiontec will randomly re-enable the ssid, simply because it feels compelled to be the boss. I've seriously had to disable it like 5 times already. It's frustrating.)

If anyone needs help with a similar setup, feel free to comment / Email me.

So anyways, I was told the Linksys no longer works. I called bullshit (especially considering the mess my family of computer geniuses left everything in). I perused the settings to see if there was anything remote to using the AP as a repeater, or set it up via a wifi wpa2 bridge, but nothing was in the linksys firmware. Now, I had originally intended to use openwrt for this project, however although I am certainly not opposed to the *nix style environment (I'd actually prefer this), according to the wiki there are a host of packages you need to download in order to get a wpa2 bridge going:
http://wiki.openwrt.org/oldwiki/wirelessbridgewithwpahowto?s[]=wireless&s[]=repeater .

Furthermore, the setup isn't exactly straightforward. That and considering I have a deadline on some projects I'm working on (note: submitting resumes to find jobs--there is a contract I'm trying to settle as we speak), I figured I'd settle for a working solution for now until I have the time to setup the environment I'd prefer (this is a trade-off I did when I first started using linux--my first home distro was fedora. That plus my redhat training made my transition to slack much smoother).

So, my instructions were gleamed from Brian Purdy's post on lifehacker. I will do you folks the favor of simplifying his post. It looks like he had to do a lot of extra work, my setup was actually pretty simple.

First, go to ddwrt's site http://www.dd-wrt.com/site/index . Next, lookup your router in the router DB, and browse to the appropriate link. According to Brian, the micro firmware will suit our purposes just fine. (This is acceptable, since my next upgrade will be openwrt). He mentions that you should powercycle the hell out of your router, although I found I had no such need to do so. Simply go your routers homepage and find the appropriate link: mine was Linksys > Administration > firmware link, and begin the upgrade by loading the micro.bin firmware. (Note, if your router doesn't have a webgui option to load firmware, you may have to utilize tftp. Consult the dd-wrt wiki for more info). You should see a "Upgrade is Successful," message appear (sorry guys I didn't take a screenshot, but it is a very simple webpage). Afterwards, your router will reboot, and you'll need to re-authenticate with the following credentials:

username: root
password: admin

(It took me a couple tries to figure it out.. I know I ride the short bus, bare with me.)

Next, comes for the configuration:


A. Edit Wireless: Wireless Tab (Basic Settings)
 > Switch wireless mode to repeater
 > For wireless network name, input the SSID of the network you will be rebroadcasting (or repeating).
 >> Save settings (do not apply just yet)
 > Below the main section you edited is a Virtual Interfaces section. Add 1 virtual interface
 > Add a NEW name for your repeater (i.e., the original SSID appended with a 2, which is what I did. Or you can use an entirely different SSID).
 >> Save settings (do not apply just yet)
 >> Head to wireless security subtab
 > Ensure you use the same security settings your primary router/wifi access point utilizes in both the primary and virtual interfaces. For WPA2, take care to notice whether you use TKIP, AES, or both.
  >> Save settings (do not apply just yet)

B. Network configuration: Network setup tab (Basic Configuration)
 >  Alter the routers Local IP Address to something different than the primary access point. I.e. if your main router uses 192.168.1.1, you can use 192.168.2.1 (which is what I did).
 >> Save settings (do not apply just yet)
 >> Switch to the Security subtab (Still under Main Network Setup tab)
 > disable SPI firewall
 > Under Block WAN requests, disable the following:
 - Block Anonymous WAN Requests (ping)
 - Filter WAN NAT Redirection
 - Filter IDENT (Port 113)
 > Leave Filter Multicast disabled
(Note: the above settings are to ensure the simplest configuration in case anything goes wrong. If you feel compelled to re-enable them after your configuration is working, feel free to do so and report your results).
 >> Save settings (and for the love of god don't apply yet!)
 > Head over to the administration, and for Pete's sake--change the password to something you can remember (if you haven't already done so).
 >> Once again, save settings. Now you can Apply!

So first things first, since you changed the lan ip your ap is using, you will need to renew your dhcp lease for your interface. Now in my configuration, this ap repeater is providing internet over ethernet to my gimped celeron lappy. For those of you who are using this over wireless, configure your wireless as normal.

Best thing is to simply bring down the interface, and re-initialize it. This way, the routing table will be reset. When I first tried it I noticed it was still trying to use 192.168.1.1 as the primary gateway under route -a.

After you have established a link over your desired interface, perform a basic network check:
 > ping your accesspoint, i.e. in my case 192.168.2.1. Also a good time to see if you can browse to your repeater ap, and to test your new login credentials.
 > if this is good, now try pinging the primary access point (in my case 192.168.0.1)
 > if this is good, you should also be able to browse to the primary ap's interface (a good check).
 > Now, hold your breath, a real WAN test. Ping the following IP (which i'm told is a DNS for Verizon): 4.2.2.2
 > If the above works, you are online! Now, for a dns test: ping your favorite website, i.e. slugman01.blogger.com
 > if you receive replies, you are golden. If not, you may need to hardcode the dns listed in your modem/primary ap's page in /etc/resolv.conf

At this point you should be able to browse the interwebs. Note: if you had your browser open prior to this point, you may need to restart it if you have problems loading webpages. For some reason, even after the above network test confirmed I was online, firefox hung on loading basic webpages. Restarting it did the trick.

If you experience any problems, feel free to post here and I'll do the best I can to help. Important points to remember are:
 > ensure your physical interface is working properly. If it isn't, you'll fail right off the bat when you try to ping your access points.
 > if you can ping your repeater access point, but not the primary, doublecheck your routing table to ensure it is using the correct primary gateway. A simple ifconfig interface down; ifconfig interface up will clear the routing table. If you are statically assigning your addresses you can setup via ifconfig and add the gateway via route as normal. Otherwise, if you are using a dhcp lease then make sure to kill the process id (or killall -9), the process for the dhcp application (in my case, dhcpcd), prior to re-initializing the interfaces, or it may screw up when it tries to grab the new lease.
 > if you can ping & browse your primary access point, but can't ping WAN (4.2.2.2), make sure your primary access point doesn't have ping requests blocked, or has its firewall disabled. (Remember, in my case I have 3 access points, the modem/ap, the dlink ap, and my repeater. The dlink provides the firewall.) Or, it may be possible you temporarily lost internebs while seting up: check the status page of your modem/ primary ap to doublecheck.

If you are fortunate enough to have a linux system connected to the primary ap/ or a windows system with putty, or any *nix environment with ssh, try making sure they can ping said IP- 4.2.2.2 . If they can't ping it, but can still browse, its likely ping requests have been disabled from the primary ap. I recommend re-enabling ping just to make sure you can perform the "ping a domain name," test afterwards. It really helps to narrow down if you are having a WAN or DNS issue.

Good hunting!

- Slug

Monday, May 20, 2013

Wierd issues with RTL-8185

I'll make this one short.

So I finally got Slack14.64 going on the Turion lappy. I could see that the interface was recognized, but attempting to scan failed:

iwlist wlan0 scan

interface doesn't support scanning

Upon further inspection, I noticed the following in /var/log/messages:

Jan  3 08:22:23 darkstar kernel: [  689.812525] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Jan  3 08:24:33 darkstar kernel: [  819.813053] rtl8180 0000:06:09.0: PCI INT A disabled
Jan  3 08:24:44 darkstar kernel: [  830.320068] rtl8180 0000:06:09.0: PCI INT A -> Link[LNK1] -> GSI
 11 (level, high) -> IRQ 11
Jan  3 08:24:44 darkstar kernel: [  830.445461] ieee80211 phy1: hwaddr 00c0a8d3d0da, RTL8185vD + rtl
8225
Jan  3 08:25:00 darkstar kernel: [  846.314529] ADDRCONF(NETDEV_UP): wlan0: link is not ready

The above shows that the kernel module rtl8180 is initializing the device via interupt request 11. However, following the NETDEV_UP wlan0:link is not ready, we should see a NETDEV_CHANGE showing the link is ready. I was puzzled why the rtl-8185 was getting stuck.

Upon perusing LQ.org's forums, I decided to give the function keys on the keyboard a try. This didn't work on my lappy server, which is older and utilizes a bcm4306 (the bios indicates a resource conflict, which I'll explain in another blogpost sometime).

The function keys actually initialized wlan0 successfully. Immediately, the wifi led lit on the lappy, and I could see the following in messages:

Jan  3 20:04:55 darkstar kernel: [42842.155104] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Jan  3 20:04:55 darkstar kernel: [42842.155650] cfg80211: Calling CRDA for country: US
Jan  3 20:04:55 darkstar kernel: [42842.162047] cfg80211: Regulatory domain changed to country: US
Jan  3 20:04:55 darkstar kernel: [42842.162056] cfg80211:     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
Jan  3 20:04:55 darkstar kernel: [42842.162064] cfg80211:     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
Jan  3 20:04:55 darkstar kernel: [42842.162071] cfg80211:     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm)
Jan  3 20:04:55 darkstar kernel: [42842.162078] cfg80211:     (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Jan  3 20:04:55 darkstar kernel: [42842.162085] cfg80211:     (5490000 KHz - 5600000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Jan  3 20:04:55 darkstar kernel: [42842.162091] cfg80211:     (5650000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Jan  3 20:04:55 darkstar kernel: [42842.162098] cfg80211:     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)

Note: ignore the timestamp. This actually occured last night, although it wasn't untill this morning that I updated the system's time in the BIOS.

So I could not successfully scan networks via wlan0. However, attempting to authenticate with my network failed. This is a simple, wpa2 network (no md5 challenge or authenticating with RADIUS), setup in our house. My attempts to use wpa_supplicant failed. All attempts eventually timed out.

At this point I was really frustrated. I abandoned the lappy until the morning, assuming I would somehow need to try and find a different version module (either new or old), to test. According to Realtek's support site, the last released linux driver for the RTL-8185 was sometime in 2007 and has 2.6.x support. This would be okay except Slack14 is a modern OS utilizing long term stable kernel 3.2.29.

The solution turned out to be much more simple than that. This morning, I was messing around and found the following worked:

ifconfig wlan0 down
<function key sequence for wifi> #disable RTL-8185
rmmod rtl8180
<function key sequence for wifi> #enable RTL-8185
modprobe rtl8180
ifconfig wlan0 up
wpa_supplicant -Dwext -iwlan0 -c/etc/wpa_supplicant.conf

And suddenly I magically authenticated with my access point.

Confusingly enough, after a reboot I was able to authenticate without having to malarky around with removing and modprobing the module.

That was after I modified my BIOS time. I did notice weird behavior as a result of my system being so far back in time (things like x would hang and firefox was being weird), but I can't say for certain if that would also affect this kernel module. When I have the time, I'll devise an experiment to test this hypothesis.

- Slug
 

Sunday, May 19, 2013

PXE Installation Notes

I recently discovered my siblings old laptop. Note: this unit is newer than mine, and has a turion 64 processor. Enough said; I can load slack64 and multi-lib this sucker. (Plus my current slack-lappy-server, has a whopping 256mb of ram--wow!--and a celeron, okay i'll just stop i can hear you laughing.)

The only problem is that the newer system has no HD, and no optical drive. What can one do?

Never fear! In 1999 Intel and Systemsoft developed the Pre Execution Boot environment, aka PXE. This makes use of IPv4, DHCP, and a TFTP server.

So, we use a variation of a PXE boot environment to install over the slack14-32 usb environment I previously had, and upgrade it to slack14-64.

Although I've shamelessly (and without permission I might add), have borrowed previous authors material to repost on this blogspace, I will not do so to the awesome contributors of slackware. I believe Eric Hamleers (aka the amazing Alien Bob), is the author of the current Slackware README_PXE.txt file. So, I refer everyone to the aforementioned readme, which can be found in your slackware installation disk, under the /USB-AND-PXE-INSTALLERS directory. (A link is also provided at the end of this post under Sources.)

What I wanted to add is some notes from my experience. Now, I've actually setup this at least.. oh 4-7 times in my life, so I'm already very familiar with the process. If anyone needs any help--feel free to ask me and I'll be happy to comment. Otherwise, join and post your question on LQ.

a. I used the simple setup. Note: this is appropriate for most home /SOHO users. If you have a corp network/ laboratory, or systems that you do NOT want to boot linux, use the advanced setup. The difference in configuring dhcpd.conf is not that great, it simply allows you to define which systems to boot via MAC address.

a1. The readme does not explicitly state this, however my first time took me about a day to figure this simple thing out, so I'll be nice and share.

The sample configuration defines the DHCP & TFTP server as 192.168.0.1. So, when you start dhcpd (after having defined your /etc/dhcpd.conf of course), make sure to:

# bring up the interface you will use to serve pxe, i.e. Duh.

ifconfig eth0 up                           
ifconfig eth0 192.168.0.1
dhcpd

And then you should see a response stating dhcp is being served on 192.168.0.0/24.

b. Usually after getting everything setup, I usually just plugged in my ethernet cable from the server system to the target installation system. Although technically you should be using a crossover cable, in my experience with newer gigabit interfaces, the negotiation is handled automatically by the interface firmware. This was not the case in my current setup.I sat dumbfounded as the target system had a big fat no on screen after booting to pxe-boot mode:

Check Interface
Operating System not found

So I was like, wtf? I was about to consider buying a pcmcia gigabit nic, when I remembered I have a 5port asus gigabit switch. I figured, I can connect the server to the switch, check if the port activates, and do the same with the target system. Note: the target system's ethernet lights do NOT work, and neither did the servers activity lights activate, so it was a reasonable assumption to consider that one of the nics was fucked.

However, when I connected each nic to seperate ports on the switch, I finally got activity on the nics (except the target system). I figured, if the switch shows activity, I'll give it a shot. PHY (OSI level 1), troubleshooting is very straightforward, but a necessary step (sometimes).

Boot the target system into PXE-Boot and BAM, I hit jackpot. When this shit works, you'll know right away: you should see your system grap a dhcp lease almost immediately, and you'll be greated by the slackware installation screen:








So if you see this, that means your DHCP and TFTP server is successfully serving, and if you can boot, your kernels and initrd.gz from the installation disk are intact.

Part2

c. Installation via NFS:

Now, in the past I usually opted for samba installation (simply because I had roomates that use windblows and wanted access to my goods), however I've also used http installation (recommended if you have the time), but nfs is pretty damn simple. My error in this configuration was

/mirror/slackware       192.168.0.0/24(ro,sync,insecure,all_squash)


Now, that is the default recommended config. However, I was trying to be fancy. Instead of populating the installation in /mirror/slackware as mentioned above, (say like /mirror/slackware/slackware-14.64), I populated it in /iso and created a soft-link to the dir (ln -s /iso /mirror/slackware/slackware-14.64).

The install did not like this. It kept coming up with errors, and essentially told me to fuck off. I'm know that somewhere in man exports I can determine the option which will facilitate softlinks to work, however instead I just modified exports to the actual mirror dir:

/iso       192.168.0.0/24(ro,sync,insecure,all_squash)


And the installation was off!

Reasons to recommend PXE/TFTP installation:
a. 100mb ethernet interface is much faster than typical CD/DVD rom read speads. (A typical install from DVD takes a while. If you've done this a couple times, you'll be surprised how fast a 100mb install is. A gigabit install is LIGHTNING fast (dont even get me started on fiber channel links, when I have the hardware I will revisit this post). 

b. You can host more than just slackware. In fact, at Intel I used this setup to host our RHEL6 installs. Any Linux can be hosted. In fact--I used a slack pxe/server to host Windows7! This requires configuring pxelinux.cfg/default a little differently than what is included in the installation disk. I remember I had to do this when a friend handed me a semi-new Sony Vaio laptop, which for some reason did NOT want to boot any of the win7 burned iso's I had (probably Sony dicking us with some bullshit firmware hacks). When I have the time, I'll make another post exclusively on Slack PXE serving Windows.

c. For shear practice. Seriously its good for the soul (or masochistic if your not technically inclined, but hey whatever you call it you'll be happy when it works--trust me).

Sources:

1. Slackware USB-AND-PXE-INSTALLERS/README_USB.txt: http://taper.alienbase.nl/mirrors/slackware/slackware64-14.0/usb-and-pxe-installers/README_PXE.TXT

Friday, May 17, 2013

Deciphering a Linux Call Trace, Part 1

Deciphering a Linux Call Trace (aka Crash Dump)

While working at Intel, my primary responsibility was finding bugs in the then prototype C600 SAS/SATA RAID chipset, linux kernel driver. What this basically meant was setup a huge storage configuration and do my best to break the living shit out of it. This wasn't usually hard, as when I started we could barely support 2 level Expander attached storage configurations, where SAS was the only option (SATA support came later). Nonetheless this was a ton of fun as it meant I got to play with linux every day I was at work, and better yet I was getting paid for it. (Note, I was a green badge, which means contract employee--though still very cool.)

Our performance was measured by how many bugs we could find in the driver. Although I'm sure the dev's hated me every time I walked over to their desk, or bombarded them with emails, I got to have a very good relationship with a few of them. This was about the time I realized my aspirations to contribute to the Linux open source community.

So, basically we would setup a configuration, using SAS/SATA expanders, and say fill one with 12 SAS drives, and use PHY > table routing E1 | Subtractive Routing > E2; to a 2nd expander, filled with say another 12 SAS drives (The exact routing configuration used depends on what the expander supports, although eventually the driver could support various configurations). I would create oh lets say 4 raid 5 arrays (6 disks each), and run IO with a variety of tools. (We primarily used JDSU's Medusa labs. JDSU laid a lot of fiber in late 90's - 2000's.)

The gold is when, after running IO for a period of time, the system would eventually crash. Unfortunately I don't have any of my old logs with me (and I'm sure Intel will be pissed if I shared them, considering I've shared enough already), but you will see something like this:

(example of a call trace, from a module written explicitly to crash)

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
PGD 7a719067 PUD 7b2b3067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/virtual/misc/kvm/uevent
CPU 1
Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64
RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
FS:  00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0)
Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
RIP  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP <ffff88007ad4bf08>
CR2: 0000000000000000
 
Now, I need to emphasize I am not an expert at decyphering these. Rather, I wanted to provide the information you fellow slackers/hackers/bums need to figure out what the fuck happened.
 
a. The first line indicates a pointer with a NULL value.
 > BUG: unable to handle kernel NULL pointer dereference at (null)
 
b. IP is the instruction Pointer
 > IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21
 
c. Oops: Designates error code value (hex). Each bit designates the following:
 > Oops: 0002 [#1] SMP  
 > bit 0 == 0 means no page found, 1 means a protection fault 
   bit 1 == 0 means read, 1 means write 
   bit 2 == 0 means kernel, 1 means user-mode 
   [#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as a    
      cascading effect of the first one. 
 
d.  CPU 1 > Which CPU the error occurred (On the XEON systems we tested on I swear we would have PAGES of these call traces its ridiculous).

e. Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64
 > PID: the process ID of the action performed
 > comm: insmod >> the command performed when shit hit the fan
 > Tainted: P
 as defined in kernel/panic.c :
  P — Proprietary module has been loaded. 
  F — Module has been forcibly loaded. 
  S — SMP with a CPU not designed for SMP. 
  R — User forced a module unload. 
  M — System experienced a machine check exception. 
  B — System has hit bad_page. 
  U — Userspace-defined naughtiness. 
  A — ACPI table overridden. 
  W — Taint on warning. 
 > 2.6.33.3-85.fc13.x86_64: the kernel utilized when oops occured
 
 So Tainted:P means we most likely loaded a proprietary module (even though this  was just a sample module written exclusively to crash--bear with me folks). Note: if you ever see a tainted kernel due to P, this is most likely due to a closed source module, and if you seek help from the community they will most likely point you to the software developer/ hardware mfg of the module/driver.
 
f. RIP is the CPU register containing the address of the instruction executed.
 > RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 
 > 0010 is the code segment register
 > my_oops_init+0x12/0x21 is the <symbol> + the offset length
 
g. The following is a dump of the listed CPU registers:
 >
RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
.
.
 
h. The following is a stack trace:
 > 
Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
 
i. And now comes the call trace!
>
Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
 
> these are the functions being performed prior to the oops
 
 
Now, to really get into the meat of this, you will need a debuger. I am not very skilled at using these (yet!), however with time I will make a 2nd blog post outlineing how to figure out what the fuck is going on. 
 
Note: some tutorials will want you to use fancy utilities like crash, or kdump. Now, I have configured these and used these before, although I found the best way to truly capture these is to have an active serial port connection to a 2nd computer, that is actively recording the contents of your dump. (I would use uucp--which I can explain in another blog post. The recording is quite simple, just redirect output to a file, and in another terminal watch everything crash and burn.) Sure, it seems like a waste of a 2nd compy but in all honesty sometimes these utilities fail or the damn contents of the dump will become corrupted depending on how severe things became.
 
Thanks goes to , whom I shamelessly borrowed most of this information from (some from memory but hey I needed a refresher and he did a good job of outlining whats going on).
 
Further reading:
 
 
b. Linuxforu: Understanding a Kernel Oops!; http://www.linuxforu.com/2011/01/understanding-a-kernel-oops/
 
 

Thursday, May 16, 2013

Slackware 14 USB Install


Okay, its been quite a while since I've shared anything.

Credit for the following post goes to Gareth Lowe, a LQ newbie (although the level of advice given in this post proves Gareth is anything BUT).

Unfortunately I didn't have much computer access for about oh a good 5 months. This was due to several reasons, although in retrospect I should've taken the initiative to hit up the local goodwill's in search of a barton amd system (a good 10 years old now, but still reliable in my opinion).

Anyways fast foward to April 14 2013 and I discover Slack14 has been released. There was an extra lappy sitting around my folks house, and not wanting to permenantly alter the HD data (and having no storage resources to perform a backup), I thought I'd go ahead and instal Slack14 on a good ol thumb drive. I briefly considered using Slax (which I've used before, and is quite good), although considering Slax is based on 12.0, I wouldn't settle until I had the real enchilada.

If I was in practice, I'm sure I could've figured this out. Although, I wasn't able to immediately, and I give a many humble thanks to Gareth for sharing this advice.

Now, onto the good stuff!

INSTRUCTIONS:


OK yall, a quick rundown on how I setup a usb HDD with a bootable Slackware 14 install today.
Firstly, I installed the base system from the CD, with my partition table looking a little something like this..

/dev/sda1   *     63                 1558304         779121           83  Linux                       /boot
/dev/sda2         1558305        9365894        3903795          82  Linux swap              swap
/dev/sda3   *     9365895       204684164     97659135       83  Linux                       /
/dev/sda4          204684165   1953519615   874417725+   7  HPFS/NTFS/exFAT  /store

Now during the install I chose the simple LILO install option, into the MBR of /dev/sda.

After that I modified my initrd tree, firstly by deleting it, located at /boot/initrd-tree, and then by running mkinitrd, which will give us a fresh tree and populate it. Now to get a working initrd image to bring up the usb disk on boot, first I had to modify /boot/initrd-tree/wait-for-root and set the value to something in seconds to allow the drive time to come up once the modules are loaded. I set mine to 15. Next I modified the fstab to only reference UUIDs when mounting the disks, as If you are using this on different machines, you may find the disk moves locations depending on how many disks are in said machine, ie : /dev/sda if singlular, or first drive (unlikely, given USB) becomes /dev/sdb or c, if detected after others. This way the drive is always referenced correctly and you dont get kernel panics. You can get the UUID of your partitions by ‘blkid’.

My fstab is as follows:

UUID=af7efa55-2f37-415a-b131-130d2accbd5d        swap             swap       defaults         0   0
UUID=ddee4a6a-900d-494e-9573-acb6fd371faf        /                    ext4         defaults         1   1
UUID=dac53074-92d8-4fb1-abc9-0bd0f0631102       /boot             ext2        defaults         1   2
UUID=3E58608D586045AD        /store        ntfs        fmask=111,dmask=000 1   0
#/dev/cdrom      /mnt/cdrom      auto        noauto,owner,ro,comment=x-gvfs-show 0   0
/dev/fd0             /mnt/floppy       auto        noauto,owner         0   0
devpts               /dev/pts            devpts     gid=5,mode=620   0   0
proc                  /proc                 proc        defaults                   0   0
tmpfs                /dev/shm          tmpfs       defaults                   0   0

Next I built my initrd image using the command ‘mkinitrd -s /boot/initrd-tree -k 3.2.29-smp -m ehci-hcd:uhci-hcd:usb-storage -f ext4  -o /boot/initrd.gz’. You will note the modules we are placing into the image, these will be loaded and allow the drive to be initialised, and the root fs duties to be handed off to it. Change the variables for kernel and filesystem as needed.

Lastly, I configured and reinstalled LILO. Again, like the example above, we want to modify it to only use UUIDs as reference, and to add in the lines for our initrd. Make sure you place the initrd line above the root line when configuring, it denotes hierachy.

My lilo.conf entries, dont forget to change the boot line to the target HDD.

boot = /dev/sdc
image = /boot/vmlinuz
initrd = /boot/initrd.gz
root = “UUID=ddee4a6a-900d-494e-9573-acb6fd371faf”
label = Slack14
read-only

Lastly, run ‘lilo -v’ to commit the whole thing to the MBR.


Sources:

1. Linuxquestions.org: http://www.linuxquestions.org/questions/slackware-installation-40/slackware-14-usb-hdd-install-4175457861/

2. Blogger.com: http://unsoundadvice.wordpress.com/2013/04/12/slackware-14-usb-hdd-install/

Notes:
a. Initrd:
 i.   Del initrd tree: /boot/initrd-tree
 ii.  run mkinitrd
 iii.  modify /boot/initrd-tree/wait-for-root to 15s

b. Use UUID's in /etc/fstab to identify partitions

c. Create initrd via the following command:
mkinitrd -s /boot/initrd-tree -k 3.2.29-smp -m ehci-hcd:uhci-hcd:usb-storage -f ext4  -o /boot/initrd.gz

It is important to include the ehci-hcd, uhci-hcd, and usb-storage modules, as these modules are the drivers which allow the system to load usb storage devices upon boot, and make it the root filesystem.

d.  Modifly Lilo.conf as in example above. Make sure to denote root partition by UUID, and include the initrd.gz file in the configuration to ensure loading the aforementioned modules.

I am using this to boot off of my 32gb verbatim thumb drive. And its reasonably fast! (Excluding the 15s wait.)

- Slug