SCSI subsystem initialized: Deciphering a Linux Call Trace, Part 1

Deciphering a Linux Call Trace (aka Crash Dump)

While working at Intel, my primary responsibility was finding bugs in the then prototype C600 SAS/SATA RAID chipset, linux kernel driver. What this basically meant was setup a huge storage configuration and do my best to break the living shit out of it. This wasn't usually hard, as when I started we could barely support 2 level Expander attached storage configurations, where SAS was the only option (SATA support came later). Nonetheless this was a ton of fun as it meant I got to play with linux every day I was at work, and better yet I was getting paid for it. (Note, I was a green badge, which means contract employee--though still very cool.)

Our performance was measured by how many bugs we could find in the driver. Although I'm sure the dev's hated me every time I walked over to their desk, or bombarded them with emails, I got to have a very good relationship with a few of them. This was about the time I realized my aspirations to contribute to the Linux open source community.

So, basically we would setup a configuration, using SAS/SATA expanders, and say fill one with 12 SAS drives, and use PHY > table routing E1 | Subtractive Routing > E2; to a 2nd expander, filled with say another 12 SAS drives (The exact routing configuration used depends on what the expander supports, although eventually the driver could support various configurations). I would create oh lets say 4 raid 5 arrays (6 disks each), and run IO with a variety of tools. (We primarily used JDSU's Medusa labs. JDSU laid a lot of fiber in late 90's - 2000's.)

The gold is when, after running IO for a period of time, the system would eventually crash. Unfortunately I don't have any of my old logs with me (and I'm sure Intel will be pissed if I shared them, considering I've shared enough already), but you will see something like this:

(example of a call trace, from a module written explicitly to crash)

BUG: unable to handle kernel NULL pointer dereference at (null) 

IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 

PGD 7a719067 PUD 7b2b3067 PMD 0 

Oops: 0002 [#1] SMP 

last sysfs file: /sys/devices/virtual/misc/kvm/uevent 

CPU 1 

Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64 

RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 

RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292 

RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7 

RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 

RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004 

R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000 

R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010 

FS:  00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000 

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 

CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0 

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 

Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0) 

Stack: 

ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 

 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 

 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000 

Call Trace: 

[<ffffffff8100205f>] do_one_initcall+0x59/0x154 

[<ffffffff8107aac9>] sys_init_module+0xd1/0x230 

[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b 

Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00 

RIP  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 

RSP <ffff88007ad4bf08> 

CR2: 0000000000000000

Now, I need to emphasize I am not an expert at decyphering these. Rather, I wanted to provide the information you fellow slackers/hackers/bums need to figure out what the fuck happened.

a. The first line indicates a pointer with a NULL value.

 > BUG: unable to handle kernel NULL pointer dereference at (null)

b. IP is the instruction Pointer

 > IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21

c. Oops: Designates error code value (hex). Each bit designates the following:

 > Oops: 0002 [#1] SMP  

 > bit 0 == 0 means no page found, 1 means a protection fault 

   bit 1 == 0 means read, 1 means write 

   bit 2 == 0 means kernel, 1 means user-mode 

   [#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as a    

      cascading effect of the first one. 

d.  CPU 1 > Which CPU the error occurred (On the XEON systems we tested on I swear we would have PAGES of these call traces its ridiculous).

e. Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64

 > PID: the process ID of the action performed

 > comm: insmod >> the command performed when shit hit the fan

 > Tainted: P

 as defined in kernel/panic.c :

  P — Proprietary module has been loaded. 

  F — Module has been forcibly loaded. 

  S — SMP with a CPU not designed for SMP. 

  R — User forced a module unload. 

  M — System experienced a machine check exception. 

  B — System has hit bad_page. 

  U — Userspace-defined naughtiness. 

  A — ACPI table overridden. 

  W — Taint on warning. 

 > 2.6.33.3-85.fc13.x86_64: the kernel utilized when oops occured

 So Tainted:P means we most likely loaded a proprietary module (even though this  was just a sample module written exclusively to crash--bear with me folks). Note: if you ever see a tainted kernel due to P, this is most likely due to a closed source module, and if you seek help from the community they will most likely point you to the software developer/ hardware mfg of the module/driver.

f. RIP is the CPU register containing the address of the instruction executed.

 > RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 

 > 0010 is the code segment register

 > my_oops_init+0x12/0x21 is the <symbol> + the offset length

g. The following is a dump of the listed CPU registers:
 > 
RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292 

RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7 

RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 

RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004 

R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000 

R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010

.

.

h. The following is a stack trace:

 > 

Stack: 

ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060 

 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9 

 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000

i. And now comes the call trace!

> 

Call Trace: 

[<ffffffff8100205f>] do_one_initcall+0x59/0x154 

[<ffffffff8107aac9>] sys_init_module+0xd1/0x230 

[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
 
> these are the functions being performed prior to the oops

Now, to really get into the meat of this, you will need a debuger. I am not very skilled at using these (yet!), however with time I will make a 2nd blog post outlineing how to figure out what the fuck is going on. 

Note: some tutorials will want you to use fancy utilities like crash, or kdump. Now, I have configured these and used these before, although I found the best way to truly capture these is to have an active serial port connection to a 2nd computer, that is actively recording the contents of your dump. (I would use uucp--which I can explain in another blog post. The recording is quite simple, just redirect output to a file, and in another terminal watch everything crash and burn.) Sure, it seems like a waste of a 2nd compy but in all honesty sometimes these utilities fail or the damn contents of the dump will become corrupted depending on how severe things became.

Thanks goes to Surya Prabhakar, whom I shamelessly borrowed most of this information from (some from memory but hey I needed a refresher and he did a good job of outlining whats going on).

Further reading:

a. Dedoimedo; http://www.dedoimedo.com/computers/crash-analyze.html#mozTocId45838 

b. Linuxforu: Understanding a Kernel Oops!; http://www.linuxforu.com/2011/01/understanding-a-kernel-oops/

SCSI subsystem initialized

Friday, May 17, 2013

Deciphering a Linux Call Trace, Part 1

No comments:

Post a Comment