Friday, May 17, 2013

Deciphering a Linux Call Trace, Part 1

Deciphering a Linux Call Trace (aka Crash Dump)

While working at Intel, my primary responsibility was finding bugs in the then prototype C600 SAS/SATA RAID chipset, linux kernel driver. What this basically meant was setup a huge storage configuration and do my best to break the living shit out of it. This wasn't usually hard, as when I started we could barely support 2 level Expander attached storage configurations, where SAS was the only option (SATA support came later). Nonetheless this was a ton of fun as it meant I got to play with linux every day I was at work, and better yet I was getting paid for it. (Note, I was a green badge, which means contract employee--though still very cool.)

Our performance was measured by how many bugs we could find in the driver. Although I'm sure the dev's hated me every time I walked over to their desk, or bombarded them with emails, I got to have a very good relationship with a few of them. This was about the time I realized my aspirations to contribute to the Linux open source community.

So, basically we would setup a configuration, using SAS/SATA expanders, and say fill one with 12 SAS drives, and use PHY > table routing E1 | Subtractive Routing > E2; to a 2nd expander, filled with say another 12 SAS drives (The exact routing configuration used depends on what the expander supports, although eventually the driver could support various configurations). I would create oh lets say 4 raid 5 arrays (6 disks each), and run IO with a variety of tools. (We primarily used JDSU's Medusa labs. JDSU laid a lot of fiber in late 90's - 2000's.)

The gold is when, after running IO for a period of time, the system would eventually crash. Unfortunately I don't have any of my old logs with me (and I'm sure Intel will be pissed if I shared them, considering I've shared enough already), but you will see something like this:

(example of a call trace, from a module written explicitly to crash)

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
PGD 7a719067 PUD 7b2b3067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/virtual/misc/kvm/uevent
CPU 1
Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64
RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
FS:  00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0)
Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
RIP  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP <ffff88007ad4bf08>
CR2: 0000000000000000
 
Now, I need to emphasize I am not an expert at decyphering these. Rather, I wanted to provide the information you fellow slackers/hackers/bums need to figure out what the fuck happened.
 
a. The first line indicates a pointer with a NULL value.
 > BUG: unable to handle kernel NULL pointer dereference at (null)
 
b. IP is the instruction Pointer
 > IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21
 
c. Oops: Designates error code value (hex). Each bit designates the following:
 > Oops: 0002 [#1] SMP  
 > bit 0 == 0 means no page found, 1 means a protection fault 
   bit 1 == 0 means read, 1 means write 
   bit 2 == 0 means kernel, 1 means user-mode 
   [#1] — this value is the number of times the Oops occurred. Multiple Oops can be triggered as a    
      cascading effect of the first one. 
 
d.  CPU 1 > Which CPU the error occurred (On the XEON systems we tested on I swear we would have PAGES of these call traces its ridiculous).

e. Pid: 2248, comm: insmod Tainted: P           2.6.33.3-85.fc13.x86_64
 > PID: the process ID of the action performed
 > comm: insmod >> the command performed when shit hit the fan
 > Tainted: P
 as defined in kernel/panic.c :
  P — Proprietary module has been loaded. 
  F — Module has been forcibly loaded. 
  S — SMP with a CPU not designed for SMP. 
  R — User forced a module unload. 
  M — System experienced a machine check exception. 
  B — System has hit bad_page. 
  U — Userspace-defined naughtiness. 
  A — ACPI table overridden. 
  W — Taint on warning. 
 > 2.6.33.3-85.fc13.x86_64: the kernel utilized when oops occured
 
 So Tainted:P means we most likely loaded a proprietary module (even though this  was just a sample module written exclusively to crash--bear with me folks). Note: if you ever see a tainted kernel due to P, this is most likely due to a closed source module, and if you seek help from the community they will most likely point you to the software developer/ hardware mfg of the module/driver.
 
f. RIP is the CPU register containing the address of the instruction executed.
 > RIP: 0010:[<ffffffffa03e1012>]  [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops] 
 > 0010 is the code segment register
 > my_oops_init+0x12/0x21 is the <symbol> + the offset length
 
g. The following is a dump of the listed CPU registers:
 >
RSP: 0018:ffff88007ad4bf08  EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
.
.
 
h. The following is a stack trace:
 > 
Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
 0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
 ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
 
i. And now comes the call trace!
>
Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
 
> these are the functions being performed prior to the oops
 
 
Now, to really get into the meat of this, you will need a debuger. I am not very skilled at using these (yet!), however with time I will make a 2nd blog post outlineing how to figure out what the fuck is going on. 
 
Note: some tutorials will want you to use fancy utilities like crash, or kdump. Now, I have configured these and used these before, although I found the best way to truly capture these is to have an active serial port connection to a 2nd computer, that is actively recording the contents of your dump. (I would use uucp--which I can explain in another blog post. The recording is quite simple, just redirect output to a file, and in another terminal watch everything crash and burn.) Sure, it seems like a waste of a 2nd compy but in all honesty sometimes these utilities fail or the damn contents of the dump will become corrupted depending on how severe things became.
 
Thanks goes to , whom I shamelessly borrowed most of this information from (some from memory but hey I needed a refresher and he did a good job of outlining whats going on).
 
Further reading:
 
 
b. Linuxforu: Understanding a Kernel Oops!; http://www.linuxforu.com/2011/01/understanding-a-kernel-oops/
 
 

No comments:

Post a Comment