段错误

  前些日子深信服面试,面试官问到了如何调试段错误,一时还真不知道如何回答。虽然偶尔会遇到段错误,但都是程序运行提示段错误后回去修改代码,而没有深刻去了解。html

段错误是什么?

  参考维基百科,段错误的一个比较完整的定义以下:java

In computing, a segmentation fault (often shortened to segfault) or access violation is a fault raised by hardware with memory protection, notifying an operating system (OS) about a memory access violation; on x86 computers this is a form of general protection fault. In short, a segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (e.g., attempts to write to a read-only location, or to overwrite part of the operating system). Systems based on processors like the Motorola 68000 tend to refer to these events as Address or Bus errors.

On Unix-like operating systems, a process that accesses invalid memory receives the SIGSEGV signal. On Microsoft Windows, a process that accesses invalid memory receives the STATUS_ACCESS_VIOLATION exception.

  另外,维基百科还总结了一些引发段错误的典型缘由:linux

The following are some typical causes of a segmentation fault:
  1. Dereferencing null pointers – this is special-cased by memory management hardware
  2. Attempting to access a nonexistent memory address (outside process's address space)
  3. Attempting to access memory the program does not have rights to (such as kernel structures in process context)
  4. Attempting to write read-only memory (such as code segment)

These in turn are often caused by programming errors that result in invalid memory access:   1. Dereferencing or assigning to an uninitialized pointer (wild pointer, which points to a random memory address)   2. Dereferencing or assigning to a freed pointer (dangling pointer, which points to memory that has been freed/deallocated/deleted)   3. A buffer overflow   4. A stack overflow   5. Attempting to execute a program that does not compile correctly. (Some compilers will output an executable file despite the presence of compile-time errors.)

如何调试段错误?

  该部分主要参考自博文你的java/c/c++程序崩溃了?揭秘段错误(Segmentation fault)(3)c++

问题代码

  做为例子的代码以下:面试

 1 // stack.c
 2 #include "stdio.h"
 3 #include "string.h"
 4 #include "stdlib.h"
 5 
 6 
 7 int main(int argc,char** args) {
 8     char * p = NULL;
 9     *p = 0x0;
10 }

  程序运行结果以下:dom

  这里写图片描述

找出问题

第1步 strace 查信号描述ide

strace -i -x -o segfault.txt ./segfault.o

  获得以下信息: 
  这里写图片描述fetch

  能够知道:this

1.错误信号:SIGSEGV 
3.错误码:SEGV_MAPERR 
3.错误内存地址:0x0 
4.逻辑地址0x400507处出错.spa

  能够猜想:

程序中有空指针访问试图向0x0写入而引起段错误.

  关于strace使用可参考博文 Linux strace 命令

第2步 dmesg 查错误现场

dmesg

  获得: 
  这里写图片描述

  可知:

1.错误类型:segfault ,即段错误(Segmentation Fault). 
2.出错时ip:0x400507 
3.错误号:6,即110

第3步 收集已知结论

  这里 错误号和ip 是关键,错误号对照下面:

    /*
     * Page fault error code bits:
     *
     *   bit 0 ==    0: no page found   1: protection fault
     *   bit 1 ==    0: read access     1: write access
     *   bit 2 ==    0: kernel-mode access  1: user-mode access
     *   bit 3 ==               1: use of reserved bit detected
     *   bit 4 ==               1: fault was an instruction fetch
     */
    /*enum x86_pf_error_code {

        PF_PROT     =       1 << 0,
        PF_WRITE    =       1 << 1,
        PF_USER     =       1 << 2,
        PF_RSVD     =       1 << 3,
        PF_INSTR    =       1 << 4,
    };*/

  对照后可知:

错误号6 = 110 = (PF_USER | PF_WIRTE | 0). 
即“用户态”、“写入型页错误 ”、“没有与指定的地址相对应的页”.

  上面的信息与咱们最初的推断吻合.

  如今,对目前已知结论进行归纳以下:

1.错误类型:segfualt ,即段错误(Segmentation Fault).

2.出错时ip:0x400507

3.错误号:6,即110

4.错误码:SEGV_MAPERR 即地址没有映射到对象.

5.错误缘由:对0x0进行写操做引起了段错误,缘由是0x0没有与之对应的页或者叫映射.

第4步 根据结论找到出错代码

gdb ./segfault.o

  根据结论中的ip = 0x400507当即获得:

  这里写图片描述

  显然,这验证了咱们的结论:

咱们试图将值0x0写入地址0x0从而引起写入未映射的地址的段错误.

  这里写图片描述

  而且咱们找到了错误的代码stack.c的第9行。

调试 Core Dump

  除了以上提到的方法,咱们还能够经过调试 Core Dump 来肯定错误代码:

  

 

  关于 Core Dump 的详细,可参考博文 Linux Core Dump

参考资料

  你的java/c/c++程序崩溃了?揭秘段错误(Segmentation fault)(1)

  你的java/c/c++程序崩溃了?揭秘段错误(Segmentation fault)(2)

  你的java/c/c++程序崩溃了?揭秘段错误(Segmentation fault)(3)

  Linux环境下段错误的产生缘由及调试方法小结

相关文章
相关标签/搜索