Linux内核系列:ELF文件格式

ELF(Executable Linkable Format)分类

  • 可重定位文件:包含代码和数据,可以被用来链接成可执行文件或共享目标文件,静态链接库也可以归为这一类。
  • 可执行文件:包含可以执行的程序,代表ELF可执行文件,没有扩展名。
  • 共享目标文件:包含代码和数据,可以在两种情况下使用
    • 链接器可以使用这种文件跟其他可重定位文件和共享目标文件链接,生成新的目标文件。
    • 动态链接器将几个共享目标文件与可执行文件结合,作为进程映像的一部分来运行。
  • 核心转储文件(core dump):进程意外终止,系统可以为该进程地址空间的内存及终止是的一些其他信息转储到核心转储文件。

目标文件组成

目标文件不仅包含编译后的代码,数据,还有链接时所需要的一些信息,比如符号表,调试信息,字符串等。目标文件将这些信息按不同的属性,以section(节)的形式存储,有时候也叫segment(段),在一般情况下,它们表示一个一定长度的区域,基本不加以区别。

  • 源代码的机器指令放在代码段.text.
  • 初始化的全局变量和局部静态变量放在.data段。
  • 未初始化的全局变量和局部静态变量放在.bss段。
  • ELF文件开头是一个文件头,用来母爱书整个文件的文件属性,文件是否可执行,静态或动态链接等等。
  • 文件头包含一个段表,段表是一个描述文件中各个段的数组,包括文件中各个段在文件中的偏移位置和段的属性等等。
  • bss段只是为未初始化的全局变量和局部静态变量预留位置而已,并没有内容,也不占据空间.
1
2
3
4
5
6
7
8
9
#include <iostream>
using namespace std;
void func(){
cout << "yes" << endl;
}
int main(){
func();
return 0;
}

使用objdump -h main main.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
main:     file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 0000000000400238 0000000000400238 00000238 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 0000000000400254 0000000000400254 00000254 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .note.gnu.build-id 00000024 0000000000400274 0000000000400274 00000274 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .gnu.hash 00000030 0000000000400298 0000000000400298 00000298 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .dynsym 00000138 00000000004002c8 00000000004002c8 000002c8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynstr 00000168 0000000000400400 0000000000400400 00000400 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .gnu.version 0000001a 0000000000400568 0000000000400568 00000568 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .gnu.version_r 00000040 0000000000400588 0000000000400588 00000588 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .rela.dyn 00000030 00000000004005c8 00000000004005c8 000005c8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .rela.plt 000000a8 00000000004005f8 00000000004005f8 000005f8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
10 .init 0000001a 00000000004006a0 00000000004006a0 000006a0 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
11 .plt 00000080 00000000004006c0 00000000004006c0 000006c0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .plt.got 00000008 0000000000400740 0000000000400740 00000740 2**3
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .text 000001f2 0000000000400750 0000000000400750 00000750 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
14 .fini 00000009 0000000000400944 0000000000400944 00000944 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
15 .rodata 00000008 0000000000400950 0000000000400950 00000950 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
16 .eh_frame_hdr 0000004c 0000000000400958 0000000000400958 00000958 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
17 .eh_frame 00000154 00000000004009a8 00000000004009a8 000009a8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
18 .init_array 00000010 0000000000600df8 0000000000600df8 00000df8 2**3
CONTENTS, ALLOC, LOAD, DATA
19 .fini_array 00000008 0000000000600e08 0000000000600e08 00000e08 2**3
CONTENTS, ALLOC, LOAD, DATA
20 .jcr 00000008 0000000000600e10 0000000000600e10 00000e10 2**3
CONTENTS, ALLOC, LOAD, DATA
21 .dynamic 000001e0 0000000000600e18 0000000000600e18 00000e18 2**3
CONTENTS, ALLOC, LOAD, DATA
22 .got 00000008 0000000000600ff8 0000000000600ff8 00000ff8 2**3
CONTENTS, ALLOC, LOAD, DATA
23 .got.plt 00000050 0000000000601000 0000000000601000 00001000 2**3
CONTENTS, ALLOC, LOAD, DATA
24 .data 00000010 0000000000601050 0000000000601050 00001050 2**3
CONTENTS, ALLOC, LOAD, DATA
25 .bss 00000118 0000000000601060 0000000000601060 00001060 2**5
ALLOC
26 .comment 00000035 0000000000000000 0000000000000000 00001060 2**0
CONTENTS, READONLY

其中CONTENT表示该段在文件中存在。比较重要的是.text,.data, .rodata和 .comment段。
其他段见下表,摘自《程序员的自我修养》

  • 自定义段:GCC提供了扩展机制,可以指定变量所处的段。
    1
    __attribute__((section("foo"))) int a = 29;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
main:     file format elf64-x86-64

Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 0000000000400238 0000000000400238 00000238 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 0000000000400254 0000000000400254 00000254 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .note.gnu.build-id 00000024 0000000000400274 0000000000400274 00000274 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .gnu.hash 0000001c 0000000000400298 0000000000400298 00000298 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .dynsym 00000048 00000000004002b8 00000000004002b8 000002b8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynstr 00000038 0000000000400300 0000000000400300 00000300 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .gnu.version 00000006 0000000000400338 0000000000400338 00000338 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .gnu.version_r 00000020 0000000000400340 0000000000400340 00000340 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .rela.dyn 00000018 0000000000400360 0000000000400360 00000360 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .rela.plt 00000018 0000000000400378 0000000000400378 00000378 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
10 .init 0000001a 0000000000400390 0000000000400390 00000390 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
11 .plt 00000020 00000000004003b0 00000000004003b0 000003b0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .plt.got 00000008 00000000004003d0 00000000004003d0 000003d0 2**3
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .text 000001a2 00000000004003e0 00000000004003e0 000003e0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
14 .fini 00000009 0000000000400584 0000000000400584 00000584 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
15 .rodata 00000004 0000000000400590 0000000000400590 00000590 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
16 .eh_frame_hdr 0000003c 0000000000400594 0000000000400594 00000594 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
17 .eh_frame 00000114 00000000004005d0 00000000004005d0 000005d0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
18 .init_array 00000008 0000000000600e10 0000000000600e10 00000e10 2**3
CONTENTS, ALLOC, LOAD, DATA
19 .fini_array 00000008 0000000000600e18 0000000000600e18 00000e18 2**3
CONTENTS, ALLOC, LOAD, DATA
20 .jcr 00000008 0000000000600e20 0000000000600e20 00000e20 2**3
CONTENTS, ALLOC, LOAD, DATA
21 .dynamic 000001d0 0000000000600e28 0000000000600e28 00000e28 2**3
CONTENTS, ALLOC, LOAD, DATA
22 .got 00000008 0000000000600ff8 0000000000600ff8 00000ff8 2**3
CONTENTS, ALLOC, LOAD, DATA
23 .got.plt 00000020 0000000000601000 0000000000601000 00001000 2**3
CONTENTS, ALLOC, LOAD, DATA
24 .data 00000010 0000000000601020 0000000000601020 00001020 2**3
CONTENTS, ALLOC, LOAD, DATA
25 foo 00000004 0000000000601030 0000000000601030 00001030 2**2
CONTENTS, ALLOC, LOAD, DATA
26 .bss 00000004 0000000000601034 0000000000601034 00001034 2**0
ALLOC
27 .comment 00000035 0000000000000000 0000000000000000 00001034 2**0
CONTENTS, READONLY

使用readelf -h main

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4003e0
Start of program headers: 64 (bytes into file)
Start of section headers: 6664 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 9
Size of section headers: 64 (bytes)
Number of section headers: 32
Section header string table index: 29

上述为ELF文件头存储的信息。

  • magic:魔数,用来确定文件类型,操作系统在加载可执行文件的时候会确定魔数是否正确。
    使用

    1
    readelf -S main

    查看文件段表的内容

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    There are 32 section headers, starting at offset 0x1a08:

    Section Headers:
    [Nr] Name Type Address Offset
    Size EntSize Flags Link Info Align
    [ 0] NULL 0000000000000000 00000000
    0000000000000000 0000000000000000 0 0 0
    [ 1] .interp PROGBITS 0000000000400238 00000238
    000000000000001c 0000000000000000 A 0 0 1
    [ 2] .note.ABI-tag NOTE 0000000000400254 00000254
    0000000000000020 0000000000000000 A 0 0 4
    [ 3] .note.gnu.build-i NOTE 0000000000400274 00000274
    0000000000000024 0000000000000000 A 0 0 4
    [ 4] .gnu.hash GNU_HASH 0000000000400298 00000298
    000000000000001c 0000000000000000 A 5 0 8
    [ 5] .dynsym DYNSYM 00000000004002b8 000002b8
    0000000000000048 0000000000000018 A 6 1 8
    [ 6] .dynstr STRTAB 0000000000400300 00000300
    0000000000000038 0000000000000000 A 0 0 1
    [ 7] .gnu.version VERSYM 0000000000400338 00000338
    0000000000000006 0000000000000002 A 5 0 2
    [ 8] .gnu.version_r VERNEED 0000000000400340 00000340
    0000000000000020 0000000000000000 A 6 1 8
    [ 9] .rela.dyn RELA 0000000000400360 00000360
    0000000000000018 0000000000000018 A 5 0 8
    [10] .rela.plt RELA 0000000000400378 00000378
    0000000000000018 0000000000000018 AI 5 24 8
    [11] .init PROGBITS 0000000000400390 00000390
    000000000000001a 0000000000000000 AX 0 0 4
    [12] .plt PROGBITS 00000000004003b0 000003b0
    0000000000000020 0000000000000010 AX 0 0 16
    [13] .plt.got PROGBITS 00000000004003d0 000003d0
    0000000000000008 0000000000000000 AX 0 0 8
    [14] .text PROGBITS 00000000004003e0 000003e0
    00000000000001a2 0000000000000000 AX 0 0 16
    [15] .fini PROGBITS 0000000000400584 00000584
    0000000000000009 0000000000000000 AX 0 0 4
    [16] .rodata PROGBITS 0000000000400590 00000590
    0000000000000004 0000000000000004 AM 0 0 4
    [17] .eh_frame_hdr PROGBITS 0000000000400594 00000594
    000000000000003c 0000000000000000 A 0 0 4
    [18] .eh_frame PROGBITS 00000000004005d0 000005d0
    0000000000000114 0000000000000000 A 0 0 8
    [19] .init_array INIT_ARRAY 0000000000600e10 00000e10
    0000000000000008 0000000000000000 WA 0 0 8
    [20] .fini_array FINI_ARRAY 0000000000600e18 00000e18
    0000000000000008 0000000000000000 WA 0 0 8
    [21] .jcr PROGBITS 0000000000600e20 00000e20
    0000000000000008 0000000000000000 WA 0 0 8
    [22] .dynamic DYNAMIC 0000000000600e28 00000e28
    00000000000001d0 0000000000000010 WA 6 0 8
    [23] .got PROGBITS 0000000000600ff8 00000ff8
    0000000000000008 0000000000000008 WA 0 0 8
    [24] .got.plt PROGBITS 0000000000601000 00001000
    0000000000000020 0000000000000008 WA 0 0 8
    [25] .data PROGBITS 0000000000601020 00001020
    0000000000000010 0000000000000000 WA 0 0 8
    [26] foo PROGBITS 0000000000601030 00001030
    0000000000000004 0000000000000000 WA 0 0 4
    [27] .bss NOBITS 0000000000601034 00001034
    0000000000000004 0000000000000000 WA 0 0 1
    [28] .comment PROGBITS 0000000000000000 00001034
    0000000000000035 0000000000000001 MS 0 0 1
    [29] .shstrtab STRTAB 0000000000000000 000018f5
    0000000000000110 0000000000000000 0 0 1
    [30] .symtab SYMTAB 0000000000000000 00001070
    0000000000000678 0000000000000018 31 48 8
    [31] .strtab STRTAB 0000000000000000 000016e8
    000000000000020d 0000000000000000 0 0 1
    Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)

    其中比较重要的是.rela.dyn(重定向表).rela.plt(重定向表的信息)。

  • 字符串表:集中存放然后使用偏移量来引用。

  • 符号:链接的接口,目标文件之间相互拼合实际上是目标文件之间对地址的引用,即对函数和变量的地址的引用。使用nm查看符号。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    0000000000601030 D a
    0000000000601034 B __bss_start
    0000000000601034 b completed.7594
    0000000000601020 D __data_start
    0000000000601020 W data_start
    0000000000400410 t deregister_tm_clones
    0000000000400490 t __do_global_dtors_aux
    0000000000600e18 t __do_global_dtors_aux_fini_array_entry
    0000000000601028 D __dso_handle
    0000000000600e28 d _DYNAMIC
    0000000000601034 D _edata
    0000000000601038 B _end
    0000000000400584 T _fini
    00000000004004b0 t frame_dummy
    0000000000600e10 t __frame_dummy_init_array_entry
    00000000004006e0 r __FRAME_END__
    0000000000601000 d _GLOBAL_OFFSET_TABLE_
    w __gmon_start__
    0000000000400594 r __GNU_EH_FRAME_HDR
    0000000000400390 T _init
    0000000000600e18 t __init_array_end
    0000000000600e10 t __init_array_start
    0000000000400590 R _IO_stdin_used
    w _ITM_deregisterTMCloneTable
    w _ITM_registerTMCloneTable
    0000000000600e20 d __JCR_END__
    0000000000600e20 d __JCR_LIST__
    w _Jv_RegisterClasses
    0000000000400580 T __libc_csu_fini
    0000000000400510 T __libc_csu_init
    U __libc_start_main@@GLIBC_2.2.5
    00000000004004ea T main
    0000000000400450 t register_tm_clones
    00000000004003e0 T _start
    0000000000601030 D __TMC_END__
    00000000004004d6 T _Z4funcii

强符号和弱符号

  • 强符号:函数和初始化的全局变量
  • 弱符号:未初始化的全局变量,也可以通过GCC的 __attribute ((weak)) 来定义一个强符号为弱符号。

程序指令和数据为什么分开放

  • 程序被装载之后,数据和指令分别被映射到两个虚存区域,数据区可读可写,指令区只读。
  • 指令区和数据区的分离有利于提高程序的局部性,提高缓存命中率。
  • 系统中运行多个程序的副本时,指令都相同,内存中需要保存一份该程序的指令部分。