Linux x86 程序启动-怎么执行到main方法 - Linux x86 Program Start Up or-How the heck do we get to main [翻译][注解版]

2022-04-10/2022-05-30 | 0 评论 | 1,295 浏览

你也可以以知乎阅读此文: https://zhuanlan.zhihu.com/p/521205296

原文信息

原标题: Linux x86 Program Start Up or - How the heck do we get to main()? by Patrick Horgan
作者: Patrick Horgan
原文链接: # Linux x86 Program Start Up or - How the heck do we get to main() ? by Patrick Horgan

译文如下

Who's this for?

这是写给那些想了解程序如何在linux下加载的人的。它特别讨论了动态加载的x86 ELF文件。您学习到的信息将使您了解如何在主程序(main方法 - 泽者注)启动之前调试程序中出现的问题。我告诉你的一切都是真的，但有些事情会被掩盖(有些细节不会被特别说明,因此无关紧要 - 译者注)，因为它们不会让我们朝着目标前进。此外，如果您静态链接，一些细节将不同。这个我就不讲了。当你完成这些的时候，如果你需要的话，你就会知道足够的东西来自己弄清楚。

This is what we'll cover (pretty picture brought to you by dot - filter for drawing directed graphs)

原文: 这就是我们将要讨论的(用 dot-filter绘制有向图给你带来的漂亮图片)

等我们完事了，你会明白的。

How did we get to main?

我们是怎样到达主方法main的

我们将构建最简单的C程序, 一个空的主方法, 然后我们会看看它的反汇编，看看我们是如何进入主方法的。我们将看到，第一个运行的是一个链接到每个名为 _start 的程序的函数，它最终会导致程序的main被运行。

int
main()
{
}

如果需要，可以将其保存为prog1.c，然后继续。我要做的第一件事就是像这样构建它。

gcc -ggdb -o prog1 prog1.c

在我们尝试在gdb中调试这个的新版本(prog2)之前，我们将看看它的反汇编，并了解一些有关程序如何启动的知识。我将显示objdump -d prog1的输出，但我不打算显示它被 objdump 转储的顺序，而是显示它被执行的顺序。(但你完全可以自己 dump 它。类似于objdump -d prog1 >prog1。dump会为你保存一个副本，然后你可以使用你喜欢的编辑器查看它。注意:这句俏皮话最初是指《真正的男人》，因为当我还是一个年轻的程序员时，这是一个很流行的幽默用法。有人(感谢芳曼)反对，我想了想，同意了。当前上下文的读者不知道在我的脑海里,我总是试图让人们看到,我们需要更多的女性在理工科和这部分有一个普遍的问题是无意识的性别偏见在理工科使它对女性不友好。(因为我在这里是无意识的。)现在我们回到常规的指导环节。

But first, how do we get to _start?

但首先,我们怎样到达 _start (执行入口)

当你运行一个程序时，shell或gui调用execve()，它会执行linux系统调用execve()。如果你想要更多关于execve()的信息，你可以简单地在shell中输入man execve。它将来自手册的第2部分，所有的系统调用都在这里。总之，它将为您建立一个堆栈，并将argc、argv和envp推入其中。 文件描述0、1和2 (stdin、stdout、stderr)，保留shell为它们设置的任何值。加载器为你做了很多设置重定位的工作，我们稍后会看到，调用你的初始化器。当一切就绪时，通过调用 _start() 将控制权交给程序。

_start is, oddly enough, where we start

说来也奇怪 _start 是我们开始的地方

080482e0 <_start>:
80482e0:       31 ed                   xor    %ebp,%ebp
80482e2:       5e                      pop    %esi
80482e3:       89 e1                   mov    %esp,%ecx  ; tempory save pointer to argv array
80482e5:       83 e4 f0                and    $0xfffffff0,%esp
80482e8:       50                      push   %eax         ; push a padding data to stack
80482e9:       54                      push   %esp	   ; arg_6 pointer to stanck_end
80482ea:       52                      push   %edx         ; arg_5 func p  to: void rtld_fini()
80482eb:       68 00 84 04 08          push   $0x8048400   ; arg_4 func pointer to: void fini()
80482f0:       68 a0 83 04 08          push   $0x80483a0   ; arg_3 func pointer to: void init()
80482f5:       51                      push   %ecx	   ; arg_2 ubp_av ( a pointer to argv ) 
80482f6:       56                      push   %esi	   ; arg_1 argc
80482f7:       68 94 83 04 08          push   $0x8048394   ; arg_0 main address
80482fc:       e8 c3 ff ff ff          call   80482c4 <__libc_start_main@plt>
8048301:       f4                      hlt

xor将所有与自己亦或的变量设为0。所以 xor %ebp，%ebp 将 %ebp 设为0。这是ABI(应用程序二进制接口规范)建议的，用来标记最外层的帧。接下来，我们弹出栈顶(元素: (%esp) -> %esi)。在栈入口上，我们有 argc, argv 和 envp ，所以 pop 让 argc 进入 %esi 。我们将会保存它，然后稍后立即把它推回到栈中。因为我们弹出 argc， %esp 现在指向 argv。mov 将 argv 放到 %ecx 中而不移动堆栈指针(译者注:这里argv是指argv数组,与一般的main函数里面的argv还不一样.一般的main函数里面的argv是一个指向argv[][]的指针.因此这里才有为什么是直接取%esp的值,而不是取(%esp),或者使用intel风格的汇编的: [esp],具体可以看下面的附注说明 )。然后我们使栈指针与掩码相与来清除低四位。根据与操作前堆栈指针的位置，它将向下移动0到15字节。在任何情况下，它将使其对齐为16字节的偶数倍。这种对齐是为了使所有的堆栈变量都能够很好地对齐，以提高内存和缓存效率，特别是SSE(流式SIMD扩展)，指令可以同时工作在单精度浮点向量上。在特定的运行中，%esp在 _start入口上是 0xbffff770 (注: 这段代码是相对于32位 i386 的程序,下面给出的附注是X86_64环境下得到的地址:0x7fffffffdd50)。当我们弹出 argc栈后，%esp是 0xbffff774。它向上移动到更高的地址(在堆栈上放置东西时内存中向下移动，取出东西时内存中向上移动)。与操作之后，堆栈指针回到 0xbffff770。

译者注:

为了方便大家理解程序在启动时传入的参数内存布局结构. 这里另外写了一个具体的demo程序进行实际调试,并把启动时的程序运行堆栈给可视化的画了出来.具体如下:
#include <stdio.h>

int main(int argc , char** argv){
    printf("hello world!");
}
然后编译(gcc -g -o a.out main.c)后使用 gdb进行调试: gdb a.out

设置启动参数: (gdb) set args hello world这样, 在运行程序时. 会传入两个参数;

等价于使用如下命令启动运行: ./a.out "hello" "world"

使用命令:b _start 在程序入口处加上断点.然后在断点处读取程序寄存器值: 命令: i r 此时得到的寄存器 rsp的值为: 0x7fffffffdd50. 这个是栈顶的地址. 查看此内存地址的值如下:
(gdb) x/8xg 0x7fffffffdd50
0x7fffffffdd50:	0x0000000000000003	0x00007fffffffe117
0x7fffffffdd60:	0x00007fffffffe140	0x00007fffffffe146
0x7fffffffdd70:	0x0000000000000000	0x00007fffffffe14d
0x7fffffffdd80:	0x00007fffffffe158	0x00007fffffffe16a
如下图中的栈地址布局所示. Intel 芯片的栈是往低地址生长的.可知栈顶向栈内是地址增大的.(注: 生长即为压栈, 地址越压越低. 反过来栈内数据的地址是越来越大. 因此以上内存地址中的数据就是栈顶中相应的每一个slot. 一个slot是8字节. 这里是在 64位Ubuntu系统中测试得到的)

需要特别说明的是: 在上图中有一段注释,此段注释来自于 glibc的源代码. 同时这段代码的运行上下文是一个 X86_64环境的运行截图. 因此与上文中所说的地址位置与长度都略有不同.

Now set up for calling __libc_start_main

现在我们开始把 __libc_start_main 的参数推到堆栈上。第一个，%eax 是垃圾推送到堆栈上，因为有7个东西(注: 参数)要推送到堆栈上，它们需要第8个来保持16字节对齐。它从来没用过。__libc_start_main 被链接到 glibc中。在 glibc的源代码树中，它位于csu/lib-start.c中。__libc_start_main 大家是这样定义的:

int __libc_start_main(  int (*main) (int, char * *, char * *),
			    int argc, char * * ubp_av,
			    void (*init) (void),
			    void (*fini) (void),
			    void (*rtld_fini) (void),
			    void (* stack_end));

因此，我们期望_start在调用__libc_start_main之前，以相反的顺序将这些参数压入堆栈。

原文: So we expect _start to push those arguments on the stack in reverse order before the call to __libc_start_main.

译注: c 调用约定的参数是从右往左入栈.

调用__libc_start_main之前的堆栈内容

value	__libc_start_main arg	content
$eax	Don't know.	Don't care.
%esp	void (*stack_end)	Our aligned stack pointer.
%edx	void (*rtld_fini)(void)	Destructor of dynamic linker from loader passed in %edx.`Registered by __libc_start_main with __cxat_exit()`to call the FINI for dynamic libraries that got loaded before us.
0x8048400	void (*fini)(void)	__libc_csu_fini - Destructor of this program.``Registered by __libc_start_main with __cxat_exit().
0x80483a0	void (*init)(void)	__libc_csu_init, Constructor of this program.``Called by __libc_start_main before main.
%ecx	char **ubp_av	argv off of the stack.
%esi	arcg	argc off of the stack.
0x8048394	int(main)(int, char,char*)	main of our program called by __libc_start_main.``Return value of main is passed to exit() which terminates our program.

__libc_csu_fini 从 glibc 链接到我们的代码中，并在 csu/elf-init.c 的源代码树中。它是我们程序的 <kdb>C级析构函数 </kbd>，我将在白皮书的后面讨论它。
**

原文: __libc_csu_fini is linked into our code from glibc, and lives in the source tree in csu/elf-init.c. It's our program's C level destructor, and I'll look at it later in the white paper.

Hey! Where's the environment variables?

void __libc_init_first(int argc, char *arg0, ...)
{
    char **argv = &arg0, **envp = &argv[argc + 1];
    __environ = envp;
    __libc_init (argc, argv, envp);
}

你注意到我们没有从堆栈中获取环境变量的指针envp吗?它也不是 __libc_start_main 的参数之一。但我们知道 main叫做 int main(int argc, char** argv, char** envp) 那么，发生了什么?

好吧，__libc_start_main 调用 __libc_init_first，后者立即使用秘密的内部信息来查找环境变量，就在实参vector的终止null之后，然后设置一个全局变量 __environ，此后 __libc_start_main在需要时使用它，包括调用 main 时。在 envp 建立之后，__libc_start_main 使用相同的技巧, 就在 envp 数组末尾结束的 null 后面，还有另一个向量，即加载器用来向进程传递一些信息的 ELF辅助向量。查看其中内容的一种简单方法是在运行程序之前设置环境变量 LD_SHOW_AUXV=1。这是prog1的结果。

$ LD_SHOW_AUXV=1 ./prog1
AT_SYSINFO:      0xe62414
AT_SYSINFO_EHDR: 0xe62000
AT_HWCAP:    fpu vme de pse tsc msr pae mce cx8 apic
             mtrr pge mca cmov pat pse36 clflush dts
             acpi mmx fxsr sse sse2 ss ht tm pbe
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0x8048034
AT_PHENT:        32
AT_PHNUM:        8
AT_BASE:         0x686000
AT_FLAGS:        0x0
AT_ENTRY:        0x80482e0
AT_UID:          1002
AT_EUID:         1002
AT_GID:          1000
AT_EGID:         1000
AT_SECURE:       0
AT_RANDOM:       0xbff09acb
AT_EXECFN:       ./prog1
AT_PLATFORM:     i686

那不是很有趣。各种各样的信息。AT_ENTRY是_start的地址，这是我们的userid，有效userid和groupid。我们知道我们是686,times()频率是100，时钟滴答/秒?我得调查一下。AT_PHDR是ELF程序头的位置，它包含关于程序在内存中的所有段的位置和关于重定位项的信息，以及加载程序需要知道的其他任何信息。AT_PHENT只是报头条目中的字节数。我们现在不打算追查这条路径，因为我们不需要那么多关于文件加载的信息来成为一个有效的程序调试器。

注: 我们的demo程序的用此标记输出如下:

LD_SHOW_AUXV=1 ./a.out
AT_SYSINFO_EHDR: 0x7ffda0a1a000
AT_HWCAP: 1f8bfbff
AT_PAGESZ: 4096
AT_CLKTCK: 100
AT_PHDR: 0x400040
AT_PHENT: 56
AT_PHNUM: 9
AT_BASE: 0x7fc832236000
AT_FLAGS: 0x0
AT_ENTRY: 0x400430
AT_UID: 1000
AT_EUID: 1000
AT_GID: 1000
AT_EGID: 1000
AT_SECURE: 0
AT_RANDOM: 0x7ffda0a0a619
AT_HWCAP2: 0x0
AT_EXECFN: ./a.out
AT_PLATFORM: x86_64
hello world

__libc_start_main in general

这就是我要讲的关于__libc_start_main的细节，但一般来说

用setuid ,setgid 程序处理一些安全问题
启动线程: Starts up threading
注册函数 thefini (our program), 和rtld_fini (run-time loader) arguments to get run byat_exit to run the 用户程序的和加载器(loader) 清理例程 (cleanup routines)
Calls theinit argument
调用main 方法使用参数: argc, argv 和全局的__environ参数(细节如上描述).
调用exit 方法用main方法的返回值;

Calling the init argument

调用 init 函数指针参数

传给 __libc_start_main 函数的 init 参数被设置为 __libc_csu_init，它也被链接到我们的代码中。它是从一个C程序编译而来的，该C程序位于 csu/elf-init.c 中的 glibc源代码树中，并链接到我们的程序中。C代码类似于如下代码(但是有更多的 #ifdefs ).

This is our program's constructor

void
__libc_csu_init (int argc, char **argv, char **envp)
{

  _init ();

  const size_t size = __init_array_end - __init_array_start;
  for (size_t i = 0; i < size; i++)
      (*__init_array_start [i]) (argc, argv, envp);
}

它对程序非常重要，因为它是可执行文件的构造函数。“等等!”你会说，“这不是c++ !”是的，这是真的，但是构造函数和析构函数的概念不属于c++，也不属于c++之前的c++ !我们的可执行文件和其他可执行文件都有一个C级构造函数: __libc_csu_init和一个C级析构函数: __libc_csu_fini。在构造函数内部，正如您将看到的，可执行程序将寻找全局C级构造函数，并调用它找到的任何构造函数。C程序也可能有这些(特性的)，在本文结束之前，我将演示它。如果你觉得更舒服的话，你可以叫它们初始化器和终结器。下面是为 __libc_csu_init生成的汇编程序。

080483a0 <__libc_csu_init>:
 80483a0:       55                      push   %ebp
 80483a1:       89 e5                   mov    %esp,%ebp
 80483a3:       57                      push   %edi
 80483a4:       56                      push   %esi
 80483a5:       53                      push   %ebx
 80483a6:       e8 5a 00 00 00          call   8048405 <__i686.get_pc_thunk.bx>
 80483ab:       81 c3 49 1c 00 00       add    $0x1c49,%ebx
 80483b1:       83 ec 1c                sub    $0x1c,%esp
 80483b4:       e8 bb fe ff ff          call   8048274 <_init>
 80483b9:       8d bb 20 ff ff ff       lea    -0xe0(%ebx),%edi
 80483bf:       8d 83 20 ff ff ff       lea    -0xe0(%ebx),%eax
 80483c5:       29 c7                   sub    %eax,%edi
 80483c7:       c1 ff 02                sar    $0x2,%edi
 80483ca:       85 ff                   test   %edi,%edi
 80483cc:       74 24                   je     80483f2 <__libc_csu_init+0x52>
 80483ce:       31 f6                   xor    %esi,%esi
 80483d0:       8b 45 10                mov    0x10(%ebp),%eax
 80483d3:       89 44 24 08             mov    %eax,0x8(%esp)
 80483d7:       8b 45 0c                mov    0xc(%ebp),%eax
 80483da:       89 44 24 04             mov    %eax,0x4(%esp)
 80483de:       8b 45 08                mov    0x8(%ebp),%eax
 80483e1:       89 04 24                mov    %eax,(%esp)
 80483e4:       ff 94 b3 20 ff ff ff    call   *-0xe0(%ebx,%esi,4)
 80483eb:       83 c6 01                add    $0x1,%esi
 80483ee:       39 fe                   cmp    %edi,%esi
 80483f0:       72 de                   jb     80483d0 <__libc_csu_init+0x30>
 80483f2:       83 c4 1c                add    $0x1c,%esp
 80483f5:       5b                      pop    %ebx
 80483f6:       5e                      pop    %esi
 80483f7:       5f                      pop    %edi
 80483f8:       5d                      pop    %ebp
 80483f9:       c3                      ret

What the heck is a thunk?

没什么好说的，但我觉得你会想看看。get_pc_thunk 有点意思。它用于位置独立的代码。他们设置的位置独立代码能够工作。为了让它工作，基指针需要有 GLOBAL_OFFSET_TABLE的地址。代码是这样的:

push %ebx
call __get_pc_thunk_bx
add  $_GLOBAL_OFFSET_TABLE_,%ebx

__get_pc_thunk_bx:
movel (%esp),%ebx
return

仔细看看发生了什么。对 __get_pc_thunk_bx的调用，像所有其他调用一样，将下一条指令的地址压入堆栈，这样当我们返回时，将继续执行下一条连续指令。在本例中，我们真正想要的是这个地址。

译者注: 这个就像是在动态运行时,要自己定位自己的地址.并保存起来.

所以在 __get_pc_thunk_bx中，我们将返回地址从 栈中复制到 %ebx中。当我们返回时，下一条指令将 _GLOBAL_OFFSET_TABLE_加到它上面，它将解析当前地址和位置无关代码所使用的全局偏移表之间的差值。这个表保存了一组指针，指向我们想要访问的数据，我们只需要知道表中的偏移量。加载程序为我们修复了表中的地址。有一个类似的表用于访问过程。在汇编程序中，用这种方式编程可能很乏味，但你可以只写C或c++，然后把 -pic参数传递给编译器，它会自动完成。在汇编程序中看到这段代码，说明源代码是用 -pic标志编译的。

But what is that loop?

在我们讨论了真正调用_init的 init()调用之后，将会讨论 __libc_csu_init的循环。现在，只要记住它调用了我们程序的任何C级初始化器。

_init gets the call

加载器把控制权交给_start，它调用__libc_start_main，然后调用__libc_csu_init，然后调用_init。

原文: Ok, the loader handed control to _start, who called __libc_start_main who called __libc_csu_init who now calls _init.

译注: 调用过程可以表示为: shell => execve => _start => __libc_start_main => __libc_csu_init => _init

08048274 <_init>:
 8048274:       55                      push   %ebp
 8048275:       89 e5                   mov    %esp,%ebp
 8048277:       53                      push   %ebx
 8048278:       83 ec 04                sub    $0x4,%esp
 804827b:       e8 00 00 00 00          call   8048280 <_init+0xc>
 8048280:       5b                      pop    %ebx
 8048281:       81 c3 74 1d 00 00       add    $0x1d74,%ebx        (.got.plt)
 8048287:       8b 93 fc ff ff ff       mov    -0x4(%ebx),%edx
 804828d:       85 d2                   test   %edx,%edx
 804828f:       74 05                   je     8048296 <_init+0x22>
 8048291:       e8 1e 00 00 00          call   80482b4 <__gmon_start__@plt>
 8048296:       e8 d5 00 00 00          call   8048370 <frame_dummy>
 804829b:       e8 70 01 00 00          call   8048410 <__do_global_ctors_aux>
 80482a0:       58                      pop    %eax
 80482a1:       5b                      pop    %ebx
 80482a2:       c9                      leave
 80482a3:       c3                      ret

It starts with the regular C calling convention

如果您想了解更多关于C调用约定的信息，请参阅GDB的基本汇编调试。简单来说，我们将调用者的基指针(译者注: ebp)保存在栈上，并将基指针指向堆栈的顶部(注: ebp -> stack_top ,只是目前的栈帧的start)，然后为某种4字节的局部存储空间。有趣的是第一个调用(即: call 8048280 <_init+0xc>)。它的目的与我们前面看到的get_pc_thunk调用非常相似。如果仔细看，调用是对下一个连续地址的调用!(注: 函数地址 _init+0xc 刚好是顺序执行的下一个地址.) 这样就可以把你带到下一个地址，就像你刚刚继续一样，但副作用是这个地址现在在栈上了。它弹出到 %ebx中，然后用于设置对全局访问表的访问。

译者注1: 这个 call 8048280 <_init+0xc> 的巧妙之处在于.利用了 call指令来完成对下一条指令的地址的入栈保存. 这样我的代码就可以自定位了. 怎么理解呢. 我的理解大概是这样的: 因为这个 _init方法和全局访问表之间的相对地址是知道的. 也就是相对偏移. 然后下一条指令:

add $0x1d74,%ebx (.got.plt)

它可以把一个相对的固定偏移(0x1d74) 和一个当前的绝对地址( 通过上面的指令动态获取 ) ,这样就可以得到 .got.plt的运行时绝对全局地址.

译者注2: 标准的C调用栈看起来是这样的:

说明: 每一帧的栈帧都是以 EBP开始的. 然后当前栈帧的开始是在 ebp寄存器中. 而栈帧的首槽位是指向上一个栈栈帧的首地址. 这样就把每一个栈帧给串起来了.

译者注3: Test命令将两个操作数进行逻辑与运算，并根据运算结果设置相关的标志位。但是，Test命令的两个操作数不会被改变。运算结果在设置过相关标记位后会被丢弃。参考:test 百度百科

译者注4: JE 指令说明, ZF（“零”标志）等于1 时的条件跳转。参考:JE指令说明

Show me your best profile

原文: Then we grab the address of gmon_start. If it's zero then we don't call it, instead we jump past it. Otherwise, we call it to set up profiling. It runs a routine to start profiling, and calls at_exit to schedule another routine to run later to write gmon.out at the end of execution.

然后我们获取到了gmon_start的地址。如果它是零，我们就不调用它，而是跳过它。否则，我们调用它来设置分析。它运行一个例程来开始分析，并调用at_exit来调度稍后运行的另一个例程来编写gmon。在处决结束时被释放。

This guy's no dummy! He's been framed!

In either case, next we call frame_dummy. The intention is to call __register_frame_info, but frame_dummy is called to set up the arguments to it. The purpose of this is to set up for unwinding stack frames for exception handling. It's interesting, but not a part of this discussion, so I'll leave it for another tutorial perhaps. (Don't be too disappointed, in our case, it doesn't get run anyway.)

在这两种情况下，接下来我们都调用frame_dummy。其目的是调用**__register_frame_info**，但调用frame_dummy是为了设置它的参数。这样做的目的是为异常处理设置展开栈帧。这很有趣，但不是讨论的一部分，所以我将把它留到另一个教程中讨论。(不要太失望，在我们的例子中，它无论如何都不会运行。)

Finally we're getting constructive!

Finally we call _do_global_ctors_aux. If you have a problem with your program that occurs before main starts, this is probably where you'll need to look. Of course, constructors for global C++ objects are put in here but it's possible for other things to be in here as well.

最后我们调用**_do_global_ctors_aux**。如果您的程序在主程序启动之前出现问题，这可能是您需要查看的地方。当然，全局c++对象的构造函数放在这里，但这里也可能有其他东西。

Let's set up an example

Let's modify our prog1 and make a prog2. The exciting part is the attribute ((constructor)) that tells gcc that the linker should stick a pointer to this in the table used by __do_global_ctors_aux. As you can see, our fake constructor gets run. (FUNCTION is filled in by the compiler with the name of the function. It's gcc magic.)

让我们修改prog1并创建一个pro2。令人兴奋的部分是__attribute__((构造函数))，它告诉gcc链接器应该在**__do_global_ctors_aux使用的表中插入一个指向this的指针。如您所见，我们的假构造函数被运行。FUNCTION**由编译器用函数名填充。gcc魔法。)

#include <stdio.h>

void __attribute__ ((constructor)) a_constructor() {
    printf("%s\n", __FUNCTION__);
}

int
main()
{
    printf("%s\n",__FUNCTION__);
}

运行之:

$ ./prog2
a_constructor
main
$

prog2's _init, much the same as prog1

In a minute we'll drop into gdb and see it happen. We'll be going into prog2's _init.

As you can see, the addresses are slightly different than in prog1. The extra bit of data seems to have shifted things 28 bytes. So, there's the name of the two functions, "a_constructor" (14 bytes with null terminator), and "main" (5 bytes with null terminator) and the two format strings, "%s\n" (2*4 bytes with the newline as 1 character and the null terminator), so 14 + 5 + 4 + 4 = 27? Hmmm off by one somewhere. It's just a guess anyway, I didn't go and look. Anyway, we're going to break on the call to __do_global_ctors_aux, and then single step and watch what happens.

稍后，我们将进入gdb并查看它的运行情况。我们会讲到prog2的_init。

如您所见，这些地址与prog1中的略有不同。额外的数据位似乎已经移动了28个字节。这是两个函数的名字，a_constructor (14字节，带空终止符)，和"main"(带有空结束符的5个字节)和两个格式字符串 "%s\n"(2*4字节，换行符为1个字符和空结束符)，所以14 + 5 + 4 + 4 = 27?嗯，在某个地方差了一分。这只是猜测，我没有去看。无论如何，我们将中断对 __do_global_ctors_aux 的调用，然后单步并观察发生了什么。

08048290 <_init>:
 8048290:       55                      push   %ebp
 8048291:       89 e5                   mov    %esp,%ebp
 8048293:       53                      push   %ebx
 8048294:       83 ec 04                sub    $0x4,%esp
 8048297:       e8 00 00 00 00          call   804829c <_init+0xc>
 804829c:       5b                      pop    %ebx
 804829d:       81 c3 58 1d 00 00       add    $0x1d58,%ebx
 80482a3:       8b 93 fc ff ff ff       mov    -0x4(%ebx),%edx
 80482a9:       85 d2                   test   %edx,%edx
 80482ab:       74 05                   je     80482b2 <_init+0x22>
 80482ad:       e8 1e 00 00 00          call   80482d0 <__gmon_start__@plt>
 80482b2:       e8 d9 00 00 00          call   8048390 <frame_dummy>
 80482b7:       e8 94 01 00 00          call   8048450 <__do_global_ctors_aux>
 80482bc:       58                      pop    %eax
 80482bd:       5b                      pop    %ebx
 80482be:       c9                      leave
 80482bf:       c3                      ret

And here's the code that will get called

Just to help, here's the C source code for __do_global_ctors_aux out of the gcc source code where it lives in a file gcc/crtstuff.c.

As you can see, it initializes p from a global variable CTOR_END and subtracts 1 from it. Remember this is pointer arithmetic though and the pointer points at a function, so in this case, that -1 backs it up one function pointer, or 4 bytes. We'll see that in the assembler as well. While the pointer doesn't have a value of -1 (cast to a pointer), we'll call the function we're pointing at, and then back the pointer up again. Obviously, the beginning of this table starts with -1, and then has some number (perhaps 0) function pointers.

为了提供帮助，这里有gcc源代码中的 __do_global_ctors_aux的C源代码，它位于 gcc/crtstuff.c 文件中。

__do_global_ctors_aux (void)
{
  func_ptr *p;
  for (p = __CTOR_END__ - 1; *p != (func_ptr) -1; p--)
    (*p) ();
}

如你所见，它用全局变量 __CTOR_END__ 初始化 p，并从中减去1. 记住，这是指针的算术，指针指向函数，在这个例子中，-1支持一个函数指针，或4个字节。我们也会在汇编器(反汇编代码 - 译者注)中看到。虽然指针的值不是-1(转换为指针)，但我们将调用所指向的函数，然后再次将指针向后移动。显然，这个表的开头是-1，然后有一些函数指针(可能是0)。

Here's the same in assembler

Here's the assembler that corresponds to it from objdump -d. We'll go over it carefully so you understand it completely before we trace through it in the debugger.

这是和objdump -d对应的汇编程序。在我们在调试器中跟踪它之前，我们将仔细地检查它，以便您完全理解它。

08048450 <__do_global_ctors_aux>:
 8048450:       55                      push   %ebp
 8048451:       89 e5                   mov    %esp,%ebp
 8048453:       53                      push   %ebx
 8048454:       83 ec 04                sub    $0x4,%esp
 8048457:       a1 14 9f 04 08          mov    0x8049f14,%eax
 804845c:       83 f8 ff                cmp    $0xffffffff,%eax
 804845f:       74 13                   je     8048474 <__do_global_ctors_aux+0x24>
 8048461:       bb 14 9f 04 08          mov    $0x8049f14,%ebx
 8048466:       66 90                   xchg   %ax,%ax
 8048468:       83 eb 04                sub    $0x4,%ebx
 804846b:       ff d0                   call   *%eax
 804846d:       8b 03                   mov    (%ebx),%eax
 804846f:       83 f8 ff                cmp    $0xffffffff,%eax
 8048472:       75 f4                   jne    8048468 <__do_global_ctors_aux+0x18>
 8048474:       83 c4 04                add    $0x4,%esp
 8048477:       5b                      pop    %ebx
 8048478:       5d                      pop    %ebp
 8048479:       c3                      ret

First the preamble

There's the normal preamble with the addition of saving %ebx as well because we're going to use it in the function, and we also save room for the pointer p. You'll notice that even though we save room on the stack for it, we never store it there. p will instead live in %ebx, and *p will live in %eax.

还有一个正常的前序言，因为我们将在函数中使用它，我们也为指针p保留了空间。你会注意到，即使我们为它在堆栈上保留了空间，但我们从未将它存储在那里。P 将在 %ebx中，而 * P将在 %eax中。

Now set up before the loop

It looks like an optimization has occurred, instead of loading CTOR_END and then subtracting 1 from it, and dereferencing it, instead, we go ahead and load *(CTOR_END - 1), which is the immediate value 0x8049f14. We load the value in it (remember $0x8049f14 would mean put that value, without the $, just 0x8049f14 means the contents of that address), into %eax. Immediately, we compare this first value with -1 and if it's equal, we're done and jump to address 0x8048474, where we clean up our stack, pop off the things we've saved on there and return.

它看起来像一个优化已经发生，而不是加载__CTOR_END__，然后从它中减去1，并取消引用，相反，我们继续并加载 *(__CTOR_END__ - 1)，这是立即值 0x8049f14。我们将其中的值加载到 %eax中(记住，$0x8049f14表示将该值放入 %eax中，没有 $，只有 0x8049f14表示该地址的内容)。立即，我们将第一个值与-1进行比较，如果它等于，我们就完成了并跳转到地址 0x8048474，在那里我们清理堆栈，弹出我们在那里保存的东西并返回。

Assuming that there's at least one thing in the function table, though, we also move the immediate value $0x8049f14, into %ebx which is f our function pointer, and then do the xchg %ax,%ax. What the heck is that? Well, grasshopper, that is what they use for a nop (No OPeration) in 16 or 32 bit x86. It does nothing but take a cycle and some space. In this case, it's used to make the loop (the top of the loop is the subtract on the next line) start on 8048468 instead of 8048466. The advantage of that is that it aligns the start of the loop on a 4 byte boundary and gives a better chance that the whole loop will fit in a cache line instead of being broken across two. It speeds things up.

假设函数表中至少有一个东西，但是，我们还将立即值 $0x8049f14移动到函数指针 f %ebx中，然后执行 xchg %ax，%ax。这是什么鬼东西?好吧，蚱蜢，这就是他们在16或32位 x86中使用的 nop (No OPeration)。它只需要一个循环和一些空间。在本例中，它用于使循环(循环的顶部是下一行的减法)从 8048468开始，而不是 8048466。这样做的好处是，它将循环的开始对齐在一个 4字节的边界上，并提供了一个更好的机会，整个循环将适合一个缓存行(注: cache line)，而不是被打断在两个。它加快了速度。

And now we hit the top of the loop

Next we subtract 4 from %ebx to be ready for the next time through the loop, call the function we've got the address of in %eax, move the next function pointer into %eax, and compare it to -1. If it's not -1 we jump back up to the subtract and loop again.

接下来，我们从%ebx中减去4，为下一次循环做准备，调用我们在%eax中获得的地址的函数，将下一个函数指针移动到%eax中，并将其与-1进行比较。如果不是-1，我们就再做一次减法循环。

And finally the epilogue

Otherwise we fall through into our function epilogue and return to _init, which immediately falls through into its epilogue and returns to libc_csu_init. Bet you forgot all about him. There's still a loop to deal with there but first--

否则，我们就进入函数 epilogue并返回 _init，而 _init立即进入它的 epilogue并返回 __libc_csu_init__。你肯定把他给忘了。这里仍然有一个循环需要处理，但是首先

I promised you we'd go into the debugger with prog2!

So here we go! Remember that gdb always shows you the line or instruction that you are about to execute.

我们开始吧!请记住，gdb总是向您显示将要执行的行或指令。

$ !gdb
gdb prog2
Reading symbols from /home/patrick/src/asm/prog2...done.
(gdb) set disassemble-next-line on
(gdb) b *0x80482b7
Breakpoint 1 at 0x80482b7

We ran it in the debugger, turned disassemble-next-line on, so that it will always show us the disassembly for the line of code that is about to be executed, and set a breakpoint at the line in _init where we're about to call __do_global_ctors_aux.

我们在调试器中运行它，打开 disassemble-next-line，这样它总是会显示将要执行的代码行的反汇编，并在 _init中我们将要调用 __do_global_ctors_aux的行设置一个断点。

(gdb) r
Starting program: /home/patrick/src/asm/prog2 

Breakpoint 1, 0x080482b7 in _init ()
=> 0x080482b7 <_init+39>:	 e8 94 01 00 00	call   0x8048450 <__do_global_ctors_aux>
(gdb) si
0x08048450 in __do_global_ctors_aux ()
=> 0x08048450 <__do_global_ctors_aux+0>:	 55	push   %ebp

I typed r to run the program and hit the breakpoint. My next command to gdb was si, step instruction, to tell gdb to single step one instruction. We've now entered __do_global_ctors_aux. As we go along you'll see times when it seems that I entered no command to gdb. That's because, if you simply press return, gdb will repeat the last instruction. So if I press enter now, I'll do another si.

我输入 r来运行程序并命中断点。我到 gdb的下一个命令是 si，步进指令，告诉gdb执行单步指令。我们现在进入了__do_global_ctors_aux。接着往下,你可能看到多次似乎我没有输入命令给 gdb. 那是因为, 如果你简单的按下回车 , gdb将会重复上一次的指令. 因此如果我现在按回车,我将会执行另外一个 si(单步执行).

(gdb)
0x08048451 in __do_global_ctors_aux ()
=> 0x08048451 <__do_global_ctors_aux+1>:	 89 e5	mov    %esp,%ebp
(gdb) 
0x08048453 in __do_global_ctors_aux ()
=> 0x08048453 <__do_global_ctors_aux+3>:	 53	push   %ebx
(gdb) 
0x08048454 in __do_global_ctors_aux ()
=> 0x08048454 <__do_global_ctors_aux+4>:	 83 ec 04	sub    $0x4,%esp
(gdb) 
0x08048457 in __do_global_ctors_aux ()

Ok, now we've finished the preamble, and the real code is about to start.

好了，现在我们已经完成了序言，真正的代码即将开始。

(gdb)
=> 0x08048457 <__do_global_ctors_aux+7>:	 a1 14 9f 04 08	mov    0x8049f14,%eax
(gdb) 
0x0804845c in __do_global_ctors_aux ()
=> 0x0804845c <__do_global_ctors_aux+12>:	 83 f8 ff	cmp    $0xffffffff,%eax
(gdb) p/x $eax
$1 = 0x80483b4

I was curious after loading the pointer so I told gdb p/x $eax which means print as hexadecimal the contents of the register %eax. It's not -1, so we can assume that we'll continue through the loop. Now, since my last command was the print, I can't hit enter to get an si, I'll have to type it the next time.

加载指针后，我很好奇，所以我告诉gdb p/x $eax，这意味着将寄存器 %eax的内容以 十六进制输出。它不是-1，所以我们可以假设继续循环。现在，由于我的上一个命令是打印，我不能按回车来得到si，我必须下次再输入它。

译者注1:

命令: ( gdb ) p/x $eax , p是打印的意思. 全称是 print, x 是打印十六进制的意思. 在gdb中. eax被当作一个符号.

(gdb) si
0x0804845f in __do_global_ctors_aux ()
=> 0x0804845f <__do_global_ctors_aux+15>:	 74 13	je     0x8048474 <__do_global_ctors_aux+36>
(gdb) 
0x08048461 in __do_global_ctors_aux ()
=> 0x08048461 <__do_global_ctors_aux+17>:	 bb 14 9f 04 08	mov    $0x8049f14,%ebx
(gdb) 
0x08048466 in __do_global_ctors_aux ()
=> 0x08048466 <__do_global_ctors_aux+22>:	 66 90	xchg   %ax,%ax
(gdb) 
0x08048468 in __do_global_ctors_aux ()
=> 0x08048468 <__do_global_ctors_aux+24>:	 83 eb 04	sub    $0x4,%ebx
(gdb) 
0x0804846b in __do_global_ctors_aux ()
=> 0x0804846b <__do_global_ctors_aux+27>:	 ff d0	call   *%eax
(gdb) 
a_constructor () at prog2.c:3
3	void __attribute__ ((constructor)) a_constructor() {
=> 0x080483b4 <a_constructor+0>:	 55	push   %ebp
   0x080483b5 <a_constructor+1>:	 89 e5	mov    %esp,%ebp
   0x080483b7 <a_constructor+3>:	 83 ec 18	sub    $0x18,%esp

Now this is very interesting. We've single stepped into the call. Now we're in our function, a_constructor. Since gdb has the source code for it, it shows us the C source for the next line. Since I turned on disassemble-next-line, it will also give us the assembler that corresponds to that line. In this case, it's the preamble for the function that corresponds to the declaration of the function, so we get all three lines of the preamble. Isn't that interesting? Now I'm going to switch over to the command n (for next) because our printf is coming up. The first n will skip the preamble, the second the printf, and the third the epilogue. If you've ever wondered why you have to do an extra step at the beginning and end of a function when single stepping with gdb, now you know the answer.

这很有趣。我们已经单步执行到了调用里面。现在我们在函数中，a_constructor。因为gdb有它的源代码，所以它向我们显示了下一行的C源代码。因为我打开了disassemble-next-line，它也会给我们与这条线对应的汇编器。在这种情况下，它是函数的前导代码对应于函数的声明，所以我们得到了前导代码的全部三行。这不是有趣的吗?现在我要切换到命令 n(下一个)因为我们的printf已经出现了。第一个n将跳过序言，第二个n跳过printf，第三个n跳过尾声。如果您曾经想知道，当使用gdb单步执行时，为什么必须在函数的开头和结尾执行额外的步骤，现在您知道答案了。

(gdb) n
4	    printf("%s\n", __FUNCTION__);
=> 0x080483ba <a_constructor+6>:	 c7 04 24 a5 84 04 08	movl   $0x80484a5,(%esp)
   0x080483c1 <a_constructor+13>:	 e8 2a ff ff ff	call   0x80482f0 <puts@plt>

We moved the address of the string "a_constructor" onto the stack as an argument for printf, but it calls puts since the compiler was smart enough to see that puts was all we needed.

我们移动了(译者注: 复制)字符串"a_constructor"的地址到栈上;作为printf的一个参数,但是它最后调用了 puts函数.因为编译器足够smart知道 puts 就是我们想要的.

(gdb) n
a_constructor
5	}
=> 0x080483c6 <a_constructor+18>:	 c9	leave  
   0x080483c7 <a_constructor+19>:	 c3	ret

Since we're tracing the program, it is, of course running, so we see a_constructor print out above. The closing brace (}) corresponds to the epilogue so that prints out now. Just a note, if you don't know about the instruction leave it does exactly the same as

由于我们正在跟踪调试这个程序. 当然,它也正在运行. 因此我们看到了"a_constructor" 的打印输出. 右大括号(})对应尾声，因此现在打印出来。提醒一下，如果你不知道这个指令 leave 那么它也是一样的意思.

movl %ebp, %esp
    popl %ebp

One more step and we exit the function and return, I'll have to switch back to si.

再走一步，我们退出这个函数，然后返回，我必须切换回 si ( 注: 单步指令级执行)。

(gdb) n
0x0804846d in __do_global_ctors_aux ()
=> 0x0804846d <__do_global_ctors_aux+29>:	 8b 03	mov    (%ebx),%eax
(gdb) si
0x0804846f in __do_global_ctors_aux ()
=> 0x0804846f <__do_global_ctors_aux+31>:	 83 f8 ff	cmp    $0xffffffff,%eax
(gdb) 
0x08048472 in __do_global_ctors_aux ()
=> 0x08048472 <__do_global_ctors_aux+34>:	 75 f4	jne    0x8048468 <__do_global_ctors_aux+24>
(gdb) p/x $eax
$2 = 0xffffffff

Got curious and checked again. This time, our function pointer is -1, so we'll exit the loop.

我很好奇，又检查了一下。这一次，我们的函数指针是-1，所以我们将退出循环。

(gdb) si
0x08048474 in __do_global_ctors_aux ()
=> 0x08048474 <__do_global_ctors_aux+36>:	 83 c4 04	add    $0x4,%esp
(gdb) 
0x08048477 in __do_global_ctors_aux ()
=> 0x08048477 <__do_global_ctors_aux+39>:	 5b	pop    %ebx
(gdb) 
0x08048478 in __do_global_ctors_aux ()
=> 0x08048478 <__do_global_ctors_aux+40>:	 5d	pop    %ebp
(gdb) 
0x08048479 in __do_global_ctors_aux ()
=> 0x08048479 <__do_global_ctors_aux+41>:	 c3	ret  
(gdb) 
0x080482bc in _init ()
=> 0x080482bc <_init+44>:	 58	pop    %eax

Notice we're back in _init now.

(gdb) 
0x080482bd in _init ()
=> 0x080482bd <_init+45>:	 5b	pop    %ebx
(gdb) 
0x080482be in _init ()
=> 0x080482be <_init+46>:	 c9	leave  
(gdb) 
0x080482bf in _init ()
=> 0x080482bf <_init+47>:	 c3	ret  
(gdb) 
0x080483f9 in __libc_csu_init ()
=> 0x080483f9 <__libc_csu_init+25>:	 8d bb 1c ff ff ff	lea    -0xe4(%ebx),%edi
(gdb) q
A debugging session is active.

	Inferior 1 [process 17368] will be killed.

Quit anyway? (y or n) y
$

Notice we jumped back up into __libc_csu_init, and that's when I typed q to quite the debugger. That's all the debugging I promised you. Now that we're back in libc_csu_init there's another loop to deal with, and I'm not going to step through it, but I am about to talk about it.

注意，我们回到了__libc_csu_init，那是我输入q来关闭调试器的时候。这就是我承诺的调试。现在我们回到了__libc_csu_init__，这里有另一个循环要处理，我不打算单步执行它，但我要讨论它。

Back up to libc_csu_init

Since we've spent a long tedious time dealing with a loop in assembler and the assembler for this one is even more tedious, I'll leave it to you to figure it out if you want. Just to remind you, here it is in C.

因为我们花了很长时间在汇编程序中处理循环而这个的汇编程序更冗长，如果你想的话，我把它留给你自己去弄明白。提醒一下，这里是C。

void
__libc_csu_init (int argc, char **argv, char **envp)
{

  _init ();

  const size_t size = __init_array_end - __init_array_start;
  for (size_t i = 0; i < size; i++)
      (*__init_array_start [i]) (argc, argv, envp);
}

Here's another function call loop

What is this __init_array? I thought you'd never ask. You can have code run at this stage as well. Since this is just after returning from running _init which ran our constructors, that means anything in this array will run after constructors are done. You can tell the compiler you want a function to run at this phase. The function will receive the same arguments as main.

这个 __init_array 是什么?我还以为你不会问呢。您也可以在这个阶段运行代码。因为这是在运行了构造函数的_init返回之后，这意味着这个数组中的任何东西都将在构造函数完成之后运行。你可以告诉编译器你想在这个阶段运行一个函数。该函数将接收与main相同的参数。

void init(int argc, char **argv, char **envp) {
 printf("%s\n", __FUNCTION__);
}

__attribute__((section(".init_array"))) typeof(init) *__init = init;

We won't do it, yet, because there's more things like that. Lets just return from __lib_csu_init. Do you remember where that will take us?

我们暂时不会这么做，因为这样的事情还有很多。让我们从__lib_csu_init返回。你还记得我们会去哪里吗

We'll be all the way back in libc_start_main

He calls our main now, and then passes the result to exit().

exit() runs some more loops of functions

exit() runs the functions registered with at_exit run in the order they were added. Then he runs another loop of functions, this time, functions in the fini array. After that he runs another loop of functions, this time destructors. (In reality, he's in a nested loop dealing with an array of lists of functions, but trust me this is the order they come out in.) Here, I'll show you.

exit()按照添加的顺序运行在 at_exit 中注册的函数(以注册的顺序)。然后他运行另一个函数循环，这次是 fini数组中的函数。之后，他运行另一个循环函数，这次是析构函数。(实际上，他是在处理函数列表数组的嵌套循环中，但相信我，这是它们输入的顺序。)来，我指给你看。

This program, hooks.c ties it all together

#include <stdio.h>

void preinit(int argc, char **argv, char **envp) {
 printf("%s\n", __FUNCTION__);
}

void init(int argc, char **argv, char **envp) {
 printf("%s\n", __FUNCTION__);
}

void fini() {
 printf("%s\n", __FUNCTION__);
}

__attribute__((section(".init_array"))) typeof(init) *__init = init;
__attribute__((section(".preinit_array"))) typeof(preinit) *__preinit = preinit;
__attribute__((section(".fini_array"))) typeof(fini) *__fini = fini;

void  __attribute__ ((constructor)) constructor() {
 printf("%s\n", __FUNCTION__);
}

void __attribute__ ((destructor)) destructor() {
 printf("%s\n", __FUNCTION__);
}

void my_atexit() {
 printf("%s\n", __FUNCTION__);
}

void my_atexit2() {
 printf("%s\n", __FUNCTION__);
}

int main() {
 atexit(my_atexit);
 atexit(my_atexit2);
}

If you build and run this, (I call it hooks.c), the output is

如果您构建并运行它(我称之为hooks.c)，输出是

$ ./hooks
preinit
constructor
init
my_atexit2
my_atexit
fini
destructor
$

The End

I'll give you a last look at how far we've come. This time it should all be familiar territory to you.

译者注:

整篇文章大意就是要给我们介绍在main函数执行前要做什么.或者说main方法整体在程序的运行流程中.处于哪一个环节. 然后主体的串流程是在函数: __libc_start_main 因为这个是gcc, 编译的有可能是cpp文件. 另外即使是c语言.也有可能的一些全局的 construct函数和 deconstruct 函数.

其它参考:

RBA的技术分享