Beennan的内嵌汇编指导（译）Brennan's Guide to Inline Assembly

时间 2019-12-20

标签 beennan 内嵌汇编指导 brennan's brennan guide inline assembly 繁體版

原文原文链接

注：写在前面，这是一篇翻译文章，本人的英文水平颇有限，但内嵌汇编是学习操做系统不可少的知识，本人也常去查看这方面的内容，本文是在作mit的jos实验中的一篇关于内嵌汇编的介绍。关于经常使用的内嵌汇编（AT&T格式）的语法都有介绍，同时在篇末还列出了经常使用的一些内嵌汇编代码的写法。看了颇有益处。大牛就没必要看了。固然很是欢迎对文章中的翻译错误或不当之处进行指正。html

ps:这是这篇文章的原地址：http://www.delorie.com/djgpp/doc/brennan/brennan_att_inline_djgpp.htmlexpress

ps:全部注都是本人另外添加的。数组

Brennan's Guide to Inline Assembly
Beennan的内嵌汇编指导

by Brennan "Bas" Underwood
做者：Brennanapp

Document version 1.1.2.2
文档版本 1.1.2.2less

Ok. This is meant to be an introduction to inline assembly under DJGPP. DJGPP is based on GCC, so it uses the AT&T/UNIX syntax and has a somewhat unique method of inline assembly. I spent many hours figuring some of this stuff out and told Info that I hate it, many times.
这是一篇关于在DJGPP编译器下的内嵌汇编的介绍。DJGPP基于GCC，因此它使用AT&T语法格式，而且一些独特的方法。我花了好几个小时指出它的特性以及我屡次提到的令我讨厌的地方。
Hopefully if you already know Intel syntax, the examples will be helpful to you. I've put variable names, register names and other literals in bold type.
若是你已经了解Intel的汇编语法，这些例子会对你颇有帮助。我用粗体字来标识变量、寄存器以及其余名称。ide

The Syntax
语法

So, DJGPP uses the AT&T assembly syntax. What does that mean to you?
DJGPP使用AT&T汇编语法。这对你意味着什么？函数

Register naming:

寄存器名称

AT&T:  %eax
Intel: eax

Source/Destination Ordering:

操做数方向：

In AT&T syntax (which is the UNIX standard, BTW) the source is always on the left, and the destination is always on the right.So let's load ebx with the value in eax:
在AT&T语法中（顺便说一句，这个在unix中是标准。），来源总在左侧，目的总在右侧。那么让我将eax中的值保存在ebx中，那语句将会象以下所示：学习

AT&T:  movl %eax, %ebx
Intel: mov ebx, eax

Constant value/immediate value format:

常量和当即数格式：

You must prefix all constant/immediate values with "$".
你必须在常量和当即数前加$符号。
Let's load eax with the address of the "C" variable booga, which is static.
将一个c语言的一个静态变量booga保存在eax中。优化

AT&T:  movl $_booga, %eax
Intel: mov eax, _booga

Now let's load ebx with 0xd00d:
将一个十六进制数保存在ebx中。

AT&T:  movl $0xd00d, %ebx
Intel: mov ebx, d00dh

Operator size specification:

操做数大小指令：

You must suffix the instruction with one of b, w, or l to specify the width of the destination register as a byte, word or longword. If you omit this, GAS (GNU assembler) will attempt to guess. You don't want GAS to guess, and guess wrong! Don't forget it.
你必须使用b,w或者l作为指令后缀来表示保存在目的寄存器中的是一个位，字或长字。若是省略它，GAS(GNU的编译器）会临时推断。你必定不想GAS去猜它，也许会猜错！这一点不要忘记。以下面的指令：

AT&T:  movw %ax, %bx
Intel: mov bx, ax

The equivalent forms for Intel is byte ptr, word ptr, and dword ptr, but that is for when you are...
在intel汇编语法中相匹配的格式是位使用byte ptr，字使用word ptr,长字使用dword ptr，但这是...

Referencing memory:

内存引用：

DJGPP uses 386-protected mode, so you can forget all that real-mode addressing junk, including the restrictions on which register has what default segment, which registers can be base or index pointers. Now, we just get 6 general purpose registers. (7 if you use ebp, but be sure to restore it yourself or compile with -fomit-frame-pointer.)
DJGPP使用386的保护模式，因此你能够忘记全部关于实模式地址的问题，包括寄存器默认使用哪一个段寄存器，哪一个寄存器能够用作基址或索引指针。如今，咱们必须使用6个通用寄存器。（固然，若是你使用ebp，那就是7个，但必须记得自已手动恢复它，或者在编译时使用-fomit-frame-pointer选项。）

Here is the canonical format for 32-bit addressing:
下面是32位地址的常规格式：

AT&T:  immed32(basepointer,indexpointer,indexscale) 32位当即数（基址指针，索引指针，索引倍数）
Intel: [basepointer + indexpointer*indexscale + immed32]

You could think of the formula to calculate the address as:
你须要使用如下公式来计算地址：

immed32 + basepointer + indexpointer * indexscale

You don't have to use all those fields, but you do have to have at least 1 of immed32, basepointer and you MUST add the size suffix to the operator!
你可能不会用到全部的参数部分，但你至少会有一个当即数参数，使用基址指针时你必须添加指定大小的后缀。

Let's see some simple forms of memory addressing:
让咱们来看一些简单的关于内存地址例子：

Addressing a particular C variable:

（直接寻址）用一个规则的c变量的进行内存寻址：

AT&T:  _booga
Intel: [_booga]

Note: the underscore ("_") is how you get at static (global) C variables from assembler. This only works with global variables. Otherwise, you can use extended asm to have variables preloaded into registers for you. I address that farther down.
注释：下划线是编译器翻译后的静态（全局）c语言变量。这种方式仅在引用全局变量时使用。不然你必须使用扩展asm来控制可变的预保存寄存器。我会在后面指出这种用法。

Addressing what a register points to:

（间接寻址）使用一个寄存器中地址值进行内存寻址：

AT&T:  (%eax)
Intel: [eax]

Addressing a variable offset by a value in a register:

（寄存器变址寻址）使用寄存器加偏移量进行内存寻址：

AT&T: _variable(%eax)
Intel: [eax + _variable]

Addressing a value in an array of integers (scaling up by 4):

使用一个整数数组进行内存寻址（以4为步长）：

AT&T:  _array(,%eax,4)
Intel: [eax*4 + array]

You can also do offsets with the immediate value:

你也可使用当即数做为偏移量：

C code: *(p+1) where p is a char *
对应的c代码：*(p+1) 这里p是一个char * 变量

AT&T:  1(%eax) where eax has the value of p（这里eax是变量p的值）
Intel: [eax + 1]

You can do some simple math on the immediate value:

你也能够对当即数进行简单的算术运算：

AT&T: _struct_pointer+8

I assume you can do that with Intel format as well.
我假设你能用Intel格式作相同的事情。

Addressing a particular char in an array of 8-character records:

在一个8个大小的字符数组组成的记录中进行寻址：

eax holds the number of the record desired. ebx has the wanted char's offset within the record.
寄存器eax中保存的是记录号。寄存器ebx中是这个记录中想查找的字符的偏移量。

AT&T:  _array(%ebx,%eax,8)
Intel: [ebx + eax*8 + _array]

Whew. Hopefully that covers all the addressing you'll need to do. As a note, you can put esp into the address, but only as the base register.

但愿这些能覆盖你能遇到的全部寻址方式。另外，你能够把esp的值放在一个内存地址中，但仅限于作为基址寄存器。

Basic inline assembly

基本内嵌汇编

The format for basic inline assembly is very simple, and much like Borland's method.
内联汇编的语法格式是至关简单的，并且更象Borland的方法。

asm ("statements");

Pretty simple, no? So
很是简单，是不？

asm ("nop");
//will do nothing of course, and
//什么也不作的空语句。
asm ("cli");
//will stop interrupts, with
//关闭中断，
asm ("sti");
//of course enabling them. You can use __asm__ instead of asm if the keyword asm conflicts with something in your program.
//When it comes to simple stuff like this, basic inline assembly is fine. You can even push your registers onto the stack, 
//use them, and put them back.
//固然是容许中断了。若是asm关键字在你的程序中冲突了，你可使用__asm__代替asm。

若是仅象上面这些同样简单，那内联汇编真是好东西。你甚至能够将寄存器入栈，而后使用它们，用完后再出栈。就象下面这样：

asm ("pushl %eax\n\t"
     "movl $0, %eax\n\t"
     "popl %eax");

(The \n's and \t's are there so the .s file that GCC generates and hands to GAS comes out right when you've got multiple statements per asm.)
It's really meant for issuing instructions for which there is no equivalent in C and don't touch the registers.
（这里使用的\n\t是为了让GAS在一段内联汇编中使用了多条语句时准确地认出它们。）这里真正用意是为了让它们和c语句不等同。而且不破坏寄存器。

But if you do touch the registers, and don't fix things at the end of your asm statement, like so:
但若是你破坏了寄存器，而且在结束时也没有修正，就象下面：

asm ("movl %eax, %ebx");
asm ("xorl %ebx, %edx");
asm ("movl $0, _booga");

then your program will probably blow things to hell. This is because GCC hasn't been told that your asm statement clobbered ebx and edx and booga, which it might have been keeping in a register, and might plan on using later. For that, you need:

那么你的程序可能会最到恐怖的事情。这是由于GCC没有告诉你的汇编语句前面的ebx,edx和booga(多是保存在寄存器中)，你在后面计划用到它。如想如此，你须要：

Extended inline assembly

扩展的内嵌汇编

The basic format of the inline assembly stays much the same, but now gets Watcom-like extensions to allow input arguments and output arguments.
内嵌汇编的基本语法格式和上面提到的很象，但须要Watcom扩展风格的输入及输出参数。

Here is the basic format:
下面是基本的语法格式：

asm ( "statements" : output_registers : input_registers : clobbered_registers);
asm（语句：输出寄存器，输入寄存器，会被破坏的寄存器）

Let's just jump straight to a nifty example, which I'll then explain:
先让咱们直接看一段例子，稍后会作解释：

asm ("cld\n\t"
     "rep\n\t"
     "stosl"
     : /* no output registers *//*没有指定输出寄存器*/
     : "c" (count), "a" (fill_value), "D" (dest)
     : "%ecx", "%edi" );

The above stores the value in fill_value count times to the pointer dest.

上面的程序段将fill_value分count次保存在目的地址处。

Let's look at this bit by bit.

让咱们一句一句来看看。

asm ("cld\n\t"

We are clearing the direction bit of the flags register. You never know what this is going to be left at, and it costs you all of 1 or 2 cycles.

清除寄存器方向标志。你永远不会知道若是忘记了这句会怎么样，也许会花费你一两个循环的时间。

"rep\n\t"

"stosl"

Notice that GAS requires the rep prefix to occupy a line of it's own. Notice also that stos has the l suffix to make it move longwords.

注意GAS须要rep前缀单独占一行。也要注意stos指令有个后缀l来指明它每次移动一个长字。

: /* no output registers */

Well, there aren't any in this function.

在这段函数中这里什么也没有。

: "c" (count), "a" (fill_value), "D" (dest)

Here we load ecx with count, eax with fill_value, and edi with dest. Why make GCC do it instead of doing it ourselves? Because GCC, in its register allocating, might be able to arrange for, say, fill_value to already be in eax. If this is in a loop, it might be able to preserve eax thru the loop, and save a movl once per loop.

这里count值被保存在ecx中，fill_value被保存在eax中，edi中的是目的地址。为何要自已指定寄存器，而不是让GCC来决定？由于GCC在分配寄存器时，可能会作如此安排，好比，fill_value已经在eax中了。假如这是一个循环，它应该整个循环被保留在eax中，每次循环均要保存一次。

: "%ecx", "%edi" );

And here's where we specify to GCC, "you can no longer count on the values you loaded into ecx or edi to be valid." This doesn't mean they will be reloaded for certain. This is the clobberlist.

这里的意思是提醒GCC，“你不能期望你保存在ecx或edi中的数据依旧有效。”这不意味着它们必定被从新载入。这是一个寄存器影响列表。

Seem funky? Well, it really helps when optimizing, when GCC can know exactly what you're doing with the registers before and after. It folds your assembly code into the code it's generates (whose rules for generation look remarkably like the above) and then optimizes. It's even smart enough to know that if you tell it to put (x+1) in a register, then if you don't clobber it, and later C code refers to (x+1), and it was able to keep that register free, it will reuse the computation. Whew.

看起来让人担忧？好吧。当GCC能准确地知道你使用寄存器先后的事情时，在优化代码时会有帮助。它将你的代码放在它生成的代码中而后再优化。编译器足够智能，以至于知道若是你告诉它放置一个变量值（经+1）到一个寄存器中，而后若是你不去破坏它，在后面的C代码对这个变量（x+1）的引用中，它会保持这个寄存器，这样就能重用计算。

Here's the list of register loading codes that you'll be likely to use:

下面是你最可能用到的寄存器对应的代码列表：

a        eax
b        ebx
c        ecx
d        edx
S        esi
D        edi
I        constant value (0 to 31)数值
q,r      dynamically allocated register (see below)动态分配寄存器
g        eax, ebx, ecx, edx or variable in memory
A        eax and edx combined into a 64-bit integer (use long longs)长字时用eax和dex合起来表示一个64位字

Note that you can't directly refer to the byte registers (ah, al, etc.) or the word registers (ax, bx, etc.) when you're loading this way. Once you've got it in there, though, you can specify ax or whatever all you like.

注意在这种使用方法中，你不能直接引用位寄存器（ah,al,等等）或者字寄存器（ax,bx,等等）。一旦你拿到一个寄存器，你就能指定ax或者你愿意的用法。

The codes have to be in quotes, and the expressions to load in have to be in parentheses.

代码必须位于引号以内，表达式必须放在圆括号内。

When you do the clobber list, you specify the registers as above with the %. If you write to a variable, you must include "memory" as one of The Clobbered. This is in case you wrote to a variable that GCC thought it had in a register. This is the same as clobbering all registers. While I've never run into a problem with it, you might also want to add "cc" as a clobber if you change the condition codes (the bits in the flags register the jnz, je, etc. operators look at.)

在寄存器影响列表中，使用%前缀。若是你使用了一个变量，你必须在列表中包括memory。这是防止你写了一个变量，GCC却把它放在寄存器中。

Now, that's all fine and good for loading specific registers. But what if you specify, say, ebx, and ecx, and GCC can't arrange for the values to be in those registers without having to stash the previous values. It's possible to let GCC pick the register(s). You do this:

如今，使用指定的寄存器彷佛很好用。但，必定你指定ebx和ecx,而GCC在不隐藏之前保存的值就没法安排这些数值。一种办法是让GCC来选择寄存器。能够象下面这样作：

asm ("leal (%1,%1,4), %0"
     : "=r" (x)
     : "0" (x) );

The above example multiplies x by 5 really quickly (1 cycle on the Pentium). Now, we could have specified, say eax. But unless we really need a specific register (like when using rep movsl or rep stosl, which are hardcoded to use ecx, edi, and esi), why not let GCC pick an available one? So when GCC generates the output code for GAS, %0 will be replaced by the register it picked.

上面例子快速将变量x乘5倍（在Pentium上只用一个周期）。咱们能够指定寄存器，好比eax。但只有咱们真的必须指定寄存器时才应该这样作（就象当咱们使用rep movsl或者rep stosl这样的语句时，由于它们规定必须使用ecx,dei和dsi），若是没必要要，那为何不让gcc来选择一个可用的寄存器呢？这样，当GCC生成输出代码时，%0就会被它选择的寄存器代替。注：lea是传送指令，将左侧值传送到右侧寄存器中。这样就产生相似这样的代码：%0=%1+%1*4，这样，就实现了x变量的乘5。

And where did "q" and "r" come from? Well, "q" causes GCC to allocate from eax, ebx, ecx, and edx. "r" lets GCC also consider esi and edi. So make sure, if you use "r" that it would be possible to use esi or edi in that instruction. If not, use "q".

那么何时使用q和r?q会致使GCC在eax,ebx,ecx和edx这几个寄存器中进行分配。r让GCC决定esi和edi。若是你使用了r，那就必定会使用esi或edi这两个寄存器。若是没必要要，请使用q。

Now, you might wonder, how to determine how the %n tokens get allocated to the arguments. It's a straightforward first-come-first-served, left-to-right thing, mapping to the "q"'s and "r"'s. But if you want to reuse a register allocated with a "q" or "r", you use "0", "1", "2"... etc.

如今你极可能想知道%n这样的参数是如何分配的？这里遵循先看到先服务，从左至右的规则，将q或r指定的寄存器进行映射。若是你想重复使用经过q或r分配的寄存器，可使用0,1,2等。

You don't need to put a GCC-allocated register on the clobberlist as GCC knows that you're messing with it.

你没必要要在影响列表中包含GCC分配的寄存器，由于GCC知道它们的使用状况。

Now for output registers.

下面是输出寄存器。

asm ("leal (%1,%1,4), %0"
     : "=r" (x_times_5)
     : "r" (x) );

Note the use of = to specify an output register. You just have to do it that way. If you want 1 variable to stay in 1 register for both in and out, you have to respecify the register allocated to it on the way in with the "0" type codes as mentioned above.

注意，使用=号来指定输出寄存器。你须要作的仅仅就是象上面这样。若是你想让第1个变量在输入及输出时均保留在第一个寄存器，你必须使用0类型代码来从新分配寄存器。

asm ("leal (%0,%0,4), %0"
     : "=r" (x)
     : "0" (x) );

注：这段代码就经过0来指定使用的寄存器和%0是一个。

This also works, by the way:

下面代码也完成一样工做：

asm ("leal (%%ebx,%%ebx,4), %%ebx"
     : "=b" (x)
     : "b" (x) );

2 things here:

两点要注意的事：

Note that we don't have to put ebx on the clobberlist, GCC knows it goes into x. Therefore, since it can know the value of ebx, it isn't considered clobbered. Notice that in extended asm, you must prefix registers with %% instead of just %. Why, you ask? Because as GCC parses along for %0's and %1's and so on, it would interpret %edx as a %e parameter, see that that's non-existent, and ignore it. Then it would bitch about finding a symbol named dx, which isn't valid because it's not prefixed with % and it's not the one you meant anyway.

注意，咱们没必要将ebx放在影响列表中，由于GCC知道它将保存变量x。所以它知道ebx中保存有值，它就不会考虑去破坏它。注意在扩展内联汇编中，你必须使用%%前缀来代替%前缀。为何非要如此呢？由于GCC分析%0这类参数变量，它会在分析%edx时在%e处就中止分析，这样会将%edx作为%e这样的参数变量，但它是不存在的，GCC就会忽略它。一样GCC也会破坏找到的dx这样的符号名称，由于那些没有%前缀的符号名称是不合语法的。

Important note: If your assembly statement must execute where you put it, (i.e. must not be moved out of a loop as an optimization), put the keyword volatile after asm and before the ()'s. To be ultra-careful, use __asm__ __volatile__ (...whatever...);

重要的注意：若是你的汇编代码必需要象你书写的那样来执行，（好比，不能在优化中将它从循环中移除），那么就须要在asm关键字与()前放置volatile关键字。必定要当心，使用 __asm__ __volatile__ (..其余代码...);

However, I would like to point out that if your assembly's only purpose is to calculate the output registers, with no other side effects, you should leave off the volatile keyword so your statement will be processed into GCC's common subexpression elimination optimization.

然而，我要指出的是，若是你的汇编代码目的仅仅是计算输出寄存器，而且不m有其余影响，你不该当放置volatile关键字，这样能够容许GCC进行代码。

Some useful examples

一些有用例子

#define disable() __asm__ __volatile__ ("cli");

#define enable() __asm__ __volatile__ ("sti");

Of course, libc has these defined too.
固然，libc库中也有这些定义。

#define times3(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,2),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

#define times5(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,4),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

#define times9(arg1, arg2) \
__asm__ ( \
  "leal (%0,%0,8),%0" \
  : "=r" (arg2) \
  : "0" (arg1) );

These multiply arg1 by 3, 5, or 9 and put them in arg2. You should be ok to do: times5(x,x);

上面这些代码是将乘数arg1进行3倍，5倍或9倍乘法，而后结果放在arg2中。你应当象这样作：times5(x,x);

as well.

#define rep_movsl(src, dest, numwords) \
__asm__ __volatile__ ( \
  "cld\n\t" \
  "rep\n\t" \
  "movsl" \
  : : "S" (src), "D" (dest), "c" (numwords) \
  : "%ecx", "%esi", "%edi" )

Helpful Hint: If you say memcpy() with a constant length parameter, GCC will inline it to a rep movsl like above. But if you need a variable length version that inlines and you're always moving dwords, there ya go.

有益的提示：若是你使用固定长度参数来调用memcpy()函数，GCC会将它内联成象上面这样的转移指令。但若是你须要一个内联的可变长度参数内存拷贝，你老是须要移动dwords，就象上面。

#define rep_stosl(value, dest, numwords) \
__asm__ __volatile__ ( \
  "cld\n\t" \
  "rep\n\t" \
  "stosl" \
  : : "a" (value), "D" (dest), "c" (numwords) \
  : "%ecx", "%edi" )

Same as above but for memset(), which doesn't get inlined no matter what (for now.)

上面的代码和memset()函数执行一样功能，但memset不会生成内联代码（到目前为止是这样）。

#define RDTSC(llptr) ({ \
__asm__ __volatile__ ( \
        ".byte 0x0f; .byte 0x31" \
        : "=A" (llptr) \
        : : "eax", "edx"); })

Reads the TimeStampCounter on the Pentium and puts the 64 bit result into llptr.

读取Pentium机器上的时间戳，而后将它放在一个64位的结果变量llptr中。

注：在多核心机器上，可能使用rdtscp指令更可靠些，虽然执行周期多一些。就象下面这样：

__inline__ uint64_t perf_counter(void)
{
  uint32_t lo, hi;
  // take time stamp counter, rdtscp does serialize by itself, and is much cheaper than using CPUID
  __asm__ __volatile__ (
      "rdtscp" : "=a"(lo), "=d"(hi)
      );
  return ((uint64_t)lo) | (((uint64_t)hi) << 32);
}

The End

写在最后

"The End"?! Yah, I guess so.

结束了？我猜是这样。

If you're wondering, I personally am a big fan of AT&T/UNIX syntax now. (It might have helped that I cut my teeth on SPARC assembly. Of course, that machine actually had a decent number of general registers.) It might seem weird to you at first, but it's really more logical than Intel format, and has no ambiguities.

若是你想知道，到目前为止我我的是一个AT&T/UNIX语法的粉丝。（这种语法在我使用SPARC汇编时有帮助。固然那个机器实际上有至关多的通用寄存器。）这些语法对你来讲可能有些怪，但真的比Intel格式要有逻辑得多，并且没有岐义。

If I still haven't answered a question of yours, look in the Info pages for more information, particularly on the input/output registers. You can do some funky stuff like use "A" to allocate two registers at once for 64-bit math or "m" for static memory locations, and a bunch more that aren't really used as much as "q" and "r".

若是对你的问题我上面这些内容依旧没有可以说清楚，能够相关的Info Pages去看更多信息，尤为是关于寄存器的输入和输出部分。你能作一些恐怖的事情，例如，使用"A"同时分配两个寄存器来完成64位计算，或者使用"m"来定位静态内存，或者"q"功"r"来绑定更多内容。

Alternately, mail me, and I'll see what I can do. (If you find any errors in the above, please, e-mail me and tell me about it! It's frustrating enough to learn without buggy docs!) Or heck, mail me to say "boogabooga."

或者，给我写信，我将看看我能帮你作什么。（若是你在上面的内容中发现错误，请必定要e-mail我，让我知道！得知一个文档没有错误是使人不快的！）真见鬼，给我写信并写上"boogabooga." 注：最后这句话，我真不知做者在说什么。

It's the least you can do.