结构体最后的长度为0或1数组的做用

时间 2019-11-13

标签结构最后长度数组繁體版

原文原文链接

其实很早在看LINUX下就看到这个东西，后来在MFC内存池里一样也看到了相似的东西，还依照MFC写过一个相似的小内存池，（MFC用的是return this + 1）后来在李先静的《系统程序员成长计划》里看到了相似的定义，因而内心想着总结一下，结果发现网上已经有牛人总结的很好了，因而乎就转了过来，谢谢大家的分享，这是我前进的动力！linux

同时，须要引发注意的：ISO/IEC 9899-1999里面，这么写是非法的，这个仅仅是GNU C的扩展，gcc能够容许这一语法现象的存在。但最新的C/C++不知道是否能够，我没有测试过。（C99容许。微软的VS系列报一个WARNING，即很是的标准扩展。）程序员

结构体最后使用0或1的长度数组的缘由，主要是为了方便的管理内存缓冲区，若是你直接使用指针而不使用数组，那么，你在分配内存缓冲区时，就必须分配结构体一次，而后再分配结构体内的指针一次，（而此时分配的内存已经与结构体的内存不连续了，因此要分别管理即申请和释放）而若是使用数组，那么只须要一次就能够所有分配出来，（见下面的例子），反过来，释放时也是同样，使用数组，一次释放，使用指针，得先释放结构体内的指针，再释放结构体。还不能颠倒次序。其实就是分配一段连续的的内存，减小内存的碎片化。数组

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 数据结构

在Linux系统里，/usr/include/linux/if_pppox.h里面有这样一个结构： ide

struct pppoe_tag {
    __u16 tag_type;
    __u16 tag_len;
    char tag_data[0];
} __attribute ((packed));
最后一个成员为可变长的数组，对于TLV（Type-Length-Value）形式的结构，或者其余须要变长度的结构体，用这种方式定义最好。使用起来很是方便，建立时，malloc一段结构体大小加上可变长数据长度的空间给它，可变长部分可按数组的方式访问，释放时，直接把整个结构体free掉就能够了。例子以下：
struct pppoe_tag *sample_tag;
__u16 sample_tag_len = 10;
sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag)+sizeof(char)*sample_tag_len);
sample_tag->tag_type = 0xffff;
sample_tag->tag_len = sample_tag_len;
sample_tag->tag_data[0]=....
...
释放时，
free(sample_tag)函数

是否能够用 char *tag_data 代替呢？其实它和 char *tag_data 是有很大的区别，为了说明这个问题，我写了如下的程序：
例1：test_size.c
10  struct tag1
20  {
30     int a;
40     int b;
50  }__attribute ((packed));
60
70  struct tag2
80  {
90     int a;
100     int b;
110     char *c;
120  }__attribute ((packed));
130
140  struct tag3
150  {
160      int a;
170      int b;
180      char c[0];
190  }__attribute ((packed));
200
210  struct tag4
220  {
230      int a;
240      int b;
250      char c[1];
260  }__attribute ((packed));
270
280  int main()
290  {
300      struct tag2 l_tag2;
310      struct tag3 l_tag3;
320      struct tag4 l_tag4;
330
340      memset(&l_tag2,0,sizeof(struct tag2));
350      memset(&l_tag3,0,sizeof(struct tag3));
360      memset(&l_tag4,0,sizeof(struct tag4));
370
380      printf("size of tag1 = %d\n",sizeof(struct tag1));
390      printf("size of tag2 = %d\n",sizeof(struct tag2));
400      printf("size of tag3 = %d\n",sizeof(struct tag3));
410
420      printf("l_tag2 = %p,&l_tag2.c = %p,l_tag2.c = %p\n",&l_tag2,&l_tag2.c,l_tag2.c);
430      printf("l_tag3 = %p,l_tag3.c = %p\n",&l_tag3,l_tag3.c);
440      printf("l_tag4 = %p,l_tag4.c = %p\n",&l_tag4,l_tag4.c);
450      exit(0);
460  }学习

__attribute ((packed)) 是为了强制不进行4字节对齐，这样比较容易说明问题。
程序的运行结果以下：
size of tag1 = 8
size of tag2 = 12
size of tag3 = 8
size of tag4 = 9
l_tag2 = 0xbffffad0,&l_tag2.c = 0xbffffad8,l_tag2.c = (nil)
l_tag3 = 0xbffffac8,l_tag3.c = 0xbffffad0
l_tag4 = 0xbffffabc,l_tag4.c = 0xbffffac4测试

从上面程序和运行结果能够看出：tag1自己包括两个32位整数，因此占了8个字节的空间。tag2包括了两个32位的整数，外加一个char *的指针，因此占了12个字节。tag3才是真正看出char c[0]和char *c的区别，char c[0]中的c并非指针，是一个偏移量，这个偏移量指向的是a、b后面紧接着的空间，因此它其实并不占用任何空间。tag4更加补充说明了这一点。因此，上面的struct pppoe_tag的最后一个成员若是用char *tag_data定义，除了会占用多4个字节的指针变量外，用起来会比较不方便：flex

方法一，建立时，能够首先为struct pppoe_tag分配一块内存，再为tag_data分配内存，这样在释放时，要首先释放tag_data占用的内存，再释放pppoe_tag占用的内存；this

方法二，建立时，直接为struct pppoe_tag分配一块struct pppoe_tag大小加上tag_data的内存，从例一的420行能够看出，tag_data的内容要进行初始化，要让tag_data指向strct pppoe_tag后面的内存。
struct pppoe_tag {
    __u16 tag_type;
    __u16 tag_len;
    char *tag_data;
} __attribute ((packed));

struct pppoe_tag *sample_tag;
__u16 sample_tag_len = 10;
方法一：
sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag));
sample_tag->tag_len = sample_tag_len;
sample_tag->tag_data = malloc(sizeof(char)*sample_tag_len);
sample_tag->tag_data[0]=...
释放时：
free(sample_tag->tag_data);
free(sample_tag);

方法二：
sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag)+sizeof(char)*sample_tag_len);
sample_tag->tag_len = sample_tag_len;
sample_tag->tag_data = ((char *)sample_tag)+sizeof(struct pppoe_tag);
sample_tag->tag_data[0]=...
释放时：
free(sample_tag);
因此不管使用那种方法，都没有char tag_data[0]这样的定义来得方便。

讲了这么多，其实本质上涉及到的是一个C语言里面的数组和指针的区别问题（也就是咱们提到的内存管理问题，数组分配的是在结构体空间地址后一段连续的空间，而指针是在一个随机的空间分配的一段连续空间）。char a[1]里面的a和char *b的b相同吗？《Programming Abstractions in C》（Roberts, E. S.，机械工业出版社，2004.6）82页里面说：“arr is defined to be identical to &arr[0]”。也就是说，char a[1]里面的a实际是一个常量，等于&a[0]。而char *b是有一个实实在在的指针变量b存在。因此，a=b是不容许的，而b=a是容许的。两种变量都支持下标式的访问，那么对于a[0]和b[0]本质上是否有区别？咱们能够经过一个例子来讲明。

例二：
10  #include <stdio.h>
20  #include <stdlib.h>
30
40  int main()
50  {
60      char a[10];
70      char *b;
80
90      a[2]=0xfe;
100      b[2]=0xfe;
110      exit(0);
120  }

编译后，用objdump能够看到它的汇编：
080483f0 <main>:
80483f0:       55                      push   %ebp
80483f1:       89 e5                   mov    %esp,%ebp
80483f3:       83 ec 18                sub    $0x18,%esp
80483f6:       c6 45 f6 fe             movb   $0xfe,0xfffffff6(%ebp)
80483fa:       8b 45 f0                mov    0xfffffff0(%ebp),%eax
80483fd:       83 c0 02                add    $0x2,%eax
8048400:       c6 00 fe                movb   $0xfe,(%eax)
8048403:       83 c4 f4                add    $0xfffffff4,%esp
8048406:       6a 00                   push   $0x0
8048408:       e8 f3 fe ff ff          call   8048300 <_init+0x68>
804840d:       83 c4 10                add    $0x10,%esp
8048410:       c9                      leave
8048411:       c3                      ret
8048412:       8d b4 26 00 00 00 00    lea    0x0(%esi,1),%esi
8048419:       8d bc 27 00 00 00 00    lea    0x0(%edi,1),%edi

能够看出，a[2]＝0xfe是直接寻址，直接将0xfe写入&a[0]+2的地址，而b[2]=0xfe是间接寻址，先将b的内容（地址）拿出来，加2，再0xfe写入计算出来的地址。因此a[0]和b[0]本质上是不一样的。

但当数组做为参数时，和指针就没有区别了。
int do1(char a[],int len);
int do2(char *a,int len);
这两个函数中的a并没有任何区别。都是实实在在存在的指针变量。

顺便再说一下，对于struct pppoe_tag的最后一个成员的定义是char tag_data[0]，某些编译器不支持长度为0的数组的定义，在这种状况下，只能将它定义成char tag_data[1]，使用方法相同。

在openoffice的源代码中看到以下数据结构，是一个unicode字符串结构，他的最后就用长度为1数组，多是为了兼容或者跨编译器。

typedef struct _rtl_uString
{
    sal_Int32       refCount;
    sal_Int32       length;
    sal_Unicode     buffer[1];
} rtl_uString;
这是不定长字符串。大概意思是这样：

rtl_uString * str = malloc(256);
str->length = 256;
str->buffer如今就指向一个长度为256 - 8的缓冲区

总结：经过上面的转载的文章，能够清晰的发现，这种方法的优点其实就是为了简化内存的管理，咱们假设在理想的内存状态下，那么分配的内存空间，能够是按序下来的（固然，实际由于内存碎片等的缘由会不一样的）咱们能够利用最后一个数组的指针直接无间隔的跳到分配的数组缓冲区，这在LINUX下很是常见，在WINDOWS下的我只是在MFC里见过相似的，别的状况下记不清楚了，只记得MFC里的是这么讲的，能够用分配的结构体的指针（this）直接+1（详细的方法请看个人博客:CE分类里的：内存池技术的应用和详细说明），就跳到实际的内存空间，当初也是想了半天，因此说，不少东西看似很复杂，其实都是基础的东西，要好好打实基础，这才是万丈高楼拔地巍峨的前提和保障，学习亦是如是，切忌好高骛远，应该脚踏实地，一步一步的向前走，并且要不时的总结本身的心得和体会，理论和实践不断的相互印证，才可以走得更远，看到更美丽的风景。

最后，再次感谢网上无私共享的童鞋们！！！

柔性数组结构成员收藏
【柔性数组结构成员
C99中，结构中的最后一个元素容许是未知大小的数组，这就叫作柔性数组成员，但结构中的柔性数组成员前面必须至少一个其他成员。柔性数组成员容许结构中包含一个大小可变的数组。sizeof返回的这种结构大小不包括柔性数组的内存。包含柔性数组成员的结构用malloc ()函数进行内存的动态分配，而且分配的内存应该大于结构的大小，以适应柔性数组的预期大小。】
C语言大全，“柔性数组成员”

【柔性数组结构成员
C99中，结构中的最后一个元素容许是未知大小的数组，这就叫作柔性数组成员，但结构中的柔性数组成员前面必须至少一个其他成员。柔性数组成员容许结构中包含一个大小可变的数组。sizeof返回的这种结构大小不包括柔性数组的内存。包含柔性数组成员的结构用malloc ()函数进行内存的动态分配，而且分配的内存应该大于结构的大小，以适应柔性数组的预期大小。】
C语言大全，“柔性数组成员”

看看 C99 标准中灵活数组成员：

结构体变长的妙用——0个元素的数组
有时咱们须要产生一个结构体，实现了一种可变长度的结构。如何来实现呢？
看这个结构体的定义：
typedef struct st_type
{
int nCnt;
int item[0];
}type_a;
（有些编译器会报错没法编译能够改为：）
typedef struct st_type
{
int nCnt;
int item[];
}type_a;
这样咱们就能够定义一个可变长的结构，用sizeof(type_a)获得的只有4，就是sizeof(nCnt)=sizeof(int)那

个0个元素的数组没有占用空间，然后咱们能够进行变长操做了。
C语言版：
type_a *p = (type_a*)malloc(sizeof(type_a)+100*sizeof(int));
C++语言版:
type_a *p = (type_a*)new char[sizeof(type_a)+100*sizeof(int)];
这样咱们就产生了一个长为100的type_a类型的东西用p->item[n]就能简单地访问可变长元素，原理十分简单

，分配了比sizeof(type_a)多的内存后int item[];就有了其意义了，它指向的是int nCnt;后面的内容，是没

有内存须要的，而在分配时多分配的内存就能够由其来操控，是个十分好用的技巧。
而释放一样简单：
C语言版：
free(p);
C++语言版：
delete []p;
其实这个叫灵活数组成员(fleible array member)C89不支持这种东西，C99把它做为一种特例加入了标准。但

是，C99所支持的是incomplete type，而不是zero array，形同int item[0];这种形式是非法的，C99支持的

形式是形同int item[];只不过有些编译器把int item[0];做为非标准扩展来支持，并且在C99发布以前已经有

了这种非标准扩展了，C99发布以后，有些编译器把二者合而为一。
下面是C99中的相关内容：
6.7.2.1 Structure and union specifiers

As a special case, the last element of a structure with more than one named member may have

an incomplete array type; this is called a flexible array member. With two exceptions, the

flexible array member is ignored. First, the size of the structure shall be equal to the offset

of the last element of an otherwise identical structure that replaces the flexible array member

with an array of unspecified length.106) Second, when a . (or ->) operator has a left operand

that is (a pointer to) a structure with a flexible array member and the right operand names that

member, it behaves as if that member were replaced with the longest array (with the same element

type) that would not make the structure larger than the object being accessed; the offset of the

array shall remain that of the flexible array member, even if this would differ from that of the

replacement array. If this array would have no elements, it behaves as if it had one element but

the behavior is undefined if any attempt is made to access that element or to generate a pointer

one past it.
例如在VC++6里使用二者之一都能经过编译而且完成操做，而会产生warning C4200: nonstandard extension

used : zero-sized array in struct/union的警告消息。而在DEVCPP里二者一样可使用，而且不会有警告消息