OpenGL超级宝典笔记——性能比较

时间 2019-11-10

原文原文链接

本文经过包含许多顶点数据的复杂模型来比较使用glBegin()/glEnd当即模式，显示列表，以及顶点索引数组的性能与内存。 F-16 Thunderbird的飞机模型有3704个独立的三角形，经过Deep Exporation工具的索引模式编制后，共有1898个独立的顶点，2716个法线，2925个纹理坐标。 <a href="http://static.oschina.net/uploads/img/201312/22132527_DQPt.png"><img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://static.oschina.net/uploads/img/201312/22132527_VHTe.png" width="469" height="369"></a> 下面代码展现DrawBody函数，经过遍历索引来为每个独立的三角形设置并发送纹理，法线和顶点坐标。<pre class="prettyprint">void DrawBody(void) { int iFace, iPoint; glBegin(GL_TRIANGLES); for(iFace = 0; iFace < 3074; iFace++) //遍历每个三角形 for(iPoint = 0; iPoint < 3; iPoint++) //每个顶点 { //设置纹理 glTexCoord2fv(textures[face_indices[iFace][iPoint+6]]); //设置法线 glNormal3fv(normals[face_indices[iFace][iPoint+3]]);git

//设置顶点
			glVertex3fv(vertices[face_indices[iFace][iPoint]]);
		}
glEnd();

}github

</pre> 当你必须优化模型的存储空间时，这种方法是能够的。例如你在嵌入式中须要节省内存，或者在网络上传输时须要减小流量。但在实时应用中，这种方法的性能很是差，由于每一次都向OpenGL发送一个顶点数据，函数调用的次数也特别多。 显而易见的加速这些代码执行的速度就是使用显示列表的方式。咱们把这些代码放到显示列表中。 glNewList(bodyList, GL_COMPILE);         DrawBody();     glEndList(); …. glCallList(bodyList); 下面咱们来对比一下显示列表的方式和顶点索引数组的方式。 <h4>计算花费</h4> 首先计算一下这些通过包装过的顶点数据的所须要的内存。<pre class="prettyprint">// Thunderbird body extern short face_indicies[3704][9]; extern GLfloat vertices [1898][3]; extern GLfloat normals [2716][3]; extern GLfloat textures [2925][2]; </pre> 其中face_indicies包含了顶点，法线，和纹理的索引 short face_indicies[3704][9] = {{6,8,7 ,0,1,2 ,0,1,2 }, {6,9,8 ,0,3,1 ,0,3,1 }, {10,8,11 ,4,1,5 ,4,1,5 }....} face_indicies 须要 3074*9*sizeof(short), 55332字节.相似地计算出vertices要22776字节，normals 32592字节。textures 23400字节。总计134100 约130KB. 在显示列表中，咱们须要把这些数据拷贝一份到显示列表（显示列表中的命令和数据会通过优化后，放到命令缓冲区或者图形硬件中）咱们无法计算显示列表具体使用多少内存，但能够对顶点的数据进行估算。每一个三角形须要3个顶点，3个法线，2个纹理坐标，这些都是浮点数。假设sizeof(float)为4个字节。那么： 3704*3=11112个顶点。每一个顶点包含3个成分（x,y,z)因此有11112*3=33336个浮点数值，同理法线有33336个浮点值，纹理坐标有22224个浮点值。把这些加起来再乘以4个字节为335,584个字节。那么加上以前的原始数据有469684个字节约460kb，不到0.5M.可是咱们有11,112个顶点数据须要通过OpenGL的变换管道，这里面包含了许多矩阵运算。 <h4>建立合理的顶点索引数组</h4> 上面所存储的数据还不能直接用于OpenGL的顶点数组。由于OpenGL要求顶点数组，法线数组和纹理坐标数组必须是一样的大小，这样数组的遍历方式才能保持一致。顶点数组的第0个元素和法线数组的第0个元素是对应的。对于索引数组也有一样的要求。 在下面的例子中，咱们使用一个类来处理现有的数组，并为其创建索引。下面是处理机身和玻璃座舱盖并创建索引的代码：<pre class="prettyprint">CTriangleMesh thunderBirdBody; CTriangleMesh thunderBirdGalss;数组

//临时空间 M3DVector3f vVerts[3]; M3DVector3f vNorms[3]; M3DVector2f vTex[3];网络

//开始收集机身的网格，设置最大值 thunderBirdBody.BeginMesh(3074*3);并发

//循环全部面 for(int iFace = 0; iFace < 3074; iFace++) { for(int iPoint = 0; iPoint < 3; iPoint++) { memcpy(&vVerts[iPoint][0], &vertices[face_indices[iFace][iPoint][0]], sizeof(M3DVector3f));函数

memcpy(&amp;vNorms[iPoint][0], &amp;normals[face_indices[iFace][iPoint+3][0]], sizeof(M3DVector3f));

	memcpy(&amp;vTex[iPoint][0], &amp;textures[face_indices[iFace][iPoint+6][0]], sizeof(M2DVector2f));
}
thunderBirdBody.AddTriangle(vVerts, vNorms, vTex);

} //结束，并缩放顶点的值，以便屏幕的显示。 thunderBirdBody.EndMesh(); thunderBirdBody.Scale(fScale);工具

thunderBirdGlass.BeginMesh(352*3);性能

for(int iFace = 0; iFace < 352; iFace++) {优化

for(int iPoint = 0; iPoint &lt; 3; iPoint++)
{
		memcpy(&amp;vVerts[iPoint][0], &amp;verticesGlass[face_indiciesGlass[iFace][iPoint]][0], sizeof(M3DVector3f));
		memcpy(&amp;vNorms[iPoint][0], &amp;normalsGlass[face_indiciesGlass[iFace][iPoint+3]][0], sizeof(M3DVector3f)); 
		memcpy(&amp;vTex[iPoint][0], &amp;texturesGlass[face_indiciesGlass[iFace][iPoint+6]][0], sizeof(M3DVector2f));
}

thunderBirdGlass.AddTriangle(vVerts, vNorms, vTex); }ui

thunderBirdGlass.EndMesh(); thunderBirdGlass.Scale(fScale);

</pre> 首先，咱们声明了两个三角形网格类 CTriangleMesh thunderBirdBody; CTriangleMesh thunderBirdGalss; 而后咱们要告诉包含全部顶点所须要的大小的最大值，在最坏的状况下咱们可能有3074个惟一的顶点，但通常状况下，许多顶点是共享的，值是同样的。 thunderBirdBody.BeginMesh(3074*3); 而后遍历集体全部的面，并收集每个独立的三角形，并做为AddTriangle的参数，AddTriagnle会组织索引数组。在AddTriangle函数中把传进来的参数与以前的顶点数据进行比较看是否有重复的。若是是重复的在索引数组中就用同一个索引值。其内部处理代码：<pre class="prettyprint"> for(GLuint iVertex = 0; iVertex < 3; iVertex++) { GLuint iMatch = 0; for(iMatch = 0; iMatch < nNumVerts; iMatch++) { // If the vertex positions are the same if(m3dCloseEnough(pVerts[iMatch][0], verts[iVertex][0], e) && m3dCloseEnough(pVerts[iMatch][1], verts[iVertex][1], e) && m3dCloseEnough(pVerts[iMatch][2], verts[iVertex][2], e) &&

// AND the Normal is the same...
           m3dCloseEnough(pNorms[iMatch][0], vNorms[iVertex][0], e) &amp;&amp;
           m3dCloseEnough(pNorms[iMatch][1], vNorms[iVertex][1], e) &amp;&amp;
           m3dCloseEnough(pNorms[iMatch][2], vNorms[iVertex][2], e) &amp;&amp;
               
            // And Texture is the same...
            m3dCloseEnough(pTexCoords[iMatch][0], vTexCoords[iVertex][0], e) &amp;&amp;
            m3dCloseEnough(pTexCoords[iMatch][1], vTexCoords[iVertex][1], e))
            {
            // Then add the index only
            pIndexes[nNumIndexes] = iMatch;
            nNumIndexes++;
            break;
            }
        }
        
    // No match for this vertex, add to end of list
    if(iMatch == nNumVerts)
        {
        memcpy(pVerts[nNumVerts], verts[iVertex], sizeof(M3DVector3f));
        memcpy(pNorms[nNumVerts], vNorms[iVertex], sizeof(M3DVector3f));
        memcpy(pTexCoords[nNumVerts], &amp;vTexCoords[iVertex], sizeof(M3DVector2f));
        pIndexes[nNumIndexes] = nNumVerts;
        nNumIndexes++; 
        nNumVerts++;
        }   
    }

</pre> <h4>比较开销</h4> 如今让咱们来比较这三种渲染模型的方式的开销。在CTriangleMesh类的统计中，Thunderbird的机身模型中共有3265个惟一的顶点（包含法线和纹理坐标）和11,112个索引。每一个顶点和法线包含3个浮点值，纹理坐标包含两个浮点值。因此 3265*8=26120个浮点值。再乘以4有104,480字节，再加上使用short类型建立的索引数组11,112*2=22,224字节。总共有126,704个字节，约124kb 对比表格: <table border="2" cellspacing="0" cellpadding="2" width="600"> <tbody> <tr> <td valign="top" width="200">渲染模式</td> <td valign="top" width="200">内存使用量</td> <td valign="top" width="200">须要变换的顶点格式</td></tr> <tr> <td valign="top" width="200">当即模式</td> <td valign="top" width="200">约130kb</td> <td valign="top" width="200">11,112</td></tr> <tr> <td valign="top" width="200">显示列表</td> <td valign="top" width="200">约460kb</td> <td valign="top" width="200">11,112</td></tr> <tr> <td valign="top" width="200">顶点索引数组</td> <td valign="top" width="200">约124kb</td> <td valign="top" width="200">3,265</td></tr></tbody></table> 从上面的表格能够看出，顶点索引数组不但使用了更少的内存，并且仅仅须要处理其余模式三分之一的顶点。若是模型有许多尖锐的角或边那么共享的顶点数就较少，若是模型是平滑的表面那么共享的顶点数就多。使用顶点索引数组的方式可以极大的提高性能。 源代码：<a href="https://github.com/sweetdark/openglex/tree/master/thunderbird">https://github.com/sweetdark/openglex/tree/master/thunderbird</a>