Everything you never wanted to know about OpenGL Vertex Buffer Objects but were forced to learn anyway
Posted: Fri May 07, 2021 12:37 am
(Edit 2022-05-22: This post is wrong if you care about good performance on AMD's OpenGL. Read my followup to understand why.)
So I'm in the process of removing all the OpenGL stuff MXS uses that is deprecated/removed in OpenGL 3.1 / OpenGL ES and I figured you guys might want to suffer along with me. First up is the immediate mode drawing API.
If you don't know OpenGL, the immediate mode API is a really easy to use way to draw various types of polygon. Here's an example of how you'd draw a square:
For a comparison, here's the more complicated, equivalent code using vertex buffer objects (abbreviated VBO henceforth):
Obviously, it'd be a huge pain to do this every time you want to draw something simple like a square so I want to emulate the old API using the new API. It'd also be nice if it weren't actually slower than the original code as well.
To start I needed a test program to make sure everything works and also to benchmark with. I chose "glutplane" which is a glut example program that draws paper planes. I modified it to be more useful for benchmarking (output FPS, more planes and start with moving planes). Here's the resulting listing:
This gets 454 FPS on my system. My goal is to at least not be slower than that.
First up is the basic VBO implementation. I added this code at the beginning of the listing after the includes and added a call to init_fake_immediate() in main() after the glut setup code.
The result - 449 FPS. This isn't too bad considering the code is pretty bad. It uses glBufferData() for every glBegin/glEnd pair which forces the driver to waste a lot of time managing lots of tiny buffers since glBufferData() allocates a new buffer every time it's called.
For the next attempt we want to use glBufferSubData to update a single buffer so it doesn't have to keep allocating new buffers.
First the initialization code is changed to allocate the buffer:
Next the glBufferData() call in mxglEnd() is replaced with the glBufferSubData() call:
Test that out and we get 277 FPS. Oh no, we're getting even slower! The problem now is that while we aren't creating thousands of tiny buffers, our one buffer effectively forces the GPU to render only one polygon at a time.
At a minimum, we can't keep overwriting the first elements of the array. We want to rotate through the entire array so the GPU can work on one part of the array while we update it somewhere else.
So a "g_begin" variable is added that points to the next free spot in the array. When we hit the end of the buffer we move the current primitive to the front and reset "g_begin".
This gets, er, 276 FPS. OK, this is not going well. I don't know why this isn't faster. But we have one more trick up our sleeve, we can "orphan" the buffer when it gets full. You do this by calling glBufferData() with the same parameters. This tells OpenGL you can't modify the orphaned buffer anymore so it can do what it wants with it.
So we change the buffer overflow code to look like this and we're good to go:
It now gets 554 FPS and I've had enough!
So I'm in the process of removing all the OpenGL stuff MXS uses that is deprecated/removed in OpenGL 3.1 / OpenGL ES and I figured you guys might want to suffer along with me. First up is the immediate mode drawing API.
If you don't know OpenGL, the immediate mode API is a really easy to use way to draw various types of polygon. Here's an example of how you'd draw a square:
Code: Select all
glBegin(GL_QUADS);
glVertex3f(-1.0, 1.0, 1.0);
glVertex3f(-1.0, -1.0, 1.0);
glVertex3f(1.0, -1.0, 1.0);
glVertex3f(1.0, 1.0, 1.0);
glEnd();
Code: Select all
GLuint vbo = 0;
GLfloat data[16] = {
-1.0, 1.0, 1.0, 1.0,
-1.0, -1.0, 1.0, 1.0,
1.0, -1.0, 1.0, 1.0,
1.0, 1.0, 1.0, 1.0,
};
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glVertexPointer(4, GL_FLOAT, sizeof(GLfloat) * 4, 0);
glEnableClientState(GL_VERTEX_ARRAY);
glBufferData(GL_ARRAY_BUFFER, sizeof(data), data, GL_STREAM_DRAW);
glDrawArrays(GL_QUADS, 0, 4);
glDisableClientState(GL_VERTEX_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);
To start I needed a test program to make sure everything works and also to benchmark with. I chose "glutplane" which is a glut example program that draws paper planes. I modified it to be more useful for benchmarking (output FPS, more planes and start with moving planes). Here's the resulting listing:
Code: Select all
/* Copyright (c) Mark J. Kilgard, 1994. */
/* This program is freely distributable without licensing fees
and is provided without guarantee or warrantee expressed or
implied. This program is -not- in the public domain. */
#include <stdlib.h>
#include <stdio.h>
#ifndef WIN32
#include <unistd.h>
#else
#define random rand
#define srandom srand
#endif
#include <math.h>
#include <time.h>
#define GL_GLEXT_PROTOTYPES
#include <GL/glcorearb.h>
#include <GL/glut.h>
/* Some <math.h> files do not define M_PI... */
#ifndef M_PI
#define M_PI 3.14159265
#endif
#ifndef M_PI_2
#define M_PI_2 1.57079632
#endif
GLboolean moving = GL_TRUE;
#define MAX_PLANES (1024*64)
struct {
float speed; /* zero speed means not flying */
GLfloat red, green, blue;
float theta;
float x, y, z, angle;
} planes[MAX_PLANES];
#define v3f glVertex3f /* v3f was the short IRIS GL name for
glVertex3f */
void
draw(void)
{
GLfloat red, green, blue;
int i;
glClear(GL_DEPTH_BUFFER_BIT);
/* paint black to blue smooth shaded polygon for background */
glDisable(GL_DEPTH_TEST);
glShadeModel(GL_SMOOTH);
glBegin(GL_POLYGON);
glColor3f(0.0, 0.0, 0.0);
v3f(-20, 20, -19);
v3f(20, 20, -19);
glColor3f(0.0, 0.0, 1.0);
v3f(20, -20, -19);
v3f(-20, -20, -19);
glEnd();
/* paint planes */
glEnable(GL_DEPTH_TEST);
glShadeModel(GL_FLAT);
for (i = 0; i < MAX_PLANES; i++)
if (planes[i].speed != 0.0) {
glPushMatrix();
glTranslatef(planes[i].x, planes[i].y, planes[i].z);
glRotatef(290.0, 1.0, 0.0, 0.0);
glRotatef(planes[i].angle, 0.0, 0.0, 1.0);
glScalef(1.0 / 3.0, 1.0 / 4.0, 1.0 / 4.0);
glTranslatef(0.0, -4.0, -1.5);
glBegin(GL_TRIANGLE_STRIP);
/* left wing */
v3f(-7.0, 0.0, 2.0);
v3f(-1.0, 0.0, 3.0);
glColor3f(red = planes[i].red, green = planes[i].green,
blue = planes[i].blue);
v3f(-1.0, 7.0, 3.0);
/* left side */
glColor3f(0.6 * red, 0.6 * green, 0.6 * blue);
v3f(0.0, 0.0, 0.0);
v3f(0.0, 8.0, 0.0);
/* right side */
v3f(1.0, 0.0, 3.0);
v3f(1.0, 7.0, 3.0);
/* final tip of right wing */
glColor3f(red, green, blue);
v3f(7.0, 0.0, 2.0);
glEnd();
glPopMatrix();
}
glutSwapBuffers();
}
void
tick_per_plane(int i)
{
float theta = planes[i].theta += planes[i].speed;
planes[i].z = -9 + 4 * cos(theta);
planes[i].x = 4 * sin(2 * theta);
planes[i].y = sin(theta / 3.4) * 3;
planes[i].angle = ((atan(2.0) + M_PI_2) * sin(theta) - M_PI_2) * 180 / M_PI;
if (planes[i].speed < 0.0)
planes[i].angle += 180;
}
void
add_plane(void)
{
int i;
for (i = 0; i < MAX_PLANES; i++)
if (planes[i].speed == 0) {
#define SET_COLOR(r,g,b) \
planes[i].red=r; planes[i].green=g; planes[i].blue=b;
switch (random() % 6) {
case 0:
SET_COLOR(1.0, 0.0, 0.0); /* red */
break;
case 1:
SET_COLOR(1.0, 1.0, 1.0); /* white */
break;
case 2:
SET_COLOR(0.0, 1.0, 0.0); /* green */
break;
case 3:
SET_COLOR(1.0, 0.0, 1.0); /* magenta */
break;
case 4:
SET_COLOR(1.0, 1.0, 0.0); /* yellow */
break;
case 5:
SET_COLOR(0.0, 1.0, 1.0); /* cyan */
break;
}
planes[i].speed = ((float) (random() % 20)) * 0.001 + 0.02;
if (random() & 0x1)
planes[i].speed *= -1;
planes[i].theta = ((float) (random() % 257)) * 0.1111;
tick_per_plane(i);
if (!moving)
glutPostRedisplay();
return;
}
}
void
remove_plane(void)
{
int i;
for (i = MAX_PLANES - 1; i >= 0; i--)
if (planes[i].speed != 0) {
planes[i].speed = 0;
if (!moving)
glutPostRedisplay();
return;
}
}
void
tick(void)
{
int i;
for (i = 0; i < MAX_PLANES; i++)
if (planes[i].speed != 0.0)
tick_per_plane(i);
}
static time_t g_seconds = 0;
static int g_frames = 0;
void
animate(void)
{
time_t t = time(NULL);
tick();
glutPostRedisplay();
if (g_seconds != 0 && g_seconds != t) {
printf("%d frames in %ld second(s)\n", g_frames, (long)t - (long)g_seconds);
g_frames = 0;
}
g_seconds = t;
g_frames++;
}
void
visible(int state)
{
if (state == GLUT_VISIBLE) {
if (moving)
glutIdleFunc(animate);
} else {
if (moving)
glutIdleFunc(NULL);
}
}
/* ARGSUSED1 */
void
keyboard(unsigned char ch, int x, int y)
{
switch (ch) {
case ' ':
if (!moving) {
tick();
glutPostRedisplay();
}
break;
case 27: /* ESC */
exit(0);
break;
}
}
#define ADD_PLANE 1
#define REMOVE_PLANE 2
#define MOTION_ON 3
#define MOTION_OFF 4
#define QUIT 5
void
menu(int item)
{
switch (item) {
case ADD_PLANE:
add_plane();
break;
case REMOVE_PLANE:
remove_plane();
break;
case MOTION_ON:
moving = GL_TRUE;
glutChangeToMenuEntry(3, "Motion off", MOTION_OFF);
glutIdleFunc(animate);
break;
case MOTION_OFF:
moving = GL_FALSE;
glutChangeToMenuEntry(3, "Motion", MOTION_ON);
glutIdleFunc(NULL);
break;
case QUIT:
exit(0);
break;
}
}
int
main(int argc, char *argv[])
{
int i;
glutInit(&argc, argv);
/* use multisampling if available */
glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH | GLUT_MULTISAMPLE);
glutCreateWindow("glutplane");
glutDisplayFunc(draw);
glutKeyboardFunc(keyboard);
glutVisibilityFunc(visible);
glutCreateMenu(menu);
glutAddMenuEntry("Add plane", ADD_PLANE);
glutAddMenuEntry("Remove plane", REMOVE_PLANE);
glutAddMenuEntry("Motion off", MOTION_OFF);
glutAddMenuEntry("Quit", QUIT);
glutAttachMenu(GLUT_RIGHT_BUTTON);
/* setup OpenGL state */
glClearDepth(1.0);
glClearColor(0.0, 0.0, 0.0, 0.0);
glMatrixMode(GL_PROJECTION);
glFrustum(-1.0, 1.0, -1.0, 1.0, 1.0, 20);
glMatrixMode(GL_MODELVIEW);
/* add three initial random planes */
srandom(getpid());
for (i = 0; i < 1024; i++) add_plane();
/* start event processing */
glutMainLoop();
return 0; /* ANSI C requires main to return int. */
}
First up is the basic VBO implementation. I added this code at the beginning of the listing after the includes and added a call to init_fake_immediate() in main() after the glut setup code.
Code: Select all
#define IMM_CHUNK (1024*16)
#define VERT_ELT 12
static GLuint g_vbo = 0;
static GLenum g_mode = 0;
static int g_count = 0;
static GLfloat g_data[IMM_CHUNK * VERT_ELT];
static GLfloat g_texcoord[] = { 0.0, 0.0, 0.0, 1.0 };
static GLfloat g_color[] = { 1.0, 1.0, 1.0, 1.0 };
int
init_fake_immediate(void)
{
glGenBuffers(1, &g_vbo);
}
void
mxglBegin(GLenum mode)
{
g_mode = mode;
g_count = 0;
}
void
mxglColor3f(GLfloat r, GLfloat g, GLfloat b)
{
g_color[0] = r;
g_color[1] = g;
g_color[2] = b;
}
void
mxglVertex3f(GLfloat x, GLfloat y, GLfloat z)
{
int i;
if (g_count == IMM_CHUNK)
return;
i = g_count * VERT_ELT;
g_data[i++] = x;
g_data[i++] = y;
g_data[i++] = z;
g_data[i++] = 1.0;
g_data[i++] = g_texcoord[0];
g_data[i++] = g_texcoord[1];
g_data[i++] = g_texcoord[2];
g_data[i++] = g_texcoord[3];
g_data[i++] = g_color[0];
g_data[i++] = g_color[1];
g_data[i++] = g_color[2];
g_data[i++] = g_color[3];
g_count++;
}
void
mxglEnd(void)
{
glBindBuffer(GL_ARRAY_BUFFER, g_vbo);
glVertexPointer(4, GL_FLOAT, sizeof(GLfloat) * VERT_ELT, (char *)(sizeof(GLfloat) * 0));
glTexCoordPointer(4, GL_FLOAT, sizeof(GLfloat) * VERT_ELT, (char *)(sizeof(GLfloat) * 4));
glColorPointer(4, GL_FLOAT, sizeof(GLfloat) * VERT_ELT, (char *)(sizeof(GLfloat) * 8));
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * VERT_ELT * g_count, g_data, GL_STREAM_DRAW);
glDrawArrays(g_mode, 0, g_count);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
#if 1
#define glBegin mxglBegin
#define glEnd mxglEnd
#define glColor3f mxglColor3f
#define glVertex3f mxglVertex3f
#endif
For the next attempt we want to use glBufferSubData to update a single buffer so it doesn't have to keep allocating new buffers.
First the initialization code is changed to allocate the buffer:
Code: Select all
int
init_fake_immediate(void)
{
glGenBuffers(1, &g_vbo);
glBindBuffer(GL_ARRAY_BUFFER, g_vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * VERT_ELT * IMM_CHUNK, NULL, GL_DYNAMIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
Code: Select all
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(GLfloat) * VERT_ELT * g_count, g_data);
At a minimum, we can't keep overwriting the first elements of the array. We want to rotate through the entire array so the GPU can work on one part of the array while we update it somewhere else.
So a "g_begin" variable is added that points to the next free spot in the array. When we hit the end of the buffer we move the current primitive to the front and reset "g_begin".
Code: Select all
#define IMM_CHUNK (1024*16)
#define VERT_ELT 12
static GLuint g_vbo = 0;
static GLenum g_mode = 0;
static int g_begin = 0;
static int g_count = 0;
static GLfloat g_data[IMM_CHUNK * VERT_ELT];
static GLfloat g_texcoord[] = { 0.0, 0.0, 0.0, 1.0 };
static GLfloat g_color[] = { 1.0, 1.0, 1.0, 1.0 };
int
init_fake_immediate(void)
{
glGenBuffers(1, &g_vbo);
glBindBuffer(GL_ARRAY_BUFFER, g_vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * VERT_ELT * IMM_CHUNK, NULL, GL_DYNAMIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
void
mxglBegin(GLenum mode)
{
g_mode = mode;
g_begin += g_count;
g_count = 0;
}
void
mxglColor3f(GLfloat r, GLfloat g, GLfloat b)
{
g_color[0] = r;
g_color[1] = g;
g_color[2] = b;
}
static void
move_to_beginning(void)
{
int i;
for (i = 0; i < g_count * VERT_ELT; i++)
g_data[i] = g_data[i + g_begin * VERT_ELT];
g_begin = 0;
}
void
mxglVertex3f(GLfloat x, GLfloat y, GLfloat z)
{
int i;
if (g_count == IMM_CHUNK)
return;
if (g_begin + g_count == IMM_CHUNK)
move_to_beginning();
i = (g_begin + g_count) * VERT_ELT;
g_data[i++] = x;
g_data[i++] = y;
g_data[i++] = z;
g_data[i++] = 1.0;
g_data[i++] = g_texcoord[0];
g_data[i++] = g_texcoord[1];
g_data[i++] = g_texcoord[2];
g_data[i++] = g_texcoord[3];
g_data[i++] = g_color[0];
g_data[i++] = g_color[1];
g_data[i++] = g_color[2];
g_data[i++] = g_color[3];
g_count++;
}
void
mxglEnd(void)
{
glBindBuffer(GL_ARRAY_BUFFER, g_vbo);
glVertexPointer(4, GL_FLOAT, sizeof(GLfloat) * VERT_ELT, (char *)(sizeof(GLfloat) * 0));
glTexCoordPointer(4, GL_FLOAT, sizeof(GLfloat) * VERT_ELT, (char *)(sizeof(GLfloat) * 4));
glColorPointer(4, GL_FLOAT, sizeof(GLfloat) * VERT_ELT, (char *)(sizeof(GLfloat) * 8));
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
glBufferSubData(GL_ARRAY_BUFFER, sizeof(GLfloat) * VERT_ELT * g_begin, sizeof(GLfloat) * VERT_ELT * g_count, g_data + VERT_ELT * g_begin);
glDrawArrays(g_mode, g_begin, g_count);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
#if 1
#define glBegin mxglBegin
#define glEnd mxglEnd
#define glColor3f mxglColor3f
#define glVertex3f mxglVertex3f
#endif
So we change the buffer overflow code to look like this and we're good to go:
Code: Select all
static void
orphan_vbo(void)
{
glBindBuffer(GL_ARRAY_BUFFER, g_vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(GLfloat) * VERT_ELT * IMM_CHUNK, NULL, GL_DYNAMIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
void
mxglVertex3f(GLfloat x, GLfloat y, GLfloat z)
{
int i;
if (g_count == IMM_CHUNK)
return;
if (g_begin + g_count == IMM_CHUNK) {
orphan_vbo();
move_to_beginning();
}