I didn't think of this when I replied, but I think combining textures into a super-texture is a common trick exactly to avoid texture changes. It is of course especially useful for those super-lo-res textures in a minecraft-like game (assuming you use similar 8x8 or 16x16 textures like minecraft itself).
The only thing you'd have to watch out for is mipmap bleeding, but with 8x8 textures you probably don't use mipmaps (scaled-down versions of the texture to use for polygons in the distance).
I have honestly never tried mipmapping, sounds like a cool feature. But again as you stated for theese low res textures, I also thought it wouldn't matter. The textures I use are a total ripoff from minecraft, and they are 16x16.
Not only for totally surrounded cubes. Remember that each individual cube face can be skipped if there is an adjacent cube for that face, even if the other faces are visible.
As a simple trick, you might store a bit field with 6 bits (using type "byte") for each cube, where each bit tells whether the corresponding face is visible. This bit is pre-computed from whether there is an adjacent cube in the corresponding direction. You'll have to update these bits when placing and removing cubes. When rendering the cube, you just check the bits and skip faces accordingly. This should speed things up (and if it doesn't, you have another indication towards where the performance bottleneck is -- probably loading cube data from RAM).
This is, however, a good starter. Move placing/deleting cubes into a separate method that also updates the bits, so you have the complex work at one place in the code. When testing the bits for each cube becomes too slow, you'll have to move to more complicated data structures, and the updates to this structure required to place or remove cubes become more complicated too, so it's good to have them in one place in the code.
More complex data structures will then allow you to skip multiple cubes in a single test. As an example, you might choose super-cubes of 16^3 or 32^3 (should be at least as large as the VBOs to simplify things). For each super-cube, store the number of filled cubes at startup, and update it when placing or removing cubes. When rendering, if the number is 0, skip the whole super-cube. If the number is equal to the size of the super-cube, at least you'll only have to render the outer cubes (and check face bits as before). You'll be able to skip all the empty air in a few steps.
My vbo's currently store 16x16x16 cubes. roughly 4096 in total, each cube has 6 faces, each face has 4 vertices (obviously :p). They also have 6*4 texture coords, and 6 normal coords.
So the amount of data would be:
Vertices
3*4*6 = 72
Tex coords
2*4*6 = 48
Normals
3*6 = 18
Summing up 72+48+18 = 138
each float is four bytes. making 138 * 4 = 552 bytes from each cube. 2260992bytes for each vbo. Sounds like alot ;S
Do you have any tips for reducing this massive amount of data?
And to respond to your idea about the facing bits, I love it. That way I can easily determine whether the face should be drawn or not.
Edit:
Added a method to remove the faces that you don't see, but that is only for each chunk.

In this image you can clearly see the boundaries of each chunk. Due to the fact that the blocks that is facing another chunk has it's outer sides rendered, and as you can see when the two faces clashes it makes a border :p
Well, it's an easy task to deal with, to remove thoose faces too. But now I am only rendering necessary faces, and I can render alot of blocks. Though I have to test whether TONS AND TONS of blocks will lagg.
They shouldn't lagg since they are not processed nor are they rendered.
Another Edit:
I am still rendering blocks that are behind me, which is not necessary, and I am not totally sure on howto estimate the field of vision. If I could interprete the field of vision I could easily remove blocks that is not visible.