I mostly understood.
What I do not really like about your Idea is that , afaiu your method limits our possibilities for effects a bit. What we want to add (later in development) are 3d-like effects. I am thinking about small camera-pans if you change direction and probably splatter-effects (no humans) xD. The cam pans are almost impossble if we go 2d.
Can you reckon how much work it is to code a simple 3d display-method with animations. Or where to get such a method, we can slighly alter and then use.
Making a 3D game happen means implementing a camera, realizing texture mapping onto polygon soups, coding the lighting and additional effects like environment mapping or shadows, building a reasonable physics system, in your case adding a keyframe animations system, thus implementing bezier curves and possibly doing animation blending so that going into a new animation happens smoothly, coding the importer.. And all the stuff I forgot, of course.
It's not something that codes itself, but it's not hard at all. Many of the topics are "solved", so you just have to do research, meaning reading papers, howtos etc.
A problem is that such a codebase can get rather large and complicated, so you would have to design ahead.
And there are already engines out there that can do it all for you, you just have to read the manuals and code accordingly; see above.
As for material on animation, I recommend browsing gamedev.net, checking out wikipedia, figuring out how Blender does it and writing prototypes.
There are diverse pitfalls for beginners, like rotations, optimizations, synchronization of events, designing metastructures, .. I can't spell everything out for you. Engine development is a rather large topic.
What I can tell you in any case that 3D animation isn't just a matter of a chunk of code. To integrate animation into an engine, you have to design everything for it.
After all, it consumes memory, uses computing cycles, gives you events to handle, should react on events, is asynchronous in one regard and synchronous in the other... Blablabla. It's an integrated system, so it's not trivial in any case.
The animation system I told you about above does things in the following way:
In a frame
For each animated mesh
Is current mesh animation frame a keyframe?
Yes: Copy rotation, location, scale from keyframe to mesh
No: Interpolate rotation, location, scale from previous and next keyframe via bezier curves, linearly or whatever floats your boat, apply to mesh
So it's rather simple in theory, as you can see. As you will see once you're familiar with the details, it's rather simple in practice, as well; at least when you've got the pitfalls covered. But it's limited and obviously flawed, for example, you can see the seams between meshes when they animate; or the data overhead is rather large; or it needs a lot of interpolation, so it is expensive.