There are two flavours of Xbox Kinect - the v1 (top) is the original device, originally released for the Xbox 360. The improved v2 version (bottom) was released for both the Xbox One and for PC as a development kit from Microsoft. Over time, Microsoft stopped producing the PC version and instead created an adapter to make the Xbox version compatible with any PC supporting a USB3 port.
Using either of these sensors, it's pretty easy to do motion capture with a half-decent PC and any one of a number of mo-cap applications. You can even do mo-cap directly in Unity with something like Cinema Mo Cap, which supports both types of Kinect device.
Although motion capture with a Kinect is (relatively) cheap - about £200 for the hardware and software, compared with £15,000 for a profession set-up - some of the captured motions leave a little bit to be desired, and almost always require some degree of "cleaning up" to remove rogue movements and the odd limb-twitch.
And this is where the cheap vs expensive approach really shows. Raw motion capture data is great for capturing key poses, but the problem with a lot of cheap mo-cap solutions is that they don't provide any editing tools - they simply expect you to use the motion capture data as provided.
This has two main drawbacks:
The first is that every now and again the mo-cap software mis-interprets the position of a limb or extremity (usually a foot or a finger) which creates strange glitches in the animation when played back.
The second is that a massive amount of extra data is captured that isn't really necessary. Most 3D packages provide "tweening" for animating a 3d character between two poses. BVH mo-cap files include all the inbetween movements as the subject moves from one pose to another.
If we had a way of capturing just the keyframes with the "important" poses in them (just before/after a major movement) we'd be able to eliminate both of these problems in one - we'd only need to keep the actual poses that add to our character movement, and in doing so, we could easily exclude any poses that contained glitches.
The problem with this approach is cost.
Mo-cap editing software is expensive.
And we're pretty cheap.
So we set about creating a simple mo-cap editor.
Now recreating a 3d animation from mo-cap data would be cool, but there's loads of this software around that does a far better job than we could do in a weekend. So we're not going to do that. We're going to use the fact the the BVH (mocap) data file format is pretty basic, to allow us to parse an animation file and extract only the data we're interested in.
For example, a BVH file consists of a simple skeleton description, followed by a number of rotations for each "bone" or joint in the skeleton.
As we're looking at the Brekel motion capture software, we had a look at the sample data they provide from their Microsoft Kinect capture solution. The header of their BVH file looks something like this:
And after all this junk, we start to see the actual animation data:
The motion data follows the structure of the defined skeleton exactly. So in our example, the first root of the tree is the hips bone/joint. This is defined as having 6 "channels"- the X,Y,Z positions of the hips in real space, and the Z,X,Y rotation applied to the hips. The position of every bone in the skeleton after this is in relation to the "root" bone.
This means that the first six values on each line of the motion data define the position and rotation of the hips joint. The next entry in the skeleton is the left-hip joint. This is defined as having three rotational values. So the next three values in each line of the motion data (the sixth, seventh and eighth values) define the rotation of the left-hip joint. The next joint in the skeleton is the left-knee, defined with three channels, so the ninth, tenth and eleventh values in the motion data are the Z,X,Y rotation values for the left-knee joint.
And so the motion data continues, each set of three values relating to the next bone in the skeleton. To confirm that this is true, we can take any line of motion data and apply it the skeleton description in the BVH header file. Here is the first line of motion data, applied to the skeleton:
With this information, it should be pretty trivial to create an interface (in VB6 of course!) that can read and re-create the BVH skeleton tree visually:
We've added in some tick boxes in the tree structure, to allow us to select which bones/joints we want to include (or remove, if necessary). By walking through the tree structure, we're able to re-create the BVH structure, amended with bones removed if required.
By ticking and unticking boxes in our editor, we're able to remove entire limbs and tree branches from the skeleton. In this example, we've removed the left forearm, hand and fingers, as well as the right ankle and foot.
Of course removing a bone (or bones) from the skeleton means we also need to split each line of the motion data, and re-create it to apply only to the bones selected in the interface. This is also a relatively trival exercise, but one that needs careful attention, since removing the wrong set of three values can vastly effect the rest of the animation.
Lastly, we added in a list of "keyframes" we want to capture into our editor. So instead of recreating the entire bvh file as captured, we can take only the frames we're interested in from the BVH data. This is done by simply reading through the original BVH motion data and if the line number (plus the offset from where the motion data starts) matches any one of the keyframes required, the motion data is parsed and re-applied to our modified skeleton, before being written to a second BVH file.
By using this approach, we've been able to create motion capture files with only the bones/joints we're interested in (for example, removing the lower limbs from an animation from which only the upper torso is only required) as well as only the key poses from the animation we need to recreate the animation in our Unity game. This results in much smaller files, with only a tiny amount of data in them (compared to the original animation files).
Of course a BVH file containing only the key poses we're interested in won't play back at the correct speed (since we've dropped about 98% of the captured data). But by capturing only the key poses, after importing into Unity, we can stretch the imported animation out, using the original animation to work out how many frames to leave between each key pose. And we still need to load the original data into some kind of BVH playback application (such as BVHViewer) in order to see the animation being played out on a 3d character. But using a bvh-viewing app allows us to select a frame with a pose we want to keep and then simply enter this frame number into our VB editor app - it's a bit clunky, but it works. And, as much as anything, doesn't cost a penny!