On Monday, February 7th, we’ll talk about motion capture as a lead in to working on Synthesis-by-Example methods. As I’ve mentioned on the Mocap Readings Page, I don’t have good readings for this. So instead, I’m going to ask you to read some less good ones.
For class, you need to read 2 things: a “philosophy” paper by me to get an idea of what the issues are, and a survey paper (by someone else) to get an idea of what the solutions are. You do not need to get all the details (yet) – just to have the big picture.
- More Motion Capture in Games: Can We Make Example-Based Approaches Scale? by Michael Gleicher, Motion in Games, 2008. http://graphics.cs.wisc.edu/Papers/2008/Gle08/ (pdf here)
- VAN WELBERGEN H., VAN BASTEN B. J. H., EGGES A., RUTTKAY Z. M., OVERMARS M. H.: Real time animation of virtual humans: A trade-off between naturalness and control. Computer Graphics Forum 29, 8 (2010), 2530–2554. http://people.cs.uu.nl/basten/publication/CGF2010.pdf (local version)
On the Mocap Readings Page, you can see others in each category. I recommend (but do not require) that you read the draft book chapter Animation from Observation to get a sense of some of the challenges in capture.
Before class on Monday, please read 2 (or more) things (they are light reading and should be quick). In a comment to this post, please discuss the following:
- What do you see as the core challenges that we’ll need to learn methods for in order to use motion data for animation? How different is it for different applications (e.g. film, games, medical analysis)?
- What are you most curious about?
- If you read one of my old things: if i were to re-write it today, what might be different? What is still a problem after all these years, and what is not. (I think I actually did a good job at predicting what was fundamental and what wasn’t – given that in 2000, motion graphs etc. hadn’t been invented).
{ 13 comments }
Almost all the challenges I perceive are data-centric, and most of them are probably extremely similar…
1. High specificity of data is required for most methods of working with motion capture, but you can’t sample everything.
2. The less specific your data is to your purpose the more manual work you have to do.
3. Getting data is expensive (in terms of equipment, setup, effort) and prone to error (obscured markers etc), and transference onto targets is not a trivial problem.
4. Editing motion may have physically unnatural results.
5. These problems will not be solved until we can not just capture, but describe and abstract upon the essence of a motion instead of working with specific data of a particular movement.
Now that I perceive these problems, I think the most interesting thing would be to explore ways of abstracting the essence of motion. After all these years in class we have still mentioned that getting data is hard. Some corporations or entities may have great setups for capturing data in addition to large repositories of data, but we don’t have access to that. I think methods of synthesizing motion would be valuable.
The core challenges a we would need to handle include:
The ability to create new motion based on the existing examples. This is important because we don’t have the recourses to get motion capture data for every movement we might need. Also as pointed out in the papers we would run in to a scalability problem and as we introduce more parameters to fine tune the motion dealing with a high number of dimensions will also become a problem.
Some methods such as splicing will help extract and combine parts from the existing examples. Interactive/real time animations are especially difficult and would require a method to find matching motions quickly, and to generate motion graphs.
I am curious about interactive/ real time animation using SBE’s
As I watch the Super Bowl and read about SBE I have to think about all of the sports games that are out there. Whenever I see the newest football game (or basketball for that matter) I always notice that the motion just doesn’t look right. Now I don’t know whether this is SBE, but I would say that it looks like they must be doing something similar to get the motion to at least look fluid. I guess this is probably the scalability problem. They just don’t have enough processing power to dedicate to animation to handle the number of examples needed to make it look right.
The number of methods for performing animation from motion capture and simulation methods is pretty daunting. I’m interested to find out where in the film industry each of these methods is used.
One thing that I remember some of the really smart Quants at Stark Investments would say is that you never want to use data mining to make investment decisions. I know motion capture is a perfectly legitimate method for creating animation, but it feels like a lot of control is given up and that there are probably a lot of things that motion capture won’t allow you to do. Just my feeling, I’m interested to see if I learn differently over the course of this project.
1. For games, the biggest problem with motion capture seems to be the number of motions required to interact with the world. Since a game is essentially a non-deterministic simulation of reality, you cannot know beforehand all possible movements a character will need to make. Since motion capture relies on specific actions being recorded the method fall short when a character needs to do something it has no example for. In this case, devising a method that can extend existing examples to new situations would help immensely.
For things like film the actions are known beforehand, but generally the quality of the animation needs to be far superior than what is generally considered acceptable in a game. In this case the primary problem is recording examples that accurately depict what the director would like the motion to be, a matter of control. Perfecting it is a time consuming process reliant on the actors ability to carry out the directors instructions. Automating this process without sacrificing the realism/naturalness of the motion would be desirable.
2. I’m curious how well motion capture data has been used to generate a model based approach to animation, would it generalize well to new actions while retaining the style directed in the capture of the examples?
The main challenges seem to be finding a close match for the motion you want, and altering the original data to fit your parameters. Scaling the data set doesn’t actually help either of these. You can get a closer match if you have more motion data, but you still have to be able to describe your desired motion well enough to find the right point. Regardless of the size of your sample set, a good blending algorithm is hard to do.
One thing I noticed is that procedural techniques offer much better control, but are sometimes avoided because they look unnatural. It seems that an alternative to using motion data would be to create more realistic procedural algorithms. In other words, we currently have two extremes: procedural offers high control but low realism, and motion capture offers high realism but low control. Most of the techniques are starting at the motion capture side and trying to improve the control issue, but few are starting at the procedural side and working on realism. I also get the feeling that problems on the motion capture end are usually overconstrained (several motion primitives conflict with each other and with physicality constraints) while problems on the procedural side are underconstrained (IK being the best example). Finding some sort of “realism” optimization for a procedural animation seems simpler than figuring out which parts of a motion primitive to ignore.
What surprises me is how little progress has been made in the realm of real time human motion. It seems to be very challenging to put all of the pieces in place: implementing a wide variety of motions, blending those motion primitives together, allowing improvisation, and doing all of those things in real time and in a way that looks natural. Other than motion in cut scenes (which doesn’t count, both because it falls so far towards the control side of the Van Welbergen et al. criteria, and because motion in cut scenes has been pretty good for a while now), I haven’t really been wowed by anything in video games for a while now.
The motion system in Spore is what seems most promising to me: it tackles the issues by doing everything procedurally (well, not really, but that’s the claim). Monsters with 12 legs will move reasonably well, ditto with 8 legged creatures or creatures with caterpillar legs. Of course it is very easy to play around with this system and break it, and all of the motions generated are exaggerated and unnatural (somewhat intentionally so). But I think the fact that it’s possible to set up an a posteriori real time motion scheme is promising.
The Gleicher paper I think makes a convincing argument that advances will need to be revolutionary, not evolutionary. We’re pretty close to surmounting the uncanny valley when it comes to human-looking characters, but we’re still a long ways off from human-moving characters, except in limited contexts. We’ve gotten a little bit better at making “cartoony” human motion: n.b. the hand-tweaked mo-cap canned motions in games like Resident Evil 5: in its intended context the motion doesn’t look “realistic,” but it is compelling and fits the setting (“realistic” in a Walt Disney way, maybe). But much like Spore, it’s easy to break the system by choosing odd angles and other parameters to begin the canned motions. By erring on the side of realism and control, they prevented spontaneity, the flipside to Spore’s design decisions.
I wonder how much of these decisions about where to fall in this vast continuum of motion choices are made. We have an idea of how game designers skimp on things like graphics or storytelling or voice acting, but how do designers skimp on motion? How can people do it well when the skill set needed to implement a motion system can drastically change based on what compromises are made? How much compartmentalization and movement occurs between the keyframe+mo-cap animators working on cutscenes and animated films, and the programmers in the trenches making motion for gameplay?
1) I think that capturing the richness of human motion will be the most difficult core challenge to learn methods for. Human motion is so subjective and complex that it can sometimes be tough to describe qualitatively; trying to do this quantitatively will be even more challenging. Is there a way to generalize the feeling or style of the movement such that the data can be used to generate new motion primitives that convey the same feeling or style? We need to learn methods to capture and understand these feelings/styles in order to automate genuine, unique animations (as opposed to bland, general movements). The need to be understood quantitatively is most needed in games for the sake of generating/blending fluid movements that reflect a style/feeling on the fly; without thorough understanding, motions that do not fit the style could be chosen resulting in a unnatural movements. Film may not need as deep of an understanding, since it doesn’t have the real-time constraint (and can be adjusted offline).
2) At this point, I’m most interested in what I mentioned above, real-time generation/blending of motions that reflect a certain style. If we are to have believable, realistic characters in games (my primary interest), then we need to have their motions communicate a dynamic state of mind.
3) In the intro to your Mocap book, you say that video processing technology is a long way off from being able to determine the movement of someone in clothing. The authors of the paper “Video-based Reconstruction of Animatable Human Characters” present methods that make this appear feasible. Although, if I remember correctly, I think it took six hours to process one video sequence…Thus this is still a problem, but perhaps its solution is not as far off as once thought.
One of the main challenges seems to be combining motion data with interactions from other objects. Motion data seems to be harder to use when animating something that is hitting or otherwise touching something else in the scene. Another challenge is making smooth transitions between different sets of motion data so that the motion seems to move smoothly. Additionally, the motion data needs to be simplified enough so that it can be applied to more than just one model and so it isn’t too complicated to use.
I am curious to learn more about the different ways that the motion data is combined and how several different sources are sometimes used to make a smooth transition. Another interesting issue seems to be the splicing together of different upper and lower motions at the same time
After doing the reading, I believe that the core challenges have to do with how we can adapt the data for the intended use. This sounds vague, so let me expand:
In film, a big problem is how transfer the motion of the actors to the characters that will ultimately be displayed. If the characters do not have the same proportions as the actor, how do we deal with that? Even if that is the case, there is no way that we can hope to capture all of the subtle details that the human body makes, so will missing these be a problem?
In games the motions can’t be planned ahead of time, and therefore they need to be generated on the fly. If we restrict what the character can do, perhaps we can construct (both manually and automatically) sets of motions that will be enough for the purposes of the game. However, this tend to make things look repetitive, or even unnatural. As stated in one of the readings, it is not practical to just include huge libraries of examples either, so how do we deal with that?
In medical analysis, the fidelity of the data is extremely important. However, every type of sensing carries with it a certain degree of uncertainty. How can we make sure that this uncertainty will not lead to wrong medical conclusions?
With all of these problems however, I think that what most interests me is something in between the game and film example. How can we use synthesis by example to make it possible for a novice (aka me) to create professional grade animations. Is that goal even possible?
The main challenge for motion capture data is that the number of data sets available to us is limited however we try to simulate an infinite number of moves from a combination of these few sets.However the more data we have with us, the more computationally complicated it becomes to ‘decide’ which motion fits in best with the current set (i.e. for operations such as blending/concatenation etc). This is specially true for games as the virtual character is human controlled and with the combination of keys available to him on the controller and the environment that the character is in, several moves are possible. Again motion (visual) is not the only aspect of a game – this whole unpredictability of the motion during real time game playback makes it difficult to produce proper lip-sync of the characters (the Realistic Crowd Sim paper I read makes a study on this).Motion captured data is limited as well. Today if they need to do a motion capture of an elephant I doubt existing techniques allow that i.e. it is highly specific. What we see more often is a human version of that animal (in movies like ‘Happy Feet’ and ‘Ratatouille’). Coming to films, it is easier to ‘handle’ motion capture data as the real-time play-back factor is no longer valid. Moves can be choreographed tailored to the script. But think of the end scene in ‘Titanic’ where a lot of non-real characters jump off the ship into the ocean – even though we can spend months/years carefully planning out how each one of them and then use motion capture, it makes more sense to capture just say 5-6 of them and mix and match them to create 100’s of CG passengers.
I was wondering how motion capture data is actually used in an animated film. What about the face of a character? Is that motion captured as well or animated – how to decide that?
1. The central challenge in using motion data to drive animation involves figuring out a way to use an existing database of motions to derive the precise motion we desire, either through blending, concatenation, layering, or some other transformation. The fundamental difficulty is that no single existing motion primitive can give us exactly what we want, even if that primitive was obtained during a motion capture session with the specific motion goal in mind. We probably do not need all of the details present in the richness of human motion to animate a simple character; we only want the “essence” of the motion. Different applications differ in their necessity for control vs naturalness, both of which are exhaustively explained in the second paper. For example, film usually places an emphasis on naturalness over control (much to the displeasure of the actual animators involved, I’m sure), hence they make much more use of motion capture techniques than procedural or physical simulations. Games are just the opposite, in which a great degree of control is required in order to make compelling interactivity possible.
2. I am very curious about the specifics involved in parameterization, defined in the second paper as the method of converting intuitive control parameters into the animation parameters defined by whichever animation technique is currently being used. It seems to me to be a very hard problem, yet very important, to be able to present the actual animator with controls that are both intuitive and as orthogonal as possible.
3. I read the draft chapter of Animation from Observation, and it seemed to me to be a very good introduction to the large fundamental issues at hand, all of which seem to still be quite relevant. One interesting note you made was on how there is quite a tension between traditional animators and motion capture technicians and users, stemming from unrealistic expectations of what motion capture can achieve and animators having a difficult time working with the data produced by motion capture. I’m curious as to what extent this tension still exists today?
After reading these papers, the primary challenge that seems to arise is finding a balance between the natural fluidity of motion capture data and the precise control provided by algorithmic data. While motion capture data may provide a greater sense of realism in fields like film and medical fields, where the motion can all be orchestrated ahead of time, scalability issues become huge in games, where motions must be generated on-demand.
I am very curious about how specifically to handle the scalability issues presented by the data. My initial sense from reading these papers is that the scalability issue is hard and we don’t currently have a ‘killer’ way of tackling the issue. Not only that, but finding the key trick for rapidly generating natural motion from both mocap databases and algorithmic procedures could provide insight into doing rapid calculation and approximation (i.e. ‘faking’ what is natural) for other computational issues as well.
I felt that the challenges discussed in 2000 paper generally echoes the Eurographics paper with the exception of the solutions for providing more control over motion capture data by motion graphs and similar techniques. This is surprising out of context given how much work has gone into understanding how to effectively animate the human form; however, like most of the elements cited in the Catmull paper as being essential to the development of animation as a field, it still is an open question as it is so difficult to model and humans are so used to seeing real human motion in everyday life that it is perceptually really hard to fake.
I apologize for the lateness of this post. I saw the big posting of
papers last Friday, but somehow I missed the actual assignment for
this Sunday. I must have glanced over it in my RSS, but I have no idea
why.
——————————-
The use of motion capture data for computer animation will depend on
solving problems in two key areas: producing a convincing single
motion and producing a convincing transition between motions. In terms
of motion capture these two tasks seem to be somewhat separate.
The first task is primarily concerned with the raw motion capture data
and cleaning it for use. Artifacts from the data need to be eliminated
and the motion smoothed so it looks fluid and natural. It also deals
with the concept of synthesis, where many similar motions can be used
to produce a single blend that meets some set of requirements – be
they emotional content or mechanics of limb placement.
The second task is important as a single motion is rarely sufficient
for any purpose. A character’s actions consist of many motions, some
sequential and others concurrent. While motion capture sessions could
be planed to demonstrate all the motions the character needs, rarely
can every action be planned to such detail ahead of time. Techniques
are required to merge multiple motion capture clips together and to
splice them sequentially, all while maintaining realism. This requires
maintaining constraints and physical laws such that any transition,
while not preformed explicitly by the actor, will have the appearance
of being possible to do so.
In many ways, these two task overlap. It is difficult to get good
blending if the starting motions are poorly conditioned for the
scene. Thus well constructed, and with prior knowledge of needed
transitions, motions need to be made for the entire action to come off
successfully. However, depending on the final purposes, the ‘weight’
of these two tasks may be different. For interactive character
controls, like in a game, very good sequences and transitions are
important as the player will be often switching between motions while
playing. If the transitions are poor, they will be obvious to a viewer
who sees them done repeatedly. On the other hand, other areas may be
less concerned with the transition quality, but more focused on the
quality of the individual motions. Medical analysis and physical
therapy probably cares more about the action of rotating a single
damaged joint rather than blending the motion with another.
In the constraints of this area, I am fairly curious about the work
Perlin did with the noise based animation. On one hand, the demos look
strangely appealing – probably due to the natural seeming randomness
of the actions. No motion looks quite like any other, which is
characteristic of a biological system moving. On the other hand, I
wonder if the technique could be improved by combining it with motion
capture data. As with many other applications of noise, perhaps it
would reach its peak if used as a surface decoration rather than the
meat of the animation. This may be a viable alternative to using large
databases of motion to construct motion blends. A limited set of
simple motions could be enhanced with noise to create the effect an
animator is looking for.
———————————————
Also, I read this paper last semester when I implemented a Cyclical Coordinate Descent method of IK, and I don’t think its in the IK list. I don’t know if you have seen it, but I thought it was a useful overview of techniques. Its a bit old, but I don’t think that matters too much.
Welman, C. “Inverse kinematics and geometric constraints for articulated figure manipulation.” Simon Fraser University, 1993.
http://lib-ir.lib.sfu.ca/bitstream/1892/7119/1/b15233406.pdf