Use this page to comment on papers from Group 2 from Reading 9 and 10.
Reading 9-2 – Comments on Group 2
Previous post: Reading 9-1 – Comments on Group One
Next post: Readings 9-3 – Comments on Group 3 Papers
CS777 Computer Animation Spring 2011
Archive of 2011 Computer Animation Course Web
Use this page to comment on papers from Group 2 from Reading 9 and 10.
Previous post: Reading 9-1 – Comments on Group One
Next post: Readings 9-3 – Comments on Group 3 Papers
{ 21 comments }
– Near-optimal Character Animation with Continuous Control
This paper tries to develop a kinematic controller that blends pre-captured motion clips to achieve user driven character motion inputs in an environment with (or without) obstacles. The authors use a ‘low dimensional reinforcement learning’ method to handle the high dimensional complicated human motion. The first stage is to develop a motion model which stitches short clips (taking into consideration constraints (1 per frame) – therefore footskating on 1 foot is avoided) and the second stage is a controller, which acts like a feedback mechanism and decides which is the best sequence of clips that satiates some user defined goals.
The state space of the controller is defined using a number of parameters – for eg. position/orientation/speed of character/obstacle and user desired gait, orientation etc. Each of these parameters is associated with a cost and the goals are defines in terms of these costs. Deviations from user defined path and proximity to obstacles is penalized with high costs for example. An interesting point about the state space variables is that all of them can be independently controlled and allows motion such as walking backwards/sidewards with user defined torso orientation.
To solve this problem of achieving the goal, the authors have taken an approach based on linear programming. They define a value function that measures long term state costs and try to approximate this value using a set of basis functions (the concept is still fuzzy to me). Thus at run time the system chooses the minimum value transition satisfying the user goal and then updates the state space parameters accordingly for the next transition.
Interactive Generation of Human Animation with Deformable Motion Models
The key idea here is that the authors apply statistical analysis techniques to a set of precaptured human motion data and construct a low-dimensional deformable motion model of the form x = M(alpha, gamma). Alpha and gamma are the deformable parameters, referring to geometric and timing variations respectively. They formulate the constraint-based motion synthesis problem in a maximum a posteriori (MAP) framework, estimating the most likely deformable parameters from the user’s input. The system will pick “natural-looking” motions that best match user constraints.
There are two ways to interact with the system. First is by direct manipulation of points on the character, timing, and high-level control knobs. Second is a sketching interface, where the user can select a point on the character and then draw a desired time trajectory in 2D screen space. This system can also be used to filter noisy motion data and remove foot sliding artifacts.
I skimmed over most of the math in this paper, since I don’t have much experience with the machine learning concepts such as gaussian mixture models employed.
Interactive Generation of Human Animation with Deformable Motion Models
quoting form the paper:
Our key idea is to apply statistical analysis techniques to a large set of annotated
motion examples and construct a deformable motion model of the form x = M(@;#) for particular human actions x, where the deformable parameters @ and # control the motion’s geometric and timing variations, respectively.
I get that they are forming a statistical model that decouples motion and time warps, using something called a maximum a posteriori (MAP) frame work. I have no idea what this is…
It also seems to be important to their success that the model is annotated with extra information (I think this is were the word deformable in their title comes from, but I am not sure).
Until I understand the whole statistical modeling framework better, I cant think of any more meaningful questions
I am reading the Lee & al paper.
I’m still working my way through the math (which is non-trivial since flow stuff has never been high on my list of math expertise), but the key idea seems to be translating the motion graph/move tree metaphor from a walk on a graph (or a definition of a group operator in pose space, or whatever discrete metaphor you’d like) into a traversal through a vector field.
It addresses a few of the issues with motion graphs (and motion graph-like systems) that I could imagine coming up, such as having to “throw away” a lot of computing once you add new constraints or move to a different space or any number of things that you could imagine would occur all the time in an interactive environment. The authors also mention the difficulty of combining motion graphs with physical simulations and other forces that aren’t constraints or examples per se.
The flow metaphor used in the paper I think is a good one; rather than simply choosing from a selection of possible next motions, or making hasty simulations by blending motions, we can treat nearby examples (of both pose and limb velocity) as forces attracting our figure through the space of plausible motions, trail markers instead of railroad tracks.
I’ll need to check out the website to see some results, but if they can get the realtime performance they claim even when confronting perturbations I think they’ve sold me.
Interactive Generation of Human Animation with Deformable Motion Models
The general idea of this paper is to construct a statistical model of motion from pre-recorded databases of motion capture, which is then controlled by a user via direct control or sketch interfaces. The methods use the interfaces to generate control parameters for the model which via a statistical MAP method will produce natural motion.
Honestly I had a far more difficult time reading this paper than the first paper. I believe this is mainly due to my lack of knowledge in statistics. If I had not learned the basic idea of a MAP from my last semester class, I am fairly sure most of the concepts in this paper would have been lost on me. As it is, I have a very rough idea of how the MAP works in this instance and how they are using PCA to extract the key features of the motion data, which can then be described by a set of parameters. However, I am fairly sure I can not answer any detailed questions about the math involved.
I think a large part of my difficulty is that I have trouble visualizing how the equations presented in the paper will eventually produce the motion examples scene in the video and the paper’s figures. However, despite my lack of mathematical understanding, it is clear that the method here depends on comprehensive motion databases. Without examples, and large variations, in each motion ‘category’, the authors note that their model will be unable to generate novel behavior. They use the example of being unable to spontaneously generate a combination head-scratching / walking motion, unless that combination is within the database.
Near-optimal Character Animation with Continuous Control
What: Produce motion controllers from a motion database
Why: Many algorithms exist for different types of motion control, this method tries to create a more general solution.
How: The first part of the algorithm is a motion model that allows transitions between any two clips. Longer motion clip are created by overlapping and blending motions. Motion controllers are generated using system states constructed to suit the task of the controller is meant to carry out. This framework allows a wide variety of tasks. States and transitions are assigned costs based on how they achieve the controllers goal. Policies for transitioning between states are created by using basis functions to estimate the long term costs. The number of basis needed is reduced by taking advantage that certain variables don’t change between motion clips.
I can’t say I’ve really heard of linear programming in any detail before, so that section of the paper seems rather vague to me.
Responsive Characters from Motion Fragments
This paper describes an on-line character animation controller that assembles a motion stream from short motion fragments, choosing each fragment based on current player input and the previous fragment. By adding a simple model of player behavior we are able to improve an existing reinforcement learning method for pre-calculating good fragment choices.
The proposed character animation controller works by generating a stream of motion fragments. As each fragment completes, the next fragment to play is selected based on the current player input and previous fragment. A control model, tells how player input is expected to change using a table of conditional probabilities. The ControlQuality and MotionQuality functions provide a means to compute the immediate benefit of decisions generated by the control model.
The results show that the model performs better than the steady controller. The provided video is not too helpful because it does not provide a comparison to other controllers and it is hard to see the responsiveness of the proposed controller. I should say, I am not totally convinced how much better this new controller is.
I read “Motion Fields for Interactive Character Animation” (3).
This paper describes an alternative to graph-based motion editing techniques. Instead of dividing motion into sequences, it considers each frame individually. The pose and velocity of each frame of motion is stored in a “motion field”, which is a high-dimensional generalization of a vector field. The character’s motion “flows” through this field, allowing the character to respond to user input more quickly than with graph-based approaches.
I really liked this paper. We have talked a lot during lectures about the idea of a “pose space”. This concept of a motion field is very similar to pose space, and is an intuitive and easy-to-understand approach to motion editing. I was also impressed by how flexible the technique is. It can be adapted to various computational and memory constraints and still perform well, and it can be used on its own or in conjunction with graph-based approaches, or incorporate physical dynamics.
I don’t really have any specific questions about this paper, just a few observations/comments. First, it seems like the last several papers I’ve read for this course have been trying to solve the same problem– real-time character control through editing a set of motion capture data. All the papers seem to list video games (but not much else) as a potential application. What else are these techniques good for? Second, this is the first paper I’ve read that has made any mention of incorporating physical dynamics to create more realistic effects. Why isn’t it more common?
Dear god, this paper is brutal. I thought I had was somewhat mathematically literate, but I’m second guessing myself now. Let me try and summarize what the authors are presenting here. By using universal kriging, a modified form of spatial prediction which compensates for the lack of intrinsic stationarity in motion data, Mukai & Kuriyama are able to parametrize motions using dynamically computed kernel functions to get optimal blending between poses. Radial basis functions accomplish a similar goal, but they apply a uniform kernel across the pose space which can provide results with high-frequencies (jerkiness/imprecision ensues). In order to determine this predictive model, a pre-processing step is required, which computes the trend estimate and variogram estimate of the data. The trend estimate is actually a hyperplane that is formed by minimizing the least squares, for a given component of a motion, of the trend component (will need to be elaborated) and the synthesized pose (aka residual of this element). The variogram is computed using the residuals of the per-pose distance metric (created by Kovar, point cloud), between sampled and initial poses. These results parametrize the inputted motions into a space of time-aligned motions. Then during runtime, this information is used to compute the blending kernel function as well as predict residuals, which effectively predict the appropriate motion.
In order to truly understand this method, I’m going to help with the following:
-What the hell is kriging?
-What are the variogram function and trend space components of kriging?
-What is intrinsic stationarity?
-I’m still not clear on the definition of a pseudo-inverse matrix, is that when a sub-matrix of the original invertible?
Wrong section…
Hello Aaron,
Out of curiosity for the first word “Brutal” and Prof. Mike’s description in the class, I went through the paper and I found it extremely simple and requires very little mathematical skills to formulate the problem. Things are so similar to Least square formulation and expectation minimization techniques that we learn in our undergraduate classes and to understand this paper, just don’t with the fancier words and stories in the paper ( more than 30% of the paper is just a story).
This paper is EXTRA cool.
csv
Responsive Characters from Motion Fragments:
This technique is used to combine short motion clips interactively. It uses transitions similar to a motion graph but it requires there to be a transition to a new motion clip after a short period. A motion graph might have a section where there are no transitions, but for this version there has to always be nearby transitions so the changes can be interactive. The next transition is based on the previous “fragment” of motion along with the current control signals such as are used for playing video games. It can also use “traces” from previous example control motions to provide a guess at what the user’s choices will be. This can allow it to have a better chance at being in a good position for future transitions.
I wondered if they had some automated methods to generate the “control signals” that were specified for each short motion fragment that went along with a certain button. It seems like this could be difficult to determine from a general set of motion data.
For group two, I read McCann and Pollard’s “Responsive Characters from Motion Fragments”.
McCann and Pollard present a method that offers the capabilities of motion graphs but as well as the responsiveness needed for real-time applications. The method uses machine learning to build a predictive model for control input/signals which correspond to short motion fragments. This probability model is determined from training data that is collected from the input streams of normal gameplay. In addition to the probability model, the controller uses a quality heuristic, which selects a motion fragment that provides the best motion quality (jerky or smooth blend) and control quality (incorrect or correct action) evaluated over n time steps in the future. A continuous stream of motion fragments (0.1 seconds in length in the authors scenario) are played and selected fast enough to make seemingly instantaneous response possible.
This paper directly addresses my concerns with the Kovar’s motion graphs; the responsiveness seemed too limited to make them feasible for use in game. I think the underlying ideas of motion graphs are definitively the next step in improving the interactivity of video game characters, so it was really exciting to find that this technology is in the works.
I am confused about the definition of control bins. The authors say that they are regions of space in the high-dimensional space of control signals, which are closest to hand selected points. What are these points and what are these regions? Some examples would really help…
The only concept I was confused about on Monday was the definition of control bins, but looking back now, they are pretty simple. Control bins serve to make the continuous control space discrete, so any input that is nearby in continuous space will jump to them. This enables the construction of the control policy table in the reinforcement learning process; it is extremely simple and requires very little mathematical skills to formulate. Oh and it’s extra cool.
I attempted to read the Interactive Generation of Human Animation with Deformable Models paper.
This was very dense mathematically and I didn’t understand most of it.
What I did gather, was that they attempted to perform statistical analysis an a motion database to reduce motion down to the form x = M(a,g) where a specifies space and g specifies time. By warping these two values, different motions can be Synthesised. I don’t know what MAP nor do I understand the statistical methods in question.
At first I didn’t think this was all that interesting of a paper, but the pictures at the end, with the sketch interface looked really cool and if animation can be made that intuitive, this paper probably deserves a closer read later.
I read the motion fields paper. It is a beautifully written paper and presents a novel representation of motion data that isn’t “rigid” in its notion of state.
One of the disadvantages of motion graphs is that you can “move” only along the transitions (edges) and so, in interactive editing, there tends to be delay if the character isn’t at a node in the graph, which happens if the editor makes a sudden change.
Motion fields brings in the notion of continuity by representing state using three vectors:
– the post comprising the root position, orientation and the joint orientations
– the velocity, which is the difference of two poses (and hence successive frames)
– the task, comprising of task parameters for a user-specified task
They define a similarity metric (very similar to kovar et al, with the addition of the velocity vector).
The beauty of the paper lies in formulating the problem of choosing the best (or was it called near-optimal) action as a Markov model that uses rewards based on the task vector and then reducing the exponential search space using a value function that can be computed recursively and can also be compressed.
Values at states not in the db are calculated via interpolation.
So, motion fields is another data-dependent technique that provides a different representation of motion data allowing real-time user control to make the motion “flow” through the character poses.
The authors mention that “no preprocessing of data or determining where to connect clips of captured data” is needed.
How on earth is this true?
i) The footskate solution in motion fields (i.e., storing whether each foot is in contact per motion state).
This required some pre-processing didn’t it? The user had to annotate frames with foot contact in the mocap data.. isn’t automatic inference useful here?
I don’t think Kovar et al needed the user to annotate the mocap data for footskate clean up.
Also, there isn’t the notion of a post-processing state in this paper. It seems like everything is done on the go.
ii) how are value functions calculated “w/o any preprocessing”? The ability to compress them comes via calculating them earlier, didn’t it?
other points worth discussing:
i) non-parametric vs parametric models in motion and their consequences
ii) ideas to reduce the space of searching for similarity (whatever be the metric)
iii) what are other ways (than markov modelling) to achieve near-optimal decisions?
iv) blending in motion fields v/s motion graphs; no time-warping in motion fields?
I was interpreting the footskate cleanup &c. as extra goodies tacked on to the continuous representation of motion space. So their claim that they don’t do any preprocessing is only true in that they don’t need it to construct a path through the vector field, and that you can get a lot of constraints satisfied by weighting the Markov model rather than an explicit annotation and cleanup step.
I’m assuming that once you’ve made the motion then you can do all the post processing you want, and in fact it might be easier since you don’t have to resolve the search problem at every step.
Paper: Responsive Characters from Motion Fragments
Brief information :
***************
Category : Engineering (Describes a technique to improve existing methods).
Citations Since 2006 : Unknown
Breakthrough Idea: Still searching ( mild way to say, probably Nothing).
Unfamiliar terms: Tabular-Policy based Controller.
Kolmogorov Minimum Description:
Reactive characters are important in animation to augment reality. This paper describes an improvement in “Reinforcement Learning Method” to precalculate good fragments.
Reproducibility: Probably not too hard.
First of all, the Video clip provided by the authors is totally unimpressive.
It is really hard for to evaluate this paper except that I know what authors wants and what they are doing, but compare to previous works such as motion graphs, parametrized motions etc, how superior is their approach and why existing methods can not be modified to incorporate reactive automation, is beyond my experience in these systems. To me all the “Related Work” described in the paper are more elegant and fortunately none of them use AI, which we all know that the efforts for the last thirty years can be best described in two words “Successfully failed”.
I read Near-optimal Character Animation with Continuous Control. This paper applies a two-step approach to synthesizing example-based motion controls focused a particular user constraints: namely the target motion sequence type and the particular motion parameter the user wishes to preserve. This is done by first constructing a linear blend of the clips and then selecting the best blended sequence of clips to best fit the user’s demand based on a specified set of multiple parameters of interest (the value function).
I did not really understand a lot of the underlying optimization techniques, but am hopeful later passes over the mathematics after a bit may help to clarify them.
This technique focuses heavily on using the low-resolution properties of the animation in parameter space in order to synthesize ‘realistic’ human locomotion. This is interesting as the human cognitive system is very good at interpreting the low-resolution details of a motion, even in the periphery. While the authors present a crowd- based study of their methods, it would be interesting to extend this further into environment synthesis and I would also like to understand the discrepancies between the perception of these motions both in primary and secondary characters.
Motion Fragment paper:
The main idea of this paper seems pretty simple, they start out with a library of short motion fragments that are length restricted. They model player behavior and use it to build a controller that looks like a lookup table. One axis represents incoming signals, the other axis represents possible motions, intersection of the row and column is the output motion fragment. The main contribution is that this is an online system suitable for interactive games and optimized for immediate responsiveness over pure quality of motion. The concept that “all segments [must be able to] transform into other segments” is almost exactly like the concept of a hub in Snap Together Motion, but the difference is that they weight the trade-offs between a possible bad transition or two and better responsiveness, which is more important for gaming. I do not understand their arguments for why pruning the state space prior to planning may yield worse results. The other thing is that since 2007, I’ve actually noticed that responsive transitions between motions in games I’ve played have actually greatly improved; it looks like if you send a signal mid-motion now, the rest of the motion is accelerated to completion and the next is begun… so this paper may not be altogether relevant … but that’s just based on personal observation and not real knowledge of how this is handled nowadays.
Looking at Min, Chen, and Chai again didn’t really give me much more in the way of understanding. Although I have a slightly better view of what they’ve accomplished.
I do have a few questions.
1. Can we nail down a definition for the word “Registration”. I’ve got a feeling of what it means from Kovar’s Thesis but it was used again in this paper and I want to make sure I know what it means in this context.
2. We’ve talked about how computer animation can be broken up into categories, one of which is example based (i.e. throw a bunch of data at the problem). It seems that the papers we are looking at are almost exclusively in this category. It seems that in each paper we read, there is a completely different method being tried and that makes it difficult to know what the standard practices are. Is there any way to categorise these papers into general methods? It would be nice to have some intuition going into a paper as to how their method is going to work at a higher level.
{ 1 trackback }