For Wednesday, February 16, we will look at alternatives to the typical skeletal representation for motion data.
There are two systems that I’d like you to read about. Choose either #1 or #2 and #3. (1 and 2 are about the same system).
- Multon, F., Kulpa, R. & Bideau, B. MKM: a global framework for animating humans in virtual reality applications. Presence: Teleoper. Virtual Environ., 2008, Vol. 17(1), pp. 17-28 – Abstract Pdf BibTeX
(this is a systems paper, giving an overview of everything together) - Kulpa, R., Multon, F. & Arnaldi, B. Morphology-independent representation of motions for interactive human-like animation. Computer Graphics Forum, Eurographics 2005 special issue, 2005, Vol. 24(3), pp. 343-352 – Abstract Pdf BibTeX
(this is the paper with the representation details and retargeting method) - Edmond S.L. Ho, Taku Komura, and Chiew-Lan Tai. Spatial Relationship Preserving Character Motion Adaptation. SIGGRAPH 2010 (ACM Trans Graph 29(4)). (project page w/PDF and video). Although, if you’re on campus, it might be best to get the ACM Digital Library version (here).
Before class, leave a comment discussing the two different representations. In particular, what are the pros/cons of each of these alternative representations of motion data, and why might you choose them? (preferably things other than what the authors use the representations for).
{ 15 comments }
Read #1 / #3.
The two papers are different in that MKM presents a fundamentally different representation that uses a modified “skeleton” consisting of a spline for the spine, normalized segments for select bones, and a half plane in conjunction with a single variable length bone to represent arms and legs. The intention is to divorce morphology from the representation, thereby simplifying the retargeting process; a complete framework for how to execute synchronization, blending, and adaptation using this new structure is presented. The upside of divorcing morphology from this representation is that motion retargeting is executed with techniques more efficient than inverse kinematics; the system can be used to animate hundreds of characters in real-time. Both kinetics and kinematics are factored into the animation process, resulting in realistic motion. In contrast the Ho paper does not seem to describe a new representation of a skeleton, but instead superimposes a volumetric mesh on the existing representation in order to devise new ways of retargeting, warping morphology, and editing motion that avoids penetrations. This approach greatly eases the burden on the animator because by simply point/click/dragging body parts, warping is handled by the mesh. However, it seems like motions must be broken down into small segments, because there is no way to break constraints in the middle of the scene. This may result in implausible situations (in figure 6, the two left images convincingly portray a character ducking a kick, the right image looks like the guy is stretching up to be kicked). I would say that if retargeting is your primary concern, choose MKM because the conversion to a different morphology only requires easy analytical methods (division, multiplication, finding intersections of circles in a half plane), whereas the mesh may be more complicated. If animating a cluttered scene with many spatial interactions, use a regular skeleton with Ho’s method.
I read the Multon et al. paper as my MKM choice. Both papers implemented an idea that I think is useful: it’s not the specific ratio and angle of the skeleton (the morphology in the terminology of the Multon et al. paper) that is important in an animation context, but rather the general stance and semantic relationship between limbs in a pose.
The half plane representation of articulated limbs used in the Multon et al. paper seems compact and easier to represent than the full skeleton view, or the relative representation in the Ho et al. paper. I would imagine that crowd shots and long distance/low fidelity shots would be great for this sort of representation: I don’t care how the wrist of soldier #314 in my video game is positioned, but I do want to make sure that he’s in roughly the same pose as the others. I could imagine having different levels of granularity and detail in this representation for changes in constraints or fidelity requirements. That being said, the presented results all seem very stiff and lacking in fluidity and degrees of freedom: I don’t know if that’s a consequence of the figure models used or the focus on getting gross leg position correct at the expense of hands and feet, but the results just do not look natural.
The Ho et al. representation is described as suited for both motion retargeting and iterative methods. The first seems borne out by their results (although they do mention that they have trouble when the differences in scale become too great), but the second does not. Ideally an IK/retargeting algorithm that is iterative is suitable for (among other things) the benefits I attributed to the Multon et al. paper: being able to create “good enough” solutions for different levels of detail or computational time. But Fig. 3 dissuades me from thinking that being iterative buys us all that much besides the ability to sidestep doing a big linear solver: the solution seems to drastically change as the number of n iterations increases (before the final stability once n is large). In addition a number of the other benefits presented in the paper (non exponential IK, soft constraints, &c.) do not seem inherent to the model presented, but rather are existing algorithms re-implemented in the input+mesh context. Of course, I like their results a lot, at least the ones presented.
I know it’s not kosher for writers in academia to admit failures, especially in papers about algorithms and infrastructure, but I think I’d have a better idea of the usefulness of these cases if I had more images of edge cases and failure cases for these proposals. Neither of the papers explicitly control for realistic human forces in their motion planning, so it would interesting to see what sorts of constraints produce motions that violate these assumptions.
“Choose either #1 or #2 and #3”. That through me for a fun filled, operator precedence refreshing, loop. 🙂
I read the Multon et al. MKM. paper and, of course the Ho et al. paper.
The MKM paper presents a modified method for storing postures of articulated figures. Each posture is stored as a spline for the spine, constant sized appendages, and variable length limbs (but no angles or limb joints). The spine provides the basis for the figure while half planes coupled with the limb segments provide a means for dynamically determing the joint locations of the arms which effectively allows for the reconstruction of the figure. Each figure is “normalized” which means it needs to be scaled to the desired size before it’s blended with another motion. Once this is done, blending is first done by making sure that both motions have compatible stances and morphing the lower priority motions in time if they don’t. The final motion is calculated by imposing geometric constraints on rest of the limbs. I didn’t completely understand the constraints section. I grasped that they’re attempting to constrain the motion such that each motion has a plausible center of mass.
The paper claims that the benefit of this method comes in a performance boost that allows for interactive motion synthesis. The only reason that the paper seems to give is that by throwing away the standard “joint/angle” model, the complexity in Inverse Kinematic solutions goes away.
One con might be that this technique is probably only going to work with human figures (or figures who’s limbs only have two joints) as recovery of the elbow/knee joints is simple for two limb joints but get’s trickier for higher limb joints.
Uses for this technique are laid out broadly in the paper with focus on interactive VR. Apart from the real time applications, it might be useful in animation prototyping in which a basic animation is needed before a more sophisticated motion synthesis technique is used.
The Spacial Relationship Preserving Character Motion Adaptation paper was really interesting. Here, they basically represent the figure as a mesh built from the joint positions and strategically sampled bounding volume surface points. The paper calles this an Interaction Mesh and from what I got, it looks like the mesh attempts to capture limbs that are close to each other so that when one figure is altered, the limbs on the other figure will be morphed in such a way as to preserve the spacial relationship as much as possible. I didn’t understand the math other but it sounds like the Delaunay tetrahedralization method will give you the mesh and Laplacian coordinates will allow you to modify the mesh while preserving local details. It would be nice if we could talk about Laplacian coordinates in class.
The way that I kind of envision it is that each character has springs strategically placed on them such that they can’t get to close to the other character and when they do, the sprints push them away.
The big win with this method is that, even when one figure’s size is changed, the motions still looks pretty good with little to no intersections.
The con seems to be that there is a limit to how much a figure can be changed. If the figure get’s to big, constraints might break down and the result may look unnatural. On the flip side, from the video on their site, it also looked like if the models got to small, there might be figure intersection issues.
One use that pops out at me is path finding in games in which terrain is dynamically changing. For example, if a space ship is flying through a canyon, it’s path might be modeled as a straight line in the interference mesh with the canyon around it modeled as the second figure. If the canyon where to collapse, the flight path of the ship might be updated using this method.
I read #1 and #3
#1 Uses a representation that tries to be as independent as possible to actual character morphology, while at the same time making it easier to treat constraints, such as foot plants. For this, they use a normalized bone and half-plane representation for the motion. Since the bones are not stored in an absolute sense, this allows them to deal with different morphologies, and the half-plane representation makes it easier to deal is different size bones.
#3 uses a delunay tetrahedralization to represent the spatial constraints and relationships in 3d. By warping the points in the mesh, one can satisfy certain constraints. This reminds me a lot of warp-grid based image retargeting methods (and even a bit about my own stuff on planar map operations…). This interaction mesh makes it easier to retarget a motion to characters of different proportions.
Of these two techniques, I can see more general uses for #3. It should be possible to apply these directly to marker data (in fact, the paper essentially uses virtual markers). It should also be possible to retarget motions to characters of different morphologies by defining a mapping from one initial interaction mesh to another, and then applying the mapping in each frame.
It is clear that both representations can solve
the problem of fitting motion data to figures that
do not match the original form from which the
motion data was collected. However, each has its
own focus and intent.
The method from the first two papers, which can be
described as a “loose” skeleton, maintains a
traditional skeleton structure only for the end
points, ie the hands and feet, and the solid
pieces of the body, ie hips and shoulders. The
rest of the skeleton is represented by splines,
the spine, or a single, variable length,
connection that represents a jointed extension,
the arms and the legs.
When the skeleton is needed to be used, the
system then computes for the positions of the
arm and leg joints based on mass and the
dimensions of the body it is being fitted to. This
allows the skeleton to be used in a variety of
forms without the heavy computation needed to
dynamically adapt a traditional skeleton whose
joints are all rigidly defined. And by using a
limited amount of dynamics constraints, the
motions are designed to maintain realistic
balance, which the authors in the first paper
found important for athletics simulations.
One benefit to this method may be in ease of
motion capture. If the arm and leg joints are not
explicitly needed, markers may only be needed on
the extremities and core of the body. On the
other hand, this may lose the fine details of
elbow and knee movement. And like traditional
methods of motion management, this method does not
keep track of spacial relationships as the third
paper’s method does.
The third paper discusses a motion system where
instead of placing a skeleton inside a model, the
skeleton forms a “exoskeleton” around not just the
model but the objects the model is interacting
with. By use of gradual deformations, the
exoskeleton can ensure that the implicit
constraints defined by a motion’s original
interactions are preserved as an animator alters
the dimensions of a model. But probably the most
important aspect of the system is that it can
determine these spacial relationships
automatically from the original motion, without an
artist specifying all the constraints.
While the results are quite impressive for the
fight scenes demonstrated and the close
environment demos, the major drawbacks of this
method are a lack of realism constraints and
time. While the system preserves spacial
relationships, the animator is free to alter the
motion in any way, quite possibly producing a
motion that is unnatural, but still similar to the
original. Also, the approach takes considerable
time as the exoskeleton must be reevaluated over
all the frames if a change is made. The authors
predict significant improvements in this area if
the computations can be moved into a GPU, possibly
as a shader.
Assuming that the computations can be moved to the
GPU and preformed fasted on a parallel
architecture, this method could allow more dynamic
adjustments to characters in a video game. Imagine
not just a character growing or shrinking
uniformly, but having just a body part or two
altering dimensions and still being able to
preform the motion correctly. This may be realized
as a character whose limbs are actually pseudopods
and needs the limbs to change size on demand. It
would also allow a player to more dramatically
customize his or her avatar in game, a practice
that has become more and more common in role
playing games. Current customizations are limited
to mainly cosmetic details, with the body shape
remaining constant.
1: The first method has the advantage of only requiring a small database of motions to work, and with little or no preprocessing. Also, by using the center of mass and a kinetic solver, realistic balance preserving motions are produced.
3: The second method is targeted at close proximity interactions between a character and the environment (including the character itself). The advantage of this method is that it can handle such close interactions with characters that have been rescaled from the original motion capture. This suggests the ability to have alien or non-human characters interact reasonably realistically.
Interestingly both methods claim to handle scaled characters better than traditional methods, and to do so at real-time/interactive speeds. Where the first paper aims to extend motions from a small database, the second method tries to preserve existing close interaction. The first method could be desirable in situations where budget to produce motion capture data is limited, or only a set amount of data is available. The second method seems less interested in extending motion and more in adjusting them to fit new characters.
I read the 2nd and 3rd papers.
#2 essentially presents a simplified, what they call normalized, skeleton as an alternative to the standard skeletal representations of human motion. Constraints are stored with the motion, and can also be specified interactively. First the skeleton is converted to the normalized skeleton; adapted to the specific character’s size, environment, and constraints; and then converted back into a standard skeleton. This system seems most applicable to scenes with lots of characters interacting with each other, since the computations have all been simplified. One thing that wasn’t clear to me is how they make sure motions preserve a certain style or naturalness. The authors seem more concerned with simply getting the joints to their constraints in the quickest and most computationally efficient manner possible.
In the third paper the authors introduce the concept of an interactive mesh for preserving spatial relationships in a scene. For example, in a wrestling motion, they show how they can make one guy fat and the other one skinny and the motion still looks natural and correct. This seems to be a great idea to me, but I think they might be overselling a bit how often we want to be constantly preserving spatial relationships in our scenes above all other things. For example, they show how moving one character’s hand forward will have the side effect of moving the other character’s head proportionally backward. This system obviously wouldn’t work well if we wanted to change the motion to have the first character’s hand strike the second character’s head. Another drawback that I found interesting in light of our class conversations was the authors’ comment on how they tend to oversmooth the motions with their acceleration energy term.
Not sure if this is linked elsewhere, but here are some videos from Richard Kulpa’s website showing off some concepts presented in the MKM papers: http://www.irisa.fr/bunraku/Richard.Kulpa/
Paper: Morphology-independent representation of motions for interactive
human-like animation.
Unfamiliar terms : Cyclic Coordinate Descent Algorithm.
Basic ideas: Adapting motions to new space-time constraints in real-time is perhaps not possible with large database of motion data. The authors aim at controlling the motion with small motion capture database.
1. Apply IK to a subset of kinematic chain. Use intermediate skeleton (and Normalized ) that has less degree of freedom. Human body is subdivided into kinematic sub-chains ( Six groups ).
2. Prioritize the constraints. The constraints with low priorities are only verified after those associated with higher priorities.
3. Use specific constraint solver which take advantages of heuristics in iterative search process.
Comments: At first reading, the whole ideas seems promising but the authors have clearly mentioned the limitation of their approach. I am not quite sure how much computational saving someone can expect on modern hardware by looking at figure 10 where only elbows and Knees are simplified.
But this has advantage where real-time interaction is the main goal because of less
computational requirements.
**********************************************************************
Paper : Spatial Relationship Preserving Character Motion Adaptation.
Unfamiliar terms:
(1) Laplacian deformation techniques
(2) Laplacian Coordinates.
(3) Gauss linking Integral.
(4) Topological Coordinate ( Very strange term ).
Comments:
Volumetric meshes have been used for varieties of applications. But I am not certain about the word “a new representation” in this paper. Are they storing only the “Interaction Mesh” in their file format or the mesh is generated only for collision detection using classical joints information. So is it primary or secondary representation ?
I am also not quite sure how rigid is the spatial relationships between two actors. Are they allowed to act independently ? If it is soft, then how it is automated ?
But, I can see some big advantages of this representation over others
(1) it could be very powerful tool to synchronize motions among multiple actors
(for example in choreography ).
(2) Anticipation: One actor can well prepare in advance for the future actions based on proximity information gained from the mesh.
But the great flexibility provided by this representation could also be bane, as it could be easily abused to produce unrealistic motion.
I read “Spacial Relationship Preserving Character Motion Adaption” and “Morphology-independent representation of motions for interactive human-like animation.”
The “Morphology-independent” paper provided a way to take motion data and combine it into a new form of the motion which includes constraints built in to the representation. The feet can be forced to be on the ground or wrists above a certain level for instance. For motions between two models, it also allows adding a positional constraint for one model that is itself tied to the position of a different model.
The “Spacial Relationship” paper is directed more towards close interactions between models and other objects. It attempts to reduce the intersection of two models with the goal to keep them close to touching but not going through each other. It can also handle two different parts of the same model interacting.
In general, the Morphology paper seems like it would be more useful for helping with basic constraints such as positioning of an arm or leg. The Spacial Relationship paper seems more useful for complex situations where the approximate motions are already defined and fine tuning of the exact position is needed for interactions. So when subtle changes in interactions are needed, the Spacial Relationship ideas might be more useful. When larger changes in a specific portion of the model are needed, the ideas from the Morphology paper might be better.
Multon’s global framework for animating humans:
Representation:
Body is divided into kinematic sub-chains using: variable length limbs, normalized segments, spline spines, and half-planes; some limbs are not encoded.
Pros:
-capable of retargeting motion to any arbitrary skeleton without using classical IK
-solves kinematics for modifying gestures to adapt to continually changing constraints in the virtual world
-doesn’t require a large database of motion data
-responsive to user specified start/stop of motion
Cons:
-physical dynamics are not taken into account
-cannot control jumps realistically
-the constraint solvers seem incapable of compensating for postures that don’t maintain balance; this may be limiting (probably in a good way, but it is still limiting)
-highly dynamic motions are not well represented
Use:
I could see this representation being used for a lower-end simulation of environments with mostly passive characters. The main issue with this representation appears to be its ability to handle highly dynamic motions. So using this representation for avatars (with unique morphology) in a social VE such as “Second Life” would be ideal; a new level of environmental responsiveness could be added to the avatars while computation could be kept low enough to enable interaction of many characters.
Spatial Relationship Preserving Character Motion Adaptation:
Representation:
Uses a volumetric interaction mesh defined by the joints/vertices of characters/objects with which the characters are interacting; this is computed for every time frame. This mesh is retargeted to different characters/objects by maintaining local details of the computed mesh.
Pros:
-preserves spatial relationships between characters or environments for close -interactions
-can retarget semantics of motion to characters with different morphologies
Cons:
-cannot accurately represent narrow body parts
-small morph steps are needed to deter artifacts
-penetrations can occur with environments that are too constrained/small
-unstable when original motion contains movements where body parts pass through each other
-O(m * n) complexity; m = vertices in mesh, n = number of frames
Use:
First of all, I find the underlying idea of this method to be extremely promising in terms of video games. Even today, close interactions between characters in video games is severely lacking; penetration of geometry in these situations seems to be accepted as a norm (sometimes ‘hidden’ with particle effects or improbable collisions). Unfortunately, this technique doesn’t seem suitable for real-time yet due to the complexity and lack of scalability (between multiple objects/characters). At this point, I think this method will best be used as a tool for offline animation of close interaction for a single motion to be mapped to different characters. For instance, if a developer wanted to implement an improved, yet general, melee system (same attacks/motions) for a game with multiple character types or races (I’m thinking World of Warcraft or The Elder Scrolls), this technique would greatly expedite the process of defining these ‘static’ motions for the multiple types.
I read papers 2 and 3.
Paper two presents a characterization for capturing a more generalize essence of the overall motion. Motions are represented as normalized hierarchical collections of bones and half-planes to facilitate motion adaptation. The method does make real-time adaptation of the motion of a character simple and the hierarchical frameworks defined therein help to induce natural motion. However, the paper acknowledges limitations of the lack of control over center of mass results in difficulties in creating realistic motions, explicit attention is not really given to collision detection, and it appears a lot of user effort must go into defining constraints. At first glance, these methods seem like they would be very useful for adapting a particular piece of motion data to a scenario with a well-defined set of constraints.
Paper three provides a different spin on retargeting. The paper focuses on the preservation of relative spacing between key figures in an animation. A volume is defined between joint positions of two actors and the deformation of this volume is minimized as the different motions are retargeted. This method supports simple computation of the relative motion of characters by using the joint positions instead of angles. However, as mentioned in the work, extreme changes in scale over a tightly bound volume would break this method. Also, overly complicated volumes could bog down the computation in real-time environments. This method would be incredibly useful in cases where the real-time relative motion between characters is important. It may provide an efficient real-time method for computing motion in an environment with relatively few constraints where character interaction is key to the scene; especially in video games and in retargeting to non-traditional character shapes.
I read papers 1 and 3
1 – The core idea in the paper is to use a representation that is independent of the morphology of the character and are more adaptable than joint angles for motion re-targeting. The skeleton is simply represented using a bunch of normalized segments, half-planes and a spline and can be scaled easily for another character – thus eliminating the need for a large database. The priority associated with ‘heaviness’ of the groups is pretty interesting and according to the paper produces realistic results. However this method fails to take dynamics into account. This simplistic method is therefore suitable for secondary characters which do not draw much attention from the viewer.
3 – This paper introduces the idea of interactive mesh, the use of which allows more realistic interaction of a character with it’s environment along with automatic constraint preservation. This allows motion to be retargeted to scaled versions of same characters without the animator having to manually alter the constraints. This produces impressive results under difficult constraint cases while maintaining balance and natural movement. However computing the mesh is computationally very expensive and one can expect them to be even more complicated for a larger more detailed character. Therefore this is applicable to scenes in which there are less background characters. While watching the videos what caught my eye more than once was that it produced unrealistic outputs. In the Judo example, inspite of the opponent being scaled up and the other scaled down, this lean thin guy is able to tackle the heavy guy very easily. Isn’t the mass associated with the mesh also scaled up?
I Read Morphology-independent representation of motions for interactive human-like animation and Spatial Relationship Preserving Character Motion Adaptation both papers present new ways of representing human motion.
The Multon/ Kulpa, paper introduces a approach that is suited for real time animation of interactive environments due to it’s reduced computational cost. The key ideas are to normalize the skeleton to contain only one body segment and separate upper and lower limbs (upper arm/thigh)and represent the spine as a spine and consider the segments as a kinematic chain. Another key idea is storing the constraints together with the motion. An iterative process is used to model the motion, but its claimed to take only few computation times to solve for the simplified skeleton. The fast computation time is a definite advantage of the new representation, but I think due to simplifying the structure too much they lose some naturalness in the motion. So this method would better suited for modeling crowds/ background animation than the main character.
The Edmond / Komura Paper introduces a way of using a interaction mesh for a motion by assuming that the mesh characters are rigged with skeletons, and each body segment is bounded by a volume. Postures of the characters are represented by the positions of the joints, rather than the joint angles. A spacetime optimization problem is solved to adapt the motion at each morph-step and at every morph-step, the body sizes and the positional constraints are updated, and the motions of the characters are adapted by minimizing the sum of the deformation acceleration and constraint energy. The method used in the paper performs better than using kinematic constraints and collision avoidance for motions that involves many close interactions. The method seems fail when the constraints are drastically different from those in the original motion, such as when one of the interacting characters is scaled too small or too large. Using the interaction mesh could be used for modeling many kids of close interactions such as hand to hand combat dancing and navigating through obstacles and working with tools or utensils.
I read 1 & 3.
MKM (1) proposes a framework that is based on a morphology-free representation of a character, wherein the skeleton is represented using half-planes, a spline and normalized segments, that allows operations such as motion sync, blending, retargeting and adaptation. This method is useful when you don’t have a large motion database or when the motions can be varied (user driven) and allows motion synthesis at interactive rates (30Hz).As the authors mentioned, crowd simulation is a natural application apart from VR & e-learning. It doesn’t do a good job with penetration and can lead to unrealistic motions as a result of neglecting the mass.
3 proposes the idea of an interactive mesh that enables more realistic editing and retargeting of motion that has a number of close interactions. The interaction mesh is essentially a data structure that captures the spatial relationships between objects/body parts and tries to minimize the deformation of this mesh locally (taking into account the previous frames) by using a spacetime optimization subject to positional, bone-length and collision constraints. As a result, the animator doesn’t need to modify the constraints while editing/retargeting, since they are “captured” via the interaction mesh. The computation is real time when there are very few “interacting” characters and becomes a natural favorite for games like wrestling and fighting!
I read papers 2 and 3.
The paper “Morphology independent representation of motions for interactive human-like animation” describes a method of skeletal representation that is dimensionless, and so “morphology independent”. Additionally, the authors developed a technique for approximately solving constantly-updating constraints in real time. “Spatial relationship preserving character motion adaptation” describes an interaction mesh that is designed to maintain relationships between the locations of characters’ joints who are interacting closely with (but maybe not quite touching) one another. By minimizing the Laplacian deformation of this mesh subject to bone length, position, and collision constraints, the authors were able to maintain the “high-level” semantics of motion sequences after changing the sizes of characters.
Both techniques are “morphology independent”, and well suited to a rapidly changing environment or interactive application with multiple characters. Both have their limitations. The mesh technique seems to be more robust overall (the other technique fails when a character is upside down), but it is limited, to some extent, to close quarters or close interactions. If the environment is too sparse then there presumably won’t be enough vertices to create an interaction mesh. Both techniques seem to struggle with properly positioning the center of mass of the characters. Honestly, the authors of the “morphology independent” paper don’t do a very good job selling their technique as something that could have any commercial application. They mention more than once the danger of a “inhomogeneous repartition of the deformation along the kinematic chain”. (I don’t know what that means but it doesn’t sound good.) The other technique seemed to work very well for a specific set of motions. I liked that the authors used spacetime optimization so that characters could anticipate motions a few frames in the future. This technique could probably be applied to gaming, or maybe there could be some ergonomics application that involved studying different peoples’ movement in restrictive environments.