Pose (computer vision)

In computer vision and robotics, a typical task is to identify specific objects in an image and to determine each object's position and orientation relative to some coordinate system. This information can then be used, for example, to allow a robot to manipulate an object or to avoid moving into the object. The combination of position and orientation is referred to as the pose of an object, even though this concept is sometimes used only to describe the orientation. Exterior orientation and translation are also used as synonyms of pose.

The image data from which the pose of an object is determined can be either a single image, a stereo image pair, or an image sequence where, typically, the camera is moving with a known velocity. The objects which are considered can be rather general, including a living being or body parts, e.g., a head or hands. The methods which are used for determining the pose of an object, however, are usually specific for a class of objects and cannot generally be expected to work well for other types of objects.

The pose can be described by means of a rotation and translation transformation which brings the object from a reference pose to the observed pose. This rotation transformation can be represented in different ways, e.g., as a rotation matrix or a quaternion.

Pose estimation

The specific task of determining the pose of an object in an image (or stereo images, image sequence) is referred to as pose estimation. The pose estimation problem can be solved in different ways depending on the image sensor configuration, and choice of methodology. Three classes of methodologies can be distinguished:

  • Analytic or geometric methods: Given that the image sensor (camera) is calibrated and the mapping from 3D points in the scene and 2D points in the image is known. If also the geometry of the object is known, it means that the projected image of the object on the camera image is a well-known function of the object's pose. Once a set of control points on the object, typically corners or other feature points, has been identified, it is then possible to solve the pose transformation from a set of equations which relate the 3D coordinates of the points with their 2D image coordinates. Algorithms that determine the pose of a point cloud with respect to another point cloud are known as point set registration algorithms, if the correspondences between points are not already known.
  • Genetic algorithm methods: If the pose of an object does not have to be computed in real-time a genetic algorithm may be used. This approach is robust especially when the images are not perfectly calibrated. In this particular case, the pose represent the genetic representation and the error between the projection of the object control points with the image is the fitness function.
  • Learning-based methods: These methods use artificial learning-based system which learn the mapping from 2D image features to pose transformation. In short, this means that a sufficiently large set of images of the object, in different poses, must be presented to the system during a learning phase. Once the learning phase is completed, the system should be able to present an estimate of the object's pose given an image of the object.
gollark: <@!222424031368970240> If you're trying to make a sandbox which can't be broken even if you know it's there and are deliberately trying to remove it here are some things to watch out for- `getfenv`- `os.queueEvent` (if you run code which does basically any IO outside of the sandbox/with access to non-sandbox functions)- `debug`- `load` (it has some weird environment quirks)- `io` (due to, again, environment weirdness, depending on how you load the new FS API it might still use the regular one)- potential meddling with global APIs like `string` and/or metatables, to confuse your sandboxing codeand to hide it you probably also want to worry about- `debug`- `string.dump`- `error` (you can generate stack tracebacks in a really convoluted way using it, which could allow detecting the sandbox)- `error` (in some very convoluted way you can generate stack tracebacks using this and thus realize
gollark: Proper sandboxing is extremely hard. But if you want to protect against people/things not actively attempting to break it you can do quite well.
gollark: What happened to make you want to avoid programming anyway?
gollark: I don't really care enough about replacing DokuWiki now to pay money for it.
gollark: Well, I'm not a very good programmer, I dislike some aspects of the design but can't figure out how to neatly replace them, and I am REALLY bad at focusing on large projects.

See also

References

  • Linda G. Shapiro and George C. Stockman (2001). Computer Vision. Prentice Hall. ISBN 0-13-030796-3.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.