For drawing graphics in 3D, a camera is ubiquitous, and absolutely necessary. It provides a means for transforming a model’s vertices so that they appear in a specific orientation on the screen. A camera is also responsible for culling graphics, so that the CPU and GPU don’t have to draw graphics where they don’t need to, such as a graphic that is behind the camera, and is not normally visible. However, when it comes to 2D graphics, a camera is less essential, but for certain games and applications, having one can provide some really cool graphics techniques.
To start, let’s talk about how the 3D camera works, so that when we convert to 2D, we know what is going on. A 3D camera can have many configurations in different graphics engines, but the principles are the same: store a matrix that defines how to orient objects from the camera’s perspective. Many cameras use a position/target/up system, in which the camera stores a vector for position, a vector that points in the direction the camera is looking, and a vector that points to the direction considered “upwards” to the camera. These can be manipulated to turn and position the camera. Another system is to store the position as a vector and the orientation as a quaternion. Depending on what you are using the camera for, one of these systems can be better than the other. Other variables that a camera may have is a field of view, which is used when projecting a 3D scene onto a 2D surface and can be used to control zoom, a minimum and maximum clipping distance, which determine the range in which graphics are drawn, and an aspect ratio, which determines the width and height of the final projection.
All of these variables are used to calculate what is called a view/projection matrix, which is used to transform objects into camera/draw space. A view matrix is responsible for transforming 3D spacial coordinates, and works like your typical transform matrix, that is, by creating a series of transformations to convert the coordinates of objects as if the camera were at the origin of space instead of the usual arbitrary world origin. The projection matrix is responsible for mapping a 3D scene into 2 dimensions, effectively flattening the scene. The projection matrix is a bit useless when considering a 2D camera, since we don’t need to change the dimensionality of the scene.
For a 2D sprite camera, we can eliminate certain 3D complexities. We can store a 2D position, which defines the location of the center of the camera’s viewport in world coordinates. We also need a zoom variable, to change the size of the camera’s viewport, effectively scaling the scene by a certain amount. We can also store an orientation, if we want to. Since a 2D orientation only has one degree of freedom (that is, it can only change in one set of directions), we can use a single variable for this. This variable will indicate the size of the angle between the camera’s local x axis and the world x axis. We also don’t need much projection data. However, we do need one thing: the size of the screen we are drawing to. Why we need this is seen in the next few sections.
Just like my previous post on relativity when dealing with collisions, the concept of relativity shows up a lot when dealing with graphics and cameras. This will be useful when considering how to transform 2D graphics so that they display appropriately. Consider the graphical world that exists in your game as a set of three different worlds.
One world is the ultimate world, in which all of your points are defined from an arbitrary point. Any positional data you have is defined in world coordinates. Depending on the graphics engine you use, a common point to define the world origin is either the top-left corner or the bottom-left corner, which correlate directly to the top or bottom left corner of a computer’s monitor. The x axis will increase as it moves right, and the y axis will increase downwards or upwards respectively. In actuality, it doesn’t matter where you define your world origin, but for this article we will consider the origin in the top-left corner.
The second world is the one that the camera sees. We often call this camera space or a camera viewport. For our camera, we define the origin to be the center of the viewport. The y axis progresses downward from this center, and the x axis progresses to the left. Note that the axes aren’t always parallel to the world axes, since our camera has the ability to rotate.
The third world is that of the actual screen, or the computer’s monitor. This is often called screen space or simply the viewport. This world is the only one in which you don’t define your origin. Depending on your graphics engine, the origin can be either the top-left corner or the bottom-left corner. DirectX by default sets the origin at the top-left, while OpenGL sets it at the bottom-left. For this article, we will consider the DirectX case. For screen space, one coordinate is translated to the number of pixels over and down from the top-left of the screen. For example, the point (100, 150) in screen space directly translates to 100 pixels from the left and 150 pixels down on the screen.
So if we have three worlds, how to we get objects in one to be drawn in another? The key is transformations. Like in 3D, we will transform objects by rotating, scaling, and translating their coordinates before drawing them, so that they appear in the right place, as seen by the camera. We will start with the simplest case and build a general transformation from there.
The simplest transform is no transform. This means that the camera has a zoom level of one, an orientation of 0, and a position at the origin. However, there is a problem with this. Since we defined our camera’s position as the center of its space, while the screen space defines its origin at the top-left corner, objects positioned at the origin, which are supposed to be drawn in the center of the screen, are actually drawn at the top-left corner. So how do we fix this? Well, that is where our first transform comes in. So assume you want to draw an object in the center of the screen, and you set its position to be (0, 0). We need to change this location in a way so that it draws to the center of the screen. Well, since we know that the screen space origin is at the top-left hand corner, and if we know the screen’s width and height, we can easily find its center: C = (width / 2, height / 2). So when we have an object at (0, 0) and we want its position to be in the center of the screen, we simply add C to (0, 0) to get the center of screen space.
So, in order to transform everything into screen coordinates, we can define the transform to be:
Pc = Pw + C
where Pc is the position in screen coordinates, Pw is the position in world coordinates, and C is the center of the screen viewport, (width / 2, height / 2)
So good so far.
Now for the next tranform: translation. Translation is used to alter the position of an object. We are going to approach this from another special case of the camera. Consider the same camera as in the previous example, except that its position is no longer at the origin. Lets say that the camera’s location is at Sc = (x, y). This means that any object that is at location (x, y) should get drawn to the screen. So, what we want to do is change an object’s position from world coordinates to camera space coordinates, and then change the camera space coordinates to viewport coordinates. We know how to change camera coordinates to viewport coordinates if the camera is at the origin, as stated above, so what we are really trying to do is convert the object’s coordinates in such a way that the position of the camera becomes the origin. So, if an object is at position (x, y), then it is at position (0, 0) in camera coordinates (since (0, 0) is the center in camera coordinates. So, to convert the world position to the camera position, we can simply subtract the object’s world position from the camera’s world position. Then, we can convert the camera coordinates to viewport coordinates using the same equation as stated above.
So, the equation for translating an object’s location to screen coordinates is:
Pc = (Pw – Sc) + C
where Sc is the position of the camera in world coordinates.
Well, thats not too hard.
It gets a little more challenging when dealing with zoom. So, what is zoom, and how do we define it. Basically, a zoom level of one means that objects appear their actual size on the screen; that is, they are scaled by one, which doesn’t scale them at all. If we have a zoom level of two, objects appear twice their size on the screen. So, is there a way to define this in another way? It turns out there is. We can show that the zoom level is simply a ratio between the screen size and the camera size. This is simply saying that at a zoom level of two, the camera’s viewport is half the size of the screen’s viewport, which means that with width of the camera’s viewport is half of the width of the screen, and the hight of the camera’s viewport is half of the height of the screen. So, zoom becomes:
z = Sw / Cw = Sh / Ch
where z is the zoom, Sw and Sh are the width and height of the screen, and Cw and Ch are the width and height of the camera’s viewport.
So, what does it mean to have zoom? Well, we know that if the zoom is 2, objects appear twice their size on the screen, so we know we need to scale every object that we draw by the zoom. But zoom is also going to have an effect on distance, which defines our translation. Distances in camera space are twice as big as they are in world space, so we will have to factor that in when we apply our translation. So, our translation gets scaled by the zoom level in order to convert the object’s world position into camera coordinates:
Pc = z(Pw – Sc) + C
Well, that didn’t seem so hard.
Finally, we need to account for camera orientation. If the camera has an orientation of r, that means that the angle between it’s local x axis and the world x axis is r radians. You can think of this as imagining the camera viewport as a rectangle, and that rectangle has an orientation of r radians about its center. So, in order to convert an object’s world position into camera coordinates, we must account for this rotation; that is, we must rotate the object’s position around the camera’s center. However, to do this, we must rotate by the negative of r, because we want to effectively “rotate” the camera so that it has an orientation of 0 radians, and when we rotate the camera, everything else in the world moves with it.
In order to rotate a point around another point, we must rely on matrices. We can think of vectors as row matrices: so, the vector (x, y) becomes the matrix [x y]. This allows us to perform matrix multiplication on this vector. The matrix we want to multiply by is called a rotation matrix (The origin of this rotation matrix is beyond the scope of this article). A rotation matrix R for any rotation a, in radians, is defined as:
R = [cos(a) sin(a)]
[-sin(a) cos(a)]
So, we simply multiply the vector (x, y) by this matrix to give the new, rotated point:
[x' y'] = [x y] * R
This equates to:
x’ = x cos(a) – y sin(a)
y’ = x sin(a) + y cos(a)
So, what do we need this for? Well, in order to convert to camera coordinates from world coordinates, we need to rotate the world coordinates by the negative of the angle of the camera to get the camera coordinates. So, we can define the camera coordinates as Pw’, which can be calculated by:
Pw’.x = z(Pw.x – Sc.x) cos(-r) – z(Pw.y – Sc.y) sin(-r)
Pw’.y = z(Pw.x – Sc.x) sin(-r) + z(Pw.y – Sc.y) cos(-r)
So, now we know the camera coordinates, which have been translated by subtracting Sc from Pw, scaled by the zoom level z, and rotated by the negative of the camera rotation, r. The only thing left to do is convert the camera coordinates into screen coordinates, which was the first thing we did:
Pc = Pw’ + C
This can be expanded to:
Pc.x = z(Pw.x – Sc.x) cos(-r) – z(Pw.y – Sc.y) sin(-r) + C.x
Pc.y = z(Pw.x – Sc.x) sin(-r) + z(Pw.y – Sc.y) cos(-r) + C.y
So, once you find Pc, then you just draw your object to the screen at location Pc, with scale z and rotation -r.
In one of my previous posts, I talked about a few revelations that I had when developing a physics engine. Many of these ideas dealt with collision detection using an object’s velocity rather than an object’s position. This means that instead of waiting for two objects to intersect geometrically to register collisions, you can interpolate collisions using velocities. However, this bit of information is not particularly practical. The reasons for using intersection algorithms instead of interpolating velocities are numerous if the wrong algorithms for interpolation are used. It is definitely a lot easier to simply determine if a polygon is intersecting another polygon as opposed to trying to interpolate two entire polygons over a discrete time interval just to determine when they first collided. In order to effectively develop a physics engine based on these concepts, we have to study some history.
There once was a very smart man, who studied physics in much the same light that we will for this article. His name was Einstein and he was fundamental in redefining modern physics. Einstein’s strong suit was the topic of relativity. He often said that everything is relative, and that is in fact the case. Both space and time can be relative, and so can their derivatives of velocity and acceleration. This means that we can calculate velocities of objects with relation to other objects, and even calculate shapes of objects with relation to other objects. Using this, I will effectively convert a collision between two moving polygons into a collision between a stationary polygon and a moving particle.
The first step of this seemingly obscure and impossible attempt at converting a collision is by calculating relative velocities. First off, we must define what it means to be relative to something in space. For any object, a spacial coordinate will be defined, which defines the object’s position in space. This coordinate, or position, is actually simply a displacement from some chosen origin. This coordinate will be in what’s called “world space” or “world coordinates” if the chosen origin is the same for any other object we compare to. However, since the origin is chosen, we can make it anything we want. This is very beneficial for defining vertices of polygons, since we can define the location of each vertex in “local coordinates”, meaning that the position of the center of that polygon is the origin for our vertices. For example, a square with side length of 10 would have vertices in local coordinates of (-5, -5), (-5, 5), (5, -5), and (5, 5), which, if you draw these vertices on a Cartesian coordinate system, creates a square of side length 10 with a center exactly at the origin, or (0,0). Now say we know this square’s center has a position of (100, 100). In order to convert the vertices of the square from local coordinates to world coordinates, each vertex must be put under a linear translation. This translation is equal to the translation from the center of the square to the origin of the world that it is in, which happens to be (100, 100). So to convert the vertices of the square to world coordinates, we simply add the world position of the square’s center to the local coordinates of each vertex of the square, resulting in vertices of (95, 95), (105, 95), (95, 105), and (105, 105). If you were to draw each of these vertices on the same coordinate system as the previous set of vertices, you would see the same square of side length 10, but now it would have a center exactly at (100, 100). So, now we can convert any polygon into any coordinate space by just knowing the translation from that polygon’s center to the point that defines the origin of the coordinate space. We can take any point in the world as the origin of this space, and this is most useful when we take the center of another polygon as the origin. For example, if I have two circles, one at (50, 100) and another at (100, 100), I can set the origin of a new coordinate space as the center of the first square, at (50, 100). Then, I can subtract (50, 100) from each circle, and get two circles, one at (0, 0, and the other at (50, 0). This at least makes the math a little easier.
Since velocity is just displacement over time, we can do the same thing with velocity as we did with position, since our time change is usually the same for any object in the world. This allows us to convert two moving objects into one stationary object and one moving object by subtracting the velocity of one object from both. Mathematically, this looks like:
Va/b = Va – Vb
Vb/b = Vb – Vb = 0
where Va is object a’s velocity, Vb is object b’s velocity, Va/b is object a’s velocity with respect to object b, and Vb/b is object b’s velocity with respect to itself, which is always zero.
So now I can convert a collision of two moving objects into one moving object and one stationary object by simply altering the velocity of each object. This certainly makes interpolating a collision time and point easier, but it still seems nearly impossible, especially given the desire to support any type of polygon.
Fortunately, I can further simplify the problem into the collision between a stationary polygon and a moving point. By taking relative shapes, I can convert shape a into a point and shape b into a new polygon. However, as shown above, in order to do the same kind of relativity as shown above with velocity, I would need to subtract polygon b from both polygon a and polygon b. But isn’t it impossible to subtract two shapes? The answer is no.
We will have to dive back into history to do polygon arithmetic. We will now be talking about another really smart guy by the name of Hermann Minkouski, who came up with a way to do polygon arithmetic in what’s called the Minkouski sum. Minkouski said that in order to add to shapes together, say a square and a triangle, you would have to add every point in the square to every point in the triangle. Since the Minkouski sum is just a bunch of vector sums, we can multiply one by a negative one and get subtraction. This means that we can subtract two shapes by taking every point in the first and subtracting by every point in the second. This solves our problem quite nicely, since we can now use this to convert our collision into a nice, easy, point-shape collision. Mathematically, this looks like:
Sa/b = Sa – Sb
Sb/b = Sb – Sb = a point in space
where Sa is shape a, Sb is shape b, Sa/b is the shape of a with respect to b (called a Configuration Space Obstacle, or CSO), and Sb/b is the shape of b with respect to b, which is the geometric version of zero, which is simply a point in space.
Now that we can take relative velocities, meaning that I can convert two moving objects into one moving object and one stationary object, and we can take relative shapes, meaning that I can convert two polygons into one polygon and one point, I can combine the two to convert two moving objects into one stationary object and one moving point. This certainly makes collision much, much easier since the algorithms to interpolate the collision point and time of two moving objects boils down to a simple ray collision between a point and a polygon. Another thing to note is that by taking the Minkouski difference of two shapes, the shape that gets turned into a point is automatically turned into the origin as well, meaning its relative position is naturally (0, 0), which even further simplifies the algorithm.
Thanks to two very smart guys, Einstein and Minkouski, we can now convert every collision of two moving objects into a collision with a point located at (0, 0) and a polygon, which becomes a ray collision, where the ray starts at (0, 0) and travels in the direction of the relative velocity of the object. Once we find the point where this ray intersects the CSO, (this is the hard part), we can calculate the time of impact, using the d=vt equation (d = distance from collision point on the CSO to the origin, and v = the relative velocity). As an example, say I have two moving objects, object a (Oa), and object b (Ob). To find the relative velocity, I take:
Vrel = Va – Vb
which makes object b stationary and object a have a velocity relative to object b. To find the CSO, I would take:
CSO = Sb – Sa
which turns object a into a point. Now I find the intersection point. between the CSO and the ray with direction of Vrel and start point of (0, 0). Once I have this intersection point, I can take the distance from this point to the origin, and divide that by the magnitude of Vrel to get the exact time of impact of the collision. Another thing to note is that the intersection point on the CSO contains the points on object a and object b that made it (remember that the CSO is simply every point on object a subtracted from object b), which allows us to extract the exact point of impact on each object as well. Now compare this algorithm to the often extensive algorithms to find the exact same data by simply looking at a geometric intersection of two objects. Using this method, we avoid issues like tunneling, inter-penetration, and impulse bias, as well as providing easier ways to solve problems like the Linear Complementary Problem, as well as resting contacts (Vrel would simply be zero in the case of a resting contact).
However, there are still some problems that we still need to address. For example, it is very inefficient to actually calculate the CSO entirely, since the equation to do so becomes O(n * m) for subtracting a polygon with n vertices from one with m vertices, and it alos makes dealing with circles and other curved shapes nearly impossible. However, there are many simple algorithms that solve the problem, and create nearly O(1) time collision detection algorithms for each pair of colliding objects. I will talk about such algorithms in the next post. However, for those who cannot wait, you can look up the GJK algorithm was well as Gino van den Bergen’s paper on Ray Casting against General Convex Objects with Application to Continuous Collision Detection. Another great resource is from Squirrel Eiserloh’s GDC 2008 pwerpoint, which deals with this very issue of relativity and CCD, found here: GDC 08 – Physics Reframing
Collision detection is pretty much the bane of all physics engines. While most other aspects of a physics engine, such as forces, resolution, and geometries are fairly simple and can usually be expressed by a series of clever computer math, collision detection is not so easy. There are may aspects of it, such as broad-phase, narrow-phase, and contacts. Collision detection is also where most of the heavy work of a physics engine occurs, and if it is done wrong, it can often take just as long, if not longer, as a graphics engine to go through one cycle.
As lead programmer of a very small start-up game company, it is in fact my job to create a fairly lite but powerful 2D physics engine. Now, unlike graphics, physics is basically exactly the same for 3D games as it is for 2D games, except that the z coordinate in 2D games is always 0. This means that both types of physics suffer the same problems. As I am still fairly new to physics engine programming, I had been pulling out my hair at the horrors that my physics engine was creating, from glitches to poor design. After a few weeks of research, however, I finally found a really clever solution.
Continuous Collision Detection, CCD, is a fairly new way to think of collision. Since the large majority of game developers forgo designing proprietary physics engines in lieu of Havok or PhysX, many developers are totally unfamiliar with modern physics engine technology. And perhaps the biggest problem with this is that many developers automatically approach physics from a discrete view. Since computers are in fact discrete, this may make sense. All game developers understand things like an update loop and a frame rate. However, this is very bad for physics. In the real world, time is continuous, and physics happens every instant. In games, we can’t do that, so we approximate by using discrete methods. These methods involve Euler or Varlet integration, always assuming constant acceleration (this is perfectly fine in most cases). At every frame, objects are moved and then checked for overlap. If overlap is found, then they are adjusted. This is normally fine for slow moving objects, and most physical interactions will be easily solved, and it often looks perfectly fine. However, when dealing with very small objects that travel very fast (such as a bullet), tunneling can occur. This means that in one frame, the bullet will be in front of a wall, but not overlapping with that wall. In the next frame, which can occur anywhere from 16 milliseconds (60 fps) to 32 milliseconds (30 fps), the bullet will have “traveled” far enough that it is now on the other side of the wall. It still does not overlap. Hmm, this is a big problem.
The solution to this problem requires a series of simple, subtle ideas that will drastically change the way you think about a physics engine.
Idea 1: Objects can only travel so far in a frame, due to their velocity and acceleration. This means that instead of testing for positional overlap, you can create a bubble around the object that represents the maximum distance it could possibly travel in one frame. This is great for culling. Another great idea to apply to broad-phase algorithms is that an object can never collide with objects behind it. So instead of running broad phase collision tests on all objects relative to all other objects, you can eliminate about half of them by comparing their positions to the current object vs. the current object’s velocity. For narrow phase collision, you can create a more precise method to test for collision by creating a swept shape, which is just a shape that has been stretched along its velocity.
Idea 2: Everything in physics only occurs because of relativity.
Einstein was a very smart man, and he came up with the concepts of relativity. With these concepts, we can create really smart algorithms for our physics engine that are guaranteed to be accurate. Consider this: If you know that two objects are near each other in one frame, and that they could collide, but you don’t know if they do, you might try and solve an equation involving two variables, such as the movement of body a vs. body b. However, this is very difficult, if not impossible. However, Einstein says that motion is relative. So by that idea, you could convert one object into a stationary object, and then calculate the motion of the other object relative to the stationary object. Mathematically, this is done by choosing one object and subtracting its velocity vector from both objects’ velocity vectors, i.e. bodyA.Velocity -= bodyB.Velocity and bodyB.Velocity – bodyB.Velocity. Doing this results in a stationary bodyB, and a new relative velocity for body A. This can greatly simplify the amount of work that is needed to be done on the collision tests because you are simply testing to see if one body intersects a stationary object.
Idea 3: The Minkouski difference can make a difference.
Hermann Minkouski, another very smart man, came up with an arithmetic algorithm for geometric shapes. It may seem a little weird, but he actually created a way to add a square to a triangle. This is done by adding all of the vector points on one to the vector points on another. The resulting shape can be analyzed and several interesting properties about how the two shapes relate can be determined. In the same manner, the Minkouski difference (which is just the Minkouski sum with a negative shape) results in a cool shape with which we can analyze and determine properties. The most helpful is the fact that if two shapes are intersecting, the shape resulting from a Minkouski difference will always contain the origin (assuming the shapes were subtracted using world coordinates). We can manipulate this property a little bit, and come up with a really powerful algorithm that can accurately find the time of collision between two objects. Consider this: Taking the Minkouski difference basically converts one shape into a point, while the other shape gets the extra points from the first shape. Since linear kinematics and dynamics are generally not related to the shape of the body, we can consider the point as a particle, and the shape resulting from the Minkouski difference as, well, a shape. Using Idea 2, we can make the shape static, so that it does not move, and the partical dynamic, in which is velocity is adjusted to be relative to the shape. Then, you can use basic ray-polygon intersection functions to determine the point of intersection, and do a little math (distance = velocity * time) to determine the exact time of collision. From there, it is easy to take the original shapes and determine their points of contact, and then resolve them. Note: By default, the Minkouski difference has complexity of O(n^2), but three very smart people created the GJK algorithm, which is much faster than brute forcing the difference.
By combining this three ideas, a quick and effective collision detection engine can be created, one that could easily compete with Havok or PhysX.
I will have an actual tutorial on some of the algorithms up shortly, so keep an eye out for those.