Look, Ma, No Matrices!

https://github.com/enkimute/LookMaNoMatrices
A forward renderer without matrices.

Steven De Keninck
Computer Vision Group • University of Amsterdam

Putting PGA ($\mathbb R_{3,0,1}$) to the test!

Since the 2019 SIGGRAPH course [1], Geometric Algebra, and Euclidean PGA (plane-based or projective geometric algebra) in particular, has been gaining traction within the computer graphics and machine learning communities [2, 3, 4]. Despite its broad applicability, including for higher dimensional geometry and physics, its adoption in traditional 3D graphics has been limited, often merely re-branding a dual quaternion as a PGA motor. The ’Look, Ma, No Matrices!’ project aims to broaden PGA’s application by introducing a modern, glTF-compliant, forward-rendering 3D engine that fully integrates the PGA algebra.

In this write up, we will go over this project, highlighting the solutions and techniques that are required when moving to a PGA implementation. It was at times tempting to start from the existing techniques and attempt an 'algebra-level' translation. This however often leads to unsatisfactory solutions and in order for PGA to truly reach its potential a more fundamental revisit is often needed. Algebra without geometry, indeed, is blind.

Introduction.

Matrices are everywhere in computer graphics. In fact, there was a time when 4x4 matrices were both baked into the GPU and a mandatory part of all graphics API's. This project would then simply not have been possible. Today however, pushed in no small part by the advancements in AI, GPUs are highly programmable scalar processors, no longer tied into the long gone fixed function pipeline. Yet, 4x4 matrices are still omnipresent. And why should they not be? They can represent all linear transformations, and the typical forward graphics pipeline indeed involves both rigid and projective transformations. Seems like a good fit.

Quaternions are also everywhere in computer graphics. Turns out, matrices have less than ideal interpolation properties, and modern formats like Khronos' glTF use quaternions for all their rotation needs. Fantastic for animations, and generally considered worth the cost of the unavoidable conversions to and from matrices.

Out in the real world, however, the vast majority of matrices in your typical 3D engine setup are going to be orthogonal matrices, encoding just rotations and translations. And this is where the motor manifold of PGA comes in. At a lower computational and memory cost, PGA motors encode the full set of Euclidean motions, additionally offering conversion free inclusion of quaternions and dual quaternions.

This of course raises the question - can we replace all matrices in a typical forward renderer by their PGA equivalents ? Is a true matrix free setup possible or even desirable ? Only one way to find out ...

But before we start, a short disclaimer. This project aims to replace matrices without compromise. That, of course, is not how one should approach any engineering problem. As a reference for this project we are using the Khronos glTF viewer. This is a sensible choice, as no doubt others will use this as reference, but it does not attempt to be an optimal implementation. It uses 4x4 matrices in full, and the built-in glsl operators for them for most operations, when experienced graphics programmers know there are still wins there. The point here is not to make the most optimal implementation, especially in light of some of the findings here, the most optimal implementation is most likely a hybrid solution, subject to a future writeup!

FPGA : Fast PGA!

A full introduction to PGA is outside of the scope of this article, and we will assume the reader is familiar with at least the material in [1, 5, 6]. Instead, we will focus on the choices made for this particular implementation, and specifically work out in detail the basic operators that are needed, both for CPU and GPU.

Basics and Basis

The PGA algebra is generated by four basis vectors $\mathbf e_0$ to $\mathbf e_3$. The $\mathbf e_1, \mathbf e_2, \mathbf e_3$ vectors map to the $x=0, y=0, z=0$ planes respectively while the special degenerate $\mathbf e_0$ vector represents the plane at infinity. These four generators are then combined to form six bivectors, four trivectors and a single quadvector that together with the scalar represent all of the PGA elements. Our specific choice for basis and memory layout was selected to minimize conversions when handling typical graphics data. All of the elements of PGA are intricately connected, an overview of our choices and how they map to transformations and geometric elements is in the following table (where the second row denotes the square of each element) :

e23

e31

e12

e01

e02

e03

e0123

e032

e013

e021

e123

-1

plane-reflection

Motor / Dual Quaternion / Lie Group

point-reflection

Quaternion

plane
$a\mathbf e_1 + b\mathbf e_2 + c\mathbf e_3 + d\mathbf e_0 = 0$

Line through orig.

∞ line

point
$(x\mathbf e_1 + y\mathbf e_2 + z\mathbf e_3 + w\mathbf e_0)^*$

Line / Lie Algebra

vector

bivector

PSS

trivector

These choices translate to the following simple shader structures, where we opted to stay within the built-in types to retain addition, subtraction and scalar multiplication. (glsl does not support operator overloading for custom types).

#define motor     mat2x4  // [ [s, e23, e31, e12], [e01, e02, e03, e0123] ] 
#define line      mat2x3  // [ [e23, e31, e12], [e01, e02, e03] ]
#define point     vec3    // [ e032, e013, e021 ] implied 1 e123
#define direction vec3    // [ e032, e013, e021 ] implied 0 e123

Get your Geometric Products!

With our data structures defined, we can focus our attention on implementing the (subset) of PGA products we will need. Special attention is given to the composition and sandwich operators - the numerical efficiency of matrix-vector multiplication is well known, and as we will discover, some creativity is needed to get PGA up to par.

Composition of transformations.

The 8-float motors of PGA, isomorphic to the dual quaternions, naturally come with an efficient composition operator in the form of the geometric product. Recall that the product of two 4x4 matrices requires 64 multiplications and 48 additions. For two general PGA motors, their composition clocks in at just 48 multiplications and 40 additions. Working out the product at coefficient level produces the following implementation on the CPU:

// 48 mul, 40 add
export const gp_mm = (a,b,res=new baseType(8)) => {
  const a0=a[0],a1=a[1],a2=a[2],a3=a[3],a4=a[4],a5=a[5],a6=a[6],a7=a[7],
        b0=b[0],b1=b[1],b2=b[2],b3=b[3],b4=b[4],b5=b[5],b6=b[6],b7=b[7];
  res[0] = a0*b0-a1*b1-a2*b2-a3*b3;
  res[1] = a0*b1+a1*b0+a3*b2-a2*b3;
  res[2] = a0*b2+a1*b3+a2*b0-a3*b1;
  res[3] = a0*b3+a2*b1+a3*b0-a1*b2;
  res[4] = a0*b4+a3*b5+a4*b0+a6*b2-a1*b7-a2*b6-a5*b3-a7*b1;
  res[5] = a0*b5+a1*b6+a4*b3+a5*b0-a2*b7-a3*b4-a6*b1-a7*b2;
  res[6] = a0*b6+a2*b4+a5*b1+a6*b0-a1*b5-a3*b7-a4*b2-a7*b3;
  res[7] = a0*b7+a1*b4+a2*b5+a3*b6+a4*b1+a5*b2+a6*b3+a7*b0;
  return res;
}

This block of code, with some reshuffling and pattern matching can be written in glsl using dot and cross products as:

// 48 mul, 40 add    
motor gp_mm( motor a, motor b ) {
  return motor(
         a[0].x*b[0].x   - dot(a[0].yzw, b[0].yzw), 
         a[0].x*b[0].yzw + b[0].x*a[0].yzw + cross(b[0].yzw, a[0].yzw),
         a[0].x*b[1].xyz + b[0].x*a[1].xyz + cross(b[0].yzw, a[1].xyz) + cross(b[1].xyz, a[0].yzw) - b[1].w*a[0].yzw - a[1].w*b[0].yzw, 
         a[0].x*b[1].w + b[0].x*a[1].w + dot(a[0].yzw, b[1].xyz) + dot(a[1].xyz, b[0].yzw));
}

While already reasonably efficient, the above code block handles general motors and there are many scenarios where we deal with e.g. pure translations or rotations around the origin. In those scenarios many of the motor coefficients will be zero, and reworking the above code block to incorporate that is an easy task. For example, for the composition of two rotations around the origin we find we need only 16 multiplications and 12 additions:

// 16 mul, 12 add
motor gp_rr( motor a, motor b ) {
  return motor( a[0].x*b[0] + vec4( -dot(a[0].yzw, b[0].yzw), b[0].x*a[0].yzw + cross(b[0].yzw,a[0].yzw) ), vec4(0.) ); 
}

Our implementation provides these optimized versions for any combination of translation (_t), rotation around the origin (_r) and general motor (_m).

Operation	Multiplications	Additions
gp_mm	48	40
gp_rr	16	12
gp_tt	0	3
gp_rt / gp_tr	12	8
gp_rm / gp_mr	32	24
gp_tm / gp_mt	12	12

Transforming points

For the transformation of a point $p$ with a motor $M$, done in PGA with the sandwich product, the situation is more involved. $$ p' = M p \widetilde M $$ Working out these two geometric products naively results in a whopping 33 multiplications and 29 additions, or more than twice the 16 multiplications and 12 additions required for the matrix-vector equivalent. The reason for this is that this naive expansion does not take into account the fact that PGA motors satisfy $M\widetilde M = 1$. It is however not to difficult to incorporate this into our sandwich product. To do so, we suggest starting instead from the expression $$ p' = M p \widetilde M + p \cdot (1 - M\widetilde M)$$ Where the second term evaluates to zero for a normalized motor M. Evaluating this new expression at coefficient level allows us to reduce the operations needed to 21 multiplications and 18 additions (which for the isomorphic dual quaternions is the best known solution) :

// 21 mul, 18 add
point sw_mp( motor a, point b ) {
  direction t = cross(b, a[0].yzw)  - a[1].xyz;
  return  (a[0].x * t + cross(t, a[0].yzw) - a[0].yzw * a[1].w) * 2. + b;
}

Transforming directions

For directions, aka points at infinity, with an implied $\mathbf e_{123}$ coefficient of $0$, we can do a bit better still. Applying the same normalisation trick we find a solution that requires only 18 multiplications and 12 additions.

// 18 mul, 12 add
direction sw_md( motor a, direction b ) {
  direction t = cross(b, a[0].yzw);
  return  (a[0].x * t + cross(t, a[0].yzw)) * 2. + b;
}

Anticipating our needs when dealing with tangent spaces, we also work out the sandwich product on the basis directions. (as opposed to the general direction above). In that scenario, the computational cost can be reduced even further. In fact, if we produce an output normalized to $0.5$ instead of $1$ we can reduce the computational cost for the transformation of e.g. the x axis to an amazing 6 multiplications and 4 additions - about the cost of a default cross product:

// 6 mul, 4 add
direction sw_mx( motor a ) {
  return direction(
    0.5 - a[0].w*a[0].w - a[0].z*a[0].z, 
    a[0].z*a[0].y - a[0].x*a[0].w, 
    a[0].w*a[0].y + a[0].x*a[0].z
  );
}

This is an important observation, and as you will see will allow us to challenge the common belief that matrices are unconditionally the fastest choice ...

Normalization

The squared (pseudo)norm of a PGA motor $M$ is given by $$ \lvert M \rvert^2 = M \widetilde M = a + b\mathbf e_{0123}$$ For a normalized motor, $\lvert M \rvert = 1$, but in general the result of this expression is $a + b\mathbf e_{0123}$, a Study Number (a multivector whose non-scalar part squares to a scalar). As a consequence, the normalized motor $\overline M$, $$ \overline M = \cfrac {M} {\lvert M \rvert} $$ is a bit more involved to calculate. In 3D PGA it involves inverting a Study Number that here is isomorphic to a dual number. We've worked out the details for a number of algebras in [7], from which we only need the inverse square root formula in 3D PGA : $$\cfrac {1} {\lvert M \rvert} = \cfrac {1} {\sqrt{M \widetilde M}} = \cfrac{1}{\sqrt{a + b\mathbf e_{0123}}} = \cfrac{1}{\sqrt{a}} - \cfrac{b}{2{\sqrt{a}}^3}\mathbf e_{0123} $$ This leads to the following efficient implementation for 3DPGA:

// 21 mul, 5 add
motor normalize_m( motor a ) {
  float s = 1. / length( a[0] );
  float d = (a[1].w * a[0].x - dot( a[1].xyz, a[0].yzw ))*s*s;
  return motor(a[0]*s, a[1]*s + vec4(a[0].yzw*(s*d),-a[0].x*s*d));
}

Note that this procedure should be compared not to vector normalization, but instead to Gram-Schmidt orthogonalization, as the resulting motor is guaranteed to be an orthonormal transformation. As before, when we are dealing with a pure translation or rotation, far more efficient versions of the normalization procedure are available.

Square Roots

The square root plays an important role in PGA, as it is the key to constructing transformations between elements. Given any pair of points/lines/planes $a,b$, there exist a rigid transformation that moves $a$ onto $b$. Such a rigid transformation always has a motor form, and this motor is always given by the same simple expression : $$ M = \sqrt{ \cfrac {b} {a} } $$ Combine this with the fact that for any normalized non-null blade $a$ its inverse is $\pm a$, and we can rewrite this as $$ \pm M^2 = ba $$ Or, in other words, the geometric product $ba$ of any two points, two lines or two planes produces a motor that represents double the transformation from $a$ to $b$. The square root comes in to halve this result and indeed find the motor that moves $a$ exactly onto $b$. Here too, geometric algebra provides a single elegant method that universally applies : $$ \sqrt M = \overline{1 + M} $$ Here the overline denotes the Study-normalization procedure from our previous block. Hence the computational cost of a square root is exactly that of the normalization procedure with one extra addition.

// 21 mul, 6 add
motor sqrt_m( motor R ) {
  return normalize_m( motor( R[0].x + 1., R[0].yzw, R[1] ) ); 
}

Exponential map

To complete our PGA toolbox, we adapt, also from [7], efficient implementations of the logarithmic and exponential maps. Recall that the logarithm of a PGA motor is a (sum of) scaled lines, and similarly, any scaled line can be exponentiated to construct a rotation around it. While the exponential map for general 4x4 matrices is numerically very expensive, for our PGA motor manifold efficient closed form solutions are possible.

// 14 muls 5 add 1 div 1 acos 1 sqrt
line log_m( motor M ) { 
  if (M[0].x == 1.) return line( vec3(0.), vec3(M[1].xyz) );
  float a = 1./(1. - M[0].x*M[0].x), b = acos(M[0].x) * sqrt(a), c = a*M[1].w*(1. - M[0].x*b);
  return line( b*M[0].yzw, b*M[1].xyz + c*M[0].wzy);
}

// 17 muls 8 add 2 div 1 sqrt 1 cos 1 sin
motor exp_b( line B ) {
  float l = dot(B[0],B[0]);
  if (l==0.) return motor( vec4(1., 0., 0., 0.), vec4(B[1], 0.) );
  float a = sqrt(l), m = dot(B[0].xyz, B[1]), c = cos(a), s = sin(a)/a, t = m/l*(c-s);
  return motor( c, s*B[0], s*B[1] + t*B[0].zyx, m*s );
}

Inverses

If there's one place where Geometric Algebra sets itself clearly apart from our standard vector and matrix algebra approach, it is the existence of inverses for (multi) vectors. Not only do these inverses exist, but for the normalized objects we are working with in our context, they are very efficient to calculate.

Element $x$	Inverse $x^{-1}$
Plane	$x^{-1} = x$
Line	$x^{-1} = -x$
Point	$x^{-1} = -x$
Motor	$x^{-1} = \widetilde x$

Where $\widetilde x$, the reversion operation, changes the sign of the bivector and trivector coefficients only. There is one more inverse that is occasionally needed, which is the inverse of a general bivector $B$. Recall that a bivector $B$ only represents a single line iff $B \wedge B = 0$, the so called Plücker condition. If a bivector $B$ does not satisfy that requirement, it is no blade, i.e. not the result of meeting two planes or joining two points. For such an element the inverse is slightly more complicated.

To find this inverse, we start by multiplying with the reverse bivector from the right. $$ \cfrac {1} {B} = \cfrac {\widetilde B}{B \widetilde B}$$ As before, the squared norm of $\lvert B \rvert^2 = B \widetilde B = a + b \mathbf e_{0123}$, is a Study number, isomorphic to the dual numbers. This allows us to use the definition of a dual number inverse. $$ \cfrac {1} {(a + b \mathbf e_{0123})} = \cfrac {1} {a} - \cfrac {b} {a^2} \mathbf e_{0123}$$ Multiplying this last expression with $\widetilde B$ produces the inverse we are looking for.

Motor Factorization

Just as the process of factorizing matrices can be very insightful, so is the factorization of PGA motors. Two particular factorizations will be useful to us, and we will add them as the last tools to our box. The first of those is called the Euclidean Factorization, and it decomposes a motor into a rotation around the origin followed by a translation. $$ M = T_e R_e $$ This factorization is particularly easy to calculate, as the Euclidean rotor $R_e$ is simply the Euclidean part of our motor - the first four floats - isomorphic to a regular quaternion. If it is needed, the translation $T_e$ can be computed as $T_e = M \widetilde R_e$

The second factorization of interest is the so called Invariant factorization. It decomposes a motor $M$ into a commuting translation and rotation, which is always possible and generally known in 3D as the Mozzi-Chasles theorem. You may have heard of it as every rigid body transformation can be decomposed into a rotation around a line preceded or followed by a translation along the same line. $$ M = TR = RT$$ In 3D PGA, the invariant factorization is also easy to calculate, with the commuting translation given by $$ T = 1 + \cfrac {\langle M \rangle_4} {\langle M \rangle_2}$$ Where the angle brackets denote the grade extraction, and the general bivector inverse from above comes in handy. The matching rotation can now be constructed as $R = M\widetilde T = \widetilde TM$.

We will in particular use the Euclidean factorization when composing the transformation of the tangent frame with that of the object to world motor, as such a frame is invariant to translations and the composition of rotations around the origin is more efficient.

Escaping the matrix

The prevalence of matrices in computer graphics means that interacting with existing content inevitably will confront you with matrices. The Khronos glTF project from which we have started uses matrices throughout, for transformations, binding poses for skinning etc. Our commitment to a matrix-free environment implies we will have to convert these matrices to their PGA equivalents at load time.

Converting matrices to motors.

To convert a 4x4 orthogonal matrix to a motor, we happily employ the isomorphism to quaternions and upgrade an industry standard solution to handle the entire PGA manifold.

export const fromMatrix3 = M => {
  // Shorthand.
  var [m00,m01,m02,m10,m11,m12,m20,m21,m22] = M;
  
  // Quick scale check 
  const scale = [hypot(m00,m01,m02),hypot(m10,m11,m12),hypot(m20,m21,m22)];
  if (abs(scale[0]-1)>0.0001 || abs(scale[1]-1)>0.0001 || abs(scale[2]-1)>0.0001) {
    const i = scale.map(s=>1/s);
    m00 *= i[0]; m01 *= i[0]; m02 *= i[0];
    m10 *= i[1]; m11 *= i[1]; m12 *= i[1];
    m20 *= i[2]; m21 *= i[2]; m22 *= i[2];
    if (abs(scale[0]/scale[1]-1)>0.0001 || abs(scale[1]/scale[2]-1)>0.0001) console.warn("non uniformly scaled matrix !", scale);
  }  
  
  // Return a pure rotation (in motor format)
  return normalize(   m00 + m11 + m22 > 0 ? [m00 + m11 + m22 + 1.0, m21 - m12, m02 - m20, m10 - m01, 0,0,0,0]:
                   m00 > m11 && m00 > m22 ? [m21 - m12, 1.0 + m00 - m11 - m22, m01 + m10, m02 + m20, 0,0,0,0]:
                                m11 > m22 ? [m02 - m20, m01 + m10, 1.0 + m11 - m00 - m22, m12 + m21, 0,0,0,0]:
                                            [m10 - m01, m02 + m20, m12 + m21, 1.0 + m22 - m00 - m11, 0,0,0,0]);
}

export const fromMatrix = M => {
  // Shorthand.
  var [m00,m01,m02,m03,m10,m11,m12,m13,m20,m21,m22,m23,m30,m31,m32,m33] = M;
  
  // Return rotor as translation * rotation
  return gp_mm( [1,0,0,0,-0.5*m30,-0.5*m31,-0.5*m32,0], fromMatrix3([m00,m01,m02,m10,m11,m12,m20,m21,m22]) );
}

These conversions are run on all of the matrices in our imports, at load time.

Handling uniform scaling.

PGA motors, in contrast with 4x4 matrices, do not incorporate scaling, as it clearly is not a rigid body transformation. Scaling, and specifically uniform scaling is however commonly used in scene graphs to scale resources coming from potentially different sources and authored in different absolute sizes. While less than $0.5$% of the almost 400 random glTF files tested had any animation on the scale, quite a large number of them has some fixed uniform scale applied. The advantage of uniform scaling is that it is invariant to both rotations and translations, and as a result it requires only one floating point number per node, where each element's total scale is simply the product of its own scale and that of its parent. Our implementations tracks scaling in this manner, applying the total scale to the vertices either at load time or as first step in the vertex shader, and applying the parent scale to the translations, again at load time and when updating animations. The impact of incorporating uniform scaling like this is absolutely minimal, enabling us to cover almost all existing content without abandoning the PGA motor efficiency.

Handling non-uniform scaling.

For non uniform scaling the situation is trickier - in the scenario where non-uniform scaling is used, ultimately a fall-back to 4x4 matrices is unavoidable. A non-uniform scale is not invariant to rotations, and tracking these scales as we did in the uniform case is tedious. However, again from our sample of glTF files, we could only find non-uniform scale applied on leaf nodes. (and given the problems caused by non-uniform scaling, this is not unexpected). For such a scenario, animation keys are not impacted and we simply apply the non-uniform scale separately before the rest of the transformations.

Forward Rendering.

Armed with our fully stocked PGA toolbox, we can now tackle the actual rendering task. Guided by the reference implementation provided by Khronos, let us revisit those places where matrices are the de facto solution.

The general idea of a forward renderer, is to transform all mesh geometry, and determining for each triangle which pixels it covers. This is to be contrasted with a ray tracing approach where one starts from a ray through a pixel and determines which triangles it hits. In a typical forward rendering setup the transformation of the mesh from its specification in object space to its position on the screen is usually handled by a set of matrices called the model, view and projection matrices.

Model - View - Projection.

Our conversion at load time of all matrices and transformations into PGA motors, is already a substantial optimization on the amount of computation needed to update the scene graph hierarchy. For complex setups many composition operators are required, and the gain of switching to motors is obvious.

However, while the CPU is concerned with producing updated transformations, the GPU has the task of applying these transformations to the vertices, normals and tangents that make up our mesh, and as we've seen, the computational complexity involved appears to make our motors a disadvantage.

As we will soon discover, the situation is more subtle, and at this point we push through, replacing the model and view matrices with motors, and using the above defined sandwich products to transform the incoming vertex attributes.

vec3 worldPosition = sw_mp( toWorld, attrib_position );
vec3 worldNormal   = sw_md( toWorld, attrib_normal );      
vec4 worldTangent  = vec4(sw_md( toWorld, attrib_tangent.xyz ), attrib_tangent.w);

For the projection matrix, the situation is different. The typical 4x4 projection matrix has only 5 non-zero entries, and even without PGA it is much more performant to simply write out the resulting expression. The same simply holds here and we use a standard projection function for this.

vec4 project( const float n, const float f, const float minfov, float aspect, vec3 inpos ){
  float cthf = cos(minfov/2.0) / sin(minfov/2.0);              // cotangent of half the minimal fov.
  float fa = 2.*f*n/(n-f), fb = (n+f)/(n-f);                   // all of these can be precomputed constants.
  vec2 fit = cthf * vec2(-1.0/aspect, 1.0);                    // fit vertical.
  return vec4( inpos.xy * fit, fa - fb*inpos.z, inpos.z );
}

With our basic transformations all setup, let us turn our attention to one of todays most common shading techniques, tangent space normal mapping.

Tangent Space Normalmapping.

Vertex Shader

For a tangent space normal-mapped mesh, the vertex shader needs to transform the position, the normal, and the tangent vector. So it seems unavoidable that our choice for PGA means we are incurring the higher transformation cost threefold. However, the normal, tangent and bitangent vector together form an orthonormal frame. In PGA, any orthonormal frame is related to the canonical basis frame through a $k$-reflection. When $k$ is even, these are just the rotation-only motors we encountered before (isomorphic to the quaternions), and when $k$ is odd, this k-reflection instead represents a similar rotation followed by one extra reflection.

This implies that we can remove both the normal and the tangent vectors from our vertex description, replacing them instead by a tangentRotor, which represents the rotation from the basis frame to the desired tangent frame. Such a tangentRotor $R$ in fact double-covers all possible tangent frames in the sense that both $R$ and $-R$ produce the same transformation. We can use this double covering to disambiguate even and odd k-reflections, simply by making sure the sign of the scalar coefficient of $R$ matches the classical handedness flag. Note that in doing so, we piggy-back on the IEE754 floating point specification, that is we depend on the signed representation of zero. In the vertexshader we can then unambiguously extract the original sign using

float handedness = sign(1/tangentRotor.x)

Combining all this, we conclude that we can reduce our vertex descriptor for the most common tangent space normalmapping setup from 12 floats (3 position, 3 normal, 4 tangent, 2 uv) down to 9 (3 position, 4 tangentRotor, 2 uv). That is a substantial save, which is implemented at load-time, converting loaded normal and tangent vectors with:

// Normalize, Orthogonalize
normal  = normalize( normal );
tangent = normalize( sub(tangent, mul(normal, dot(normal,tangent) ) ) );
// Calculate the bitangent.
let bitangent = normalize(cross(normal, tangent));
// Now setup the matrix explicitly.
let mat = [...tangent, ...bitangent, ...normal];
// Convert to motor and store.
let motor = fromMatrix3( mat );
// Use the double cover to encode the handedness.
// in GA language, this means we are using half of the double cover to distinguish even and odd versors.
if (Math.sign(motor[0])!=tangents[i*4+3]) motor = motor.map(x=>-x);

But there is more good news. Recall that using 4x4 matrices, the transformation of position, normal and tangent includes 3 matrix vector products, totaling 48 multiplications and 36 additions. In the PGA version, we can however transform the entire tangent frame in one go, for a cost of just 16 multiplications and 12 additions. After which we can in fact extract the world-space normal and tangent directly with just 9 multiplications and 8 additions using :

// 9 muls, 8 adds
void extractNormalTangent( motor a, out direction normal, out direction tangent ) {
  float yw = a[0].y * a[0].w;
  float xz = a[0].x * a[0].z;
  float zz = a[0].z * a[0].z;

  normal  = direction( yw - xz, a[0].z*a[0].w + a[0].y*a[0].x, 0.5 - zz - a[0].y*a[0].y );
  tangent = direction( 0.5 - zz - a[0].w*a[0].w, a[0].z*a[0].y - a[0].x*a[0].w, yw + xz );
}

Add to that the 21 multiplications and 18 additions needed to transform the vertex position, and another multiplication to extract the handedness, and we come to the remarkable conclusion that in order to transform a vertex with full tangent frame to world space, our PGA approach needs only (16 + 9 + 21 + 1 = 47) multiplications and (12 + 8 + 18 = 38) additions. That is nearly identical to the 48 multiplications and 36 additions that would be required if we use 4x4 matrices and normal and tangent vectors instead.

PGA motors can be just as fast as 4x4 matrices to transform your mesh vertices !!!

method	floats/vertex	floats/transform	multiplications	additions
Matrix + normal + tangent	12	32	48	36
Motor + tangentRotor	9 -25%	8 -75%	47	38

The resulting code block in the vertexshader now becomes

// Now transform our vertex using the motor from object to world-space.
worldPosition = sw_mp(toWorld, attrib_position);

// Concatenate the world motor and the tangent frame.
motor tangentRotor = gp_rr( toWorld, motor(attrib_tangentRotor,vec4(0.)) );

// Next, extract world normal and tangent from the tangentFrame rotor.
extractNormalTangent(tangentRotor, worldNormal, worldTangent.xyz);
worldTangent.w = sign(1.0 / attrib_tangentRotor.x); // trick to disambiguate negative zero!

At this point, no changes to the fragment shader are required, making this a drop-in replacement that can be used in any existing engine.

Fragment Shader

If we want to be able to load existing content, this is the point where we have to resort back to the TBN matrix. The reason for this is clear. When baking the high detail mesh onto the low detail mesh, the baking tool has interpolated vertex normal and tangent vectors over the face of each triangle. From these (no longer normalized or orthogonal) vectors, at each fragment, an orthogonal TBN matrix is constructed, and used to transform the high detail world space normal to the tangent space normal that is stored in the texture.

This process of interpolating basis vectors introduces an error that is typical for matrices, and unfortunately this error is thus literally baked into the textures. This is why we opted to indeed extract the normal and tangent vectors explicitly from the tangentRotor.

However, for scenarios where one controls the baking tool, we can do better still. In these cases we could just pass the tangentRotor unmodified to the fragmentShader, where it can be normalized and used to transform the sampled normal, without ever constructing a TBN matrix. In this scenario, we would save even more, removing the need to extract normal and tangent in the vertex shader, requiring one less varying parameter, and removing the need for expensive orthogonalization in the fragment shader.

Motor Skinning.

With PGA motors isomorphic to the dual quaternions, skinning is an obvious candidate for our PGA approach. After converting inverse bind matrices to their motor equivalent, the skinning code for our motors follows the well known pattern from dual quaternion skinning :

// Grab the 4 bone motors.
motor b1 = motors[int(attrib_joints.x)];
motor b2 = motors[int(attrib_joints.y)];
motor b3 = motors[int(attrib_joints.z)];
motor b4 = motors[int(attrib_joints.w)];

// Blend them together, always use short path.
motor r = attrib_weights.x * b1;
if (dot(r[0],b2[0])<=0.0) b2 = -b2;
r += attrib_weights.y * b2;
if (dot(r[0],b3[0])<=0.0) b3 = -b3;
r += attrib_weights.z * b3;
if (dot(r[0],b4[0])<=0.0) b4 = -b4;
r += attrib_weights.w * b4;
    
// Now renormalize and combine with object to world
toWorld = gp(toWorld, normalize_m(r));

Note how just as for dual quaternions, we make sure that any transformation that is blended in follows the shortest arc, and renormalize the resulting transformation.

Animation Blending

For animation blending, the same technique is used, directly blending and renormalizing PGA motors on the CPU.

Conclusion

This project started off with the goal to demonstrate it is indeed possible to implement a forward renderer using PGA exclusively. What we found is that, not only is this possible, the common understanding that this would come at the cost of more expensive transformations turns out to be much more subtle. The resulting improvements are both unexpected and quite spectacular, especially on the memory footprint where an extra 33% vertices in the same storage is quite a significant improvement. This technique can readily be deployed into other existing 3D engines, at virtually no cost on the vertex shader and without modifications to the rest of the pipeline.

References

Geometric Algebra and Computer Graphics. Charles Gunn & Steven De Keninck. https://dl.acm.org/doi/10.1145/3305366.3328099
n-Dimensional Rigid Body Mechanics. Marc Ten Bosch. SIGGRAPH2020. https://marctenbosch.com/ndphysics/NDrigidbody.pdf
Geometric Clifford Algebra Networks. David Ruhe & co. https://doi.org/10.48550/arXiv.2302.06594
Geometric Algebra Transformers. Johann Brehmer & co. https://arxiv.org/pdf/2305.18415.pdf
Plane-based Geometric Algebra for Computer Science. Leo Dorst & Steven De Keninck. https://bivector.net/PGA4CS.html
May the Forque be with you. Leo Dorst & Steven De Keninck. https://bivector.net/PGADYN.html
Normalization, square roots, and the exponential and logarithmic maps in geometric algebras of less than 6D. Steven De Keninck & Martin Roelfs. http://dx.doi.org/10.1002/mma.8639