Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Functors, Naturality, Monoids, and Chain Rule

The problem this chapter solves is:

After seeing individual ML arrows, you need names for common structures that appear across many systems: mapping inside containers, converting containers consistently, combining traces, and composing local derivatives.

The previous chapters were mostly about one pipeline. This chapter zooms out from that pipeline and asks which shapes keep appearing even when the concrete data changes. Once you can see those repeated shapes, the category-theory vocabulary stops feeling like a separate subject. It becomes a set of names for ordinary engineering moves.

This chapter covers four patterns:

Functor
NaturalTransformation
Monoid
Chain rule

They are not separate from the ML pipeline.

They explain patterns you already saw:

  • mapping over many examples
  • converting one wrapper shape to another
  • combining pipeline traces
  • composing gradients through layers

Reader orientation: This chapter is more abstract than the previous ones. Read each section in this order: first the Rust mechanism, then the ML use, then the category-theory name. The names are not decoration; they are compression for patterns that appear repeatedly in real model code.

What You Already Know

If you have mapped over a Vec, handled an Option, appended logs, or applied the chain rule in calculus, you already know informal versions of this chapter. The new work is to name those repeated shapes and connect them to the same typed pipeline discipline used earlier.

Source Snapshots

src/structure.rs covers functors, natural transformations, and monoids.

Source snapshot: src/structure.rs
/// A minimal functor interface for this tutorial.
pub trait Functor<A, B> {
    type WrappedA;
    type WrappedB;

    fn fmap<F>(wrapped: Self::WrappedA, f: F) -> Self::WrappedB
    where
        F: Fn(A) -> B;
}

pub struct VecFunctor;

impl<A, B> Functor<A, B> for VecFunctor {
    type WrappedA = Vec<A>;
    type WrappedB = Vec<B>;

    fn fmap<F>(wrapped: Vec<A>, f: F) -> Vec<B>
    where
        F: Fn(A) -> B,
    {
        wrapped.into_iter().map(f).collect()
    }
}

pub struct OptionFunctor;

impl<A, B> Functor<A, B> for OptionFunctor {
    type WrappedA = Option<A>;
    type WrappedB = Option<B>;

    fn fmap<F>(wrapped: Option<A>, f: F) -> Option<B>
    where
        F: Fn(A) -> B,
    {
        wrapped.map(f)
    }
}

/// A structure-preserving conversion between wrappers.
pub trait NaturalTransformation<A> {
    type From;
    type To;

    fn transform(from: Self::From) -> Self::To;
}

/// Natural transformation from `Vec<A>` to `Option<A>` by taking the first item.
pub struct VecToFirstOption;

impl<A> NaturalTransformation<A> for VecToFirstOption {
    type From = Vec<A>;
    type To = Option<A>;

    fn transform(from: Vec<A>) -> Option<A> {
        from.into_iter().next()
    }
}

pub fn naturality_square_holds_for_first_option() -> bool {
    let xs = vec![1, 2, 3];
    let f = |x| x * 10;

    let path_top_then_right = VecToFirstOption::transform(VecFunctor::fmap(xs.clone(), f));
    let path_left_then_bottom = OptionFunctor::fmap(VecToFirstOption::transform(xs), f);

    path_top_then_right == path_left_then_bottom
}

pub trait Monoid: Sized {
    fn empty() -> Self;
    fn combine(&self, other: &Self) -> Self;
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct TraceStep(&'static str);

impl TraceStep {
    pub fn new(name: &'static str) -> Self {
        Self(name)
    }

    pub fn name(&self) -> &'static str {
        self.0
    }
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct PipelineTrace(Vec<TraceStep>);

impl PipelineTrace {
    pub fn from_steps(steps: impl IntoIterator<Item = TraceStep>) -> Self {
        Self(steps.into_iter().collect())
    }

    pub fn names(&self) -> Vec<&'static str> {
        self.0.iter().map(TraceStep::name).collect()
    }
}

impl Monoid for PipelineTrace {
    fn empty() -> Self {
        PipelineTrace(vec![])
    }

    fn combine(&self, other: &Self) -> Self {
        let mut combined = self.0.clone();
        combined.extend_from_slice(&other.0);
        PipelineTrace(combined)
    }
}

pub fn monoid_laws_hold_for_pipeline_trace() -> bool {
    let a = PipelineTrace::from_steps(vec![TraceStep::new("embedding")]);
    let b = PipelineTrace::from_steps(vec![TraceStep::new("linear")]);
    let c = PipelineTrace::from_steps(vec![TraceStep::new("softmax")]);
    let identity = PipelineTrace::empty();

    let left_identity = identity.combine(&a) == a;
    let right_identity = a.combine(&identity) == a;
    let associativity = a.combine(&b).combine(&c) == a.combine(&b.combine(&c));

    left_identity && right_identity && associativity
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn naturality_square_commutes() {
        assert!(naturality_square_holds_for_first_option());
    }

    #[test]
    fn pipeline_trace_obeys_monoid_laws() {
        assert!(monoid_laws_hold_for_pipeline_trace());
    }
}

src/calculus.rs keeps the chain-rule example deliberately small.

Source snapshot: src/calculus.rs
use crate::error::{CtError, CtResult};

#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Scalar(f32);

impl Scalar {
    pub fn new(value: f32) -> CtResult<Self> {
        if !value.is_finite() {
            return Err(CtError::InvalidLoss(value));
        }

        Ok(Self(value))
    }

    pub fn value(&self) -> f32 {
        self.0
    }
}

#[derive(Debug, Clone, Copy, PartialEq)]
pub struct LocalGradient(f32);

impl LocalGradient {
    pub fn new(value: f32) -> CtResult<Self> {
        if !value.is_finite() {
            return Err(CtError::InvalidLoss(value));
        }

        Ok(Self(value))
    }

    pub fn value(&self) -> f32 {
        self.0
    }
}

#[derive(Debug, Clone, Copy)]
pub struct MulOp;

impl MulOp {
    pub fn forward(&self, x: Scalar, y: Scalar) -> CtResult<Scalar> {
        Scalar::new(x.value() * y.value())
    }

    /// Given upstream gradient dL/dz, return `(dL/dx, dL/dy)` for `z = x * y`.
    pub fn backward(
        &self,
        x: Scalar,
        y: Scalar,
        upstream: LocalGradient,
    ) -> CtResult<(LocalGradient, LocalGradient)> {
        let dz_dx = y.value();
        let dz_dy = x.value();

        Ok((
            LocalGradient::new(upstream.value() * dz_dx)?,
            LocalGradient::new(upstream.value() * dz_dy)?,
        ))
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn multiply_backward_returns_local_chain_rule_gradients() -> CtResult<()> {
        let mul = MulOp;
        let x = Scalar::new(2.0)?;
        let y = Scalar::new(3.0)?;
        let upstream = LocalGradient::new(1.0)?;
        let (dl_dx, dl_dy) = mul.backward(x, y, upstream)?;

        assert_eq!(dl_dx.value(), 3.0);
        assert_eq!(dl_dy.value(), 2.0);
        Ok(())
    }
}

The Whole Structure File

src/structure.rs defines:

Functor<A, B>
VecFunctor
OptionFunctor
NaturalTransformation<A>
VecToFirstOption
Monoid
TraceStep
PipelineTrace

Each block gives a Rust handle to one abstract pattern.

Worked Example: Mapping Over A Vector

Before reading the traits, start with the plain Rust operation that motivates them. Mapping over a vector means taking each item out, applying a function, and collecting the new values into another vector:

#![allow(unused)]
fn main() {
let values = vec![1, 2, 3];
let doubled: Vec<i32> = values.into_iter().map(|x| x * 2).collect();

assert_eq!(doubled, vec![2, 4, 6]);
}

There is no category theory hidden in that snippet. It is just ordinary Rust. The category-theory word functor appears when we notice the reusable shape: the code changes the contents while preserving the surrounding container.

Self-Check

Before reading the Functor trait, explain what stayed the same and what changed in the Vec mapping example.

Functor<A, B>

The problem this block solves is:

The code needs a name for “apply a function inside a wrapper while keeping the wrapper shape.”

First principle: a trait is a contract. It says, “any type that implements this trait must provide these associated types and this method.” Here the contract is small on purpose. It does not try to model every possible functor in mathematics; it gives this tutorial one precise place to talk about fmap.

The block:

/// A minimal functor interface for this tutorial.
pub trait Functor<A, B> {
    type WrappedA;
    type WrappedB;

    fn fmap<F>(wrapped: Self::WrappedA, f: F) -> Self::WrappedB
    where
        F: Fn(A) -> B;
}

Rust Syntax

Functor<A, B> is a trait. A trait is Rust’s way to name behavior that many types can implement.

Here, the behavior is:

map a function through some wrapper shape

A and B are generic type parameters:

A = input item type
B = output item type

Generic means the trait is not tied to one concrete type like i32 or String. The same trait can describe Vec<i32> -> Vec<String>, Option<TokenId> -> Option<Embedding>, or any other pair of item types.

It has associated types:

type WrappedA;
type WrappedB;

Associated types are type names chosen by each implementation of the trait. They let the trait say:

every implementer must tell us what wrapped input and wrapped output mean

For VecFunctor, those associated types become Vec<A> and Vec<B>.

For OptionFunctor, they become Option<A> and Option<B>.

The method:

fn fmap<F>(wrapped: Self::WrappedA, f: F) -> Self::WrappedB
where
    F: Fn(A) -> B;

means:

Given a wrapped A and a function A -> B, produce a wrapped B.

The where clause is a readable place to put a bound. The bound:

F: Fn(A) -> B

means F must be callable like a function that consumes an A and returns a B.

Here is the real crate API in the smallest useful form:

use category_theory_transformer_rs::{Functor, VecFunctor};

let lengths = VecFunctor::fmap(vec!["cat", "rust"], |word| word.len());

assert_eq!(lengths, vec![3, 4]);

What to notice: The call names the structure once: VecFunctor::fmap. The closure only describes the item-level operation: &str -> usize. The vector shape is handled by the functor implementation.

ML Concept

In ML, you often apply the same transformation across a structure:

map preprocessing over a batch
map token conversion over a sequence
map loss computation over examples

The wrapper might be:

Vec
Option
Result
batch tensor

The idea is:

transform the contents, preserve the surrounding structure

Category-Theory Concept

A functor maps:

objects -> objects
morphisms -> morphisms

while preserving identity and composition.

This tutorial’s trait is deliberately small. It focuses on the practical fmap operation.

VecFunctor

The problem this block solves is:

Demonstrate fmap for lists.

The block:

pub struct VecFunctor;

impl<A, B> Functor<A, B> for VecFunctor {
    type WrappedA = Vec<A>;
    type WrappedB = Vec<B>;

    fn fmap<F>(wrapped: Vec<A>, f: F) -> Vec<B>
    where
        F: Fn(A) -> B,
    {
        wrapped.into_iter().map(f).collect()
    }
}

Rust Syntax

VecFunctor is a unit struct. It stores no state.

The impl block is a trait implementation:

impl<A, B> Functor<A, B> for VecFunctor

Read it as:

for every A and B, VecFunctor knows how to behave as Functor<A, B>

The implementation chooses the associated types:

WrappedA = Vec<A>
WrappedB = Vec<B>

The method consumes the vector:

wrapped.into_iter()

maps the function over every item:

.map(f)

and collects the result:

.collect()

The runnable companion example uses the same real crate API:

use category_theory_transformer_rs::{Functor, VecFunctor};

let token_ids = vec![1, 2, 3];
let shifted = VecFunctor::fmap(token_ids, |id| id + 100);

assert_eq!(shifted, vec![101, 102, 103]);

ML Concept

If you have a batch of token IDs:

[TokenId(1), TokenId(2), TokenId(3)]

and a function:

TokenId -> Vector

mapping over the batch gives:

[Vector, Vector, Vector]

That is the same shape as VecFunctor.

Category-Theory Concept

Vec is list-like structure.

Mapping preserves the list shape:

List A -> List B

The length and order remain structurally meaningful.

OptionFunctor

The problem this block solves is:

Demonstrate the same functor idea for optional values.

The block:

pub struct OptionFunctor;

impl<A, B> Functor<A, B> for OptionFunctor {
    type WrappedA = Option<A>;
    type WrappedB = Option<B>;

    fn fmap<F>(wrapped: Option<A>, f: F) -> Option<B>
    where
        F: Fn(A) -> B,
    {
        wrapped.map(f)
    }
}

Rust Syntax

The wrapper types are:

Option<A>
Option<B>

The implementation delegates to Rust’s built-in:

wrapped.map(f)

If the value is Some(a), it becomes Some(f(a)).

If the value is None, it stays None.

The important beginner point is that Option makes absence explicit in the type. You cannot accidentally treat a missing value as a real value without handling the None case.

use category_theory_transformer_rs::{Functor, OptionFunctor};

let present = OptionFunctor::fmap(Some(7), |value| value * 2);
let missing = OptionFunctor::fmap(None::<i32>, |value| value * 2);

assert_eq!(present, Some(14));
assert_eq!(missing, None);

ML Concept

Optional values appear when data may be absent:

maybe first token
maybe cached embedding
maybe resolved department

Mapping over Option lets you transform present values without inventing a fake value for missing ones.

Category-Theory Concept

Option is a context representing possible absence.

fmap lifts:

A -> B

to:

Option<A> -> Option<B>

Conceptual Extension: Distribution<T>::map

The problem this block solves is:

A probabilistic value may contain many possible outcomes. Sometimes you want to transform every possible outcome while keeping its probability attached.

The current crate’s Distribution in src/domain.rs is a concrete validated probability vector:

pub struct Distribution(Vec<f32>);

The block below is a conceptual generic version that explains the functor idea for probabilistic outcomes:

#![allow(unused)]
fn main() {
pub struct Probability(f32);

pub struct Distribution<T> {
    outcomes: Vec<(T, Probability)>,
}

impl<T> Distribution<T> {
    pub fn map<U>(
        self,
        f: impl Fn(T) -> U,
    ) -> Distribution<U> {
        let outcomes = self
            .outcomes
            .into_iter()
            .map(|(value, probability)| {
                (f(value), probability)
            })
            .collect();

        Distribution { outcomes }
    }
}
}

The core idea is:

map changes the values inside the distribution,
but keeps the probabilities attached to them.

Rust Syntax

Start with the generic struct:

#![allow(unused)]
fn main() {
pub struct Probability(f32);

pub struct Distribution<T> {
    outcomes: Vec<(T, Probability)>,
}
}

This means:

Distribution<T> = many possible T values, each paired with a probability

If T is TokenId, then the type is:

Distribution<TokenId>

If T is String, then the type is:

Distribution<String>

The method introduces a second generic type:

pub fn map<U>(...)

T is the old outcome type.

U is the new outcome type.

So the method has this shape:

Distribution<T> -> Distribution<U>

The first parameter is:

self

That means the method consumes the old distribution.

After calling:

let text_dist = token_dist.map(decode);

the old token_dist has been moved and cannot be used again.

That is why the implementation can call:

self.outcomes.into_iter()

into_iter() consumes the vector and yields owned pairs:

(T, Probability)

The function parameter is:

f: impl Fn(T) -> U

This means:

give this method a function or closure that takes T and returns U

For example, a decoder might have this shape:

TokenId -> String

Then:

Distribution<TokenId> -> Distribution<String>

The inner mapping line is:

.map(|(value, probability)| {
    (f(value), probability)
})

For every pair:

(value, probability)

the code applies f to the value and leaves the probability unchanged.

So:

(TokenId(2), Probability(0.70))

can become:

("Rust", Probability(0.70))

Finally:

.collect()

collects the transformed pairs back into a vector, and:

Distribution { outcomes }

wraps them in the new distribution.

ML Concept

Imagine a model returns possible next tokens:

TokenId(2) -> 0.70
TokenId(4) -> 0.20
TokenId(3) -> 0.10

Those token IDs are useful to the model, but a learner or UI might need text:

TokenId(2) -> "Rust"
TokenId(4) -> "Python"
TokenId(3) -> "."

map changes the representation:

Distribution<TokenId> -> Distribution<String>

The values change:

TokenId(2) becomes "Rust"
TokenId(4) becomes "Python"
TokenId(3) becomes "."

The probabilities do not change:

0.70 stays 0.70
0.20 stays 0.20
0.10 stays 0.10

So map is for changing the meaning or representation of each possible outcome, not for changing the probability mass.

Category Theory Concept

This is the functor pattern for probabilistic context.

Given a normal deterministic function:

f : T -> U

map lifts it into the distribution context:

Distribution<T> -> Distribution<U>

In functional-programming notation:

fmap : (T -> U) -> Distribution<T> -> Distribution<U>

The outer structure is preserved:

same number of possible outcomes
same probabilities
same probabilistic context

Only the inner values are transformed.

That is the same pattern as:

Option<T> -> Option<U>
Vec<T> -> Vec<U>
Distribution<T> -> Distribution<U>

Different context, same functor idea.

Concrete Example

Here is a complete conceptual example:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy)]
pub struct TokenId(pub usize);

#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Probability(pub f32);

#[derive(Debug, Clone)]
pub struct Distribution<T> {
    outcomes: Vec<(T, Probability)>,
}

impl<T> Distribution<T> {
    pub fn new(outcomes: Vec<(T, Probability)>) -> Self {
        Self { outcomes }
    }

    pub fn map<U, F>(self, mut f: F) -> Distribution<U>
    where
        F: FnMut(T) -> U,
    {
        let outcomes = self
            .outcomes
            .into_iter()
            .map(|(value, probability)| {
                (f(value), probability)
            })
            .collect();

        Distribution { outcomes }
    }
}

let vocab = ["I", "love", "Rust", "."];

let token_dist = Distribution::new(vec![
    (TokenId(2), Probability(0.70)),
    (TokenId(3), Probability(0.30)),
]);

let text_dist = token_dist.map(|token| {
    vocab[token.0].to_string()
});

assert_eq!(
    text_dist.outcomes,
    vec![
        ("Rust".to_string(), Probability(0.70)),
        (".".to_string(), Probability(0.30)),
    ],
);
}

Conceptually, the result is:

"Rust" -> 0.70
"."    -> 0.30

Why Fn(T) -> U

The signature:

f: impl Fn(T) -> U

accepts functions and closures.

It is more flexible than:

fn(T) -> U

because closures can capture values from the surrounding scope:

let vocab = ["I", "love", "Rust", "."];

let text_dist = token_dist.map(|token| {
    vocab[token.0].to_string()
});

The closure uses vocab from outside the closure body.

Why A Library Might Use FnMut

The pedagogical signature:

f: impl Fn(T) -> U

is easy to read.

A more flexible library signature is often:

pub fn map<U, F>(self, mut f: F) -> Distribution<U>
where
    F: FnMut(T) -> U,

FnMut allows the closure to mutate captured state.

For example:

let mut counter = 0;

let numbered = token_dist.map(|token| {
    counter += 1;
    (counter, token)
});

The key ownership rule is unchanged:

the method consumes the old distribution and moves each T into f

map Versus flat_map

Use map when each possible value becomes one transformed value:

T -> U

So:

Distribution<T> -> Distribution<U>

Example:

TokenId -> String

Use flat_map when each possible value produces another distribution:

T -> Distribution<U>

Without flattening, the result would be:

Distribution<Distribution<U>>

The simple distinction is:

map:
  one possible value becomes one transformed value

flat_map:
  one possible value becomes many possible future values

In language modeling:

map decodes possible tokens into text
flat_map chains uncertainty across another prediction step

Algebra Version

If:

D<T> = [(t1, p1), (t2, p2), ..., (tn, pn)]

and:

f : T -> U

then:

map(f, D<T>) = [(f(t1), p1), (f(t2), p2), ..., (f(tn), pn)]

The probabilities are untouched.

Only the values move through f.

Core Mental Model

In Rust terms:

consume self, move each T into f, preserve each Probability, collect into
Distribution<U>

In ML terms:

decode or transform every possible outcome without changing model confidence

In category-theory terms:

lift a deterministic morphism T -> U into the probabilistic context
Distribution<T> -> Distribution<U>

NaturalTransformation<A>

The problem this block solves is:

Sometimes you need to convert one wrapper shape into another without caring about the specific item type.

The beginner trap is to think a natural transformation is just any conversion. It is more disciplined than that. The conversion must be compatible with mapping. If you transform the wrapper first and then map the item, you should get the same result as mapping first and then transforming the wrapper.

The block:

/// A structure-preserving conversion between wrappers.
pub trait NaturalTransformation<A> {
    type From;
    type To;

    fn transform(from: Self::From) -> Self::To;
}

Rust Syntax

The associated types are:

From
To

The method:

fn transform(from: Self::From) -> Self::To;

converts the wrapper.

The type parameter A represents the item type inside the wrapper.

This trait has the same shape as Functor: the abstract contract is public, but each implementation chooses the concrete wrapper types through associated types.

That is why the implementation can be generic over A without knowing whether A is a token, a sentence, a loss value, or a trace step.

ML Concept

Data pipelines often convert shapes:

list of candidates -> optional selected candidate
batch -> first example
many diagnostics -> maybe first failure

The conversion should not depend on whether the item is a token, vector, or loss.

Category-Theory Concept

A natural transformation converts one functor into another in a way that is uniform over the inner type.

The important word is uniform.

It should not inspect special details of A.

VecToFirstOption

The problem this block solves is:

Convert a list into an optional first item in a type-uniform way.

The block:

/// Natural transformation from `Vec<A>` to `Option<A>` by taking the first item.
pub struct VecToFirstOption;

impl<A> NaturalTransformation<A> for VecToFirstOption {
    type From = Vec<A>;
    type To = Option<A>;

    fn transform(from: Vec<A>) -> Option<A> {
        from.into_iter().next()
    }
}

Rust Syntax

The implementation works for every A.

That is the role of:

impl<A> NaturalTransformation<A> for VecToFirstOption

The implementation does not ask for any bound on A. There is no A: Clone, no A: Debug, and no A: PartialEq. It does not need those capabilities because it never looks inside the item. It only changes the outer shape.

It consumes a vector:

from.into_iter()

then takes the first item:

.next()

If the vector is empty, the result is None.

If it has at least one item, the result is Some(first_item).

use category_theory_transformer_rs::{NaturalTransformation, VecToFirstOption};

let first = VecToFirstOption::transform(vec!["embed", "linear", "softmax"]);
let empty = VecToFirstOption::transform(Vec::<&str>::new());

assert_eq!(first, Some("embed"));
assert_eq!(empty, None);

ML Concept

This is like selecting the first candidate from a batch while preserving the possibility that the batch was empty.

It does not care what the candidate type is.

Category-Theory Concept

This is the example transformation:

Vec<A> -> Option<A>

It is natural because it is uniform over A.

Naturality Check

The problem this block solves is:

Show that mapping first, then converting gives the same result as converting first, then mapping.

The function:

pub fn naturality_square_holds_for_first_option() -> bool {
    let xs = vec![1, 2, 3];
    let f = |x| x * 10;

    let path_top_then_right = VecToFirstOption::transform(VecFunctor::fmap(xs.clone(), f));
    let path_left_then_bottom = OptionFunctor::fmap(VecToFirstOption::transform(xs), f);

    path_top_then_right == path_left_then_bottom
}

Rust Syntax

There are two paths.

Path one:

Vec<i32> -> Vec<i32> -> Option<i32>

Path two:

Vec<i32> -> Option<i32> -> Option<i32>

Both should produce the same value.

The crate exposes this as a small law check:

use category_theory_transformer_rs::naturality_square_holds_for_first_option;

assert!(naturality_square_holds_for_first_option());

ML Concept

This is a consistency check for pipeline shape conversions.

It says:

transform values, then select

matches:

select, then transform the selected value

for this operation.

Category-Theory Concept

This is the naturality square:

Vec<A> ----fmap f----> Vec<B>
  |                     |
  | transform           | transform
  v                     v
Option<A> --fmap f--> Option<B>

The square commutes when both paths agree.

Monoid

The problem this block solves is:

Some values can be combined repeatedly, and there should be an empty value that changes nothing.

The first-principles version is string concatenation. The empty string changes nothing, and grouping does not change the final text:

#![allow(unused)]
fn main() {
let empty = String::new();
let a = String::from("embed");
let b = String::from(" -> linear");
let c = String::from(" -> softmax");

assert_eq!(format!("{empty}{a}"), "embed");
assert_eq!(format!("{}{}", format!("{a}{b}"), c), format!("{a}{}", format!("{b}{c}")));
}

PipelineTrace uses the same idea, but with named pipeline steps instead of raw text.

The trait:

pub trait Monoid: Sized {
    fn empty() -> Self;
    fn combine(&self, other: &Self) -> Self;
}

Rust Syntax

Sized means values of this type have a known size at compile time. In this trait, it keeps the return type Self straightforward:

fn empty() -> Self;

Self means “the type implementing this trait.”

The trait requires:

empty() -> Self
combine(&self, other: &Self) -> Self

So a monoid can produce an identity value and combine two values into one.

The combine method borrows both traces:

fn combine(&self, other: &Self) -> Self;

&self and &Self are references. They let the method read the two existing values without taking ownership of them. The method returns a new combined value.

ML Concept

Common monoid-like values in ML systems include logs, traces, metrics, batches, and accumulated gradients.

You often need to combine many small values into one larger value.

Category-Theory Concept

A monoid has:

identity element
associative binary operation

The laws are:

empty combine a = a
a combine empty = a
(a combine b) combine c = a combine (b combine c)

TraceStep and PipelineTrace

The problem this block solves is:

Pipeline execution steps should be combinable into a larger trace.

The key types:

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct TraceStep(&'static str);

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct PipelineTrace(Vec<TraceStep>);

Rust Syntax

TraceStep wraps one static string.

PipelineTrace wraps a vector of steps.

PipelineTrace::from_steps collects any iterable of steps.

names() returns the raw names for display:

self.0.iter().map(TraceStep::name).collect()

These wrappers matter because two values may both be strings but mean different things. A TraceStep is not a model name, a token, or a user-facing sentence. The type keeps that meaning attached to the value.

ML Concept

A trace can record:

embedding
linear
softmax
cross_entropy

This is useful for understanding which stages ran.

Category-Theory Concept

A pipeline trace is a sequence-like monoid.

The empty trace is the identity.

Combining traces is concatenation.

PipelineTrace as Monoid

The problem this block solves is:

Make traces obey the monoid interface.

The implementation:

impl Monoid for PipelineTrace {
    fn empty() -> Self {
        PipelineTrace(vec![])
    }

    fn combine(&self, other: &Self) -> Self {
        let mut combined = self.0.clone();
        combined.extend_from_slice(&other.0);
        PipelineTrace(combined)
    }
}

Rust Syntax

empty returns an empty vector.

combine clones the first trace, appends the second trace, and wraps the result again as PipelineTrace.

The clone is intentional here: combine borrows both inputs, so it cannot move steps out of either trace. Cloning the small TraceStep values lets the method produce a fresh trace while leaving the inputs usable.

use category_theory_transformer_rs::{Monoid, PipelineTrace, TraceStep};

let encoder = PipelineTrace::from_steps([TraceStep::new("embedding")]);
let head = PipelineTrace::from_steps([TraceStep::new("softmax")]);
let trace = encoder.combine(&head);

assert_eq!(trace.names(), vec!["embedding", "softmax"]);

ML Concept

This is how many execution logs work:

trace_a + trace_b = longer trace

Category-Theory Concept

This is the list monoid specialized to trace steps.

Monoid Law Check

The problem this block solves is:

Verify the identity and associativity laws for the trace type.

The function:

pub fn monoid_laws_hold_for_pipeline_trace() -> bool {
    let a = PipelineTrace::from_steps(vec![TraceStep::new("embedding")]);
    let b = PipelineTrace::from_steps(vec![TraceStep::new("linear")]);
    let c = PipelineTrace::from_steps(vec![TraceStep::new("softmax")]);
    let identity = PipelineTrace::empty();

    let left_identity = identity.combine(&a) == a;
    let right_identity = a.combine(&identity) == a;
    let associativity = a.combine(&b).combine(&c) == a.combine(&b.combine(&c));

    left_identity && right_identity && associativity
}

Rust Syntax

The function constructs three traces and checks three booleans.

It returns true only if all monoid laws hold for those examples.

use category_theory_transformer_rs::monoid_laws_hold_for_pipeline_trace;

assert!(monoid_laws_hold_for_pipeline_trace());

ML Concept

Grouping trace combination should not change the final trace.

This matters when systems combine logs from nested pipelines.

Category-Theory Concept

This directly checks the monoid laws:

identity
associativity

The Calculus File

The problem src/calculus.rs solves is:

Backpropagation is easier to understand if you first see one local derivative rule in isolation.

What to notice: Backpropagation does not require every operation to know the whole model. Each operation only needs to know how its own output changes when its own inputs change. Composition does the rest.

The file defines:

Scalar
LocalGradient
MulOp

Scalar

The block:

#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Scalar(f32);

Rust Syntax

Scalar wraps one f32.

Scalar::new rejects non-finite values.

value() returns the raw float.

This repeats a pattern from the earlier domain-object chapter:

private field
validating constructor
accessor

The raw f32 is private, so callers cannot construct Scalar(f32::NAN) directly. They must use Scalar::new, which returns a Result.

use category_theory_transformer_rs::Scalar;

let scalar = Scalar::new(2.5)?;

assert_eq!(scalar.value(), 2.5);

Ok::<(), category_theory_transformer_rs::CtError>(())

ML Concept

This is a single numeric value in a computation graph.

Examples:

activation
loss component
weight

Category-Theory Concept

It is the simple numeric object used in the local derivative example.

LocalGradient

The block:

#[derive(Debug, Clone, Copy, PartialEq)]
pub struct LocalGradient(f32);

Rust Syntax

This is another f32 wrapper.

It has the same finite-value validation as Scalar.

The semantic difference is important:

Scalar = forward value
LocalGradient = derivative signal

This is the same newtype move used throughout the crate. Both wrappers store an f32, but the types prevent accidental mixing at function boundaries.

ML Concept

A gradient tells how a loss changes when an intermediate value changes.

For example:

dL/dz

Category-Theory Concept

This is information flowing backward through a composed computation.

MulOp

The problem this block solves is:

Show a forward operation and its local backward rule.

The important methods:

pub fn forward(&self, x: Scalar, y: Scalar) -> CtResult<Scalar> {
    Scalar::new(x.value() * y.value())
}

pub fn backward(
    &self,
    x: Scalar,
    y: Scalar,
    upstream: LocalGradient,
) -> CtResult<(LocalGradient, LocalGradient)> {
    let dz_dx = y.value();
    let dz_dy = x.value();

    Ok((
        LocalGradient::new(upstream.value() * dz_dx)?,
        LocalGradient::new(upstream.value() * dz_dy)?,
    ))
}

Rust Syntax

forward multiplies two scalars and validates the result.

backward takes:

x
y
upstream gradient dL/dz

and returns:

(dL/dx, dL/dy)

The return type is a Rust tuple.

The ? operator appears twice in backward:

LocalGradient::new(upstream.value() * dz_dx)?

It means:

if construction succeeded, keep the value
if construction failed, return the error from backward immediately

That keeps invalid numeric states at the boundary where they are created.

use category_theory_transformer_rs::{LocalGradient, MulOp, Scalar};

let mul = MulOp;
let x = Scalar::new(2.0)?;
let y = Scalar::new(3.0)?;
let z = mul.forward(x, y)?;
let (dl_dx, dl_dy) = mul.backward(x, y, LocalGradient::new(1.0)?)?;

assert_eq!(z.value(), 6.0);
assert_eq!(dl_dx.value(), 3.0);
assert_eq!(dl_dy.value(), 2.0);

Ok::<(), category_theory_transformer_rs::CtError>(())

ML Concept

For:

z = x * y

the local derivatives are:

dz/dx = y
dz/dy = x

By the chain rule:

dL/dx = dL/dz * dz/dx = dL/dz * y
dL/dy = dL/dz * dz/dy = dL/dz * x

Category-Theory Concept

The chain rule is composition of local derivative maps.

A big neural network is many small maps composed forward, then many local gradient rules composed backward.

Run The Example

Source snapshot: examples/04_structure_and_calculus.rs
use category_theory_transformer_rs::{
    CtResult, Functor, LocalGradient, Monoid, MulOp, OptionFunctor, PipelineTrace, Scalar,
    TraceStep, VecFunctor, monoid_laws_hold_for_pipeline_trace,
    naturality_square_holds_for_first_option,
};

fn main() -> CtResult<()> {
    let squared = VecFunctor::fmap(vec![1, 2, 3], |x| x * x);
    let shifted = OptionFunctor::fmap(Some(7), |x| x + 1);

    println!("Vec fmap square: {squared:?}");
    println!("Option fmap +1: {shifted:?}");
    println!(
        "naturality square holds: {}",
        naturality_square_holds_for_first_option()
    );

    let trace = PipelineTrace::from_steps(vec![TraceStep::new("embedding")])
        .combine(&PipelineTrace::from_steps(vec![TraceStep::new("linear")]))
        .combine(&PipelineTrace::from_steps(vec![TraceStep::new("softmax")]));

    println!("trace: {:?}", trace.names());
    println!(
        "monoid laws hold: {}",
        monoid_laws_hold_for_pipeline_trace()
    );

    let mul = MulOp;
    let x = Scalar::new(2.0)?;
    let y = Scalar::new(3.0)?;
    let upstream = LocalGradient::new(1.0)?;
    let (dl_dx, dl_dy) = mul.backward(x, y, upstream)?;

    println!("dL/dx: {}", dl_dx.value());
    println!("dL/dy: {}", dl_dy.value());

    Ok(())
}

Run:

cargo run --example 04_structure_and_calculus

You should see mapping over Vec, mapping over Option, a naturality check, a combined trace, a monoid law check, and local gradients for multiplication.

Core Mental Model

In Rust terms:

traits name reusable operation shapes
unit structs demonstrate stateless operations
tests check laws

In ML terms:

map over structures, combine traces, compose local gradients

In category-theory terms:

functors preserve structure
natural transformations commute with mapping
monoids combine associatively with identity
chain rule composes derivative information

Checkpoint

Why is “local rule plus composition” the core idea behind backpropagation?

A strong answer:

Each operation only needs its local derivative; the chain rule composes those local derivatives into the gradient for the whole computation.

Where This Leaves Us

This chapter named the repeated structures that sit underneath the small ML pipeline. A functor explains mapping inside a wrapper. A natural transformation explains changing wrappers consistently. A monoid explains safe accumulation. A local gradient explains why a large training computation can be assembled from small derivative rules.

The next chapter uses the same engineering habit on a wider set of ideas from applied category theory. Instead of adding a larger ML model, it shows how the same typed-Rust style can model orders, resources, database instances, design relations, signal flow, circuits, and local-to-global behavior.

Further Reading

These pages reinforce the structure vocabulary used here:

  • Glossary: functor, natural transformation, monoid, chain rule
  • References: applied category theory, deep learning math, and attention

Retrieval Practice

Recall

What does a functor do to values inside a wrapper?

Explain

Why is a pipeline trace a good example of a monoid?

Apply

Write a small example of a value in your own codebase that has an empty value and a combine operation. State the identity law it should satisfy.