Course Map

The problem this chapter solves is:

Before reading individual Rust files, you need one map of how the whole machine-learning pipeline, Rust type system, and category-theory vocabulary fit together.

The repository is small, but it contains several layers:

domain objects
  -> typed morphisms
  -> concrete ML morphisms
  -> training endomorphism
  -> reusable structure patterns
  -> applied category-theory sketches

This chapter is the index of those layers.

Reader orientation: The map is not a list of things to memorize. It is a promise about how the book will move: first name the values, then name the arrows, then compose the arrows into a tiny learning system.

What You Already Know

If you already read programs from top to bottom, you know how to follow a flow. If you know Rust function signatures, you know that each step has an input type and an output type. If you know ML pipelines, you know that raw data becomes features, predictions, loss, and updates. This chapter puts those familiar habits on one map.

The Whole Pipeline

The central pipeline is:

TokenSequence -> TrainingSet
TokenId       -> Vector
Vector        -> Logits
Logits        -> Distribution
Distribution x TokenId -> Loss
Parameters    -> Parameters

Read this as three stories at once.

In ML terms:

tokenized text
  -> prediction examples
  -> embeddings
  -> vocabulary scores
  -> probabilities
  -> error measurement
  -> updated weights

In Rust terms:

validated input types
  -> trait implementations
  -> explicit error handling
  -> private fields
  -> read-only accessors
  -> tests

In category-theory terms:

objects
  -> morphisms
  -> products
  -> composition
  -> endomorphisms
  -> laws

The course is about learning to see the same pipeline through all three views.

Worked Example: A Tiny Typed Movement

Here is the smallest Rust idea behind that map. A function has an input type and an output type:

#![allow(unused)]
fn main() {
fn token_to_vector_id(token_id: usize) -> usize {
    token_id + 100
}

assert_eq!(token_to_vector_id(7), 107);
}

The real code does not leave those values as raw usize forever. It gives each pipeline stage a domain type, then uses morphisms to make the connections explicit.

Self-Check

Before moving into the file map, explain why TokenId -> Vector is easier to reason about than usize -> Vec<f32>.

Code Map

Each Rust file owns one part of the idea.

`src/domain.rs`

This file defines the nouns.

The main examples are TokenId, TokenSequence, Vector, Logits, Distribution, Loss, TrainingSet, and Parameters.

The problem this file solves is:

Raw numbers are too ambiguous for a training pipeline.

For example, these are all machine numbers:

token index
vocabulary size
model dimension
loss value
learning rate

But they are not the same concept.

src/domain.rs gives each concept a separate type.

`src/category.rs`

This file defines the arrows.

The central trait is:

pub trait Morphism<Input, Output> {
    fn name(&self) -> &'static str;
    fn apply(&self, input: Input) -> CtResult<Output>;
}

This says:

A morphism is something that knows how to transform an Input into an Output, possibly failing with CtError.

The rest of the file defines identity, composition, endomorphism, and repeated application.

`src/ml.rs`

This file defines concrete ML arrows.

The main transformations are:

DatasetWindowing : TokenSequence -> TrainingSet
Embedding        : TokenId -> Vector
LinearToLogits   : Vector -> Logits
Softmax          : Logits -> Distribution
CrossEntropy     : Distribution x TokenId -> Loss

This file is where the abstract Morphism trait becomes a tiny learning system.

`src/training.rs`

This file defines:

TrainStep : Parameters -> Parameters

That shape is important.

Because the output type is the same as the input type, training can be repeated:

Parameters0 -> Parameters1 -> Parameters2 -> ... -> ParametersN

That is why training is taught as an endomorphism.

`src/structure.rs`

This file teaches reusable structure:

functor: map inside a wrapper
natural transformation: convert wrapper shape consistently
monoid: combine values with an empty value

These are not extra theory for decoration. They name patterns that appear in ordinary ML systems: batches, optional values, traces, logs, and composed workflows.

`src/calculus.rs`

This file shows the smallest useful backpropagation idea:

z = x * y
dL/dx = dL/dz * y
dL/dy = dL/dz * x

The code does not implement a full automatic differentiation engine. It gives you the local rule that larger systems compose.

`src/sketches.rs`

This file connects the course to seven applied category-theory themes: orders, resources, databases, co-design, signal flow, circuits, and behavior logic.

Each theme is represented as typed Rust values plus law-checking tests.

Guided Walkthrough Snapshot

The terminal demo is the spine of the course.

The problem this block solves is:

A learner should be able to run one command and see every major concept used once in a concrete order.

Source snapshot: src/demo.rs

use crate::calculus::{LocalGradient, MulOp, Scalar};
use crate::category::{Compose, StepCount, apply_endomorphism_n_times};
use crate::domain::{
    LearningRate, Logits, ModelDimension, Parameters, Product, TokenId, TokenSequence, Vector,
    VocabSize,
};
use crate::error::CtResult;
use crate::ml::{
    CrossEntropy, DatasetWindowing, Embedding, LinearToLogits, Softmax, average_loss,
    composed_prediction_matches_direct_prediction,
};
use crate::structure::{
    Functor, Monoid, OptionFunctor, PipelineTrace, TraceStep, VecFunctor,
    monoid_laws_hold_for_pipeline_trace, naturality_square_holds_for_first_option,
};
use crate::training::TrainStep;
use crate::{Identity, Morphism};

/// Run the full terminal walkthrough used by `cargo run --bin category_ml`.
pub fn run_demo() -> CtResult<()> {
    println!("Category theory concepts implemented in Rust 2024");
    println!("=================================================\n");

    let vocab = ["<pad>", "I", "love", "Rust", "."];
    let raw_text = TokenSequence::from_indices([1, 2, 3, 4, 1, 2, 3, 4])?;

    println!("1. Object examples");
    println!("   TokenId(1) means {:?}\n", vocab[1]);

    println!("2. Dataset morphism: TokenSequence -> TrainingSet");
    let dataset = DatasetWindowing.apply(raw_text)?;
    for example in dataset.examples() {
        println!(
            "   {:?} -> {:?}",
            vocab[example.first().index()],
            vocab[example.second().index()]
        );
    }
    println!();

    println!("3. Identity morphism: id_Vector : Vector -> Vector");
    let v = Vector::new(vec![1.0, 2.0, 3.0]);
    let same_v = Identity::<Vector>::new().apply(v.clone())?;
    println!("   input  = {:?}", v);
    println!("   output = {:?}\n", same_v);

    println!("4. Composition: Softmax after Linear after Embedding");
    let params = Parameters::init(VocabSize::new(vocab.len())?, ModelDimension::new(4)?);
    let embedding = Embedding::from_parameters(&params);
    let linear = LinearToLogits::from_parameters(&params);
    let token_to_logits = Compose::<_, _, Vector>::new(embedding, linear);
    let token_to_distribution = Compose::<_, _, Logits>::new(token_to_logits, Softmax);
    let distribution = token_to_distribution.apply(TokenId::new(1))?;
    println!("   P(next token | 'I') = {:?}\n", distribution.as_slice());

    println!("5. Product object: Prediction x Target -> Loss");
    let loss = CrossEntropy.apply(Product::new(distribution, TokenId::new(2)))?;
    println!("   loss for target 'love' = {:.6}\n", loss.value());

    println!("6. Endomorphism: TrainStep : Parameters -> Parameters");
    let before = average_loss(&params, &dataset)?;
    let train_step = TrainStep::new(dataset.clone(), LearningRate::new(1.0)?);
    let trained_params =
        apply_endomorphism_n_times(&train_step, params.clone(), StepCount::new(80))?;
    let after = average_loss(&trained_params, &dataset)?;
    println!("   average loss before training = {:.6}", before.value());
    println!("   average loss after  training = {:.6}\n", after.value());

    println!("7. Functor: fmap over Vec and Option");
    let xs = vec![1, 2, 3];
    let ys = VecFunctor::fmap(xs, |x| x * x);
    let maybe = OptionFunctor::fmap(Some(7), |x| x + 1);
    println!("   VecFunctor fmap square: {:?}", ys);
    println!("   OptionFunctor fmap +1: {:?}\n", maybe);

    println!("8. Natural transformation: Vec<A> -> Option<A>");
    println!(
        "   naturality square holds: {}\n",
        naturality_square_holds_for_first_option()
    );

    println!("9. Monoid: pipeline traces compose associatively with identity");
    let trace = PipelineTrace::from_steps(vec![TraceStep::new("embedding")])
        .combine(&PipelineTrace::from_steps(vec![TraceStep::new("linear")]))
        .combine(&PipelineTrace::from_steps(vec![TraceStep::new("softmax")]));
    println!("   trace = {:?}", trace.names());
    println!(
        "   monoid laws hold for this trace type: {}\n",
        monoid_laws_hold_for_pipeline_trace()
    );

    println!("10. Commutative diagram check");
    println!(
        "   composed prediction == direct prediction: {}\n",
        composed_prediction_matches_direct_prediction(&params)?
    );

    println!("11. Chain rule / local derivative morphism");
    let mul = MulOp;
    let x = Scalar::new(2.0)?;
    let y = Scalar::new(3.0)?;
    let z = mul.forward(x, y)?;
    let upstream = LocalGradient::new(1.0)?;
    let (dl_dx, dl_dy) = mul.backward(x, y, upstream)?;
    println!("   z = x * y = {}", z.value());
    println!(
        "   if dL/dz = {}, then dL/dx = {}, dL/dy = {}\n",
        upstream.value(),
        dl_dx.value(),
        dl_dy.value()
    );

    println!("Compressed categorical training view:");
    println!("   Dataset x Parameters -> Prediction -> Loss -> Gradients -> Updated Parameters");
    println!("   TrainStep is repeated as Parameters0 -> Parameters1 -> ... -> ParametersN");

    Ok(())
}

How To Read The Demo

The demo is not random output. It is a staged proof that the pieces connect.

Section 1 introduces an object:

TokenId(1)

Section 2 applies a data-preparation morphism:

TokenSequence -> TrainingSet

Section 3 applies identity:

Vector -> Vector

Section 4 composes prediction:

TokenId -> Vector -> Logits -> Distribution

Section 5 uses a product object:

Distribution x TokenId -> Loss

Section 6 repeats an endomorphism:

Parameters -> Parameters

Sections 7 through 11 add the structural patterns:

Functor
NaturalTransformation
Monoid
Commutative diagram check
Chain rule

So the demo is a miniature course outline in executable form.

Binary Entrypoint

The binary entrypoint is deliberately tiny:

Source snapshot: src/bin/category_ml.rs

fn main() -> category_theory_transformer_rs::CtResult<()> {
    category_theory_transformer_rs::run_demo()
}

The whole file is:

use category_theory_transformer_rs::run_demo;

fn main() {
    run_demo().unwrap();
}

Line by line:

use category_theory_transformer_rs::run_demo;

This imports the library function that owns the walkthrough.

fn main()

This is the process entrypoint. When you run the binary, Rust starts here.

run_demo().unwrap();

This runs the walkthrough and panics if it fails. In the library code, fallible work uses CtResult. The binary keeps the entrypoint short because the course focus is the library, not command-line error reporting.

First Run

Run:

cargo run --bin category_ml

You should see a tiny language-model pipeline and the loss decreasing after training.

The important part is not the exact floating-point numbers.

The important part is the shape:

before training: higher loss
after training:  lower loss

That means repeated TrainStep applications moved the parameters in a useful direction on the tiny dataset.

Core Mental Model

Every chapter after this one zooms into one row of the map.

Remember:

object = typed thing
morphism = typed transformation
composition = legal connection of transformations
endomorphism = transformation from a type back to itself
law = property the code checks so composition remains trustworthy

Checkpoint

Explain this line in your own words:

TokenId -> Vector -> Logits -> Distribution

A strong answer should mention token lookup, the embedding vector, vocabulary scores, the probability distribution, and the fact that the whole path is a composition of typed morphisms.

Category Theory for Tiny ML in Rust