Repository Source Snapshots
This appendix collects the learner-facing source files used throughout the
course. The Rust snapshots are complete files, so they are marked ignore as
standalone snippets. The real source files are still validated by the repository
checks.
The problem this appendix solves is:
The explanatory chapters should never drift away from the real code.
Use this appendix as the exact-code lookup layer.
When you inspect any block here, use the same reading pattern as the chapters:
Rust syntax:
what does the code literally declare or execute?
ML or software concept:
why does this block exist in the pipeline or system model?
Category theory concept:
what object, morphism, product, composition, endomorphism, or law does it
represent?
The appendix is intentionally less narrative than the chapters. It keeps the full source available in one place so you can verify every explanation against the code itself.
Rust Library Surface
src/lib.rs
//! A small, modular Rust tutorial for category-theory ideas in tiny ML.
//!
//! The crate is intentionally split into small learning modules:
//! - [`domain`] defines the nouns: tokens, vectors, probabilities, losses, and parameters.
//! - [`category`] defines the arrows: morphisms, identity, composition, and endomorphisms.
//! - [`ml`] implements concrete ML morphisms.
//! - [`training`] turns one optimizer step into an endomorphism on parameters.
//! - [`structure`] covers functors, natural transformations, and monoids.
//! - [`calculus`] shows the chain rule as a local backward pass.
//! - [`sketches`] gives Rust models for the seven applied-category-theory sketches.
//! - [`demo`] connects the pieces into the terminal walkthrough.
pub mod calculus;
pub mod category;
pub mod demo;
pub mod domain;
pub mod error;
pub mod ml;
pub mod sketches;
pub mod structure;
pub mod training;
pub use calculus::{LocalGradient, MulOp, Scalar};
pub use category::{
Compose, Endomorphism, Identity, Morphism, StepCount, apply_endomorphism_n_times,
};
pub use demo::run_demo;
pub use domain::{
Distribution, LearningRate, Logits, Loss, ModelDimension, Parameters, Product, TokenId,
TokenSequence, TrainingExample, TrainingSet, Vector, VocabSize,
};
pub use error::{CtError, CtResult};
pub use ml::{
CrossEntropy, DatasetWindowing, DirectPredict, Embedding, LinearToLogits, Softmax,
average_loss, composed_prediction_matches_direct_prediction,
};
pub use sketches::{
CircuitComponent, CompanyInstance, DepartmentId, DesignRequirement, EmployeeId, EmployeeRecord,
FeasibilityRelation, FeatureCount, ImplementationOffer, InformationLevel, LatencyMs,
LayerBudget, LocalSafetyCheck, MatrixCols, MatrixRows, OpenCircuit, PortName, ResistanceOhms,
ResourceAmount, ResourceBundle, SafetyCover, SignalCoefficient, SignalMatrix, Throughput,
TimeInterval, TimeTick, TruthValue, abstract_to_layer_budget, concretize_layer_budget,
feature_layer_galois_law_holds, information_order_obeys_preorder_laws,
resource_tensor_is_monotone,
};
pub use structure::{
Functor, Monoid, NaturalTransformation, OptionFunctor, PipelineTrace, TraceStep, VecFunctor,
VecToFirstOption, monoid_laws_hold_for_pipeline_trace,
naturality_square_holds_for_first_option,
};
pub use training::TrainStep;
src/error.rs
use std::fmt::{self, Display};
/// Shared error type for the tutorial crate.
#[derive(Debug, Clone, PartialEq)]
pub enum CtError {
EmptyInput(&'static str),
OutOfRange {
kind: &'static str,
index: usize,
limit: usize,
},
ShapeMismatch {
op: &'static str,
expected: String,
got: String,
},
InvalidProbability(&'static str),
InvalidLoss(f32),
InvalidLearningRate(f32),
InvalidQuantity {
kind: &'static str,
value: i64,
},
InvalidInterval {
start: usize,
end: usize,
},
}
impl Display for CtError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
CtError::EmptyInput(context) => write!(f, "empty input in {context}"),
CtError::OutOfRange { kind, index, limit } => {
write!(f, "{kind} index {index} out of range; limit is {limit}")
}
CtError::ShapeMismatch { op, expected, got } => {
write!(f, "shape mismatch in {op}: expected {expected}, got {got}")
}
CtError::InvalidProbability(context) => {
write!(f, "invalid probability distribution in {context}")
}
CtError::InvalidLoss(value) => write!(f, "invalid loss value {value}"),
CtError::InvalidLearningRate(value) => write!(f, "invalid learning rate {value}"),
CtError::InvalidQuantity { kind, value } => {
write!(f, "invalid {kind} quantity {value}")
}
CtError::InvalidInterval { start, end } => {
write!(f, "invalid interval: start {start} is after end {end}")
}
}
}
}
impl std::error::Error for CtError {}
pub type CtResult<T> = Result<T, CtError>;
src/domain.rs
use crate::error::{CtError, CtResult};
/// A vocabulary index. It is intentionally not a raw `usize` in public APIs.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct TokenId(usize);
impl TokenId {
pub fn new(index: usize) -> Self {
Self(index)
}
pub fn index(&self) -> usize {
self.0
}
}
impl From<usize> for TokenId {
fn from(value: usize) -> Self {
Self::new(value)
}
}
/// A sequence of tokens before it has been converted into training pairs.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct TokenSequence(Vec<TokenId>);
impl TokenSequence {
pub fn new(tokens: impl IntoIterator<Item = TokenId>) -> CtResult<Self> {
let tokens = tokens.into_iter().collect::<Vec<_>>();
if tokens.is_empty() {
return Err(CtError::EmptyInput("token sequence"));
}
Ok(Self(tokens))
}
pub fn from_indices(indices: impl IntoIterator<Item = usize>) -> CtResult<Self> {
Self::new(indices.into_iter().map(TokenId::new))
}
pub fn as_slice(&self) -> &[TokenId] {
&self.0
}
}
/// A dense feature vector.
#[derive(Debug, Clone, PartialEq)]
pub struct Vector(Vec<f32>);
impl Vector {
pub fn new(values: Vec<f32>) -> Self {
Self(values)
}
pub fn as_slice(&self) -> &[f32] {
&self.0
}
}
/// Unnormalized model scores.
#[derive(Debug, Clone, PartialEq)]
pub struct Logits(Vec<f32>);
impl Logits {
pub fn new(values: Vec<f32>) -> Self {
Self(values)
}
pub fn as_slice(&self) -> &[f32] {
&self.0
}
}
/// A validated probability distribution.
#[derive(Debug, Clone, PartialEq)]
pub struct Distribution(Vec<f32>);
impl Distribution {
pub fn new(probabilities: Vec<f32>) -> CtResult<Self> {
if probabilities.is_empty() {
return Err(CtError::EmptyInput("distribution"));
}
let sum: f32 = probabilities.iter().sum();
let all_valid = probabilities
.iter()
.all(|probability| probability.is_finite() && *probability >= 0.0);
if !all_valid || !approx_eq(sum, 1.0, 1e-4) {
return Err(CtError::InvalidProbability("distribution constructor"));
}
Ok(Self(probabilities))
}
pub fn as_slice(&self) -> &[f32] {
&self.0
}
}
/// A non-negative scalar objective value.
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Loss(f32);
impl Loss {
pub fn new(value: f32) -> CtResult<Self> {
if !value.is_finite() || value < 0.0 {
return Err(CtError::InvalidLoss(value));
}
Ok(Self(value))
}
pub fn value(&self) -> f32 {
self.0
}
}
/// Number of vocabulary entries.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct VocabSize(usize);
impl VocabSize {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("vocabulary"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Width of each embedding vector.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct ModelDimension(usize);
impl ModelDimension {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("model dimension"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Positive optimizer step size.
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct LearningRate(f32);
impl LearningRate {
pub fn new(value: f32) -> CtResult<Self> {
if !value.is_finite() || value <= 0.0 {
return Err(CtError::InvalidLearningRate(value));
}
Ok(Self(value))
}
pub fn value(&self) -> f32 {
self.0
}
}
/// Categorical product object: `A x B`.
#[derive(Debug, Clone, PartialEq)]
pub struct Product<A, B> {
first: A,
second: B,
}
impl<A, B> Product<A, B> {
pub fn new(first: A, second: B) -> Self {
Self { first, second }
}
pub fn first(&self) -> &A {
&self.first
}
pub fn second(&self) -> &B {
&self.second
}
pub fn into_parts(self) -> (A, B) {
(self.first, self.second)
}
}
pub type TrainingExample = Product<TokenId, TokenId>;
/// Non-empty next-token training pairs.
#[derive(Debug, Clone, PartialEq)]
pub struct TrainingSet(Vec<TrainingExample>);
impl TrainingSet {
pub fn new(examples: impl IntoIterator<Item = TrainingExample>) -> CtResult<Self> {
let examples = examples.into_iter().collect::<Vec<_>>();
if examples.is_empty() {
return Err(CtError::EmptyInput("training set"));
}
Ok(Self(examples))
}
pub fn examples(&self) -> &[TrainingExample] {
&self.0
}
pub fn len(&self) -> usize {
self.0.len()
}
pub fn is_empty(&self) -> bool {
self.0.is_empty()
}
}
/// Tiny model parameters for an embedding plus language-model head.
#[derive(Debug, Clone, PartialEq)]
pub struct Parameters {
pub(crate) embedding: Vec<Vec<f32>>,
pub(crate) lm_head: Vec<Vec<f32>>,
pub(crate) bias: Vec<f32>,
}
impl Parameters {
pub fn init(vocab_size: VocabSize, d_model: ModelDimension) -> Self {
let vocab_size = vocab_size.value();
let d_model = d_model.value();
Self {
embedding: init_matrix(vocab_size, d_model, 0.2, 1),
lm_head: init_matrix(d_model, vocab_size, 0.2, 2),
bias: vec![0.0; vocab_size],
}
}
pub fn vocab_size(&self) -> usize {
self.bias.len()
}
pub fn d_model(&self) -> usize {
self.embedding.first().map_or(0, Vec::len)
}
pub fn embedding_table(&self) -> &[Vec<f32>] {
&self.embedding
}
pub fn lm_head(&self) -> &[Vec<f32>] {
&self.lm_head
}
pub fn bias(&self) -> &[f32] {
&self.bias
}
}
pub(crate) fn init_matrix(rows: usize, cols: usize, scale: f32, seed: usize) -> Vec<Vec<f32>> {
let mut out = vec![vec![0.0; cols]; rows];
for (row_index, row) in out.iter_mut().enumerate() {
for (col_index, value) in row.iter_mut().enumerate() {
let raw = ((row_index * cols + col_index) * 37 + seed * 101) % 1000;
let unit = raw as f32 / 1000.0;
*value = (unit - 0.5) * scale;
}
}
out
}
pub(crate) fn approx_eq(a: f32, b: f32, eps: f32) -> bool {
(a - b).abs() <= eps
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn distribution_rejects_non_normalized_values() {
let result = Distribution::new(vec![0.4, 0.4]);
assert!(matches!(result, Err(CtError::InvalidProbability(_))));
}
#[test]
fn token_sequence_rejects_empty_input() {
let result = TokenSequence::new(vec![]);
assert!(matches!(result, Err(CtError::EmptyInput("token sequence"))));
}
}
src/category.rs
use std::marker::PhantomData;
use crate::error::CtResult;
/// A typed category-theory arrow: `Input -> Output`.
pub trait Morphism<Input, Output> {
fn name(&self) -> &'static str;
fn apply(&self, input: Input) -> CtResult<Output>;
}
/// Identity morphism: `id_A : A -> A`.
#[derive(Debug, Clone, Copy)]
pub struct Identity<T> {
_marker: PhantomData<T>,
}
impl<T> Identity<T> {
pub fn new() -> Self {
Self {
_marker: PhantomData,
}
}
}
impl<T> Default for Identity<T> {
fn default() -> Self {
Self::new()
}
}
impl<T> Morphism<T, T> for Identity<T> {
fn name(&self) -> &'static str {
"identity"
}
fn apply(&self, input: T) -> CtResult<T> {
Ok(input)
}
}
/// Composition of two morphisms: if `f : A -> B` and `g : B -> C`, this is
/// `g after f : A -> C`.
#[derive(Debug, Clone)]
pub struct Compose<F, G, Middle> {
first: F,
second: G,
_middle: PhantomData<Middle>,
}
impl<F, G, Middle> Compose<F, G, Middle> {
pub fn new(first: F, second: G) -> Self {
Self {
first,
second,
_middle: PhantomData,
}
}
}
impl<Input, Middle, Output, F, G> Morphism<Input, Output> for Compose<F, G, Middle>
where
F: Morphism<Input, Middle>,
G: Morphism<Middle, Output>,
{
fn name(&self) -> &'static str {
"composition"
}
fn apply(&self, input: Input) -> CtResult<Output> {
let middle = self.first.apply(input)?;
self.second.apply(middle)
}
}
/// Endomorphism: a morphism from a type back to itself.
pub trait Endomorphism<T>: Morphism<T, T> {}
impl<T, M> Endomorphism<T> for M where M: Morphism<T, T> {}
/// How many times to repeat an endomorphism.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct StepCount(usize);
impl StepCount {
pub fn new(value: usize) -> Self {
Self(value)
}
pub fn value(&self) -> usize {
self.0
}
}
/// Repeatedly apply an endomorphism: `A0 -> A1 -> ... -> An`.
pub fn apply_endomorphism_n_times<T, E>(endo: &E, mut value: T, count: StepCount) -> CtResult<T>
where
E: Endomorphism<T>,
{
for _ in 0..count.value() {
value = endo.apply(value)?;
}
Ok(value)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn identity_returns_the_same_value() -> CtResult<()> {
let value = String::from("same");
assert_eq!(Identity::<String>::new().apply(value.clone())?, value);
Ok(())
}
}
src/ml.rs
use crate::category::{Compose, Morphism};
use crate::domain::{
Distribution, Logits, Loss, Parameters, Product, TokenId, TokenSequence, TrainingSet, Vector,
approx_eq,
};
use crate::error::{CtError, CtResult};
/// Turns adjacent tokens into next-token training examples.
#[derive(Debug, Clone)]
pub struct DatasetWindowing;
impl Morphism<TokenSequence, TrainingSet> for DatasetWindowing {
fn name(&self) -> &'static str {
"dataset_windowing"
}
fn apply(&self, tokens: TokenSequence) -> CtResult<TrainingSet> {
if tokens.as_slice().len() < 2 {
return Err(CtError::EmptyInput(
"dataset windowing requires at least 2 tokens",
));
}
TrainingSet::new(
tokens
.as_slice()
.windows(2)
.map(|pair| Product::new(pair[0], pair[1])),
)
}
}
/// Morphism from token id to embedding vector.
#[derive(Debug, Clone)]
pub struct Embedding {
table: Vec<Vec<f32>>,
}
impl Embedding {
pub fn from_parameters(params: &Parameters) -> Self {
Self {
table: params.embedding_table().to_vec(),
}
}
}
impl Morphism<TokenId, Vector> for Embedding {
fn name(&self) -> &'static str {
"embedding"
}
fn apply(&self, token: TokenId) -> CtResult<Vector> {
let Some(row) = self.table.get(token.index()) else {
return Err(CtError::OutOfRange {
kind: "token",
index: token.index(),
limit: self.table.len(),
});
};
Ok(Vector::new(row.clone()))
}
}
/// Linear projection from hidden vector to vocabulary logits.
#[derive(Debug, Clone)]
pub struct LinearToLogits {
weight: Vec<Vec<f32>>,
bias: Vec<f32>,
}
impl LinearToLogits {
pub fn from_parameters(params: &Parameters) -> Self {
Self {
weight: params.lm_head().to_vec(),
bias: params.bias().to_vec(),
}
}
pub(crate) fn from_parts(weight: Vec<Vec<f32>>, bias: Vec<f32>) -> Self {
Self { weight, bias }
}
}
impl Morphism<Vector, Logits> for LinearToLogits {
fn name(&self) -> &'static str {
"linear_to_logits"
}
fn apply(&self, input: Vector) -> CtResult<Logits> {
let d_model = input.as_slice().len();
let vocab_size = self.bias.len();
if self.weight.len() != d_model {
return Err(CtError::ShapeMismatch {
op: "linear layer",
expected: format!("weight rows == input dim {d_model}"),
got: format!("weight rows {}", self.weight.len()),
});
}
let mut out = self.bias.clone();
for (feature, input_value) in input.as_slice().iter().enumerate() {
if self.weight[feature].len() != vocab_size {
return Err(CtError::ShapeMismatch {
op: "linear layer",
expected: format!("weight cols == vocab size {vocab_size}"),
got: format!("weight cols {}", self.weight[feature].len()),
});
}
for (vocab_id, output_value) in out.iter_mut().enumerate() {
*output_value += input_value * self.weight[feature][vocab_id];
}
}
Ok(Logits::new(out))
}
}
/// Converts logits to a probability distribution.
#[derive(Debug, Clone)]
pub struct Softmax;
impl Morphism<Logits, Distribution> for Softmax {
fn name(&self) -> &'static str {
"softmax"
}
fn apply(&self, logits: Logits) -> CtResult<Distribution> {
if logits.as_slice().is_empty() {
return Err(CtError::EmptyInput("softmax"));
}
let max_value = logits
.as_slice()
.iter()
.copied()
.fold(f32::NEG_INFINITY, f32::max);
let mut exps = Vec::with_capacity(logits.as_slice().len());
let mut sum = 0.0;
for value in logits.as_slice() {
let exp = (*value - max_value).exp();
exps.push(exp);
sum += exp;
}
if sum <= 0.0 || !sum.is_finite() {
return Err(CtError::InvalidProbability("softmax"));
}
Distribution::new(exps.into_iter().map(|value| value / sum).collect())
}
}
/// Negative log likelihood for `(distribution, target_token)`.
#[derive(Debug, Clone)]
pub struct CrossEntropy;
impl Morphism<Product<Distribution, TokenId>, Loss> for CrossEntropy {
fn name(&self) -> &'static str {
"cross_entropy"
}
fn apply(&self, input: Product<Distribution, TokenId>) -> CtResult<Loss> {
let (distribution, target) = input.into_parts();
let Some(probability) = distribution.as_slice().get(target.index()).copied() else {
return Err(CtError::OutOfRange {
kind: "target",
index: target.index(),
limit: distribution.as_slice().len(),
});
};
Loss::new(-probability.max(1e-9).ln())
}
}
/// Direct path used for a commutative-diagram check.
#[derive(Debug, Clone)]
pub struct DirectPredict {
params: Parameters,
}
impl DirectPredict {
pub fn new(params: Parameters) -> Self {
Self { params }
}
}
impl Morphism<TokenId, Distribution> for DirectPredict {
fn name(&self) -> &'static str {
"direct_predict"
}
fn apply(&self, token: TokenId) -> CtResult<Distribution> {
let embedding = Embedding::from_parameters(&self.params).apply(token)?;
let logits = LinearToLogits::from_parameters(&self.params).apply(embedding)?;
Softmax.apply(logits)
}
}
/// Average cross-entropy over a training set.
pub fn average_loss(params: &Parameters, dataset: &TrainingSet) -> CtResult<Loss> {
let embedding = Embedding::from_parameters(params);
let linear = LinearToLogits::from_parameters(params);
let predict = Compose::<_, _, Vector>::new(embedding, linear);
let predict = Compose::<_, _, Logits>::new(predict, Softmax);
let loss_fn = CrossEntropy;
let mut total = 0.0;
for example in dataset.examples() {
let distribution = predict.apply(*example.first())?;
let loss = loss_fn.apply(Product::new(distribution, *example.second()))?;
total += loss.value();
}
Loss::new(total / dataset.len() as f32)
}
/// Verifies that the composed path and direct path produce the same result.
pub fn composed_prediction_matches_direct_prediction(params: &Parameters) -> CtResult<bool> {
let token = TokenId::new(1);
let composed = Compose::<_, _, Vector>::new(
Embedding::from_parameters(params),
LinearToLogits::from_parameters(params),
);
let composed = Compose::<_, _, Logits>::new(composed, Softmax);
let direct = DirectPredict::new(params.clone());
let left_path = composed.apply(token)?;
let right_path = direct.apply(token)?;
Ok(left_path
.as_slice()
.iter()
.zip(right_path.as_slice().iter())
.all(|(a, b)| approx_eq(*a, *b, 1e-6)))
}
#[cfg(test)]
mod tests {
use super::*;
use crate::domain::{ModelDimension, VocabSize};
#[test]
fn dataset_windowing_builds_adjacent_pairs() -> CtResult<()> {
let tokens = TokenSequence::from_indices([1, 2, 3])?;
let dataset = DatasetWindowing.apply(tokens)?;
assert_eq!(dataset.len(), 2);
assert_eq!(dataset.examples()[0].first().index(), 1);
assert_eq!(dataset.examples()[0].second().index(), 2);
Ok(())
}
#[test]
fn composed_and_direct_prediction_match() -> CtResult<()> {
let params = Parameters::init(VocabSize::new(5)?, ModelDimension::new(4)?);
assert!(composed_prediction_matches_direct_prediction(¶ms)?);
Ok(())
}
}
src/training.rs
use crate::category::Morphism;
use crate::domain::{LearningRate, Parameters, TrainingSet, Vector};
use crate::error::{CtError, CtResult};
use crate::ml::{LinearToLogits, Softmax};
/// One full-batch optimizer update.
///
/// Categorically, this is an endomorphism:
///
/// `Parameters -> Parameters`
#[derive(Debug, Clone)]
pub struct TrainStep {
dataset: TrainingSet,
learning_rate: LearningRate,
}
impl TrainStep {
pub fn new(dataset: TrainingSet, learning_rate: LearningRate) -> Self {
Self {
dataset,
learning_rate,
}
}
}
impl Morphism<Parameters, Parameters> for TrainStep {
fn name(&self) -> &'static str {
"train_step_endomorphism"
}
fn apply(&self, params: Parameters) -> CtResult<Parameters> {
let vocab_size = params.vocab_size();
let d_model = params.d_model();
if vocab_size == 0 || d_model == 0 {
return Err(CtError::EmptyInput("parameters"));
}
let mut grad_embedding = vec![vec![0.0; d_model]; params.embedding.len()];
let mut grad_lm_head = vec![vec![0.0; vocab_size]; d_model];
let mut grad_bias = vec![0.0; vocab_size];
for example in self.dataset.examples() {
let input_id = example.first().index();
let target_id = example.second().index();
if input_id >= params.embedding.len() {
return Err(CtError::OutOfRange {
kind: "input token",
index: input_id,
limit: params.embedding.len(),
});
}
if target_id >= vocab_size {
return Err(CtError::OutOfRange {
kind: "target token",
index: target_id,
limit: vocab_size,
});
}
let x = ¶ms.embedding[input_id];
let logits = LinearToLogits::from_parts(params.lm_head.clone(), params.bias.clone())
.apply(Vector::new(x.clone()))?;
let probs = Softmax.apply(logits)?;
let mut dlogits = probs.as_slice().to_vec();
dlogits[target_id] -= 1.0;
for (vocab_id, dlogit) in dlogits.iter().copied().enumerate() {
grad_bias[vocab_id] += dlogit;
for (feature, x_feature) in x.iter().copied().enumerate() {
grad_lm_head[feature][vocab_id] += x_feature * dlogit;
}
}
for (feature, grad_feature) in grad_embedding[input_id].iter_mut().enumerate() {
let dx = params.lm_head[feature]
.iter()
.zip(dlogits.iter())
.map(|(weight, dlogit)| weight * dlogit)
.sum::<f32>();
*grad_feature += dx;
}
}
let batch_scale = 1.0 / self.dataset.len() as f32;
let learning_rate = self.learning_rate.value();
let mut updated = params.clone();
for (row, grad_row) in updated.embedding.iter_mut().zip(grad_embedding.iter()) {
for (value, grad) in row.iter_mut().zip(grad_row.iter()) {
*value -= learning_rate * grad * batch_scale;
}
}
for (row, grad_row) in updated.lm_head.iter_mut().zip(grad_lm_head.iter()) {
for (value, grad) in row.iter_mut().zip(grad_row.iter()) {
*value -= learning_rate * grad * batch_scale;
}
}
for (bias, grad) in updated.bias.iter_mut().zip(grad_bias.iter()) {
*bias -= learning_rate * grad * batch_scale;
}
Ok(updated)
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::category::{StepCount, apply_endomorphism_n_times};
use crate::domain::{ModelDimension, TokenSequence, VocabSize};
use crate::ml::{DatasetWindowing, average_loss};
#[test]
fn repeated_training_step_reduces_loss() -> CtResult<()> {
let tokens = TokenSequence::from_indices([1, 2, 3, 4, 1, 2, 3, 4])?;
let dataset = DatasetWindowing.apply(tokens)?;
let params = Parameters::init(VocabSize::new(5)?, ModelDimension::new(4)?);
let before = average_loss(¶ms, &dataset)?;
let train_step = TrainStep::new(dataset.clone(), LearningRate::new(1.0)?);
let trained = apply_endomorphism_n_times(&train_step, params, StepCount::new(80))?;
let after = average_loss(&trained, &dataset)?;
assert!(after.value() < before.value());
Ok(())
}
}
src/structure.rs
/// A minimal functor interface for this tutorial.
pub trait Functor<A, B> {
type WrappedA;
type WrappedB;
fn fmap<F>(wrapped: Self::WrappedA, f: F) -> Self::WrappedB
where
F: Fn(A) -> B;
}
pub struct VecFunctor;
impl<A, B> Functor<A, B> for VecFunctor {
type WrappedA = Vec<A>;
type WrappedB = Vec<B>;
fn fmap<F>(wrapped: Vec<A>, f: F) -> Vec<B>
where
F: Fn(A) -> B,
{
wrapped.into_iter().map(f).collect()
}
}
pub struct OptionFunctor;
impl<A, B> Functor<A, B> for OptionFunctor {
type WrappedA = Option<A>;
type WrappedB = Option<B>;
fn fmap<F>(wrapped: Option<A>, f: F) -> Option<B>
where
F: Fn(A) -> B,
{
wrapped.map(f)
}
}
/// A structure-preserving conversion between wrappers.
pub trait NaturalTransformation<A> {
type From;
type To;
fn transform(from: Self::From) -> Self::To;
}
/// Natural transformation from `Vec<A>` to `Option<A>` by taking the first item.
pub struct VecToFirstOption;
impl<A> NaturalTransformation<A> for VecToFirstOption {
type From = Vec<A>;
type To = Option<A>;
fn transform(from: Vec<A>) -> Option<A> {
from.into_iter().next()
}
}
pub fn naturality_square_holds_for_first_option() -> bool {
let xs = vec![1, 2, 3];
let f = |x| x * 10;
let path_top_then_right = VecToFirstOption::transform(VecFunctor::fmap(xs.clone(), f));
let path_left_then_bottom = OptionFunctor::fmap(VecToFirstOption::transform(xs), f);
path_top_then_right == path_left_then_bottom
}
pub trait Monoid: Sized {
fn empty() -> Self;
fn combine(&self, other: &Self) -> Self;
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct TraceStep(&'static str);
impl TraceStep {
pub fn new(name: &'static str) -> Self {
Self(name)
}
pub fn name(&self) -> &'static str {
self.0
}
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct PipelineTrace(Vec<TraceStep>);
impl PipelineTrace {
pub fn from_steps(steps: impl IntoIterator<Item = TraceStep>) -> Self {
Self(steps.into_iter().collect())
}
pub fn names(&self) -> Vec<&'static str> {
self.0.iter().map(TraceStep::name).collect()
}
}
impl Monoid for PipelineTrace {
fn empty() -> Self {
PipelineTrace(vec![])
}
fn combine(&self, other: &Self) -> Self {
let mut combined = self.0.clone();
combined.extend_from_slice(&other.0);
PipelineTrace(combined)
}
}
pub fn monoid_laws_hold_for_pipeline_trace() -> bool {
let a = PipelineTrace::from_steps(vec![TraceStep::new("embedding")]);
let b = PipelineTrace::from_steps(vec![TraceStep::new("linear")]);
let c = PipelineTrace::from_steps(vec![TraceStep::new("softmax")]);
let identity = PipelineTrace::empty();
let left_identity = identity.combine(&a) == a;
let right_identity = a.combine(&identity) == a;
let associativity = a.combine(&b).combine(&c) == a.combine(&b.combine(&c));
left_identity && right_identity && associativity
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn naturality_square_commutes() {
assert!(naturality_square_holds_for_first_option());
}
#[test]
fn pipeline_trace_obeys_monoid_laws() {
assert!(monoid_laws_hold_for_pipeline_trace());
}
}
src/calculus.rs
use crate::error::{CtError, CtResult};
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Scalar(f32);
impl Scalar {
pub fn new(value: f32) -> CtResult<Self> {
if !value.is_finite() {
return Err(CtError::InvalidLoss(value));
}
Ok(Self(value))
}
pub fn value(&self) -> f32 {
self.0
}
}
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct LocalGradient(f32);
impl LocalGradient {
pub fn new(value: f32) -> CtResult<Self> {
if !value.is_finite() {
return Err(CtError::InvalidLoss(value));
}
Ok(Self(value))
}
pub fn value(&self) -> f32 {
self.0
}
}
#[derive(Debug, Clone, Copy)]
pub struct MulOp;
impl MulOp {
pub fn forward(&self, x: Scalar, y: Scalar) -> CtResult<Scalar> {
Scalar::new(x.value() * y.value())
}
/// Given upstream gradient dL/dz, return `(dL/dx, dL/dy)` for `z = x * y`.
pub fn backward(
&self,
x: Scalar,
y: Scalar,
upstream: LocalGradient,
) -> CtResult<(LocalGradient, LocalGradient)> {
let dz_dx = y.value();
let dz_dy = x.value();
Ok((
LocalGradient::new(upstream.value() * dz_dx)?,
LocalGradient::new(upstream.value() * dz_dy)?,
))
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn multiply_backward_returns_local_chain_rule_gradients() -> CtResult<()> {
let mul = MulOp;
let x = Scalar::new(2.0)?;
let y = Scalar::new(3.0)?;
let upstream = LocalGradient::new(1.0)?;
let (dl_dx, dl_dy) = mul.backward(x, y, upstream)?;
assert_eq!(dl_dx.value(), 3.0);
assert_eq!(dl_dy.value(), 2.0);
Ok(())
}
}
src/sketches.rs
//! Rust models for the seven applied-category-theory sketches.
//!
//! This module is a study companion for *Seven Sketches in Compositionality*.
//! It does not try to encode all of category theory. Instead, each section
//! gives one small Rust model for the main structure of a sketch: orders,
//! resources, databases, co-design, signal flow, circuits, and logic of
//! behavior.
use crate::error::{CtError, CtResult};
/// A finite order used for the first sketch: observations can be refined into
/// features, scores, and decisions.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub enum InformationLevel {
Observation,
Feature,
Score,
Decision,
}
impl InformationLevel {
/// Returns true when information at this level can flow into `target`.
pub fn can_flow_to(self, target: Self) -> bool {
self <= target
}
/// Least upper bound in this small total order.
pub fn join(self, other: Self) -> Self {
self.max(other)
}
}
const INFORMATION_LEVELS: [InformationLevel; 4] = [
InformationLevel::Observation,
InformationLevel::Feature,
InformationLevel::Score,
InformationLevel::Decision,
];
/// Checks reflexivity and transitivity for the finite information order.
pub fn information_order_obeys_preorder_laws() -> bool {
for level in INFORMATION_LEVELS {
if !level.can_flow_to(level) {
return false;
}
}
for first in INFORMATION_LEVELS {
for second in INFORMATION_LEVELS {
for third in INFORMATION_LEVELS {
let premise = first.can_flow_to(second) && second.can_flow_to(third);
if premise && !first.can_flow_to(third) {
return false;
}
}
}
}
true
}
/// Number of concrete features in a tiny model.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct FeatureCount(usize);
impl FeatureCount {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("feature count"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Number of abstract layers used to summarize feature capacity.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct LayerBudget(usize);
impl LayerBudget {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("layer budget"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
const FEATURES_PER_LAYER: usize = 4;
/// Abstracts concrete features to the minimum layer budget that can hold them.
pub fn abstract_to_layer_budget(features: FeatureCount) -> CtResult<LayerBudget> {
LayerBudget::new(features.value().div_ceil(FEATURES_PER_LAYER))
}
/// Concretizes an abstract layer budget back to feature capacity.
pub fn concretize_layer_budget(layers: LayerBudget) -> FeatureCount {
FeatureCount(layers.value() * FEATURES_PER_LAYER)
}
/// Checks the Galois-connection law for the feature/layer abstraction.
pub fn feature_layer_galois_law_holds(
features: FeatureCount,
layers: LayerBudget,
) -> CtResult<bool> {
let abstracted_fits = abstract_to_layer_budget(features)?.value() <= layers.value();
let concrete_fits = features.value() <= concretize_layer_budget(layers).value();
Ok(abstracted_fits == concrete_fits)
}
/// A non-negative amount in a resource bundle.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct ResourceAmount(usize);
impl ResourceAmount {
pub fn new(value: usize) -> Self {
Self(value)
}
pub fn value(&self) -> usize {
self.0
}
}
/// Compute and memory resources composed by component-wise addition.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct ResourceBundle {
compute: ResourceAmount,
memory: ResourceAmount,
}
impl ResourceBundle {
pub fn new(compute: ResourceAmount, memory: ResourceAmount) -> Self {
Self { compute, memory }
}
pub fn compute(&self) -> ResourceAmount {
self.compute
}
pub fn memory(&self) -> ResourceAmount {
self.memory
}
/// Monoidal product for independent resources.
pub fn tensor(&self, other: &Self) -> Self {
Self {
compute: ResourceAmount::new(self.compute.value() + other.compute.value()),
memory: ResourceAmount::new(self.memory.value() + other.memory.value()),
}
}
/// Resource preorder: supply can cover demand when every component is large
/// enough.
pub fn can_supply(&self, demand: &Self) -> bool {
self.compute >= demand.compute && self.memory >= demand.memory
}
}
/// Demonstrates monotonicity of the monoidal resource operation.
pub fn resource_tensor_is_monotone() -> bool {
let small_supply = ResourceBundle::new(ResourceAmount::new(2), ResourceAmount::new(8));
let large_supply = ResourceBundle::new(ResourceAmount::new(4), ResourceAmount::new(16));
let fixed = ResourceBundle::new(ResourceAmount::new(1), ResourceAmount::new(2));
large_supply.can_supply(&small_supply)
&& large_supply
.tensor(&fixed)
.can_supply(&small_supply.tensor(&fixed))
}
/// Identifier for a department row.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct DepartmentId(usize);
impl DepartmentId {
pub fn new(value: usize) -> Self {
Self(value)
}
pub fn value(&self) -> usize {
self.0
}
}
/// Identifier for an employee row.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct EmployeeId(usize);
impl EmployeeId {
pub fn new(value: usize) -> Self {
Self(value)
}
pub fn value(&self) -> usize {
self.0
}
}
/// A row in the employee table. The department field is the schema arrow
/// `Employee -> Department`.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct EmployeeRecord {
id: EmployeeId,
department: DepartmentId,
}
impl EmployeeRecord {
pub fn new(id: EmployeeId, department: DepartmentId) -> Self {
Self { id, department }
}
pub fn id(&self) -> EmployeeId {
self.id
}
pub fn department(&self) -> DepartmentId {
self.department
}
}
/// A tiny database instance: schema objects become sets of ids, and schema
/// arrows become functions between those sets.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct CompanyInstance {
departments: Vec<DepartmentId>,
employees: Vec<EmployeeRecord>,
}
impl CompanyInstance {
pub fn new(
departments: impl IntoIterator<Item = DepartmentId>,
employees: impl IntoIterator<Item = EmployeeRecord>,
) -> CtResult<Self> {
let departments = departments.into_iter().collect::<Vec<_>>();
let employees = employees.into_iter().collect::<Vec<_>>();
for employee in &employees {
if !departments.contains(&employee.department()) {
return Err(CtError::ShapeMismatch {
op: "database instance",
expected: String::from("employee departments exist in Department"),
got: format!("missing department {}", employee.department().value()),
});
}
}
Ok(Self {
departments,
employees,
})
}
pub fn departments(&self) -> &[DepartmentId] {
&self.departments
}
pub fn employees(&self) -> &[EmployeeRecord] {
&self.employees
}
pub fn department_of(&self, employee_id: EmployeeId) -> Option<DepartmentId> {
self.employees
.iter()
.find(|employee| employee.id() == employee_id)
.map(EmployeeRecord::department)
}
pub fn foreign_keys_resolve(&self) -> bool {
self.employees
.iter()
.all(|employee| self.departments.contains(&employee.department()))
}
}
/// Requests per second needed by a design.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct Throughput(usize);
impl Throughput {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("throughput"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Millisecond latency boundary.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct LatencyMs(usize);
impl LatencyMs {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("latency"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Functional need in a co-design problem.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct DesignRequirement {
minimum_throughput: Throughput,
maximum_latency: LatencyMs,
}
impl DesignRequirement {
pub fn new(minimum_throughput: Throughput, maximum_latency: LatencyMs) -> Self {
Self {
minimum_throughput,
maximum_latency,
}
}
}
/// Implementation offered by a candidate component.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct ImplementationOffer {
throughput: Throughput,
latency: LatencyMs,
}
impl ImplementationOffer {
pub fn new(throughput: Throughput, latency: LatencyMs) -> Self {
Self {
throughput,
latency,
}
}
}
/// A Bool-valued feasibility relation between requirements and offers.
#[derive(Debug, Clone, Copy)]
pub struct FeasibilityRelation;
impl FeasibilityRelation {
pub fn relates(requirement: DesignRequirement, offer: ImplementationOffer) -> bool {
offer.throughput >= requirement.minimum_throughput
&& offer.latency <= requirement.maximum_latency
}
}
/// A scalar coefficient in a signal-flow matrix.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct SignalCoefficient(i32);
impl SignalCoefficient {
pub fn new(value: i32) -> Self {
Self(value)
}
pub fn zero() -> Self {
Self(0)
}
pub fn value(&self) -> i32 {
self.0
}
fn add(self, other: Self) -> Self {
Self(self.value() + other.value())
}
fn multiply(self, other: Self) -> Self {
Self(self.value() * other.value())
}
}
/// Number of rows in a signal-flow matrix.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct MatrixRows(usize);
impl MatrixRows {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("matrix rows"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Number of columns in a signal-flow matrix.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct MatrixCols(usize);
impl MatrixCols {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("matrix columns"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Matrix semantics for a signal-flow graph.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct SignalMatrix {
rows: MatrixRows,
cols: MatrixCols,
coefficients: Vec<Vec<SignalCoefficient>>,
}
impl SignalMatrix {
pub fn new(
rows: MatrixRows,
cols: MatrixCols,
coefficients: Vec<Vec<SignalCoefficient>>,
) -> CtResult<Self> {
if coefficients.len() != rows.value() {
return Err(CtError::ShapeMismatch {
op: "signal matrix",
expected: format!("{} rows", rows.value()),
got: format!("{} rows", coefficients.len()),
});
}
for row in &coefficients {
if row.len() != cols.value() {
return Err(CtError::ShapeMismatch {
op: "signal matrix",
expected: format!("{} columns", cols.value()),
got: format!("{} columns", row.len()),
});
}
}
Ok(Self {
rows,
cols,
coefficients,
})
}
pub fn rows(&self) -> MatrixRows {
self.rows
}
pub fn cols(&self) -> MatrixCols {
self.cols
}
pub fn coefficients(&self) -> &[Vec<SignalCoefficient>] {
&self.coefficients
}
/// Matrix composition. If `previous` is `A -> B` and `self` is `B -> C`,
/// the result is `A -> C`.
pub fn compose_after(&self, previous: &Self) -> CtResult<Self> {
if previous.rows.value() != self.cols.value() {
return Err(CtError::ShapeMismatch {
op: "signal matrix composition",
expected: format!("middle dimension {}", self.cols.value()),
got: format!("middle dimension {}", previous.rows.value()),
});
}
let mut coefficients =
vec![vec![SignalCoefficient::zero(); previous.cols.value()]; self.rows.value()];
for (output_row, output_coefficients) in coefficients.iter_mut().enumerate() {
for (input_col, coefficient) in output_coefficients.iter_mut().enumerate() {
let mut total = SignalCoefficient::zero();
for middle in 0..self.cols.value() {
total = total.add(
self.coefficients[output_row][middle]
.multiply(previous.coefficients[middle][input_col]),
);
}
*coefficient = total;
}
}
Self::new(self.rows, previous.cols, coefficients)
}
}
/// A named boundary port in an open circuit.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct PortName(&'static str);
impl PortName {
pub fn new(value: &'static str) -> CtResult<Self> {
if value.trim().is_empty() {
return Err(CtError::EmptyInput("port name"));
}
Ok(Self(value))
}
pub fn as_str(&self) -> &'static str {
self.0
}
}
/// Positive resistance value.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct ResistanceOhms(usize);
impl ResistanceOhms {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::InvalidQuantity {
kind: "resistance",
value: value as i64,
});
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// A simple resistor component between two ports.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct CircuitComponent {
from: PortName,
to: PortName,
resistance: ResistanceOhms,
}
impl CircuitComponent {
pub fn resistor(from: PortName, to: PortName, resistance: ResistanceOhms) -> Self {
Self {
from,
to,
resistance,
}
}
pub fn from(&self) -> PortName {
self.from
}
pub fn to(&self) -> PortName {
self.to
}
pub fn resistance(&self) -> ResistanceOhms {
self.resistance
}
}
/// An open circuit with input ports, output ports, and internal components.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct OpenCircuit {
inputs: Vec<PortName>,
outputs: Vec<PortName>,
components: Vec<CircuitComponent>,
}
impl OpenCircuit {
pub fn new(
inputs: impl IntoIterator<Item = PortName>,
outputs: impl IntoIterator<Item = PortName>,
components: impl IntoIterator<Item = CircuitComponent>,
) -> CtResult<Self> {
let inputs = inputs.into_iter().collect::<Vec<_>>();
let outputs = outputs.into_iter().collect::<Vec<_>>();
let components = components.into_iter().collect::<Vec<_>>();
if inputs.is_empty() {
return Err(CtError::EmptyInput("circuit inputs"));
}
if outputs.is_empty() {
return Err(CtError::EmptyInput("circuit outputs"));
}
Ok(Self {
inputs,
outputs,
components,
})
}
pub fn input_count(&self) -> usize {
self.inputs.len()
}
pub fn output_count(&self) -> usize {
self.outputs.len()
}
pub fn component_count(&self) -> usize {
self.components.len()
}
/// Serial composition wires this circuit's outputs into the next circuit's
/// inputs.
pub fn then(&self, next: &Self) -> CtResult<Self> {
if self.output_count() != next.input_count() {
return Err(CtError::ShapeMismatch {
op: "open circuit serial composition",
expected: format!("{} next inputs", self.output_count()),
got: format!("{} next inputs", next.input_count()),
});
}
let mut components = self.components.clone();
components.extend_from_slice(&next.components);
Self::new(self.inputs.clone(), next.outputs.clone(), components)
}
/// Parallel composition keeps the two open interfaces side by side.
pub fn parallel(&self, other: &Self) -> CtResult<Self> {
let mut inputs = self.inputs.clone();
inputs.extend_from_slice(&other.inputs);
let mut outputs = self.outputs.clone();
outputs.extend_from_slice(&other.outputs);
let mut components = self.components.clone();
components.extend_from_slice(&other.components);
Self::new(inputs, outputs, components)
}
}
/// Truth values used for behavior classification.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum TruthValue {
False,
True,
}
impl TruthValue {
pub fn and(self, other: Self) -> Self {
match (self, other) {
(TruthValue::True, TruthValue::True) => TruthValue::True,
_ => TruthValue::False,
}
}
pub fn implies(self, other: Self) -> Self {
match (self, other) {
(TruthValue::True, TruthValue::False) => TruthValue::False,
_ => TruthValue::True,
}
}
}
/// Discrete time point in a behavior trace.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct TimeTick(usize);
impl TimeTick {
pub fn new(value: usize) -> Self {
Self(value)
}
pub fn value(&self) -> usize {
self.0
}
}
/// Closed interval of time ticks.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct TimeInterval {
start: TimeTick,
end: TimeTick,
}
impl TimeInterval {
pub fn new(start: TimeTick, end: TimeTick) -> CtResult<Self> {
if start > end {
return Err(CtError::InvalidInterval {
start: start.value(),
end: end.value(),
});
}
Ok(Self { start, end })
}
pub fn start(&self) -> TimeTick {
self.start
}
pub fn end(&self) -> TimeTick {
self.end
}
}
/// Local safety result on one interval.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct LocalSafetyCheck {
interval: TimeInterval,
truth: TruthValue,
}
impl LocalSafetyCheck {
pub fn new(interval: TimeInterval, truth: TruthValue) -> Self {
Self { interval, truth }
}
pub fn interval(&self) -> TimeInterval {
self.interval
}
pub fn truth(&self) -> TruthValue {
self.truth
}
}
/// A cover of local behavior checks. The global result is true only when all
/// local checks are true.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct SafetyCover(Vec<LocalSafetyCheck>);
impl SafetyCover {
pub fn new(checks: impl IntoIterator<Item = LocalSafetyCheck>) -> CtResult<Self> {
let checks = checks.into_iter().collect::<Vec<_>>();
if checks.is_empty() {
return Err(CtError::EmptyInput("safety cover"));
}
Ok(Self(checks))
}
pub fn global_truth(&self) -> TruthValue {
self.0
.iter()
.fold(TruthValue::True, |truth, check| truth.and(check.truth()))
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn information_order_has_preorder_laws() {
assert!(information_order_obeys_preorder_laws());
assert_eq!(
InformationLevel::Feature.join(InformationLevel::Decision),
InformationLevel::Decision
);
}
#[test]
fn feature_layer_abstraction_obeys_galois_law() -> CtResult<()> {
let features = FeatureCount::new(9)?;
let layers = LayerBudget::new(3)?;
assert!(feature_layer_galois_law_holds(features, layers)?);
assert_eq!(abstract_to_layer_budget(features)?.value(), 3);
assert_eq!(concretize_layer_budget(layers).value(), 12);
Ok(())
}
#[test]
fn resource_tensor_is_order_preserving() {
assert!(resource_tensor_is_monotone());
}
#[test]
fn database_instance_resolves_schema_arrow() -> CtResult<()> {
let research = DepartmentId::new(1);
let platform = DepartmentId::new(2);
let ada = EmployeeId::new(7);
let instance =
CompanyInstance::new([research, platform], [EmployeeRecord::new(ada, research)])?;
assert!(instance.foreign_keys_resolve());
assert_eq!(instance.department_of(ada), Some(research));
Ok(())
}
#[test]
fn feasibility_relation_matches_requirement_to_offer() -> CtResult<()> {
let requirement = DesignRequirement::new(Throughput::new(100)?, LatencyMs::new(80)?);
let offer = ImplementationOffer::new(Throughput::new(120)?, LatencyMs::new(50)?);
assert!(FeasibilityRelation::relates(requirement, offer));
Ok(())
}
#[test]
fn signal_matrices_compose_like_flow_graph_semantics() -> CtResult<()> {
let duplicate = SignalMatrix::new(
MatrixRows::new(2)?,
MatrixCols::new(1)?,
vec![
vec![SignalCoefficient::new(1)],
vec![SignalCoefficient::new(1)],
],
)?;
let add_weighted = SignalMatrix::new(
MatrixRows::new(1)?,
MatrixCols::new(2)?,
vec![vec![SignalCoefficient::new(2), SignalCoefficient::new(3)]],
)?;
let composed = add_weighted.compose_after(&duplicate)?;
assert_eq!(composed.coefficients(), &[vec![SignalCoefficient::new(5)]]);
Ok(())
}
#[test]
fn open_circuits_compose_in_series_and_parallel() -> CtResult<()> {
let input = PortName::new("input")?;
let middle = PortName::new("middle")?;
let output = PortName::new("output")?;
let first = OpenCircuit::new(
[input],
[middle],
[CircuitComponent::resistor(
input,
middle,
ResistanceOhms::new(10)?,
)],
)?;
let second = OpenCircuit::new(
[middle],
[output],
[CircuitComponent::resistor(
middle,
output,
ResistanceOhms::new(20)?,
)],
)?;
let serial = first.then(&second)?;
let parallel = first.parallel(&second)?;
assert_eq!(serial.input_count(), 1);
assert_eq!(serial.output_count(), 1);
assert_eq!(serial.component_count(), 2);
assert_eq!(parallel.input_count(), 2);
assert_eq!(parallel.output_count(), 2);
Ok(())
}
#[test]
fn local_behavior_truth_glues_to_global_truth() -> CtResult<()> {
let first = LocalSafetyCheck::new(
TimeInterval::new(TimeTick::new(0), TimeTick::new(5))?,
TruthValue::True,
);
let second = LocalSafetyCheck::new(
TimeInterval::new(TimeTick::new(5), TimeTick::new(10))?,
TruthValue::True,
);
let cover = SafetyCover::new([first, second])?;
assert_eq!(cover.global_truth(), TruthValue::True);
assert_eq!(
TruthValue::True.implies(TruthValue::False),
TruthValue::False
);
Ok(())
}
}
src/demo.rs
use crate::calculus::{LocalGradient, MulOp, Scalar};
use crate::category::{Compose, StepCount, apply_endomorphism_n_times};
use crate::domain::{
LearningRate, Logits, ModelDimension, Parameters, Product, TokenId, TokenSequence, Vector,
VocabSize,
};
use crate::error::CtResult;
use crate::ml::{
CrossEntropy, DatasetWindowing, Embedding, LinearToLogits, Softmax, average_loss,
composed_prediction_matches_direct_prediction,
};
use crate::structure::{
Functor, Monoid, OptionFunctor, PipelineTrace, TraceStep, VecFunctor,
monoid_laws_hold_for_pipeline_trace, naturality_square_holds_for_first_option,
};
use crate::training::TrainStep;
use crate::{Identity, Morphism};
/// Run the full terminal walkthrough used by `cargo run --bin category_ml`.
pub fn run_demo() -> CtResult<()> {
println!("Category theory concepts implemented in Rust 2024");
println!("=================================================\n");
let vocab = ["<pad>", "I", "love", "Rust", "."];
let raw_text = TokenSequence::from_indices([1, 2, 3, 4, 1, 2, 3, 4])?;
println!("1. Object examples");
println!(" TokenId(1) means {:?}\n", vocab[1]);
println!("2. Dataset morphism: TokenSequence -> TrainingSet");
let dataset = DatasetWindowing.apply(raw_text)?;
for example in dataset.examples() {
println!(
" {:?} -> {:?}",
vocab[example.first().index()],
vocab[example.second().index()]
);
}
println!();
println!("3. Identity morphism: id_Vector : Vector -> Vector");
let v = Vector::new(vec![1.0, 2.0, 3.0]);
let same_v = Identity::<Vector>::new().apply(v.clone())?;
println!(" input = {:?}", v);
println!(" output = {:?}\n", same_v);
println!("4. Composition: Softmax after Linear after Embedding");
let params = Parameters::init(VocabSize::new(vocab.len())?, ModelDimension::new(4)?);
let embedding = Embedding::from_parameters(¶ms);
let linear = LinearToLogits::from_parameters(¶ms);
let token_to_logits = Compose::<_, _, Vector>::new(embedding, linear);
let token_to_distribution = Compose::<_, _, Logits>::new(token_to_logits, Softmax);
let distribution = token_to_distribution.apply(TokenId::new(1))?;
println!(" P(next token | 'I') = {:?}\n", distribution.as_slice());
println!("5. Product object: Prediction x Target -> Loss");
let loss = CrossEntropy.apply(Product::new(distribution, TokenId::new(2)))?;
println!(" loss for target 'love' = {:.6}\n", loss.value());
println!("6. Endomorphism: TrainStep : Parameters -> Parameters");
let before = average_loss(¶ms, &dataset)?;
let train_step = TrainStep::new(dataset.clone(), LearningRate::new(1.0)?);
let trained_params =
apply_endomorphism_n_times(&train_step, params.clone(), StepCount::new(80))?;
let after = average_loss(&trained_params, &dataset)?;
println!(" average loss before training = {:.6}", before.value());
println!(" average loss after training = {:.6}\n", after.value());
println!("7. Functor: fmap over Vec and Option");
let xs = vec![1, 2, 3];
let ys = VecFunctor::fmap(xs, |x| x * x);
let maybe = OptionFunctor::fmap(Some(7), |x| x + 1);
println!(" VecFunctor fmap square: {:?}", ys);
println!(" OptionFunctor fmap +1: {:?}\n", maybe);
println!("8. Natural transformation: Vec<A> -> Option<A>");
println!(
" naturality square holds: {}\n",
naturality_square_holds_for_first_option()
);
println!("9. Monoid: pipeline traces compose associatively with identity");
let trace = PipelineTrace::from_steps(vec![TraceStep::new("embedding")])
.combine(&PipelineTrace::from_steps(vec![TraceStep::new("linear")]))
.combine(&PipelineTrace::from_steps(vec![TraceStep::new("softmax")]));
println!(" trace = {:?}", trace.names());
println!(
" monoid laws hold for this trace type: {}\n",
monoid_laws_hold_for_pipeline_trace()
);
println!("10. Commutative diagram check");
println!(
" composed prediction == direct prediction: {}\n",
composed_prediction_matches_direct_prediction(¶ms)?
);
println!("11. Chain rule / local derivative morphism");
let mul = MulOp;
let x = Scalar::new(2.0)?;
let y = Scalar::new(3.0)?;
let z = mul.forward(x, y)?;
let upstream = LocalGradient::new(1.0)?;
let (dl_dx, dl_dy) = mul.backward(x, y, upstream)?;
println!(" z = x * y = {}", z.value());
println!(
" if dL/dz = {}, then dL/dx = {}, dL/dy = {}\n",
upstream.value(),
dl_dx.value(),
dl_dy.value()
);
println!("Compressed categorical training view:");
println!(" Dataset x Parameters -> Prediction -> Loss -> Gradients -> Updated Parameters");
println!(" TrainStep is repeated as Parameters0 -> Parameters1 -> ... -> ParametersN");
Ok(())
}
src/bin/category_ml.rs
fn main() -> category_theory_transformer_rs::CtResult<()> {
category_theory_transformer_rs::run_demo()
}
Runnable Examples
examples/01_domain_objects.rs
use category_theory_transformer_rs::{CtResult, DatasetWindowing, Morphism, TokenSequence};
fn main() -> CtResult<()> {
let tokens = TokenSequence::from_indices([1, 2, 3, 4])?;
let dataset = DatasetWindowing.apply(tokens)?;
println!("training pairs:");
for example in dataset.examples() {
println!(
"{} -> {}",
example.first().index(),
example.second().index()
);
}
Ok(())
}
examples/02_morphism_composition.rs
use category_theory_transformer_rs::{
Compose, CtResult, Embedding, LinearToLogits, Logits, ModelDimension, Morphism, Parameters,
Softmax, TokenId, Vector, VocabSize,
};
fn main() -> CtResult<()> {
let params = Parameters::init(VocabSize::new(5)?, ModelDimension::new(4)?);
let token_to_logits = Compose::<_, _, Vector>::new(
Embedding::from_parameters(¶ms),
LinearToLogits::from_parameters(¶ms),
);
let token_to_distribution = Compose::<_, _, Logits>::new(token_to_logits, Softmax);
let distribution = token_to_distribution.apply(TokenId::new(1))?;
println!("next-token probabilities: {:?}", distribution.as_slice());
Ok(())
}
examples/03_training_endomorphism.rs
use category_theory_transformer_rs::{
CtResult, DatasetWindowing, LearningRate, ModelDimension, Morphism, Parameters, StepCount,
TokenSequence, TrainStep, VocabSize, apply_endomorphism_n_times, average_loss,
};
fn main() -> CtResult<()> {
let tokens = TokenSequence::from_indices([1, 2, 3, 4, 1, 2, 3, 4])?;
let dataset = DatasetWindowing.apply(tokens)?;
let params = Parameters::init(VocabSize::new(5)?, ModelDimension::new(4)?);
let before = average_loss(¶ms, &dataset)?;
let train_step = TrainStep::new(dataset.clone(), LearningRate::new(1.0)?);
let trained = apply_endomorphism_n_times(&train_step, params, StepCount::new(80))?;
let after = average_loss(&trained, &dataset)?;
println!("loss before: {:.6}", before.value());
println!("loss after: {:.6}", after.value());
Ok(())
}
examples/04_structure_and_calculus.rs
use category_theory_transformer_rs::{
CtResult, Functor, LocalGradient, Monoid, MulOp, OptionFunctor, PipelineTrace, Scalar,
TraceStep, VecFunctor, monoid_laws_hold_for_pipeline_trace,
naturality_square_holds_for_first_option,
};
fn main() -> CtResult<()> {
let squared = VecFunctor::fmap(vec![1, 2, 3], |x| x * x);
let shifted = OptionFunctor::fmap(Some(7), |x| x + 1);
println!("Vec fmap square: {squared:?}");
println!("Option fmap +1: {shifted:?}");
println!(
"naturality square holds: {}",
naturality_square_holds_for_first_option()
);
let trace = PipelineTrace::from_steps(vec![TraceStep::new("embedding")])
.combine(&PipelineTrace::from_steps(vec![TraceStep::new("linear")]))
.combine(&PipelineTrace::from_steps(vec![TraceStep::new("softmax")]));
println!("trace: {:?}", trace.names());
println!(
"monoid laws hold: {}",
monoid_laws_hold_for_pipeline_trace()
);
let mul = MulOp;
let x = Scalar::new(2.0)?;
let y = Scalar::new(3.0)?;
let upstream = LocalGradient::new(1.0)?;
let (dl_dx, dl_dy) = mul.backward(x, y, upstream)?;
println!("dL/dx: {}", dl_dx.value());
println!("dL/dy: {}", dl_dy.value());
Ok(())
}
examples/05_seven_sketches.rs
use category_theory_transformer_rs::{
CircuitComponent, CompanyInstance, CtResult, DepartmentId, DesignRequirement, EmployeeId,
EmployeeRecord, FeasibilityRelation, FeatureCount, ImplementationOffer, InformationLevel,
LatencyMs, LayerBudget, LocalSafetyCheck, MatrixCols, MatrixRows, OpenCircuit, PortName,
ResistanceOhms, ResourceAmount, ResourceBundle, SafetyCover, SignalCoefficient, SignalMatrix,
Throughput, TimeInterval, TimeTick, TruthValue, abstract_to_layer_budget,
concretize_layer_budget, feature_layer_galois_law_holds, information_order_obeys_preorder_laws,
resource_tensor_is_monotone,
};
fn main() -> CtResult<()> {
println!(
"orders obey preorder laws: {}",
information_order_obeys_preorder_laws()
);
println!(
"join(feature, decision): {:?}",
InformationLevel::Feature.join(InformationLevel::Decision)
);
let features = FeatureCount::new(9)?;
let layers = LayerBudget::new(3)?;
println!(
"feature/layer Galois law: {}",
feature_layer_galois_law_holds(features, layers)?
);
println!(
"abstract 9 features to {} layers; concretize 3 layers to {} features",
abstract_to_layer_budget(features)?.value(),
concretize_layer_budget(layers).value()
);
let encoder = ResourceBundle::new(ResourceAmount::new(2), ResourceAmount::new(8));
let decoder = ResourceBundle::new(ResourceAmount::new(3), ResourceAmount::new(10));
println!("combined resource bundle: {:?}", encoder.tensor(&decoder));
println!(
"resource tensor monotone: {}",
resource_tensor_is_monotone()
);
let research = DepartmentId::new(1);
let platform = DepartmentId::new(2);
let ada = EmployeeId::new(7);
let instance =
CompanyInstance::new([research, platform], [EmployeeRecord::new(ada, research)])?;
println!(
"employee {:?} belongs to department {:?}",
ada,
instance.department_of(ada)
);
let requirement = DesignRequirement::new(Throughput::new(100)?, LatencyMs::new(80)?);
let offer = ImplementationOffer::new(Throughput::new(120)?, LatencyMs::new(50)?);
println!(
"co-design offer feasible: {}",
FeasibilityRelation::relates(requirement, offer)
);
let duplicate = SignalMatrix::new(
MatrixRows::new(2)?,
MatrixCols::new(1)?,
vec![
vec![SignalCoefficient::new(1)],
vec![SignalCoefficient::new(1)],
],
)?;
let add_weighted = SignalMatrix::new(
MatrixRows::new(1)?,
MatrixCols::new(2)?,
vec![vec![SignalCoefficient::new(2), SignalCoefficient::new(3)]],
)?;
let composed = add_weighted.compose_after(&duplicate)?;
println!(
"signal-flow matrix semantics: {:?}",
composed.coefficients()
);
let input = PortName::new("input")?;
let middle = PortName::new("middle")?;
let output = PortName::new("output")?;
let first_circuit = OpenCircuit::new(
[input],
[middle],
[CircuitComponent::resistor(
input,
middle,
ResistanceOhms::new(10)?,
)],
)?;
let second_circuit = OpenCircuit::new(
[middle],
[output],
[CircuitComponent::resistor(
middle,
output,
ResistanceOhms::new(20)?,
)],
)?;
println!(
"serial circuit component count: {}",
first_circuit.then(&second_circuit)?.component_count()
);
let safety = SafetyCover::new([
LocalSafetyCheck::new(
TimeInterval::new(TimeTick::new(0), TimeTick::new(5))?,
TruthValue::True,
),
LocalSafetyCheck::new(
TimeInterval::new(TimeTick::new(5), TimeTick::new(10))?,
TruthValue::True,
),
])?;
println!("global behavior truth: {:?}", safety.global_truth());
Ok(())
}
Project Configuration
Cargo.toml
[package]
name = "category-theory-transformer-rs"
version = "0.1.0"
edition = "2024"
[dependencies]
Companion Lesson Notes
These are the shorter markdown notes kept under lessons/.
lessons/README.md
# Category Theory for Tiny ML: Compact Lesson Path
This folder is the compact reading path through the codebase.
For the complete self-contained course, use the chapters in `book/src/`. They
include source snapshots, runnable examples, exercises, a glossary, references,
and the full source appendix.
These compact lessons are kept for quick review. Each lesson is intentionally
short and points to a real Rust file that `cargo` checks.
## The Learning Loop
For each lesson:
1. Read the mental model.
2. Open the named Rust module.
3. Run the named example.
4. Answer the checkpoint before moving on.
## Lessons
1. [Map of the Course](00-map.md)
2. [Domain Objects](01-domain-objects.md)
3. [Morphism and Composition](02-morphisms-composition.md)
4. [The Tiny ML Pipeline](03-ml-pipeline.md)
5. [Training as an Endomorphism](04-training-endomorphism.md)
6. [Functors, Naturality, Monoids, and Chain Rule](05-structure-and-calculus.md)
## Validation
Run the full check:
```bash
bash scripts/check.sh
```
Run one lesson example:
```bash
cargo run --example 03_training_endomorphism
```
lessons/00-map.md
# 00 - Map of the Course
## Goal
Learn one idea:
> Category theory is a language for typed transformations.
In this repo, the transformations are tiny ML operations:
- token to vector
- vector to logits
- logits to probabilities
- prediction plus target to loss
- parameters to better parameters
## The Code Map
- `src/domain.rs`: nouns, also called objects
- `src/category.rs`: arrows, identity, composition, endomorphisms
- `src/ml.rs`: ML arrows
- `src/training.rs`: one training step
- `src/structure.rs`: functor, natural transformation, monoid
- `src/calculus.rs`: local derivative example
- `src/demo.rs`: one guided terminal walkthrough
## First Run
```bash
cargo run --bin category_ml
```
You should see a tiny language-model pipeline and the loss decreasing after
training.
## Checkpoint
Before moving on, say this in your own words:
> A morphism is a typed function. Composition lets small typed functions become
> one larger typed pipeline.
lessons/01-domain-objects.md
# 01 - Domain Objects
## Mental Model
Objects are the nouns of the system.
In normal Rust code, it is tempting to pass `usize`, `Vec<f32>`, and `f32`
everywhere. That is easy at first, then confusing later. This repo wraps those
values in small domain types.
## Read This File
Open `src/domain.rs`.
Focus only on these types first:
- `TokenId`
- `TokenSequence`
- `Vector`
- `Logits`
- `Distribution`
- `Loss`
- `TrainingSet`
- `Parameters`
## Run the Example
```bash
cargo run --example 01_domain_objects
```
Expected shape:
```text
training pairs:
1 -> 2
2 -> 3
3 -> 4
```
## Why This Matters
`TokenId` and `Loss` are not interchangeable, even if both could be represented
by numbers. The Rust type system keeps those ideas separate.
## Checkpoint
What bug becomes harder when `TokenId` is not just a raw `usize`?
lessons/02-morphisms-composition.md
# 02 - Morphism and Composition
## Mental Model
A morphism is a typed arrow:
```text
Input -> Output
```
In Rust, this repo models that with `Morphism<Input, Output>`.
## Read This File
Open `src/category.rs`.
Read in this order:
1. `Morphism`
2. `Identity`
3. `Compose`
4. `Endomorphism`
## Run the Example
```bash
cargo run --example 02_morphism_composition
```
## What to Notice
The pipeline is built from small arrows:
```text
TokenId -> Vector -> Logits -> Distribution
```
`Compose` is the glue. If the middle types do not match, Rust rejects the
program before it runs.
## Checkpoint
Why is `TokenId -> Vector -> Logits` easier to debug than one giant
`predict(...)` function?
lessons/03-ml-pipeline.md
# 03 - The Tiny ML Pipeline
## Mental Model
The ML pipeline is a sequence of typed transformations:
```text
TokenSequence -> TrainingSet
TokenId -> Vector
Vector -> Logits
Logits -> Distribution
Distribution x TokenId -> Loss
```
## Read This File
Open `src/ml.rs`.
Read only these structs first:
- `DatasetWindowing`
- `Embedding`
- `LinearToLogits`
- `Softmax`
- `CrossEntropy`
## Run the Full Demo
```bash
cargo run --bin category_ml
```
Look at sections 2 through 5 in the output.
## The Key Detail
`Softmax` validates that its output is a real probability distribution.
`CrossEntropy` validates that the target token is inside the distribution.
Errors happen at the boundary where the bad data is first understood.
## Checkpoint
Where should an out-of-range target token be caught: inside `CrossEntropy`, or
later after loss calculation?
lessons/04-training-endomorphism.md
# 04 - Training as an Endomorphism
## Mental Model
An endomorphism maps a thing back to the same kind of thing:
```text
A -> A
```
Training does exactly that:
```text
Parameters -> Parameters
```
The model changes, but it is still a model.
## Read This File
Open `src/training.rs`.
Read in this order:
1. `TrainStep`
2. `impl Morphism<Parameters, Parameters> for TrainStep`
3. The unit test at the bottom
## Run the Example
```bash
cargo run --example 03_training_endomorphism
```
Expected pattern:
```text
loss before: ...
loss after: ...
```
The second number should be smaller.
## Why This Is Category-Theoretic
`TrainStep` can be repeated because its input type and output type are both
`Parameters`.
That makes this loop type-correct:
```text
Parameters0 -> Parameters1 -> Parameters2 -> ... -> ParametersN
```
## Checkpoint
Why would training be harder to compose if it returned raw vectors instead of
`Parameters`?
lessons/05-structure-and-calculus.md
# 05 - Functors, Naturality, Monoids, and Chain Rule
## Mental Model
This lesson gives names to patterns you already use:
- `Functor`: map inside a wrapper without changing the wrapper shape
- `NaturalTransformation`: convert one wrapper shape to another in a consistent way
- `Monoid`: an empty value plus an associative combine operation
- Chain rule: local gradients compose into larger gradients
## Read These Files
Open:
- `src/structure.rs`
- `src/calculus.rs`
## Run the Example
```bash
cargo run --example 04_structure_and_calculus
```
## What to Notice
The examples are tiny on purpose.
`VecFunctor` and `OptionFunctor` are not trying to replace real Rust APIs. They
show the shape of the idea:
```text
keep the container, transform the inside
```
`PipelineTrace` is a monoid because:
- there is an empty trace
- traces can be combined
- grouping does not change the final trace
`MulOp` shows the smallest useful backward pass:
```text
z = x * y
dL/dx = dL/dz * y
dL/dy = dL/dz * x
```
## Checkpoint
Why is "local rule plus composition" the core idea behind backpropagation?