Domain Objects
The problem this chapter solves is:
A machine-learning pipeline should not pass raw numbers around and hope everyone remembers what each number means.
Before this code talks about arrows, composition, loss, or training, it defines the objects those arrows will connect.
In this course, a domain object means:
raw representation
+ a meaningful name
+ optional validation
+ controlled access
For example:
usize
could mean:
- a token index
- a vocabulary size
- a model dimension
- a matrix row count
- a training step count
Those are different concepts.
So the code creates different types.
Reader orientation: In this chapter, focus on why a type exists before focusing on its syntax. A tuple struct, private field, constructor, or accessor is not decoration. It is a small boundary that tells the rest of the pipeline which states it may trust.
What You Already Know
If you have used a Rust struct, you already know that a value can carry a name instead of floating around as raw data. If you have used an ML pipeline, you already know that a token index, a vector, a probability distribution, and a loss value play different roles. This chapter turns that familiar separation into explicit domain types.
Worked Example: Naming One Number
The smallest version of the pattern looks like this:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
struct TokenId(usize);
impl TokenId {
fn new(index: usize) -> Self {
Self(index)
}
fn index(self) -> usize {
self.0
}
}
assert_eq!(TokenId::new(3).index(), 3);
}
The real source file repeats that pattern with stronger validation where the value has an invariant, such as “a distribution must contain probabilities that sum to one.”
Self-Check
Before reading the full source snapshot, explain why TokenId(3) communicates
more than the raw number 3.
Source Snapshot
This is the domain layer used by the whole tutorial.
Source snapshot: src/domain.rs
use crate::error::{CtError, CtResult};
/// A vocabulary index. It is intentionally not a raw `usize` in public APIs.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct TokenId(usize);
impl TokenId {
pub fn new(index: usize) -> Self {
Self(index)
}
pub fn index(&self) -> usize {
self.0
}
}
impl From<usize> for TokenId {
fn from(value: usize) -> Self {
Self::new(value)
}
}
/// A sequence of tokens before it has been converted into training pairs.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct TokenSequence(Vec<TokenId>);
impl TokenSequence {
pub fn new(tokens: impl IntoIterator<Item = TokenId>) -> CtResult<Self> {
let tokens = tokens.into_iter().collect::<Vec<_>>();
if tokens.is_empty() {
return Err(CtError::EmptyInput("token sequence"));
}
Ok(Self(tokens))
}
pub fn from_indices(indices: impl IntoIterator<Item = usize>) -> CtResult<Self> {
Self::new(indices.into_iter().map(TokenId::new))
}
pub fn as_slice(&self) -> &[TokenId] {
&self.0
}
}
/// A dense feature vector.
#[derive(Debug, Clone, PartialEq)]
pub struct Vector(Vec<f32>);
impl Vector {
pub fn new(values: Vec<f32>) -> Self {
Self(values)
}
pub fn as_slice(&self) -> &[f32] {
&self.0
}
}
/// Unnormalized model scores.
#[derive(Debug, Clone, PartialEq)]
pub struct Logits(Vec<f32>);
impl Logits {
pub fn new(values: Vec<f32>) -> Self {
Self(values)
}
pub fn as_slice(&self) -> &[f32] {
&self.0
}
}
/// A validated probability distribution.
#[derive(Debug, Clone, PartialEq)]
pub struct Distribution(Vec<f32>);
impl Distribution {
pub fn new(probabilities: Vec<f32>) -> CtResult<Self> {
if probabilities.is_empty() {
return Err(CtError::EmptyInput("distribution"));
}
let sum: f32 = probabilities.iter().sum();
let all_valid = probabilities
.iter()
.all(|probability| probability.is_finite() && *probability >= 0.0);
if !all_valid || !approx_eq(sum, 1.0, 1e-4) {
return Err(CtError::InvalidProbability("distribution constructor"));
}
Ok(Self(probabilities))
}
pub fn as_slice(&self) -> &[f32] {
&self.0
}
}
/// A non-negative scalar objective value.
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Loss(f32);
impl Loss {
pub fn new(value: f32) -> CtResult<Self> {
if !value.is_finite() || value < 0.0 {
return Err(CtError::InvalidLoss(value));
}
Ok(Self(value))
}
pub fn value(&self) -> f32 {
self.0
}
}
/// Number of vocabulary entries.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct VocabSize(usize);
impl VocabSize {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("vocabulary"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Width of each embedding vector.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct ModelDimension(usize);
impl ModelDimension {
pub fn new(value: usize) -> CtResult<Self> {
if value == 0 {
return Err(CtError::EmptyInput("model dimension"));
}
Ok(Self(value))
}
pub fn value(&self) -> usize {
self.0
}
}
/// Positive optimizer step size.
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct LearningRate(f32);
impl LearningRate {
pub fn new(value: f32) -> CtResult<Self> {
if !value.is_finite() || value <= 0.0 {
return Err(CtError::InvalidLearningRate(value));
}
Ok(Self(value))
}
pub fn value(&self) -> f32 {
self.0
}
}
/// Categorical product object: `A x B`.
#[derive(Debug, Clone, PartialEq)]
pub struct Product<A, B> {
first: A,
second: B,
}
impl<A, B> Product<A, B> {
pub fn new(first: A, second: B) -> Self {
Self { first, second }
}
pub fn first(&self) -> &A {
&self.first
}
pub fn second(&self) -> &B {
&self.second
}
pub fn into_parts(self) -> (A, B) {
(self.first, self.second)
}
}
pub type TrainingExample = Product<TokenId, TokenId>;
/// Non-empty next-token training pairs.
#[derive(Debug, Clone, PartialEq)]
pub struct TrainingSet(Vec<TrainingExample>);
impl TrainingSet {
pub fn new(examples: impl IntoIterator<Item = TrainingExample>) -> CtResult<Self> {
let examples = examples.into_iter().collect::<Vec<_>>();
if examples.is_empty() {
return Err(CtError::EmptyInput("training set"));
}
Ok(Self(examples))
}
pub fn examples(&self) -> &[TrainingExample] {
&self.0
}
pub fn len(&self) -> usize {
self.0.len()
}
pub fn is_empty(&self) -> bool {
self.0.is_empty()
}
}
/// Tiny model parameters for an embedding plus language-model head.
#[derive(Debug, Clone, PartialEq)]
pub struct Parameters {
pub(crate) embedding: Vec<Vec<f32>>,
pub(crate) lm_head: Vec<Vec<f32>>,
pub(crate) bias: Vec<f32>,
}
impl Parameters {
pub fn init(vocab_size: VocabSize, d_model: ModelDimension) -> Self {
let vocab_size = vocab_size.value();
let d_model = d_model.value();
Self {
embedding: init_matrix(vocab_size, d_model, 0.2, 1),
lm_head: init_matrix(d_model, vocab_size, 0.2, 2),
bias: vec![0.0; vocab_size],
}
}
pub fn vocab_size(&self) -> usize {
self.bias.len()
}
pub fn d_model(&self) -> usize {
self.embedding.first().map_or(0, Vec::len)
}
pub fn embedding_table(&self) -> &[Vec<f32>] {
&self.embedding
}
pub fn lm_head(&self) -> &[Vec<f32>] {
&self.lm_head
}
pub fn bias(&self) -> &[f32] {
&self.bias
}
}
pub(crate) fn init_matrix(rows: usize, cols: usize, scale: f32, seed: usize) -> Vec<Vec<f32>> {
let mut out = vec![vec![0.0; cols]; rows];
for (row_index, row) in out.iter_mut().enumerate() {
for (col_index, value) in row.iter_mut().enumerate() {
let raw = ((row_index * cols + col_index) * 37 + seed * 101) % 1000;
let unit = raw as f32 / 1000.0;
*value = (unit - 0.5) * scale;
}
}
out
}
pub(crate) fn approx_eq(a: f32, b: f32, eps: f32) -> bool {
(a - b).abs() <= eps
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn distribution_rejects_non_normalized_values() {
let result = Distribution::new(vec![0.4, 0.4]);
assert!(matches!(result, Err(CtError::InvalidProbability(_))));
}
#[test]
fn token_sequence_rejects_empty_input() {
let result = TokenSequence::new(vec![]);
assert!(matches!(result, Err(CtError::EmptyInput("token sequence"))));
}
}
The Whole File
src/domain.rs defines the nouns in the tiny ML system:
TokenId
TokenSequence
Vector
Logits
Distribution
Loss
VocabSize
ModelDimension
LearningRate
Product
TrainingExample
TrainingSet
Parameters
The ML pipeline needs all of them:
TokenSequence -> TrainingSet
TokenId -> Vector
Vector -> Logits
Logits -> Distribution
Distribution x TokenId -> Loss
Parameters -> Parameters
The category-theory reading is:
These are the objects that morphisms start from and end at.
The Rust reading is:
These are wrapper types that prevent raw representation from leaking through the whole program.
Each major block below is meant to be read through three lenses:
Rust syntax:
what does the code literally declare?
ML concept:
why does the training pipeline need this value?
Category theory concept:
what object, product, list, distribution, or morphism endpoint does it model?
TokenId
The problem this block solves is:
A token index should not be confused with any other
usize.
The block:
/// A vocabulary index. It is intentionally not a raw `usize` in public APIs.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct TokenId(usize);
impl TokenId {
pub fn new(index: usize) -> Self {
Self(index)
}
pub fn index(&self) -> usize {
self.0
}
}
impl From<usize> for TokenId {
fn from(value: usize) -> Self {
Self::new(value)
}
}
Rust Syntax
TokenId is a tuple struct with one private field:
pub struct TokenId(usize);
The struct is public, but the field is private.
That means other modules can name TokenId, pass it around, and call its
methods, but they cannot directly reach inside and mutate the raw usize.
Why new Cannot Fail
pub fn new(index: usize) -> Self
Every usize is a valid token index at this layer.
The code does not know yet whether the token is inside a particular vocabulary. That check happens later when a morphism tries to look up an embedding row.
So TokenId::new is infallible.
Why index Exists
pub fn index(&self) -> usize {
self.0
}
This accessor gives read-only access to the raw index when low-level code needs it.
The type still prevents accidental mixing at the API boundary.
ML Concept
In ML terms, TokenId is a vocabulary position.
If the vocabulary is:
0 = <pad>
1 = I
2 = love
3 = Rust
4 = .
then:
TokenId::new(3)
means the token Rust.
Category Theory Concept
TokenId is one object in the category of this program’s typed values.
Arrows such as Embedding start from this object:
TokenId -> Vector
TokenSequence
The problem this block solves is:
A language model does not train directly on raw text. First, text becomes a sequence of token IDs. Then that sequence becomes input-target training pairs.
This block represents the middle stage:
raw text
-> tokens
-> token sequence
-> training examples
-> model training
The block:
/// A sequence of tokens before it has been converted into training pairs.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct TokenSequence(Vec<TokenId>);
impl TokenSequence {
pub fn new(tokens: impl IntoIterator<Item = TokenId>) -> CtResult<Self> {
let tokens = tokens.into_iter().collect::<Vec<_>>();
if tokens.is_empty() {
return Err(CtError::EmptyInput("token sequence"));
}
Ok(Self(tokens))
}
pub fn from_indices(indices: impl IntoIterator<Item = usize>) -> CtResult<Self> {
Self::new(indices.into_iter().map(TokenId::new))
}
pub fn as_slice(&self) -> &[TokenId] {
&self.0
}
}
Rust Syntax: Documentation Comment
/// A sequence of tokens before it has been converted into training pairs.
This tells you the pipeline stage.
TokenSequence is not raw text.
It is also not yet training data.
It is the ordered token stream before adjacent pairs are created.
Example:
[TokenId(1), TokenId(2), TokenId(3)]
can later become:
TokenId(1) -> TokenId(2)
TokenId(2) -> TokenId(3)
Rust Syntax: Derived Traits
#[derive(Debug, Clone, PartialEq, Eq)]
Debug allows test and debugging output.
Clone allows an explicit copy of the sequence.
PartialEq allows equality checks.
Eq says equality is total and well-behaved.
Order matters. These are not equal:
[TokenId(1), TokenId(2)]
[TokenId(2), TokenId(1)]
Rust Syntax: Private Vector
pub struct TokenSequence(Vec<TokenId>);
This wraps:
Vec<TokenId>
but does not expose the vector directly.
That is important because the type’s invariant is:
TokenSequence is non-empty.
If the field were public, a caller could construct:
TokenSequence(vec![])
and bypass validation.
The private field forces construction through TokenSequence::new or
TokenSequence::from_indices.
Rust Syntax: Constructor
pub fn new(tokens: impl IntoIterator<Item = TokenId>) -> CtResult<Self>
This accepts any input that can produce TokenId values:
- a vector
- an array
- a mapped iterator
The return type is:
CtResult<TokenSequence>
So construction can succeed or fail.
Rust Syntax: Collection
let tokens = tokens.into_iter().collect::<Vec<_>>();
This turns the flexible input into the concrete representation stored inside the struct.
The _ means Rust infers the element type as TokenId.
Rust Syntax: Empty Check
if tokens.is_empty() {
return Err(CtError::EmptyInput("token sequence"));
}
This is the invariant boundary.
An empty token stream cannot carry useful sequence information.
The error happens immediately, before invalid data enters the rest of the pipeline.
Rust Syntax: Successful Construction
Ok(Self(tokens))
Inside the impl, Self means TokenSequence.
So this is equivalent to:
Ok(TokenSequence(tokens))
The vector has already been validated, so the object is safe for later code to trust.
Rust Syntax: Convenience Constructor
pub fn from_indices(indices: impl IntoIterator<Item = usize>) -> CtResult<Self> {
Self::new(indices.into_iter().map(TokenId::new))
}
This accepts raw indices and converts each one into TokenId.
The important design choice is delegation:
from_indices -> new
Validation is not duplicated.
All construction still passes through the same non-empty check.
Rust Syntax: Read-Only Access
pub fn as_slice(&self) -> &[TokenId] {
&self.0
}
This returns a borrowed slice.
Callers can inspect the sequence, but they cannot clear it, push to it, or replace the internal vector.
That preserves the invariant after construction.
ML Concept
TokenSequence is tokenized text before next-token examples are created.
A sequence of length n can produce n - 1 adjacent prediction pairs.
Category Theory Concept
TokenSequence behaves like:
List+ TokenId
where List+ means a non-empty finite list.
Its constructor is not:
List TokenId -> TokenSequence
because the empty list is invalid.
It is:
List TokenId -> Result TokenSequence CtError
Rust turns the partial construction into a total function by using Result.
Vector and Logits
The problem these blocks solve is:
A dense hidden vector and raw vocabulary scores are both
Vec<f32>, but they do not mean the same thing.
The blocks:
#[derive(Debug, Clone, PartialEq)]
pub struct Vector(Vec<f32>);
impl Vector {
pub fn new(values: Vec<f32>) -> Self {
Self(values)
}
pub fn as_slice(&self) -> &[f32] {
&self.0
}
}
#[derive(Debug, Clone, PartialEq)]
pub struct Logits(Vec<f32>);
impl Logits {
pub fn new(values: Vec<f32>) -> Self {
Self(values)
}
pub fn as_slice(&self) -> &[f32] {
&self.0
}
}
Rust Syntax
Vector means hidden features.
Logits means unnormalized scores.
Both wrap Vec<f32>.
The distinction matters because only this arrow should produce logits:
Vector -> Logits
and only this arrow should normalize logits:
Logits -> Distribution
If both were plain Vec<f32>, the compiler could not help keep those stages
separate.
These types derive PartialEq, but not Eq, because they contain f32.
Floating-point values do not have total equality because NaN is not equal to
itself.
ML Concept
A Vector is the dense representation used after embedding lookup.
Example:
TokenId(3) -> [0.12, -0.44, 0.88, 0.03]
Logits are raw vocabulary scores.
Example:
[3.0, 1.0, -2.0]
They can be negative, larger than one, and do not need to sum to one.
The pipeline is:
TokenId -> Vector -> Logits -> Distribution
Category Theory Concept
If the model dimension is d, a vector lives in a vector-space-like object:
R^d
If the vocabulary size is V, logits live in:
R^V
The output projection is an arrow:
R^d -> R^V
and softmax maps:
R^V -> probability distributions over TokenId
Distribution
The problem this block solves is:
Probabilities are not just floats. A probability distribution must be non-empty, finite, non-negative, and sum to one.
The core block:
#[derive(Debug, Clone, PartialEq)]
pub struct Distribution(Vec<f32>);
impl Distribution {
pub fn new(probabilities: Vec<f32>) -> CtResult<Self> {
if probabilities.is_empty() {
return Err(CtError::EmptyInput("distribution"));
}
let sum: f32 = probabilities.iter().sum();
let all_valid = probabilities
.iter()
.all(|probability| probability.is_finite() && *probability >= 0.0);
if !all_valid || !approx_eq(sum, 1.0, 1e-4) {
return Err(CtError::InvalidProbability("distribution constructor"));
}
Ok(Self(probabilities))
}
}
Rust Syntax: Why Construction Can Fail
This is invalid:
[]
This is invalid:
[0.4, 0.4]
because it sums to 0.8, not 1.0.
This is invalid:
[1.2, -0.2]
because probabilities cannot be negative.
So Distribution::new returns CtResult<Self>.
Rust Syntax: The Sum Check
let sum: f32 = probabilities.iter().sum();
This computes the total probability mass.
The code uses approximate equality:
approx_eq(sum, 1.0, 1e-4)
because floating-point arithmetic is not exact.
ML Concept
This is the output of softmax:
Logits -> Distribution
The rest of the model can treat a Distribution as real probabilities because
the constructor checked the rule.
Category Theory Concept
Distribution is an object with a stronger invariant than Vec<f32>.
The softmax morphism lands in this object only if it can produce valid probability mass.
Loss
The problem this block solves is:
A loss value must be a real, non-negative scalar.
The block:
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Loss(f32);
impl Loss {
pub fn new(value: f32) -> CtResult<Self> {
if !value.is_finite() || value < 0.0 {
return Err(CtError::InvalidLoss(value));
}
Ok(Self(value))
}
pub fn value(&self) -> f32 {
self.0
}
}
Rust Syntax
Loss::new rejects:
- infinity
- not-a-number values
- negative values
Cross entropy should not produce a negative loss. If it does, something has gone wrong before or during loss calculation.
Loss derives Copy because it wraps one small scalar.
Calling value() returns the raw f32 for printing, comparison, or averaging.
ML Concept
Loss is the training signal.
For next-token prediction:
loss = -log(probability assigned to the correct token)
Lower loss means the model assigned more probability to the correct answer.
Training tries to reduce this value.
Category Theory Concept
Loss is the codomain of an evaluation morphism:
Distribution x TokenId -> Loss
It maps prediction plus truth into a non-negative scalar objective.
Shape and Training Hyperparameter Types
The problem these blocks solve is:
Dimensions and learning rates need boundary checks before they are used to allocate matrices or update parameters.
The types are:
VocabSize
ModelDimension
LearningRate
Rust Syntax
VocabSize::new(0) fails because a vocabulary with zero entries is unusable.
ModelDimension::new(0) fails because an embedding vector with zero width
cannot carry features.
LearningRate::new(value) fails when the value is not finite or is not
positive.
These checks keep bad configuration from becoming strange matrix behavior later.
ML Concept
VocabSize controls:
embedding rows
logit length
distribution length
bias length
ModelDimension controls embedding width:
R^d
LearningRate controls optimizer step size:
parameter = parameter - learning_rate * gradient
Category Theory Concept
VocabSize describes the cardinality of the finite token object.
ModelDimension chooses the intermediate vector-space-like object.
LearningRate chooses one update morphism from a family of training
endomorphisms.
Product<A, B>
The problem this block solves is:
Some ML operations need two inputs that belong together.
The block:
#[derive(Debug, Clone, PartialEq)]
pub struct Product<A, B> {
first: A,
second: B,
}
This is a generic pair.
It is used in two important places:
pub type TrainingExample = Product<TokenId, TokenId>;
and:
Product<Distribution, TokenId> -> Loss
Rust Syntax: Why Not A Tuple Everywhere?
Rust tuples like (A, B) would work mechanically.
Product<A, B> makes the category-theory idea visible:
A x B
It also gives named methods:
first()
second()
into_parts()
Those methods make call sites easier to read during the course.
ML Concept
Product<TokenId, TokenId> is one supervised next-token example:
input token x target token
Product<Distribution, TokenId> is the input to cross entropy:
prediction x target
Category Theory Concept
Product<A, B> is the course’s named version of:
A x B
The accessors are projection-like operations:
first ~ pi_1
second ~ pi_2
TrainingSet
The problem this block solves is:
Training should not run on an empty collection of examples.
The block:
#[derive(Debug, Clone, PartialEq)]
pub struct TrainingSet(Vec<TrainingExample>);
impl TrainingSet {
pub fn new(examples: impl IntoIterator<Item = TrainingExample>) -> CtResult<Self> {
let examples = examples.into_iter().collect::<Vec<_>>();
if examples.is_empty() {
return Err(CtError::EmptyInput("training set"));
}
Ok(Self(examples))
}
}
This mirrors TokenSequence.
The internal vector is private.
Construction validates non-emptiness.
Callers get read-only access through:
pub fn examples(&self) -> &[TrainingExample]
Rust Syntax: Why is_empty Exists If Empty Is Impossible
TrainingSet includes:
pub fn is_empty(&self) -> bool {
self.0.is_empty()
}
For values constructed through TrainingSet::new, this should always be
false.
The method exists because collection-like types conventionally expose both
len and is_empty, and tests or generic code may use it.
The invariant is still protected by private storage and the constructor.
ML Concept
A TrainingSet is a non-empty list of next-token examples.
For:
[10, 25, 31, 7]
the training set is:
(10, 25)
(25, 31)
(31, 7)
Category Theory Concept
The shape is:
non-empty list of (TokenId x TokenId)
or:
List+ (TokenId x TokenId)
Parameters
The problem this block solves is:
Training needs one object that owns all trainable model state.
The block:
#[derive(Debug, Clone, PartialEq)]
pub struct Parameters {
pub(crate) embedding: Vec<Vec<f32>>,
pub(crate) lm_head: Vec<Vec<f32>>,
pub(crate) bias: Vec<f32>,
}
The model has three pieces:
embedding table
lm head matrix
bias vector
The fields are pub(crate), not fully public.
That means code inside this crate can update parameters during training, but external callers use accessors.
Rust Syntax: Initialization
pub fn init(vocab_size: VocabSize, d_model: ModelDimension) -> Self
This takes validated domain values, not raw usize.
That means matrix allocation starts from:
non-empty vocabulary
positive model dimension
The initialized shapes are:
embedding: vocab_size x d_model
lm_head: d_model x vocab_size
bias: vocab_size
ML Concept
Parameters is the trainable state.
Prediction reads it.
Training maps it back to a new Parameters value:
Parameters -> Parameters
Category Theory Concept
Parameters is the object of the training endomorphism.
The important point is not that the numbers change.
The important point is that the type remains the same.
Utility Functions
The file ends with:
pub(crate) fn init_matrix(...)
pub(crate) fn approx_eq(...)
init_matrix is local deterministic initialization for the teaching model.
approx_eq is a small floating-point helper used by probability checks and
composition tests.
Both are crate-internal implementation details, not learner-facing domain objects.
Runnable Example
The domain example shows token IDs becoming training pairs:
Source snapshot: examples/01_domain_objects.rs
use category_theory_transformer_rs::{CtResult, DatasetWindowing, Morphism, TokenSequence};
fn main() -> CtResult<()> {
let tokens = TokenSequence::from_indices([1, 2, 3, 4])?;
let dataset = DatasetWindowing.apply(tokens)?;
println!("training pairs:");
for example in dataset.examples() {
println!(
"{} -> {}",
example.first().index(),
example.second().index()
);
}
Ok(())
}
Run:
cargo run --example 01_domain_objects
Expected shape:
training pairs:
1 -> 2
2 -> 3
3 -> 4
Why This Matters
The main design rule is:
Use raw primitives only at the edge where they are created or displayed.
After that, use domain types.
This prevents mistakes like:
passing a model dimension where a token ID was expected
passing logits where probabilities were expected
training on an empty dataset
using a negative learning rate
Core Mental Model
src/domain.rs turns raw storage into trustworthy objects.
In Rust terms:
private fields + smart constructors + accessors
In ML terms:
tokens, vectors, probabilities, loss, and model weights
In category-theory terms:
objects that morphisms can safely connect
Checkpoint
Pick one type from this file and explain:
- What raw representation it wraps.
- What invalid state it prevents.
- Which morphism consumes or produces it.
Example:
Distribution wraps Vec<f32>, rejects invalid probability mass, and is produced
by Softmax before CrossEntropy consumes it.
Where This Leaves Us
This chapter gave names to the values in the system. A token id is not a model dimension, logits are not probabilities, and a training set is not just any vector of pairs. Each type marks a stage where raw storage becomes a meaningful object.
The next chapter adds arrows between those objects. Once the arrows exist, the book can talk about identity, composition, and repeated transformations without falling back to loose wiring conventions.
Further Reading
These pages extend the domain-object vocabulary used in this chapter:
- Glossary: object, product object, invariant, smart constructor
- References: Rust error handling, Rust API design, and Rust documentation
Retrieval Practice
Recall
What is a domain object in this book?
Explain
Why does Distribution::new validate probability mass at construction time
instead of leaving that check to CrossEntropy?
Apply
Design a one-field newtype for a future SequenceLength. State one invariant
its constructor should protect.