Exercises
The problem this chapter solves is:
Reading detailed explanations is not enough. You need to practice explaining the code through Rust syntax, ML concept, and category-theory concept.
The exercises are deliberately small. A strong answer is not a long essay; it is a precise explanation that connects a line of Rust to the value it protects, the ML step it supports, and the categorical shape it names. When an exercise asks you to edit code, make the smallest change, run the command, and then explain what changed.
For every exercise, use this answer shape:
Rust syntax:
...
ML concept:
...
Category theory concept:
...
The point is not to write long answers.
The point is to connect the same block of code across all three meanings.
The exercise method is:
read one small idea
run the matching command
break one boundary on purpose
explain the failure
restore the working version
This matters because the Rust compiler and the test suite are part of the lesson. The official Rust testing material treats tests as executable checks for expected behavior. This book uses the same habit for learning: a failed test, rejected constructor, or compiler error is not only a problem to remove. It is evidence about which boundary the code protects.
Source-Backed Practice Contract
This chapter uses sources to keep practice cumulative, testable, and transfer-oriented. Each source supports one local exercise rule and one kind of repository evidence.
| Source | What the source supports | Local rule in this chapter | Repository evidence |
|---|---|---|---|
| How People Learn II | Learners need practice that connects prior knowledge to new transfer situations. | Move from one small Rust boundary to a new but related boundary. | TokenId to Distribution, then TrainStep, then attention shapes |
| Test-Enhanced Learning | Retrieval practice can improve retention rather than only measure it. | Ask Recall, Explain, and Apply questions before the answer key. | ## Checkpoint Quiz, ## Retrieval Practice, exercises/ANSWER_KEY.md |
| Structuring the Transition From Example Study to Problem Solving | Learners benefit from moving from worked examples toward independent problem solving. | Use a worked example, then a partially completed example, then an open transfer exercise. | ## Worked Example, ## Partially Completed Example, ## Transfer Exercise |
| Rust Book: Writing Automated Tests | Tests check expected behavior that the type system alone cannot prove. | Treat tests, constructor errors, and compiler errors as learning evidence. | cargo test --all-targets --all-features, domain::tests, category::tests, ml::tests |
| Rust By Example: Tests | Small test commands and targeted test names make feedback inspectable. | Prefer one named command and one visible signal per exercise attempt. | cargo test structure::tests --lib, cargo test cross_entropy_is_lower_for_more_confident_target_probability --lib |
| CS231n Optimization and PyTorch gradcheck | Numerical gradient checks are useful local debugging signals with limits. | Use finite differences to compare one local update path, then state what the check does not prove. | Advanced Exercise 5, TransformerBlockTrainStep finite-difference tests |
The transfer pattern is:
worked example -> partial example -> independent attempt -> evidence signal
For this chapter, evidence means one of:
command output
constructor error
compiler error
named test result
answer-key mismatch
It is not evidence that every exercise works for every reader yet. Direct exercise-attempt reports are still needed before the exercise ladder can be called fully validated.
Before starting, make sure the basic Rust feedback loop works:
cargo test --all-targets --all-features
That command is part of the learning method. It proves that the examples in the book are not only explanatory text; they are tied to code that the compiler can check.
After attempting an exercise, compare your reasoning with the public answer key
in exercises/ANSWER_KEY.md. Use it to check the shape of the explanation, not
to memorize wording.
Exercise Ladder
Use the exercises in this order:
| Stage | File or chapter | What you practice |
|---|---|---|
| Beginner | exercises/beginner/README.md | Change inputs, observe output, name one invariant |
| Core | this chapter | Explain each concept through Rust, ML, and category theory |
| Intermediate | exercises/intermediate/README.md | Add one morphism and explain one composition failure |
| Advanced | exercises/advanced/README.md | Extend a chapter, diagram, law, or sketch test |
Do not skip the small exercises. How People Learn II emphasizes that learners
need to retrieve and use knowledge in new situations. In this course, transfer
means taking the same explanation method from TokenId to Distribution, then
from Distribution to training, and then from training to the seven applied
sketches.
Core Chapter Practice Map
Use this map when you finish a chapter and want the matching practice task.
| Chapter | Practice target | Best exercise |
|---|---|---|
| Welcome | Explain the three-lens reading contract | Beginner Exercise 3 |
| Course Map | Connect terminal output to pipeline stages | Exercise 2 and Exercise 8 |
| Domain Objects | Explain wrappers, invariants, and typed objects | Exercise 1 and Exercise 7 |
| Morphism and Composition | Explain legal and illegal composition | Exercise 4 and Exercise 17 |
| The Tiny ML Pipeline | Trace adjacent pairs, prediction, and loss | Exercise 3, Exercise 9, Exercise 13, and Exercise 17 |
| Training as an Endomorphism | Explain repeated Parameters -> Parameters updates | Exercise 5 and Exercise 17 |
| Functors, Naturality, Monoids, and Chain Rule | Explain mapping, laws, traces, and local gradients | Exercise 6, Exercise 14, and Exercise 17 |
| Seven Sketches Through Rust | Identify the law or boundary a structure protects | Exercise 10 |
| Challenges | Turn one compiler-fix or paper-to-code task into evidence | Challenge completion report |
| Transformer Roadmap | Trace attention shapes, classify category shapes, and explain finite-difference checks for structured training state | Exercise 12, Exercise 16, Exercise 17, and Advanced Exercise 5 |
The map is not a separate syllabus. It is a repair tool. If a chapter feels clear while reading but vague one hour later, use the matching exercise to make the idea active again.
Chapter Mastery Gates
Use these gates before moving from a chapter into later material. A gate is not a grade. It is a quick test of whether the idea is active enough to reuse.
| Chapter | Run evidence | Explain evidence | Transfer evidence |
|---|---|---|---|
| Welcome | cargo run --example 01_token_sequence | state the three-lens reading contract without looking back | explain one output line through Rust, ML, and category theory |
| Course Map | cargo run --bin category_ml | name the file or module behind three printed sections | choose the next chapter and matching exercise from the output |
| Domain Objects | cargo run --example 01_domain_objects | explain one constructor invariant and the bad state it rejects | replace one raw value in an explanation with its domain type |
| Morphism and Composition | cargo run --example 02_morphism_composition | name every middle object in TokenId -> Vector -> Logits -> Distribution | explain one illegal skipped stage and the missing object |
| The Tiny ML Pipeline | cargo test ml::tests --lib | separate logits, probabilities, target token, and loss | compute which prediction should have lower cross-entropy |
| Training as an Endomorphism | cargo run --example 03_training_endomorphism | explain why one update has shape Parameters -> Parameters | predict what breaks if an update returns only a loose changed field |
| Functors, Naturality, Monoids, and Chain Rule | cargo run --example 04_structure_and_calculus | explain one law by tracing both sides of the example | classify a new trace, option, vector, or derivative example |
| Seven Sketches Through Rust | cargo run --example 05_seven_sketches | identify the relation, order, schema, circuit, or cover being protected | model one analogous boundary in a small software system |
| Transformer Roadmap | cargo run --example 06_attention_scores and cargo run --example 07_transformer_training_state | classify attention boundaries by input count and output object | reject one illegal shortcut such as HiddenSequence x MultiHeadOutput -> HiddenSequence |
If a gate fails, do not reread the whole chapter first. Start with the matching exercise, inspect the failure signal, and compare your reasoning with the answer-key rubric. The smallest useful repair is usually one missing object, one missing command, or one missing distinction.
Checkpoint Quiz
Use this after the mastery gates. Answer from memory first, then check the answer key. The goal is not vocabulary recall alone. The goal is to notice whether you can connect a Rust boundary, an ML role, and a category-theory shape without the chapter open.
Questions
Write one or two sentences for each question.
- A value has type
TokenId. What mistake becomes harder than if the same value crossed the boundary asusize? - The path
TokenId -> Vector -> Logits -> Distributionfails if the middleLogitsstage is skipped. What Rust evidence and ML evidence explain the failure? - A model gives the target token probability
0.9in one case and0.1in another. Which case should have lower cross-entropy, and why? - A training update changes weights but returns only the changed readout matrix. Which composition shape has been broken?
VecFunctor::fmapmaps every element andOptionFunctor::fmapmaps only when a value is present. What does that preserve?- A naturality square has two paths from
Vec<A>toOption<B>. What should be true if the square commutes? AttentionScores x AttentionMask -> AttentionScoresreturns the score object. Why is this still not a unary endomorphism?- Why must the attention mask act before row-wise softmax?
HiddenSequence x MultiHeadOutput -> HiddenSequencelooks tempting after concatenating heads. Which missing boundary makes it illegal?- A finite-difference test agrees with the inferred gradient for one parameter. What has it checked, and what has it not checked?
Coverage Map
| Question | Chapter or section | Main objective |
|---|---|---|
| 1 | Domain Objects | explain why a wrapper protects a domain role |
| 2 | Morphism and Composition | identify a missing middle object |
| 3 | Tiny ML Pipeline | connect target probability to loss |
| 4 | Training as an Endomorphism | preserve state-update composition |
| 5 | Structure and Laws | explain structure-preserving mapping |
| 6 | Structure and Laws | trace both paths through a naturality square |
| 7 | Transformer Roadmap | count inputs before naming an endomorphism |
| 8 | Transformer Roadmap | separate masked scores from weights |
| 9 | Transformer Roadmap | identify a missing projection boundary |
| 10 | Exercises and Transformer Roadmap | state the scope of a local gradient check |
Score the quiz by evidence, not points. A strong answer names the object or boundary, explains the ML or software role, and rejects one invalid shortcut. If an answer only repeats a term, return to the matching exercise.
Failure Signals
A good exercise often fails before it works. Use the failure signal as part of the answer.
| Signal | Usually means | What to explain |
|---|---|---|
| Compiler type error | two stages do not connect | the missing middle object |
Constructor returns Err(...) | a value violates an invariant | the bad state rejected at the boundary |
| Test assertion fails | the behavior no longer matches the law | which example stopped preserving the intended structure |
| Command output changes | the data path changed | which typed value moved differently through the pipeline |
When an exercise asks you to break something, do it in a small local edit and then restore the working version. The final repository should still pass the validation commands.
Exercise Evidence Map
Use this table before checking the answer key. It tells you what kind of evidence should exist when an exercise is complete.
| Exercise | Progress evidence | Failure or output to inspect |
|---|---|---|
| Exercise 1 | written three-lens explanation | raw representation, invariant, and pipeline stage are all named |
| Exercise 2 | cargo run --bin category_ml | terminal output includes the new adjacent transition |
| Exercise 3 | handwritten adjacent pairs | three overlapping TokenId pairs are present |
| Exercise 4 | temporary broken composition | compiler reports a missing trait bound or middle object |
| Exercise 5 | cargo run --example 03_training_endomorphism | loss output changes as StepCount changes |
| Exercise 6 | rewritten output distribution | probabilities stay attached to transformed outcomes |
| Exercise 7 | constructor boundary explanation | Err(...) is connected to the invalid value |
| Exercise 8 | five-sentence file summary | one command is named as the proof that the file still works |
| Exercise 9 | source-role comparison | one external resource is connected to one local source file, one owned boundary, and one unsupported claim |
| Exercise 10 | cargo run --example 05_seven_sketches or a negative test | one law still holds, or one invalid structure is rejected |
| Exercise 11 | block explanation | a beginner-facing Rust explanation and a shape name are both present |
| Exercise 12 | cargo run --example 06_attention_scores | first output line and category shape for each attention boundary are recorded |
| Exercise 13 | cargo test cross_entropy_is_lower_for_more_confident_target_probability --lib | lower loss is assigned to the higher target probability |
| Exercise 14 | cargo test structure::tests --lib | naturality paths and monoid laws are both named |
| Exercise 15 | mixed boundary diagnosis | each failure is classified as an invariant, composition, endomorphism, shape, or local-to-global boundary |
| Exercise 16 | cargo run --example 07_transformer_training_state | three different updates preserve TransformerTrainingState -> TransformerTrainingState |
| Exercise 17 | diagram reconstruction sheet | objects, arrows, paths, Rust handles, and safe non-claims are all labeled |
This is not extra bureaucracy. Rustlings-style practice works because the learner gets a concrete feedback signal. This course uses the same idea: command output, a constructor error, a compiler error, or a named test should tell you whether the concept is becoming executable.
Worked Example: Mixed Boundary Diagnosis
Before solving Exercise 15, study one complete diagnosis. The case is:
CrossEntropy receives Logits instead of Product<Distribution, TokenId>.
A weak answer says:
The types are wrong.
That is true, but it is not precise enough. A useful diagnosis names the Rust boundary, the ML mistake, and the category-theory shape.
Boundary type:
composition boundary plus product-input boundary
Rust syntax:
CrossEntropy implements Morphism<Product<Distribution, TokenId>, Loss>. The
input must therefore be a product containing a validated Distribution and the
target TokenId. Logits alone have the wrong type.
ML concept:
Logits are unnormalized vocabulary scores. Cross-entropy needs the probability
assigned to the correct target token. The missing work is Softmax followed by
pairing the resulting Distribution with the target TokenId.
Category theory concept:
The legal route is Logits -> Distribution and then
Distribution x TokenId -> Loss. Skipping the product object hides the supervised
part of the loss calculation.
Smallest useful fix:
Run Softmax first, then call CrossEntropy on
Product::new(distribution, target_token).
Use this as the standard for Exercise 15. Do not stop at “wrong type.” Explain which object was missing, which morphism should have produced it, and which shortcut the boundary rejected.
Exercise Attempt Record
When an exercise feels unclear, record the attempt in this shape before opening an issue or comparing with the answer key:
Exercise:
Chapter:
Command run:
First failure signal:
Line or concept that caused confusion:
What I expected:
What happened instead:
Answer-key mismatch:
Suggested rewrite:
This report is useful because it ties reader feedback to a concrete exercise, command, failure signal, and chapter location. It also keeps feedback public and impersonal: do not include private data, local secrets, or personal background details that are not needed to improve the exercise.
Use the answer key after the attempt record. If the answer key explains the concept but not the failure you saw, that is evidence that the exercise needs a better hint, pass condition, or worked example.
Open an exercise clarity report after you have one concrete attempt record. The link fills the route, not the evidence; the evidence signal should come from what you personally read, ran, or attempted.
Worked Example
First study a complete answer. The exercise is:
Explain why TokenId is not a raw usize.
A strong answer:
Rust syntax:
TokenId is a tuple struct around usize. The field is private, so callers use
TokenId::new and index() instead of reaching into the raw value directly.
ML concept:
The number represents a vocabulary position, not an arbitrary count or shape.
Category theory concept:
TokenId is one object in the small category of typed pipeline values. Morphisms
such as Embedding can start from it.
Notice the order: name the syntax, connect it to the ML role, then name only the categorical shape the code supports.
Worked Example: Gradient Checking
This worked example supports Advanced Exercise 5 in
exercises/advanced/README.md.
The exercise asks why a finite-difference test compares:
inferred gradient from one training update
central finite difference of average loss
The reason is that these are two independent ways to ask the same local question:
If I nudge this parameter, how does the loss move?
CS231n presents this as the difference between numerical gradients and analytic gradients: the numerical version is slower and approximate, but useful for checking whether the analytic implementation is correct. Dive into Deep Learning explains the matching training shape from the other direction: backpropagation walks the computation in reverse order, stores intermediate values, and computes gradients for parameters. This project makes that idea small enough to inspect in Rust.
PyTorch’s gradcheck documentation gives the same engineering warning in
framework form: the check compares small finite differences against analytical
gradients and accepts agreement only within tolerance. It also calls out
practical caveats such as precision, non-differentiable points, and overlapping
memory. Translate that into this Rust lab as:
finite-difference match = useful local debugging signal
finite-difference match != proof of every gradient path
The code-level test has two paths.
The first path performs one training step:
before parameter
-> TransformerBlockTrainStep
-> after parameter
From that update, the test infers the gradient:
inferred_gradient = (before_value - after_value) / learning_rate
That matches gradient descent:
parameter <- parameter - learning_rate * gradient
The second path does not trust the training step. It clones the same state, changes one parameter in two directions, and measures the loss:
loss_plus = loss(parameter + epsilon)
loss_minus = loss(parameter - epsilon)
Then it estimates the local slope:
finite_difference = (loss_plus - loss_minus) / (2 * epsilon)
A strong answer for one parameter family looks like this:
Rust syntax:
The test selects one feed-forward bias entry, clones the training state twice,
adds epsilon to the entry in one clone, subtracts epsilon in the other clone,
and calls transformer_block_average_loss on both states.
ML concept:
The bias is a trainable parameter. The central finite difference estimates how
the average loss changes around the current bias value. The one-step update
infers the gradient that backpropagation used. If both slopes match, the
implemented update has the right local sign and scale for that parameter.
Category theory concept:
The training step is an endomorphism on TransformerTrainingState. The check
asks whether this state update agrees locally with the loss morphism that it is
supposed to reduce.
What failure would this test catch?
It would catch a missing bias gradient, a reversed update sign, a dropped path
through the feed-forward block, or a mismatch between averaged loss and summed
gradients.
The important habit is not the formula by itself. The habit is triangulation:
implementation path
numerical measurement
conceptual explanation
When all three agree, the code becomes easier to trust and easier to teach. If the two numbers disagree, do not immediately change the test tolerance. Ask which boundary failed first:
wrong sign?
missing path?
wrong averaging scale?
non-smooth point?
parameter aliasing or shared storage?
That is why the exercise asks for a specific parameter family. A focused finite-difference check is a microscope, not a certificate for the whole training system.
Partially Completed Example
Complete the missing lines for Distribution:
Rust syntax:
Distribution wraps ________ and construction can return ________.
ML concept:
It represents probabilities over possible next tokens, so the values must be
non-negative and sum to ________.
Category theory concept:
It is an object produced by ________ and consumed with a target token by
________.
Expected completion:
Vec<f32>
CtResult<Self>
one
Softmax
CrossEntropy
Your Turn
Now solve the same kind of exercise without the filled answer. Pick Loss,
TrainingSet, or LearningRate and explain it through the same three lenses.
Transfer Exercise
Design a wrapper type for the attention roadmap or a future Transformer chapter,
such as SequenceLength, HeadCount, or AttentionScores. State the raw
representation, the invariant, and one function that should consume or produce
it.
Expected failure to consider:
What should the constructor reject?
If the answer is “nothing,” the type may be only a semantic wrapper. If the answer is “zero heads,” “empty sequence,” or “probability outside the allowed range,” the type needs a validating constructor.
Exercise 1: Explain One Domain Type
Use Domain Objects.
Pick one type:
VectorLogitsDistributionLossTrainingSetParameters
Write:
The problem this solves:
Rust syntax:
ML concept:
Category theory concept:
Pass condition:
- You name the raw representation.
- You name the invariant or semantic distinction.
- You name the pipeline stage where the type appears.
- You distinguish a semantic wrapper from a validated object when that distinction matters.
Primitive-to-domain audit option:
Use the chapter’s Primitive-To-Domain Responsibility Ledger. Fill this card:
raw value:
domain object:
constructor or boundary:
invariant owned here:
downstream code allowed to trust:
unsafe shortcut rejected:
source-backed limit:
validation command:
Pass condition:
- Your audit names the constructor or boundary that owns the conversion.
- It distinguishes semantic role labeling from invariant validation.
- It names what downstream code is allowed to trust after construction.
- It rejects one raw-primitive shortcut without overclaiming what the type proves.
First-principles hint:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
struct LocalTokenId(usize);
impl LocalTokenId {
fn new(index: usize) -> Self {
Self(index)
}
fn index(self) -> usize {
self.0
}
}
assert_eq!(LocalTokenId::new(7).index(), 7);
}
That snippet is intentionally smaller than the real crate. It shows the raw
idea: a named wrapper can make one usize mean “token id” instead of “any
number.”
Exercise 2: Add A Token
Use the src/demo.rs snapshot in Course Map.
Add one new vocabulary item and extend the token sequence.
Run:
cargo run --bin category_ml
Pass condition:
- the demo still runs
- the dataset windowing output includes your new transition
- you can explain why a longer
TokenSequencecreates more training examples
Debugging hint:
If the output does not include the new transition, check whether you changed both the vocabulary and the token sequence. A vocabulary entry alone does not create a training pair. The pair appears only when two token ids are adjacent in the sequence.
Exercise 3: Trace DatasetWindowing
Use The Tiny ML Pipeline.
For this input:
[TokenId(4), TokenId(8), TokenId(15), TokenId(16)]
write the training examples produced by windows(2).
Then explain:
Rust syntax:
what does `.windows(2)` do?
ML concept:
why does next-token training need adjacent pairs?
Category theory concept:
why is each example a product object?
Check yourself before reading onward:
(TokenId(4), TokenId(8))
(TokenId(8), TokenId(15))
(TokenId(15), TokenId(16))
The syntax creates overlapping adjacent windows. The ML idea is next-token supervision: each input token is paired with the token that follows it. The category-theory shape is a product object because each training example carries two typed values together.
Exercise 4: Break A Composition
Use the examples/02_morphism_composition.rs snapshot in
Morphism and Composition.
Try to compose Embedding directly with Softmax.
Expected failure shape:
the trait bound ... is not satisfied
Then restore the working version.
Explain:
Rust syntax:
which type did the compiler reject?
Composition diagnostic:
first source:
first target:
second source:
second target:
which middle object should connect the stages?
ML concept:
which prediction stage was skipped?
Category theory concept:
which middle object failed to match?
Source-target-middle repair audit option:
Use the chapter’s Source-Target-Middle Repair Ledger. Fill this card:
composition attempt:
first arrow:
second arrow:
claimed middle object:
actual first target:
actual second source:
repair:
unsafe shortcut rejected:
validation command or output:
Pass condition:
- You name
Embedding : TokenId -> VectorandSoftmax : Logits -> Distribution. - You identify
VectorversusLogitsas the failed middle-object match. - You restore
LinearToLogits : Vector -> Logitsinstead of weakeningSoftmax. - You explain why the skipped ML stage is vocabulary scoring.
- Your repair audit names the attempted composition, the actual first target, the actual second source, the missing repair arrow, the unsafe shortcut, and one validation command or output line.
Debugging hint:
Do not fix this by changing the type signatures. Restore the missing stage instead. The intended path is:
TokenId -> Vector -> Logits -> Distribution
Exercise 5: Change The Training Repetition Count
Use the examples/03_training_endomorphism.rs snapshot in
Training as an Endomorphism.
Change:
StepCount::new(80)
to:
StepCount::new(1)
StepCount::new(10)
StepCount::new(200)
Run:
cargo run --example 03_training_endomorphism
Explain the result:
Rust syntax:
where is the count used?
Training diagnostic:
what object is updated?
what object measures quality?
what repeats?
what controls update size?
ML concept:
what happens when training repeats more times?
Category theory concept:
why can the update be repeated?
Framework-to-Rust audit option:
Use the Framework-To-Rust Responsibility Ledger in
Training as an Endomorphism. Pick one framework
cue:
optimizer.zero_grad()
loss.backward()
optimizer.step()
optimizer state_dict
Fill this card:
framework cue:
responsibility:
local Rust handle:
returned object:
category boundary:
safe non-claim:
Expected observation:
One step should preserve the shape of the parameters but may not reduce loss much. More steps usually make the tiny example improve until the hand-written training rule reaches its limit. The important category-theory point is not “more is always better”; it is that the same update has the shape:
Parameters -> Parameters
Pass condition:
- You distinguish
TrainStep : Parameters -> ParametersfromParameters x TrainingSet -> Loss. - You explain that loss is a measurement, not the updated model state.
- You identify
StepCountas repetition of the same update shape. - Your framework-to-Rust audit distinguishes preparation, gradient computation, parameter update, and optimizer-state scope.
- You name the returned object and avoid calling the tiny step a framework optimizer or autograd engine.
- You avoid claiming that more steps always means better behavior.
Exercise 6: Explain Distribution<T>::map
Use Functors, Naturality, Monoids, and Chain Rule.
Explain the conceptual Distribution<T>::map example.
Use this input distribution:
TokenId(2) -> 0.70
TokenId(3) -> 0.30
and this function:
TokenId -> String
where:
TokenId(2) -> "Rust"
TokenId(3) -> "."
Write the output distribution.
Then explain:
Rust syntax:
why does `self` plus `into_iter()` move the old outcomes?
ML concept:
why do the probabilities stay the same?
Category theory concept:
what does it mean to lift `T -> U` into `Distribution<T> -> Distribution<U>`?
Exercise 7: Explain One Validation Boundary
Pick one constructor:
Distribution::newLoss::newLearningRate::newTrainingSet::newSignalMatrix::newOpenCircuit::new
Write:
The problem this solves:
Rust syntax:
which condition returns `Err(...)`?
ML or software concept:
what bad runtime behavior does this prevent?
Category theory concept:
what intended object or relationship is being protected?
Exercise 8: Trace A Full Source File
Use Repository Source Snapshots.
Pick one complete source file and write a five-sentence summary:
- What problem does the file solve?
- What are the main Rust types or traits?
- What ML or software concept does it model?
- What category-theory concept does it teach?
- Which command proves the file still works?
Exercise 9: Connect One External Reference
Use References.
Pick one external resource and connect it to one source file in this course. First classify the source using the source-role table in the references chapter.
Answer:
External resource:
Source role:
Owned boundary:
Source file:
Rust syntax connection:
ML or software concept connection:
Category theory concept connection:
What this source can support:
What this source cannot support:
One difference between the full treatment and this tiny implementation:
Pass condition:
- You classify the source as official documentation, academic paper, open textbook or university material, implementation bridge, or learner-friction signal.
- You name the boundary the source owns.
- You connect it to one concrete source file, type, function, test, or example.
- You state one claim the source does not license this book to make.
Exercise 10: Test One Sketch Law
Use Seven Sketches Through Rust.
Pick one law from src/sketches.rs:
- preorder laws
- feature/layer Galois law
- resource monotonicity
- foreign-key resolution
- co-design feasibility relation
- signal-flow matrix composition
- local-to-global safety truth
Change one input in examples/05_seven_sketches.rs, then run:
cargo run --example 05_seven_sketches
Pass condition:
- you can explain which law still holds
- you can explain which constructor or method prevents invalid structure
- your explanation uses Rust syntax, ML or software concept, and category theory concept
Negative test option:
Instead of changing the runnable example, inspect one of the negative tests in
src/sketches.rs:
- missing database reference
- mismatched signal-matrix middle dimension
- open-circuit serial boundary mismatch
Explain what invalid structure the test rejects. This is often the fastest way to understand what a law or constructor is protecting.
PDF-to-Rust contract option:
Use the chapter’s PDF-To-Rust Reading Contract. Pick one source idea from
the Seven Sketches chapter and fill this row:
source idea from the PDF:
Rust handle:
protected law, relation, or boundary:
larger source claim not implemented by this code:
local evidence command or test:
If the source idea still feels too large, fill the chapter’s transfer triage card before writing the final answer:
source idea:
local Rust handle:
protected law, relation, or boundary:
invalid shortcut rejected:
tiny ML transfer:
larger claim not implemented:
local evidence command or test:
Pass condition:
- your Rust handle is one concrete type, method, constructor, example output
line, or test from
src/sketches.rs - your protected claim is smaller than the full source text
- your evidence can be checked with
cargo run --example 05_seven_sketchesorcargo test sketches::tests --lib - your transfer card names one invalid shortcut and one non-claim
Page-to-Rust decision-ladder option:
Use the chapter’s Page-To-Rust Decision Ladder. Pick one paragraph shape from
the source text:
definition or named object
relation, order, or feasibility statement
composition rule
theorem, law, or proof step
worked example or application story
richer machinery beyond the local handle
Then fill:
source paragraph shape:
first Rust move:
invalid state or shortcut to reject:
local evidence command or test:
safe non-claim:
Pass condition:
- your first Rust move is concrete: newtype, enum, struct, constructor, method, fixture, output line, or named test
- your evidence names a command or test that exists in this repository
- your safe non-claim prevents turning one local handle into a claim about the whole source text
- you do not start by inventing a broad trait or framework when a small typed boundary would expose the issue
Bridge-back-to-tiny-ML option:
Use the chapter’s Bridge Back To Tiny ML table. Pick one row and fill:
sketch:
tiny ML pressure:
Rust handle:
bad shortcut rejected:
safe non-claim:
evidence command or test:
one-sentence transfer:
The one-sentence transfer must use this shape:
This sketch helps me reject this ML shortcut: ...
Pass condition:
- your row matches one actual bridge row in the Seven Sketches chapter
- your bad shortcut is something a tiny ML system could plausibly get wrong
- your safe non-claim prevents overclaiming the larger category-theory source
- your evidence points to
cargo run --example 05_seven_sketchesorcargo test sketches::tests --lib
Co-design option:
Use DesignRequirement, ImplementationOffer, and FeasibilityRelation.
Write the relation as:
DesignRequirement x ImplementationOffer -> bool
Then translate it to:
ArchitectureConstraint x CandidateImplementation -> Bool
Pass condition:
- you explain why this is a relation rather than a function
- you give one passing offer and one failing offer
- you say why one passing implementation does not prove the whole constraint space
Exercise 11: Write A New Block Explanation
Choose any block from the source snapshots that the chapter did not explain in enough detail for you.
Write a block explanation using this structure:
The problem this block solves:
The whole block:
Rust syntax:
ML or software concept:
Category theory concept:
Core mental model:
Pass condition:
- A beginner can understand the Rust syntax.
- An ML learner can understand why the block exists.
- A category-theory learner can name the shape.
Exercise 12: Trace Attention Shape Flow
Use Transformer Roadmap, src/attention.rs, and
examples/06_attention_scores.rs.
Run:
cargo run --example 06_attention_scores
First copy the four-line Q/K/V diagnostic printed before the attention weights:
Q/K/V source diagnostic:
query rows own score rows; key/value rows own score columns
self-attention shares the hidden source before projection; projected roles stay distinct
mask polarity here: true = allowed, false = blocked
Then write one sentence for each line:
query rows:
key/value rows:
self-attention source:
mask polarity:
Write down the first time the output mentions each shape:
AttentionScores:
AttentionWeights:
AttentionOutput:
MultiHeadOutput:
ProjectedAttentionOutput:
HiddenSequence after residual:
HiddenSequence after normalization:
HiddenSequence after feed-forward:
Then explain:
Rust syntax:
which named type or boundary protects each shape?
ML concept:
what changes between scores, weights, value mixing, projection, residual,
normalization, and feed-forward?
Category theory concept:
where does the path use a product input, and where does it return to the same
HiddenSequence object?
Then classify these boundaries:
QuerySequence x KeySequence -> AttentionScores:
AttentionScores x AttentionMask -> AttentionScores:
AttentionScores -> AttentionWeights:
AttentionWeights x ValueSequence -> AttentionOutput:
LayerNormalization : HiddenSequence -> HiddenSequence:
TransformerTrainingState -> TransformerTrainingState:
HiddenSequence x MultiHeadOutput -> HiddenSequence:
Then repeat the quick roadmap classification drill without looking at the answer table. For each boundary, count the inputs first, then name the safest category shape:
HiddenSequence -> QuerySequence:
AttentionScores x AttentionMask -> AttentionScores:
LayerNormalization : HiddenSequence -> HiddenSequence:
HiddenSequence x ProjectedAttentionOutput -> HiddenSequence:
TransformerTrainingState -> TransformerTrainingState:
Then trace three boundaries through the roadmap decision flow:
AttentionScores x AttentionMask -> AttentionScores:
MaskedMultiHeadTransformerBlock[M] : HiddenSequence -> HiddenSequence:
HiddenSequence x MultiHeadOutput -> HiddenSequence:
For each one, answer:
does it type-check?
how many inputs are visible?
was one context fixed first?
safe local name:
Finally, explain this trap in one sentence:
A product input that returns its left-hand object is not automatically an
endomorphism.
Then use the same-output classification rule from the roadmap. These three
lines all end with HiddenSequence; explain why they do not have the same
category shape:
LayerNormalization : HiddenSequence -> HiddenSequence:
HiddenSequence x ProjectedAttentionOutput -> HiddenSequence:
HiddenSequence x MultiHeadOutput -> HiddenSequence:
Then answer the terminal-output audit. For each printed line, write what the line proves and what category overclaim it does not prove:
projected attention shape: 2 positions x model dimension 2
residual shape: 2 positions x model dimension 2
masked multi-head block shape: 2 positions x model dimension 2
training state step: 0 -> 1
Use this rule:
printed shape line -> target evidence
typed transformation line -> source and target evidence
category name -> only after both are known
Then answer the source-ownership diagnostic:
Self-attention:
which sequence owns the query side?
which sequence owns the key side?
which sequence owns the value side?
Cross-attention:
which sequence owns the query side?
which sequence owns the key side?
which sequence owns the value side?
Shape check:
which length counts score rows?
which length counts score columns?
Then fill the shape ledger:
target length:
source length:
attention mask:
attention output:
For each row, write:
framework cue -> Rust roadmap meaning -> category-shape consequence
Then answer the mask-role ledger:
What does an attention-mask cell select?
Why is the mask not a shorter token sequence?
Why does the mask not directly produce AttentionWeights?
Which block-level boundary keeps the mask visible instead of hidden?
In a fixed-mask view, what context was selected first?
What does true mean in this repository's AttentionMask?
Why can a framework mask with the same shape still need boolean inversion?
Write the three-step rule:
mask cells ...
softmax ...
weights ...
Then answer the linear-scope diagnostic:
Which listed boundaries are the linear Q/K/V projections?
Which boundary turns scores into nonlinear normalized weights?
Which product-input boundaries must not be collapsed into one unary map?
Which state endomorphism belongs to training rather than forward attention?
Then answer the source-scope diagnostic:
Which source supports decomposing attention into recurring components?
Which source supports comparing the linear Q/K/V part with advanced category theory?
What does neither source license you to claim about the whole Rust roadmap block?
What is the local Rust contract for every component in this book?
Then answer the architecture-constraint diagnostic:
What is one architecture constraint in the roadmap?
Which Rust type, constructor, example, or test is implementation evidence for it?
Why is that implementation evidence not the same as proving the whole future
Transformer architecture satisfies every intended constraint?
Then answer the stackability diagnostic:
Which listed boundaries can stack directly as HiddenSequence -> HiddenSequence?
Why is MaskedMultiHeadTransformerBlock not an endomorphism while the mask is
still an open input?
What are the two precise ways to repeat a masked block?
When is a fixed-mask view allowed to be named HiddenSequence -> HiddenSequence?
When is LayerNormalization allowed to be named HiddenSequence -> HiddenSequence?
When is PositionalEncoding allowed to be named HiddenSequence -> HiddenSequence?
When is MultiHeadTransformerBlock allowed to be named HiddenSequence -> HiddenSequence?
If the layer's scale and shift are being learned, which larger boundary owns
that change?
Then answer the context-fixing drill:
Open masked block:
what is the whole input object?
what is the safe category shape?
can it stack unaided as HiddenSequence -> HiddenSequence?
Fixed-mask view:
what was selected first?
what is the induced boundary?
what promise must remain true while stacking?
Changing mask per call:
what must the caller supply or carry?
why is this not the same as a fixed-mask view?
Residual addition:
which two inputs remain visible?
why is this not a unary endomorphism?
if you name the whole product as the source object, why is
(HiddenSequence x ProjectedAttentionOutput) -> HiddenSequence
still not an endomorphism?
Rust closure bridge:
what value would a closure capture to create a fixed-mask view?
which argument would remain when the closure is called?
why does the closure analogy still not change the open block boundary?
Then answer the add-norm order drill:
Which order does the current Rust block implement around the attention sublayer?
Which order does it implement around the feed-forward sublayer?
Which two local boundaries show the order?
Why can post-norm and pre-norm blocks both have shape
HiddenSequence -> HiddenSequence
while still being different morphisms?
If a future pre-norm variant is added, what must be named separately?
Before naming each boundary, write the answer to the first diagnostic question:
How many inputs does this boundary require?
Then write the source-target audit card for at least three boundaries:
boundary:
whole source object:
target object:
context status:
safe conclusion:
Use at least one product-input boundary and one fixed-context boundary.
Pass condition:
- You name at least four concrete Rust types from
src/attention.rs. - You distinguish raw attention scores from normalized attention weights.
- You explain why residual addition must return to
HiddenSequence. - You connect one terminal output line to one typed boundary.
- You explain why self-attention shares a source before projection without collapsing query, key, and value into one role.
- You map target length, source length, attention mask, and attention output from framework notation to the Rust roadmap shape ledger.
- You explain that mask cells select legal score cells before softmax, not token rows after probability mass has been assigned.
- You state that this repository’s
AttentionMaskusestruefor an allowed source position, while some framework masks usetruefor a blocked or padding position. - You explain the four-line Q/K/V diagnostic before using later attention weights or shape lines as evidence.
- You keep claims about linear Q/K/V projections separate from softmax, masking, residual addition, normalization, and training state.
- You classify the quick roadmap drill by counting inputs before naming endomorphisms.
- You use the roadmap decision flow before the same-output and source-target audit cards.
- You explain that anatomy-of-attention research supports decomposition, while parametric-endofunctor research supports a narrower linear self-attention comparison.
- You separate architecture constraints from implementation boundaries.
- You do not claim the tiny Rust roadmap implements either full formalism.
- You distinguish an open masked-block product input from a fixed-mask induced endomorphism.
- You name what context was fixed before using the fixed-mask
HiddenSequence -> HiddenSequenceview. - You state that a shape-preserving layer is an endomorphism only for a fixed
module instance, while parameter changes belong to
TransformerTrainingState -> TransformerTrainingState. - You state that positional encodings and Transformer blocks follow the same
fixed-value rule: the table or block value must already be selected before
the forward call is named
HiddenSequence -> HiddenSequence. - You state that residual-normalization order is part of the morphism, so post-norm and pre-norm blocks can share source and target while remaining different implementations.
- You classify at least one product-input morphism, one endomorphism, and one illegal boundary.
- You do not call a product-input boundary an endomorphism only because its output matches the left input object.
- You write the whole source object and target object before deciding whether a row is an endomorphism.
- You explain that naming the whole product as the source object gives a unary morphism out of the product, not an endomorphism unless the same product object is also returned.
- You explain why two lines that return
HiddenSequencecan still have different category shapes.
Exercise 13: Compute Cross-Entropy From Target Probability
Use The Tiny ML Pipeline and src/ml.rs.
The CrossEntropy morphism uses:
loss = -ln(probability assigned to the target token)
For target token TokenId(0), compare these two distributions:
confident = [0.90, 0.10]
surprised = [0.10, 0.90]
Compute:
confident loss:
surprised loss:
which one is lower:
Then run:
cargo test cross_entropy_is_lower_for_more_confident_target_probability --lib
Explain:
Rust syntax:
which code reads the target probability, and which constructor validates the
loss?
ML concept:
why does the same target token produce different losses under the two
distributions?
Category theory concept:
why is CrossEntropy a morphism from Distribution x TokenId to Loss?
Target-probability responsibility audit option:
Use the chapter’s Target-Probability Responsibility Ledger. Fill this card:
pipeline cue:
Rust handle:
ML responsibility:
category boundary:
unsafe shortcut rejected:
source-backed limit:
validation command:
Pass condition:
- You compute approximate losses for
0.90and0.10. - You explain why the target index is
0in both cases. - You connect the test name to the learning claim.
- Your target-probability audit identifies
target.index(), rejects the largest-probability shortcut, separatesLogits -> DistributionfromDistribution x TokenId -> Loss, and states that normalized probability is not calibrated confidence or full framework equivalence.
Exercise 14: Trace Naturality And Monoid Laws
Use Functors, Naturality, Monoids, and Chain Rule
and src/structure.rs.
Run:
cargo test structure::tests --lib
For the naturality square, write the two paths:
top then right:
left then bottom:
why they should match:
For the monoid law check, write the three laws:
left identity:
right identity:
associativity:
Output-to-law audit option:
Use the Output-To-Law Audit section in
Functors, Naturality, Monoids, and Chain Rule.
Pick one output line from:
cargo run --example 04_structure_and_calculus
Fill:
output line:
Rust handle:
law or boundary:
source support:
safe non-claim:
validation command:
Then explain:
Rust syntax:
which functions or methods implement each path or law?
ML or software concept:
why do consistent wrapper conversion and trace grouping matter in a pipeline?
Category theory concept:
what does commutativity mean for the square, and what does associativity mean
for the trace monoid?
Pass condition:
- You name
naturality_square_holds_for_first_option. - You name
monoid_laws_hold_for_pipeline_trace. - You explain why both naturality paths return the same
Optionvalue. - You explain why changing parentheses in trace combination should not change the final trace.
- Your output-to-law audit connects one printed line to one Rust handle, one law-shaped claim, one source-backed limit, and one validation command.
Exercise 15: Mixed Boundary Diagnosis
Use this exercise after finishing the core chapters. The goal is interleaved transfer: diagnose which kind of boundary is being protected without being told which chapter the failure came from.
For each case, classify the boundary:
invariant boundary
composition boundary
endomorphism boundary
shape boundary
local-to-global boundary
Then answer with the usual three lenses.
Cases
1. A raw usize is used where the code expects TokenId.
2. Embedding is followed directly by Softmax.
3. CrossEntropy receives Logits instead of Product<Distribution, TokenId>.
4. A training step returns Loss instead of Parameters.
5. SignalMatrix::compose_after sees mismatched middle dimensions.
6. SafetyCover reports a global claim even though one interval is false.
7. A residual connection tries to add rows with different model dimensions.
For each case, write:
Boundary type:
Rust syntax:
ML or software concept:
Category theory concept:
Smallest useful fix:
Pass condition:
- You classify all seven cases.
- You name at least five concrete Rust types or functions.
- You explain the smallest useful fix without weakening the type boundary.
- You identify which cases are about invalid values, which are about invalid composition, and which are about invalid global claims.
Debugging hint:
Do not answer every case with “the compiler rejects it.” Some failures are
constructor errors, some are returned CtError::ShapeMismatch, some are
conceptual category-shape failures, and some are law-check failures. The skill
is choosing the right explanation for the right boundary.
Exercise 16: Trace Transformer Training State
Use Transformer Roadmap, src/attention.rs, and
examples/07_transformer_training_state.rs.
Run:
cargo run --example 07_transformer_training_state
Write down the output lines for:
initial state:
forward shape:
readout update:
feed-forward update:
composed block update:
Then classify each update:
TransformerReadoutTrainStep:
TransformerFeedForwardTrainStep:
TransformerBlockTrainStep:
For each update, answer with the three lenses:
Rust syntax:
which named type performs the update, and what state does it return?
ML concept:
which parameters or sublayer does this update train?
Category theory concept:
why is the outside shape an endomorphism?
Finally, explain why this shortcut would be weaker:
readout update returns readout weights
feed-forward update returns feed-forward weights
block update returns a bag of changed matrices
Pass condition:
- You name
TransformerTrainingState,TinyTransformerParameters, and all three training-step types. - You explain that the state owns parameters, learning rate, and step count.
- You distinguish readout-only, local feed-forward, and composed block updates.
- You connect each printed step increment to
TransformerTrainingState -> TransformerTrainingState. - You explain why returning loose weights would make the next update rebuild context by hand.
Exercise 17: Reconstruct A Diagram By Hand
Use this exercise whenever a chapter diagram feels dense. The goal is not to make a prettier copy. The goal is to prove that you can recover the objects, arrows, paths, and safe claim without relying on the book’s layout.
Choose one diagram from:
Course Map:
Text -> TokenSequence -> TrainingSet -> Loss, with Parameters -> Parameters
Domain Objects:
raw representation -> domain object -> trusted downstream boundary
Morphism and Composition:
TokenId -> Vector -> Logits -> Distribution
Tiny ML Pipeline:
Distribution x TokenId -> Loss
Training as an Endomorphism:
Parameters -> Parameters
Structure and Laws:
Vec<A> -> Option<B> naturality square
Transformer Roadmap:
AttentionWeights x ValueSequence -> AttentionOutput
Then fill this reconstruction sheet:
chapter:
diagram chosen:
objects:
arrows:
two paths or state transition:
Rust handle:
command or test:
what would break if one arrow was skipped:
safe non-claim:
For the structures chapter, use:
cargo run --example 04_structure_and_calculus
cargo test structure::tests --lib
For the roadmap attention path, use:
cargo run --example 06_attention_scores
Pass condition:
- You redraw the diagram without copying the original layout.
- You label every object and arrow.
- You say whether the diagram is a pipeline, a constructor boundary, a product input, a law square, or a state update.
- You name at least one Rust type, function, example, or test connected to the diagram.
- You explain one thing the diagram does not prove.
- You explain what would break if a key arrow, product input, or state object was removed.
Retrieval Practice
Close the source file before answering these prompts.
Recall
Name three kinds of feedback this course uses:
compiler error
constructor error
test failure
Explain
Explain why a failed composition is useful evidence, not only an obstacle.
Apply
Pick one exercise you solved and rewrite it for a different type or module. Keep the same answer shape:
Rust syntax:
ML or software concept:
Category theory concept:
Where This Leaves Us
If you can complete these exercises, you can read the project without treating category theory, Rust, and ML as three disconnected subjects. You can start from a line of code, name the syntax, identify the software or ML role, and then describe the categorical shape only as far as the code justifies it.