ALL IN ML: The Complete Operating System

Theory ⇌ Reality · Research & Engineering · Frontier AI

Prabakaran Chandran · Pracha Labs · 2026

"A researcher who can't build is speculating. An engineer who can't derive is guessing. We do both. On everything. That's the entire idea."

Part I: OBI — Outcomes, Behaviors, Insights

What Is OBI?

OBI comes from Mu Sigma's muOBI framework — a "decision design" artifact that backs into desired outcomes by identifying the behaviors needed to reach them and the insights needed to trigger those behaviors. The fundamental idea: you don't achieve outcomes by planning outcomes. You achieve outcomes by designing behaviors. Because outcomes are lagging indicators you can't directly control. Behaviors are leading indicators you can control every single day. And insights are what make the behaviors intelligent rather than mechanical.

Most people set outcome goals — "publish at NeurIPS," "get a frontier lab job," "build a research reputation" — and then improvise the path. The muOBI discipline inverts this. You start with the outcome, but you immediately ask: what daily behaviors, executed consistently, would make this outcome inevitable? And then: what insights do I need to make those behaviors effective?

This is the structure of ALL IN ML.

The OBI for Research & Engineering at the Frontier

OUTCOMES — The long-term strategic results. You don't control these directly. They are the consequence of sustained behavior.

Produce workshop and main-conference quality research at the intersection of causal inference, representation learning, world models, and continual learning
Build the dual researcher-engineer identity: can derive the theory AND build the system, at the level expected by DeepMind, Anthropic, Meta FAIR, OpenAI
Develop a compounding knowledge system where every problem solved makes the next problem faster
Establish Pracha Labs as a visible, credible research platform with a coherent intellectual identity

These outcomes are what the world evaluates. But you cannot wake up and "do" any of them. They precipitate from behaviors.

BEHAVIORS — The daily, repeatable actions you have complete control over. These are the habits that, executed consistently, make the outcomes inevitable.

The Theory ⇌ Reality loop. Every day, on every problem: derive the math, then implement it; implement it, then trace the surprises back to the math. This is the core behavior. Everything else is a variant of this.
Derive before you code. Never implement anything you haven't derived the gradient for by hand. This is a behavior, not a suggestion. It is something you choose to do every morning.
Break what you build. After every implementation, deliberately find the failure mode. Push to the edge case. Violate the assumption. This is a behavior — a habit of stress-testing that runs automatically.
Connect across territories. At the end of every work session, ask: "where else does this mathematical object appear?" Write the connection down. This is a behavior — a daily habit of cross-pollination.
Teach what you learn. Blog daily. Explain the day's insight in your own words. The act of explaining is a behavior that forces denoising. It is not optional and it is not extra — it is the single most powerful behavior for converting System 2 understanding to System 1.
Predict before you observe. Before running any experiment, write three predictions: convergence rate, failure mode, bottleneck. After running, compare. This is a behavior — a habit of calibrating your internal model.
One tangible artifact per day. A derivation, a working function, a figure, a paragraph, a territory card. No day ends with only "I read and thought." This is a behavior — the discipline of materializing understanding into something concrete.

These behaviors are entirely within your control. Rain or shine, good day or bad day, you can choose to derive before you code, to break what you build, to connect across territories. The outcomes follow.

INSIGHTS — The knowledge that enables the behaviors to be effective. Without insights, the behaviors are mechanical. With insights, they are intelligent.

The score function / operative DNA. The invariant chain — Model → Loss → Gradient → Structure → Algorithm → Complexity → Hardware — is the insight that makes the "derive before you code" behavior productive rather than aimless. Without this chain, you derive random things. With it, you derive the right things in the right order.
The three complexity signatures. Knowing that every method has time complexity, space complexity, and sample complexity — and that they interact — is the insight that makes the "break what you build" behavior diagnostic rather than destructive. You don't just find failures; you classify them.
The diffusion/denoising model of learning. Understanding that knowledge moves from noise (first encounter) to signal (System 1) through iterative refinement is the insight that makes the daily loop sustainable. You don't expect to understand everything on the first pass. You trust the process because you understand its mechanics.
The cross-territory mathematical objects. Fisher information, score functions, Bellman equations, sufficient statistics, contraction mappings — knowing that these objects recur across territories is the insight that makes the "connect" behavior explosive rather than incremental. Each connection restructures the landscape.
The frontier lab skillset map. Knowing precisely what DeepMind/Anthropic/OpenAI value — theory + engineering fused, not separate — is the insight that keeps the behaviors aligned with the outcome. You don't drift into pure theory or pure hacking because you know the target.

The OBI Dynamic: Harmonizing Realization and Transformation

Mu Sigma's key insight about traversing the OBI grid: you don't go purely vertical (chase outcomes first, then transform) or purely horizontal (gather all insights first, then act). You zigzag — one step toward realization (execute, produce, ship), then one step toward transformation (innovate, connect, expand). This creates a Goldilocks state between order and chaos.

In ALL IN ML, this zigzag is the Theory ⇌ Reality loop itself:

Realization step: Derive the gradient. Implement it. Get a result. Ship a territory card. (Order. Measurable progress.)
Transformation step: Connect to another territory. See the same object from a new angle. Revise your understanding. (Chaos. Restructured landscape.)
Realization step: Use the new understanding to solve a harder problem faster. (Order restored, at a higher level.)
Transformation step: The harder problem reveals a deeper connection... (Chaos again, productive chaos.)

This zigzag is not a compromise between realization and transformation. It is the optimal traversal — the path that simultaneously produces measurable outcomes and compounds understanding.

Part II: Theory ⇌ Reality — The Core Process

What It Is

Theory ⇌ Reality is a bidirectional loop:

Theory → Reality → Theory → Reality → ...

Theory → Reality: You have a mathematical understanding (a model, a loss, a theorem). You implement it, test it, scale it, break it. The implementation either confirms the theory or reveals where the theory is incomplete.

Reality → Theory: You observe something in your implementation (a convergence failure, a surprising result, a scaling bottleneck). You trace it back to the math. The observation either validates your understanding or forces you to revise it.

Each complete cycle is one denoising step. The first cycle, everything is noisy. The tenth cycle, the method is in your hands — you can derive it, build it, break it, extend it, teach it, and publish it.

What It Is Not

It is not "learn theory, then implement." That's a pipeline, not a loop. Pipelines don't self-correct.

It is not "just build things and figure out the theory later." That produces systems you don't understand and can't debug principally.

It is not "read a paper and move on." Reading without deriving and implementing is forward diffusion — it adds noise, not signal.

The Diffusion Analogy — Precisely

In diffusion models, the generative process works like this:

Start with pure noise: $x_T \sim \mathcal{N}(0, I)$
At each step $t$ , apply a learned denoising function: $x_{t-1} = \text{denoise}(x_t, t)$
The denoising function is trained to estimate the score: $\nabla_x \log p(x_t)$
After $T$ steps, $x_0$ is a clean sample from the data distribution

In the Theory ⇌ Reality process:

$x_T$ = first encounter with a new idea. Everything is noisy. You see notation, recognize fragments, but the structure is hidden. You can't distinguish what matters from what's convention.
Each denoising step = one Theory ⇌ Reality cycle. You derive something (noise removed at the mathematical level). You implement it (noise removed at the computational level). You break it (noise removed at the boundary level). You connect it (noise removed at the structural level).
The score function = the operative DNA. The set of internalized questions — what is the model? what is the loss? what is the gradient structure? what is the complexity? — that guide each step toward structure rather than random wandering.
$x_0$ = System 1 understanding. The idea is no longer something you reason about. It's something you perceive. You see the gradient structure instantly. You predict the failure mode before running the code. You recognize the method in a new paper as a variant of something you already know. The denoising process has converged.

The key property: each step makes future steps faster. This is not just motivational — it's structural. Once you've internalized that "the gradient of any GLM loss is $X^T(\hat{y} - y)$ ," you derive new GLM gradients in seconds. Once you've internalized that "PSD Hessian means Cholesky, not LU," you make the right implementation choice without thinking. Once you've internalized that "the Rademacher bound for norm-constrained linear classifiers is $BC/\sqrt{n}$ ," you know immediately why SVMs generalize in high dimensions.

System 1 knowledge is not memorization. It's compressed understanding that executes automatically. And it's built exclusively through the denoising process — through repeated Theory ⇌ Reality cycles.

Part III: Research & Engineering DNA

What Frontier AI Labs Actually Need

Look at what DeepMind, Anthropic, OpenAI, Meta FAIR, Google Brain actually produce. Not just papers. Not just systems. Papers that come with systems, and systems that come with theoretical understanding.

AlphaFold: structural biology theory + massive engineering at scale
Constitutional AI: alignment theory + RLHF implementation + evaluation infrastructure
Diffusion models: score matching theory + architecture engineering + efficient sampling
Scaling laws: empirical science + massive compute infrastructure + statistical methodology
Mechanistic interpretability: causal inference on neural networks + tooling + visualization

Every breakthrough at these labs sits at the intersection of deep theoretical understanding and serious engineering capability. Not one or the other. Both.

The Skillset — Precisely

Mathematical Foundations (the theory muscle):

You can derive the gradient and Hessian of any standard loss function on a blank page. You can state the generalization bound for the method you're using and explain what controls each term. You can prove convergence of the optimization algorithm and predict the rate. You can identify the assumptions of a method and construct examples where each assumption is violated. You can read a new paper and within an hour map it to the model → loss → gradient → algorithm chain.

These aren't academic exercises. When your training run diverges at 3am, the person who can diagnose "the effective learning rate exceeds $2/L$ where $L$ is the local smoothness, and the Hessian spectrum shifted because the batch composition changed" fixes the problem in minutes. The person who can't spends hours randomly adjusting hyperparameters.

Engineering Foundations (the systems muscle):

You can implement any algorithm from its mathematical specification without reference code. You can identify the BLAS operations in any gradient computation and know which ones dominate. You can predict the memory footprint of a model and its optimizer state, and know when each will OOM. You can profile code and trace bottlenecks to specific linear algebra operations. You can choose between dense/sparse/low-rank/diagonal representations based on the mathematical structure.

These aren't systems engineering exercises. When you're designing a new architecture, the person who knows "this attention mechanism is $O(n^2 d)$ and the memory is $O(n^2)$ , but if I make the attention local with window size $w$ , it becomes $O(nwd)$ and $O(nw)$ " designs architectures that scale. The person who doesn't discovers the problem at 100B tokens when it's too expensive to restart.

Research Taste (the judgment muscle):

You can identify which problems are important and which are incremental. You can read a paper and within minutes know whether the contribution is real (new theory, new capability, new understanding) or superficial (new benchmark number with no insight). You can look at a research area and identify the bottleneck — the one result that, if achieved, would unlock a cascade of progress. You can design experiments that cleanly test theoretical predictions, not just report numbers.

This comes from the Theory ⇌ Reality loop directly. When you've derived, implemented, broken, and connected enough methods, you develop a feel for what's deep and what's shallow. You can smell a paper that got lucky on the benchmark but doesn't have a real mechanism. You can recognize when a theoretical result is tight vs. when there's room for improvement.

Communication (the teaching muscle):

You can explain a complex method at three levels: the one-sentence intuition, the one-paragraph summary, and the full derivation. You can write a paper that a reviewer can follow from motivation through method through experiments without getting lost. You can give a talk where the audience learns something they'll remember.

This is the "teach" denoising operation. Every time you explain something, you find the minimal clean path through the ideas. The gaps in your understanding become visible. The explanation itself is a denoising step — perhaps the most powerful one.

The Mindset — Precisely

Intellectual honesty. If you can't derive it, you don't understand it. If you can't implement it, you can't verify it. If you can't break it, you don't know its limits. No hand-waving. No "I have the intuition." Either you can write it on the board or you can't.

Compounding over sprinting. Every piece of work builds on the last. Every implementation becomes a building block. Every derivation becomes a reference. Every connection makes the landscape smaller. The goal is not to solve today's problem — it's to solve today's problem in a way that makes tomorrow's problem faster.

Taste over volume. One deeply understood method is worth more than ten superficially implemented ones. One paper that reveals a real mechanism is worth more than ten that report benchmark numbers. Focus on what matters, go deep, and let the depth create breadth.

Part IV: Attention, Anticipation, Agency

Adapted from Nir Eyal's framework, applied to research and engineering.

Attention — Be Present in the Process

Attention is the ability to be fully present in the current denoising step. Not multitasking. Not skimming. Not half-deriving while checking email. Full, undivided engagement with the problem at hand.

In Theory mode: Attention means working through the derivation line by line, not skipping steps. It means stopping when something doesn't make sense and resolving it, not marking it "I'll come back to this." It means noticing when a term has a specific structure (PSD, diagonal, low-rank) and asking what that structure implies.

In Reality mode: Attention means reading the error message carefully, not just re-running with different parameters. It means profiling before optimizing, not guessing at the bottleneck. It means writing tests that verify the mathematical properties (is the Hessian actually PSD? Does the loss actually decrease monotonically?) not just "does it run without crashing."

Why it matters for compounding: A denoising step only works if you actually compute the score — if you actually attend to where the noise is. A distracted derivation is forward diffusion. A distracted implementation is forward diffusion. The noise doesn't decrease unless you're present.

Practical implementation: Time-boxed deep work blocks. 3–5 hours of uninterrupted focus. No notifications. One problem. One arc. The quality of a single focused hour exceeds the quality of four fragmented hours.

Anticipation — Predict Before You Observe

Anticipation is the ability to use theory to predict what will happen before running the code, and to use past experience to predict where a new method will succeed or fail.

Before running an experiment: "The convergence rate should be $O(1/\sqrt{T})$ because the loss is convex but not strongly convex. The gradient variance will be high because I'm using minibatch size 32 on a problem with high class imbalance. The model will overfit after ~100 epochs because the Rademacher complexity exceeds $O(1/\sqrt{n})$ for this architecture."

Before reading a paper: "Given the problem setup, I expect they'll use contrastive loss because they have paired data. The weakness will be the quadratic cost in batch size for the negative sampling. The main result will probably show gains on in-distribution but struggle on distribution shift because contrastive learning doesn't enforce causal structure."

Before designing a system: "The memory bottleneck will be the attention matrix at sequence length 8192. The compute bottleneck will be the MLP blocks, not the attention. The scaling behavior will follow the Chinchilla law, so I need this many tokens for this model size."

Why it matters for compounding: Anticipation is the test of System 1 convergence. If your predictions are accurate, your internal model has converged — the denoising process has reached $x_0$ for that territory. If your predictions are wrong, the mismatch is the learning signal — it tells you exactly where your understanding has remaining noise.

Practical implementation: Before every experiment, write down three predictions. After every experiment, compare. Track your prediction accuracy over time. This is the quantitative measure of denoising convergence.

Agency — Every Choice Is Deliberate

Agency is the refusal to do things without understanding why. Every line of code traces to a mathematical reason. Every architectural choice answers a theoretical question. Every hyperparameter has a principled justification (or you acknowledge it's arbitrary and note what you'd need to justify it).

In Theory mode: Agency means not just following a derivation but asking "why this loss and not another?" "Why this prior and not another?" "What would change if the assumption were different?" Every step is a choice, and you understand the alternatives.

In Reality mode: Agency means not just using Adam because everyone does. It means knowing that Adam approximates the diagonal of the inverse Fisher information, that the $\beta_1$ and $\beta_2$ parameters control the bias-variance tradeoff of the moment estimates, and that the $\varepsilon$ parameter prevents division by zero but also implicitly bounds the effective step size. If you're using it, you know why. If you don't know why, you find out.

Why it matters for compounding: Agency turns every problem into a learning opportunity. The person without agency uses a library call and moves on. The person with agency traces the library call to the math, understands why it works, and now has a reusable piece of understanding. One hour of agentic work compounds. One hour of mechanical work doesn't.

Practical implementation: The "why" journal. For every significant implementation decision, write one sentence explaining the mathematical reason. "Using Cholesky instead of LU because the Hessian is PSD (proved in derivation step 3), saving $d^3/3$ FLOPs." Over time, this journal becomes a map of principled decisions that you can reference and extend.

Part V: The Focus Areas

The Three Convergence Bets

The ML landscape is vast. You cannot go deep everywhere simultaneously. The strategy is to identify convergence zones — areas where multiple territories intersect, where a single research program produces insights across multiple fields, where depth in one area automatically creates depth in adjacent areas.

Three convergence bets, chosen because they each span 6–8 territories and sit at the frontier:

Bet 1: Causal Representation Learning × Continual Learning

Territories: SLT, Probabilistic ML, Causal Inference, CRL, Deep Generative Models, Continual Learning, Statistical Inference

The thesis: representations aligned with causal mechanisms make continual learning natural because the "what to preserve vs. what to update" decision becomes structurally obvious.

Why this is a convergence zone: CRL identification theory (score matching, identifiability) meets continual learning (Fisher information, EWC) meets SLT (generalization under distribution shift). A result here advances three fields simultaneously.

Bet 2: World Models × Multi-Agent Coordination

Territories: RL, Model-Based RL, World Models, MPC, EBMs/JEPAs, Multi-Agent, State Space Models

The thesis: zero-shot coordination requires simulating diverse partners. A world model that captures the causal structure of partner behavior enables this simulation. PRPO + learned world model = coordination through imagination.

Why this is a convergence zone: world model learning (JEPA, RSSM) meets multi-agent RL (PRPO) meets control theory (MPC). A result here advances model-based RL, multi-agent coordination, and world model evaluation simultaneously.

Bet 3: Causal Cognition in Foundation Models

Territories: Causal Inference, Multi-Modal DL, Language Models, Agents, Reasoning

The thesis: Pearl's Ladder of Causation provides a principled diagnostic framework for what VLMs and LLMs can and cannot do. Association (L1), intervention (L2), and counterfactual (L3) reasoning require fundamentally different capabilities, and current models have predictable failure patterns.

Why this is a convergence zone: causal inference theory meets VLM evaluation meets interpretability meets reasoning. A result here advances our understanding of what foundation models actually learn.

Why Three, Not One or Twenty

One bet is too fragile — if it doesn't work out, you have nothing. Twenty bets is too shallow — you can't go deep enough in any of them to produce real results.

Three bets give you:

Diversification: if one bet stalls, the others continue
Cross-pollination: insights from one bet feed the others (the Fisher information that appears in Bet 1's EWC also governs Bet 2's world model uncertainty)
Coverage: three bets spanning 6–8 territories each cover virtually the entire 20-territory landscape
Depth: each bet is narrow enough to produce a workshop paper in one sprint

Part VI: Expertise Building — Repetition × Focus

The Expertise Equation

$\text{Expertise} = \text{Repetition} \times \text{Focus} \times \text{Feedback}$

Repetition without focus is mechanical. You can implement logistic regression 100 times and still not understand why the gradient has the structure it does. You're running the loop but not attending to the score function. The denoising steps are random rather than directed.

Focus without repetition is fragile. You can derive the gradient once with full concentration, have a beautiful "aha" moment, and forget it within a week. One denoising step isn't enough. The knowledge hasn't converged to System 1. It's still in System 2, still effortful, still volatile.

Repetition × Focus without feedback is unverifiable. You might be repeating the same mistake. You might be focused on the wrong thing. Feedback — from the code (does it converge at the predicted rate?), from the math (does the derivation produce a known result?), from others (does your explanation make sense to a collaborator?), from reality (does the system actually work?) — closes the loop.

The Theory ⇌ Reality arc provides all three:

Repetition: every territory uses the same process (model → loss → gradient → structure → algorithm → complexity → hardware), so you run the core loop 20 times on 20 different problems
Focus: each territory demands full attention to derive, implement, break, and connect
Feedback: the implementation verifies the theory, the theory diagnoses the implementation, and the gap between prediction and observation is the learning signal

The Compounding Effect

The first territory takes the longest. You're building the process itself — learning to derive gradients by hand, learning to identify BLAS operations, learning to read generalization bounds. Everything is slow because both the content and the process are new.

The fifth territory takes half as long. The process is becoming automatic. You know how to set up the derivation. You know what to look for in the Hessian. You know the common numerical pitfalls. The content is new but the process is familiar.

The tenth territory takes a quarter as long. You're recognizing structures across territories. "This is just Fisher information again." "This loss is a special case of the ELBO." "This gradient has the same $(\hat{y} - y) \cdot x$ structure as logistic regression." The territory cards are getting shorter because the connections are doing the explanatory work.

The twentieth territory takes an hour. You read the paper, identify the model, derive the gradient, recognize the structure, predict the complexity, and sketch the implementation — because you've done this nineteen times before and the current method is a variant of something you already know deeply.

This is the compounding. Not compound interest on money. Compound interest on understanding. Each territory makes the next territory faster. Each connection makes future connections more visible. Each denoising cycle makes future cycles more efficient.

The Acceleration Mechanism

Why does compounding produce acceleration, not just linear improvement? Because of the connection structure.

With 5 territories mastered, there are at most 10 pairwise connections. With 10 territories, there are 45. With 20 territories, there are 190. The number of potential connections grows quadratically, but the cost of discovering each connection decreases because your System 1 pattern-matching improves.

This means the information density of your knowledge increases super-linearly. Each new territory is not just one more thing you know — it's $N$ new connections to existing territories, each of which deepens your understanding of both the new and the old.

This is why depth creates breadth. Going deep in causal inference doesn't just teach you causal inference — it teaches you what Fisher information means in continual learning, what identification means in representation learning, what intervention means in interpretability. The depth radiates outward through connections.

Part VII: From Understanding to Publication

How a Research Paper Happens Through This Process

A paper is not something you "write." It is something that precipitates out of the Theory ⇌ Reality loop when understanding reaches a critical density.

Phase 1: Encounter ( $x_T$ )

You're reading papers in two territories and you notice a gap. Maybe it's "CRL identification theory assumes you know which environment each sample comes from, but continual learning doesn't have environment labels." Or "world models are evaluated on prediction accuracy, but coordination requires them to capture partner types, which is a different metric." The gap is your research question.

At this point, the question is noisy. You can state it but you can't formalize it. You can feel that something is there but you can't prove it.

Phase 2: Formalize ( $x_{3T/4}$ )

You run Theory ⇌ Reality cycles on the gap. You derive. What does it mean formally for a representation to be "aligned with causal mechanisms"? What is the mathematical relationship between Fisher block-diagonalization and mechanism alignment? You implement. Build a synthetic SCM, train an identifiable representation learner, compute the Fisher matrix, check if it block-diagonalizes.

The gap sharpens into a claim. "Under CRL identifiability conditions, the Fisher information matrix of the learned representation block-diagonalizes with blocks corresponding to independent mechanisms."

Phase 3: Prove and Build ( $x_{T/2}$ )

You prove the claim (or find the conditions under which it holds). You build the experiment that tests it. Theory and implementation proceed in lockstep — the proof guides the experiment design, the experimental results stress-test the proof's assumptions.

This is the bioreactor running at full speed. Derive, implement, break, fix, derive again. Each cycle sharpens the claim, tightens the bound, improves the experiment.

Phase 4: Connect and Extend ( $x_{T/4}$ )

You connect the result to adjacent work. "This extends EWC by showing that with the right representations, the Fisher regularization is not approximate but exact." "This connects CRL identification to the stability-plasticity tradeoff through information geometry." The connections make the paper more than a technical result — they make it a contribution to understanding.

Phase 5: Crystallize ( $x_0$ )

The paper writes itself. Not literally — writing is hard work — but the structure is clear because the understanding is clear. The model, the loss, the gradient, the theorem, the experiment, the result, the connection — you've already built all of these through the loop. The paper is the documentation of a converged denoising process.

Why This Produces Better Papers

Papers written this way have specific properties that reviewers and readers value:

Reproducibility. The implementation exists because it was built as part of the understanding process, not added after the theory was done. The code and the math co-evolved.

Robustness. The method was broken deliberately during the process. The failure modes are known, documented, and discussed. The paper doesn't have the "we only tested on cases where it works" problem.

Depth. The connections to other work are real, not cosmetic. The related work section actually relates to the work, because the connections were discovered during the process, not retrofitted for the submission.

Clarity. The explanation has been refined through the teaching denoising step. The minimal clean path through the ideas has been found. The reader can follow from motivation through method through results without getting lost.

Part VIII: The Complete Picture

One Diagram

┌─────────────────────────────────────────────────────────┐
│                      OUTCOMES                            │
│       (What the world sees. What you don't control       │
│        directly. What precipitates from behaviors.)      │
│                                                           │
│    Papers · Systems · Benchmarks · Tools · Reputation    │
│    Frontier AI Lab readiness · Compounding expertise     │
└───────────────────────┬─────────────────────────────────┘
                        │ precipitates from
┌───────────────────────┴─────────────────────────────────┐
│                     BEHAVIORS                            │
│      (What you control. What you do every day.)          │
│                                                           │
│    The Theory ⇌ Reality Loop:                            │
│    ┌──────────┐    ┌──────────┐    ┌──────────┐         │
│    │  THEORY  │◄──►│ REALITY  │◄──►│  THEORY  │ ...     │
│    └──────────┘    └──────────┘    └──────────┘         │
│                                                           │
│    Daily behaviors:                                      │
│    Derive before you code · Break what you build ·       │
│    Connect across territories · Teach what you learn ·   │
│    Predict before you observe · One artifact per day     │
│                                                           │
│    Sustained by: Attention · Anticipation · Agency       │
│                                                           │
│    Powered by SIX DENOISING OPERATIONS:                  │
│    Derive · Implement · Break · Connect · Teach · Scale  │
└───────────────────────┬─────────────────────────────────┘
                        │ made effective by
┌───────────────────────┴─────────────────────────────────┐
│                      INSIGHTS                            │
│     (The knowledge that makes behaviors intelligent.)    │
│                                                           │
│    The Score Function / Operative DNA:                   │
│    Model→Loss→Gradient→Structure→Algorithm→              │
│    Complexity→Hardware                                    │
│                                                           │
│    Three complexity signatures (time, space, sample)     │
│    Diffusion model of learning (noise → signal)          │
│    Cross-territory mathematical objects                  │
│    (Fisher info · Score functions · Bellman equations ·  │
│     Sufficient statistics · Contraction mappings)        │
│                                                           │
│    20 Territory Cards · 3 Convergence Bets:              │
│    CRL×CL  ·  WorldModels×MARL  ·  CausalCognition     │
└─────────────────────────────────────────────────────────┘

One Paragraph

ALL IN ML is built on the OBI framework — Outcomes, Behaviors, Insights. The outcomes are research papers, working systems, and frontier-lab readiness. The behaviors are the daily, repeatable Theory ⇌ Reality loop — derive before you code, break what you build, connect across territories, teach what you learn, predict before you observe, produce one artifact per day — sustained by attention, anticipation, and agency. The insights are the operative DNA (the model-to-hardware chain), the three complexity signatures, the diffusion model of learning, and the cross-territory mathematical objects that make the behaviors intelligent rather than mechanical. The OBI dynamic zigzags between realization (execute, ship, measure) and transformation (connect, restructure, expand), creating a compounding process where each cycle makes future cycles faster. Applied across 20 territories of ML with focus on three convergence bets, this produces a researcher-engineer who can derive, build, break, scale, and ship anything — and who gets faster with every problem because depth compounds.

One Sentence

OBI: the Outcomes (papers, systems, frontier readiness) precipitate from Behaviors (the daily Theory ⇌ Reality loop, executed with attention, anticipation, and agency) which are made intelligent by Insights (the operative DNA, complexity signatures, and cross-territory connections) — and the whole system compounds because every cycle of behavior generates new insights that make the next cycle faster.

This is not a course you finish. It is a discipline you practice. The arc never terminates. It only tightens.

Attention. Anticipation. Agency.

Theory ⇌ Reality.

Learn. Experiment. Interact. Repeat.

— Pracha Labs, 2026