Standard Operating Procedure

MS in Research Engineering — 300 Days

Prabakaran Chandran · February 26 – December 23, 2026

1. What This Is

A 10-month self-directed program to develop into an independent researcher and research engineer working at the frontier of AI/ML.

Independent researcher means: I can identify open problems worth working on, formulate original questions about them, reason about them mathematically, and produce work that advances understanding — without waiting for someone to hand me the problem.

Research engineer means: I can build whatever the research demands — implementations, experiments, systems, tools — with enough engineering quality that the science is credible and reproducible.

These are not separate identities. The research generates questions that require engineering to answer. The engineering produces evidence that reshapes the research. They develop together or not at all.

2. Why This Framing

I already have breadth. Six years of industry ML engineering, an ongoing MS at Columbia, and six months of exploration across roughly 20 territories of research engineering. What I lack is depth in the specific compound skill that frontier AI labs need: the ability to move fluidly between reading a paper, understanding its mathematical core, implementing it rigorously, questioning its assumptions, and extending it with original ideas.

No existing degree program teaches this. The people who have this skill built it through sustained, deliberate practice — usually over years. This program attempts to compress that process into 10 months by being intentional about what gets practiced, how it gets practiced, and how progress is measured.

The framing as an "MS" is not decorative. It imposes structure (phases, milestones, a graduation standard), creates a commitment that survives bad weeks, and produces a coherent narrative rather than a scattered collection of projects.

3. The Core Loop

Everything in this program trains one compound process:

``` Grasp → Question → Formulate → Build → Validate → Extend → Ship ```

Grasp — Understand what exists. Read the paper, work through the math, map the related work.

Question — Identify what's missing, what's assumed, what might be wrong. This is where research begins. Without a question, the rest is just implementation practice.

Formulate — Make the question precise. A hypothesis, a mathematical framing, an experimental design. The transition from "this is interesting" to "this is testable."

Build — Implement whatever is needed to test the formulation. The code is an instrument, not an end.

Validate — Run the experiments honestly. Check the math. Compare against baselines. Be precise about what the results say and what they don't.

Extend — Go beyond what was already known. Change an assumption, combine two ideas, apply a method to a new domain. Even small extensions train the ability to think originally.

Ship — Make the work exist publicly. A build on researchengineer.ing, a blog post, a paper draft, an open-source release. Work that stays private doesn't get feedback, doesn't get challenged, and tends to stay unfinished.

Different builds emphasize different stages. A reproduction is heavy on Grasp→Build→Validate. A novel algorithm is heavy on Question→Formulate→Extend. But the full loop should always be present, at least in abbreviated form.

4. The Four Pillars

Pillar	Allocation	Purpose
🧠 AI/ML Frontier Research	40%	The driving force. Paper reading, mathematical reasoning, original thinking, research taste, conference-quality work.
🔬 Research Engineering	30%	The craft of turning ideas into credible evidence. Reproduction, experiment design, performance, validation.
⚡ Software Engineering	20%	The foundation. Algorithms, system design, clean code, open source. Maintains engineering fluency.
📊 Data Science	10%	The rigor. Causal inference, statistical methodology, experimental design. Ensures results can be trusted.

The allocation reflects a deliberate choice: research drives everything. Engineering serves research. The question "what should I code?" should never come before "what do I want to understand?"

The integration rule: every build should exercise at least two pillars. This prevents the natural tendency to retreat into whichever pillar feels most comfortable.

5. Weekly Structure

Slot	Duration	Purpose
Morning daily	1h	Algorithm practice. Maintains computational fluency.
Deep blocks (2×/week)	4–5h each	Primary research work — reading, formulating, implementing, experimenting. Requires uninterrupted time.
Medium blocks (2×/week)	3–4h each	Secondary pillar work — system design, DS methods, writing, tooling.
Long session (1×/week)	5–6h	Weekly build sprint. Where things get finished.
Reflection (1×/week)	2–3h	Write-up, logging, planning. Converts doing into understanding.
Total	~28h/week

This is designed to be sustainable alongside a full-time job, Columbia coursework, and TA responsibilities. Sustainability matters because the value comes from compounding over 40 weeks, and compounding breaks if the system breaks.

Weekly minimums

These are the things that must happen every week regardless of circumstances:

Read — 3 papers at comprehension depth with structured notes.
Think — At least 1 hour of undistracted thinking. No screen. Notebook and pen. Just the problem and your reasoning about it. This is where original ideas actually form, and it's the easiest thing to skip.
Build — Meaningful progress on the current build.
Write — At least one write-up, even if brief. Writing forces precision in a way that thinking alone does not.

6. Paper Reading

Purpose

Reading trains research taste: the ability to recognize which problems matter, which methods are sound, which results are meaningful, and where the genuine open questions lie. It is not about accumulating citations. It is about developing judgment.

Three Depths

Scan (15 min): Title, abstract, figures, conclusion. Decide whether it warrants deeper reading.

Comprehend (1–2h): Full read with structured notes:

``` Paper: [title] Problem: [what are they solving and why it matters] Method: [the approach] Key Insight: [the one idea that makes this work] Math: [core equations — can I re-derive them from memory?] Limitations: [what's assumed, what breaks, what's left unsaid] My Question: [what would I investigate next, starting from this paper?] Connection: [how does this relate to what I'm working on?] ```

The last two fields matter most. Without them, reading is passive consumption. With them, every paper becomes a prompt for original thinking.

Reproduce (days): Full implementation as a build. Reserved for papers where deep mechanical understanding is needed — roughly 1 per month.

Cadence

Scan: 5–10 per week (frontier awareness)
Comprehend: 3 per week (taste development)
Reproduce: ~1 per month (implementation depth)
Cumulative: 120+ at Comprehend depth by December

7. The Build Protocol

What Counts

A build is any shipped artifact that exercises the core loop. Not limited to paper reproductions:

Reproduction with ablation study
Novel algorithm or method with experimental evaluation
Mathematical derivation or theoretical analysis
Tool, library, or benchmark that enables research
Experiment suite that answers a specific question
Survey, position paper, or critical analysis of a research area
Extension or critique of existing work, supported by evidence

The requirement: it must be real (executed, not just planned), public (on researchengineer.ing), and documented (write-up covering what, why, how, and what was learned — including what went wrong).

Lifecycle

Pick (Day 0). Three filters: Does it serve a research question I care about? Is it slightly beyond my current ability? Does it touch 2+ pillars? Spend at most 30 minutes deciding.

Build (Days 1–7). Start with the question, not the code. Read what's needed. Attempt implementation — naive and ugly first. When stuck (this will happen), return to the theory. The moment of being stuck and then understanding why is where the real learning occurs. Validate against expected results. Add at least one original element — an ablation, an extension, a connection to another idea.

Ship (Days 7–10). A build is complete when it has: working code, concrete results, a write-up, a record of what failed, and at least one note about what to investigate next.

Cadence

1 build per 10 days minimum (30 sub-milestones in 300 days)
3+ builds per month target
38+ total by December

8. Thinking Time

This is the most important protocol and the one most likely to be skipped in favor of something that feels more productive.

Research is thinking. Code is how you test the thinking. If all the hours go to implementation, the result is an implementer, not a researcher. Dedicated thinking time — unstructured, screen-free, with only a notebook — is where original ideas actually form.

Practice

At least 1 hour per week. Some prompts for when the blank page feels unproductive:

What's the most interesting open problem I encountered this week?
What assumption does everyone in my area seem to accept? What if it's wrong?
If I had to submit a paper in 2 weeks, what would it be about?
What would it take to connect two ideas that seem unrelated?
What did I learn this week that changed how I think about something?

Research Question Register

A running list of questions, maintained in a notebook or simple document. Every paper read should either address a question on the list or add a new one. The list grows faster than it shrinks — that's expected. Questions that keep resurfacing across multiple papers and builds are probably the ones worth pursuing seriously.

The "So What?" Check

Before shipping a build, ask: what does this contribute beyond demonstrating that I can follow instructions? "I reproduced the numbers" is training. "I showed that the method fails under [condition X]" or "I found that combining [A] with [B] produces [unexpected result C]" is contribution. Over 10 months, the ratio should shift from mostly-training toward mostly-contribution.

9. Monthly Gate

Metrics

End of each month, assess:

Metric	Minimum	Target
Builds shipped	3	4
Papers read (Comprehend depth)	10	12
Original research questions generated	3	5
Algorithm problems solved	40	60
Write-ups published	3	4
Dedicated thinking hours	4	6

Assessment Questions

Volume — Did I hit the minimums?
Quality — Are the builds getting more ambitious? More rigorous? More original?
Originality — Am I asking better questions than last month? Am I starting to answer any of them?
Coherence — Are the builds converging toward a recognizable research direction, or still scattered?
Sustainability — Am I still genuinely engaged, or going through the motions?

What to Do When Things Slip

Low volume: The builds are probably scoped too large. Reduce scope per build, not frequency. A smaller build that ships is worth more than an ambitious build that stalls indefinitely.

Flat quality: Comfort zone. Choose something harder — a paper you're not sure you can reproduce, a question you're not sure you can answer.

Low originality: Passive reading. Reactivate the "My Question" and "Connection" fields in paper notes. Force yourself to generate at least one question per paper, even if it feels forced at first. The quality of the questions improves with practice.

Scattered direction: Too many threads open. Choose 1 primary research question for the next month. Every build connects to it.

Low energy: See §11.

10. Phases

Phase 1: Foundation & Velocity (Months 1–3)

Establish the system. Build execution speed. Ship the first builds. Learn the rhythm of the core loop.

The work is mostly reproduction and absorption. That's appropriate — you're calibrating the machine and developing the basic fluency that later phases depend on. The measure of readiness for Phase 2: you can reproduce a paper end-to-end without panic, the weekly rhythm is automatic, and you have a clear primary research question.

Phase 2: Depth & Integration (Months 4–7)

Go deep on what matters. Start generating original ideas. Submit to conferences. The ratio of reproduction to extension shifts.

The measure of readiness for Phase 3: you've submitted at least one paper, you have strong defensible opinions about your research area, and your builds consistently include original elements.

Phase 3: Mastery & Demonstration (Months 8–10)

Demonstrate what you've built. Every artifact is portfolio-grade. The research identity is legible to an outsider looking at your work.

The graduation standard: you can encounter any frontier AI/ML paper and know what to do with it — reproduce it, critique it, extend it, or set it aside — and you have the body of work that makes this credible.

11. Recovery

The program is 300 days. That's long enough that recovery isn't optional — it's a structural requirement. A system that breaks in Month 5 produces less total output than a system that runs at 80% for all 10 months.

Responses

Missed a day: Not meaningful. Continue the next day.

Missed a week: Something needs attention. Identify the cause — overload, burnout, external disruption, loss of direction. Address it directly. Reduce scope for the following week, but maintain the rhythm.

Red on 3+ gate metrics for 2 consecutive sub-milestones: Take 3 days fully off. No research, no code, no papers. Rest. Then return with reduced scope for 2 weeks before restoring full intensity.

Going through the motions without engagement: Usually means the work has become too routine. Reconnect with a question you actually care about, or pick a build that feels genuinely uncertain. Engagement follows challenge more reliably than it follows comfort.

Constraints

Sleep ≥ 7 hours per night, averaged weekly
Physical activity ≥ 3 times per week
1 full day off per month

These are not aspirational. They are load-bearing requirements that keep the system operational for 40 weeks.

12. Compounding

The reason this works over 10 months is that the same effort produces increasingly valuable output as the underlying skills develop.

In Month 1, a paper takes hours to read and days to implement. By Month 5, reading is pattern recognition and implementation is informed by dozens of prior builds. By Month 10, encountering a new paper means immediately seeing its relationship to 120 others, its implementation path, its limitations, and its extension possibilities.

The mechanism:

``` Reading × Implementing → Understanding Understanding × Questioning → Research Taste Research Taste × Building → Original Contribution ```

Each layer depends on the layers below it. Reading without implementing produces shallow familiarity. Implementing without questioning produces mechanical skill. Questioning without building produces untested intuition. The compound effect requires all three to be active simultaneously, week after week.

This is also why consistency matters more than intensity. The compounding operates across sleep cycles, across weeks, across phases. It cannot be compressed into sprints. It requires showing up, doing the work, resting, and returning — repeatedly, over a long enough period for the layers to develop.

February 26, 2026. Day 1.