cat ./learning

Learning.

What I'm currently studying. Some of this becomes writing; most of it stays in the notebook. Updated roughly quarterly.

> Last updated May 18, 2026

Paper

Constitutional AI: Harmlessness from AI Feedback

Anthropic's foundational paper on RLAIF. Re-reading with fresh eyes for the agentic-AI threat-modelling work.
Paper

Sleeper Agents: Training Deceptive LLMs

Backdoors that survive safety training. Implications for ML supply chain security and MAESTRO's training-time risk layer.
Framework

MAESTRO (CSA)

Reading deeply to extract the practitioner subset that maps onto real engagements, separating framework from cargo cult.
Book

Threat Modeling. Adam Shostack

Re-reading the chapters on system-level reasoning, applying the lens to agentic AI deployments.
Tool

PyRIT internals

Going deeper than user-mode, reading the harness internals to extend it for domain-specific test corpora (healthcare, finance).
Course

Probabilistic Machine Learning. Kevin Murphy (Vol. 2)

Refreshing the deep-learning math that underpins ML security review credibility.
Side project

Five-Zone worksheet

Translating the method from the STRIDE-breaks post into something engineering teams can run without me. Aim: open-source publish by Q3.

Constitutional AI: Harmlessness from AI Feedback