Interpretability & Safety
Mechanistic analysis of reasoning in GPT-2. Attribution graphs, faithfulness detection, and targeted interventions. Are CoT traces real reasoning or post-hoc rationalization?
Reverse-engineering the circuit responsible for numerical comparisons in GPT-2 Small using activation patching and mechanistic interpretability techniques.
Sparse Autoencoder implementation for neural network interpretability. Features interactive visualization dashboard and W&B integration.
Framework for evaluating when LLM "hallucination" is harmful vs. beneficial speculation. Novel metrics for the truthfulness-utility trade-off.
Applications & Tools
Algorithmic trading system using confidence-weighted sentiment analysis. Fine-tuned Falcon-7B for African markets with lightweight technical analysis.
Testing "LoRA Without Regret" methodology. Transforms numerical weather data → natural language forecasts via parameter-efficient fine-tuning with RLHF.
Interactive visualization of exponential AI infrastructure growth. Compute capacity, investment costs, power requirements. Based on Epoch AI data.