Interpretability & Safety

Mechanistic analysis of reasoning in GPT-2. Attribution graphs, faithfulness detection, and targeted interventions. Are CoT traces real reasoning or post-hoc rationalization?

Reverse-engineering the circuit responsible for numerical comparisons in GPT-2 Small using activation patching and mechanistic interpretability techniques.

Exploring SAEs Research

Sparse Autoencoder implementation for neural network interpretability. Features interactive visualization dashboard and W&B integration.

Framework for evaluating when LLM "hallucination" is harmful vs. beneficial speculation. Novel metrics for the truthfulness-utility trade-off.

Applications & Tools

FinGPT Trader Active

Algorithmic trading system using confidence-weighted sentiment analysis. Fine-tuned Falcon-7B for African markets with lightweight technical analysis.

Testing "LoRA Without Regret" methodology. Transforms numerical weather data → natural language forecasts via parameter-efficient fine-tuning with RLHF.

Interactive visualization of exponential AI infrastructure growth. Compute capacity, investment costs, power requirements. Based on Epoch AI data.