



Exploring SAEs
Implementation and analysis of Sparse Autoencoders for neural network interpretability research. Features interactive visualization dashboard and W&B integration.




Value-Aligned Confabulation (VAC)
Research framework for evaluating value-aligned confabulation in LLMs - distinguishing beneficial speculation from harmful hallucination. Implements novel metrics for the truthfulness-utility trade-off.