Research Portfolio
Machine Learning Researcher
Research Interests
I am driven by a fascination with aligning AI systems with human values. My experience in data analysis and modeling has laid a strong foundation for exploring the frontiers of safe and reliable AI.
AI Alignment
Techniques for aligning AI systems with human values and ensuring beneficial outcomes.
Reasoning
Study of reasoning capabilities in language models and methods for enhancement.
Red-teaming
Adversarial testing of AI systems to identify and mitigate vulnerabilities.
Mechanistic Interpretability
Opening the black box of neural networks to understand internal mechanisms.
"Hallucination" Control
Methods for reducing false or unsupported outputs from language models.
Publications
Enhancing HIV Testing Indicator Reporting
This paper presents novel approaches to improving the accuracy and efficiency of HIV testing indicator reporting through data science techniques.
Victor, A.J., et al. (2024). Enhancing HIV Testing Indicator Reporting. DSAI Journal.
The Future Remains Unsupervised
An exploration of the untapped potential of unsupervised learning in the era of large language models and foundation models.
Victor, A.J. (2023). The Future Remains Unsupervised. Deep Learning Indaba.
Effective Web Scraping for Data Scientists
A comprehensive guide to ethical and efficient web scraping methods tailored specifically for data science applications.
Victor, A.J. (2023). Effective Web Scraping for Data Scientists. DSAI Journal.
Current Projects
Sparse Autoencoder Interpretability
Using sparse autoencoders to improve interpretability of neural networks, with a focus on understanding internal representations of language models.
Hallucination Metrics for LLMs
Developing robust evaluation metrics for measuring and quantifying hallucinations in large language models.
Research Blog
Check out my research blog for detailed articles, analyses, and tutorials on AI safety, alignment, and more.
View Blog