Research Portfolio

Machine Learning Researcher

Research Interests

I am driven by a fascination with aligning AI systems with human values. My experience in data analysis and modeling has laid a strong foundation for exploring the frontiers of safe and reliable AI.

AI Alignment

Techniques for aligning AI systems with human values and ensuring beneficial outcomes.

Reasoning

Study of reasoning capabilities in language models and methods for enhancement.

Red-teaming

Adversarial testing of AI systems to identify and mitigate vulnerabilities.

Mechanistic Interpretability

Opening the black box of neural networks to understand internal mechanisms.

"Hallucination" Control

Methods for reducing false or unsupported outputs from language models.

Publications

2024 Journal Article

Enhancing HIV Testing Indicator Reporting

Victor, A.J., et al.

This paper presents novel approaches to improving the accuracy and efficiency of HIV testing indicator reporting through data science techniques.

2023 Conference Paper

The Future Remains Unsupervised

Victor, A.J.

An exploration of the untapped potential of unsupervised learning in the era of large language models and foundation models.

2023 Journal Article

Effective Web Scraping for Data Scientists

Victor, A.J.

A comprehensive guide to ethical and efficient web scraping methods tailored specifically for data science applications.

Current Projects

Sparse Autoencoder Interpretability

Ongoing 2023-Present

Using sparse autoencoders to improve interpretability of neural networks, with a focus on understanding internal representations of language models.

Hallucination Metrics for LLMs

Ongoing 2024-Present

Developing robust evaluation metrics for measuring and quantifying hallucinations in large language models.

Research Blog

Check out my research blog for detailed articles, analyses, and tutorials on AI safety, alignment, and more.

View Blog