Research Interests

My research sits at the intersection of AI safety, mechanistic interpretability, and understanding the failure modes of large language models. I focus on confabulations ("hallucinations")—when models produce confident-but-wrong outputs—and methods to anchor outputs to verifiable context. I'm particularly drawn to mechanistic interpretability: tracing how specific neurons, attention heads, and pathways give rise to model capabilities (and failures).

I see language models as more than statistical compression engines—but not as reasoning systems. They exhibit what Ethan Mollick calls "jagged intelligence": superhuman at some tasks, brittle at others that humans find trivial. Karpathy and Demis Hassabis have observed the same pattern—capabilities cluster unpredictably, and fluent outputs mask underlying brittleness. This jagged profile is why interpretability matters: we need to understand where and why systems break.

I'm skeptical that scaling language modeling alone leads to robust reasoning. As Yann LeCun argues, the pretraining paradigm is a dead end for true understanding—these systems master "System 1" thinking but lack the deliberative, causal reasoning of "System 2." This is why I track world models: latent simulators that support planning, counterfactual reasoning, and transfer across domains. Projects like Meta's V-JEPA point toward AI that simulates rather than predicts. I've written more on this in my State of AI 2025 essay—I believe world models are the more interesting frontier.

Research Ethos

Builder, anti-doomer, anti-gnostic

First Principles

I reject total explanations of reality—catastrophe may be possible, but inevitability must be proven, not assumed. Knowledge increases obligation: to see a failure mode is to inherit responsibility for it. I act under uncertainty because waiting for assurance is a form of paralysis.

Against False Gnosis

I explicitly oppose the posture that equates pessimism with intelligence, treats builders as naïve, and frames resignation as moral clarity. True understanding is slow, partial, embodied, and costly. Any framework that claims total knowledge of catastrophic outcomes should bear an extraordinary burden of proof.

Stewardship of AI

AI systems are artifacts shaped by human choices, not destinies. Alignment is stewardship under uncertainty—clarifying internal representations, constraining harmful dynamics, and designing feedback loops that favour corrigibility. Progress is allowed to be partial. I optimize to leave the field marginally safer than I found it.

Research Diary

I maintain a personal research diary documenting my journey through AI research—technical history from n-grams to Transformers, curated paper notes, and critical perspectives on the AI landscape.

Explore Research Diary

Publications

2025 Conference Paper

Democratizing Quantitative Trading in African Markets: A FinGPT-Based Approach

Victor Jotham Ashioya

We present FinGPT Trader, a novel confidence-weighted sentiment analysis system designed to democratize quantitative trading in African markets. By leveraging a fine-tuned Falcon-7B Large Language Model integrated with lightweight technical analysis, our approach offers a resource-efficient solution tailored for environments with constrained resources.

2024 Journal Article

Enhancing HIV Testing Indicator Reporting

Victor, A.J., et al.

This paper presents novel approaches to improving the accuracy and efficiency of HIV testing indicator reporting through data science techniques.

2024 Technical Article

What is few shot learning?

Victor, A.J.

A comprehensive exploration of few-shot learning techniques and their application in In-Context Learning scenarios.

2023 Conference Paper

The Future Remains Unsupervised

Victor, A.J.

An exploration of the untapped potential of unsupervised learning in the era of large language models and foundation models.

2023 Journal Article

Effective Web Scraping for Data Scientists

Victor, A.J.

A comprehensive guide to ethical and efficient web scraping methods tailored specifically for data science applications.

Research Projects

Evaluation Awareness Detection in LLMs

Ongoing 2025-Present

Mechanistic interpretability experiments detecting "Evaluation Awareness" in reasoning models—identifying if LLMs internally represent being monitored. Found 92.3% probe accuracy at layer 16 with 70.3% transfer to subtle cues. Explores whether awareness directions affect policy decisions.

Weather Forecasting with LoRA Fine-tuning

Ongoing 2025-Present

A comprehensive research implementation of weather forecasting using LoRA (Low-Rank Adaptation) fine-tuning on Large Language Models, following Schulman et al. (2025) "LoRA Without Regret" methodology. Transforms numerical weather data into natural language forecasts through parameter-efficient fine-tuning with RLHF optimization for accuracy and style consistency.

Chain-of-Thought Faithfulness Analysis

Ongoing 2025-Present

Comprehensive mechanistic analysis of chain-of-thought faithfulness in GPT-2. Implements attribution graphs, faithfulness detection, and targeted interventions for understanding reasoning circuits in language models.

Hallucination Metrics for LLMs

Ongoing 2024-Present

Developing robust evaluation metrics for measuring and quantifying hallucinations in large language models through Value-Aligned Confabulation (VAC) research.

Research Blog

Check out my research blog for detailed articles, analyses, and tutorials on AI safety, alignment, and more.

View Blog