Research Interests
My research sits at the intersection of AI safety, mechanistic interpretability, and understanding the failure modes of large language models. I focus on confabulations ("hallucinations")—when models produce confident-but-wrong outputs—and methods to anchor outputs to verifiable context. I'm particularly drawn to mechanistic interpretability: tracing how specific neurons, attention heads, and pathways give rise to model capabilities (and failures).
I see language models as more than statistical compression engines—but not as reasoning systems. They exhibit what Ethan Mollick calls "jagged intelligence": superhuman at some tasks, brittle at others that humans find trivial. Karpathy and Demis Hassabis have observed the same pattern—capabilities cluster unpredictably, and fluent outputs mask underlying brittleness. This jagged profile is why interpretability matters: we need to understand where and why systems break.
I'm skeptical that scaling language modeling alone leads to robust reasoning. As Yann LeCun argues, the pretraining paradigm is a dead end for true understanding—these systems master "System 1" thinking but lack the deliberative, causal reasoning of "System 2." This is why I track world models: latent simulators that support planning, counterfactual reasoning, and transfer across domains. Projects like Meta's V-JEPA point toward AI that simulates rather than predicts. I've written more on this in my State of AI 2025 essay—I believe world models are the more interesting frontier.
Research Ethos
Builder, anti-doomer, anti-gnostic
First Principles
I reject total explanations of reality—catastrophe may be possible, but inevitability must be proven, not assumed. Knowledge increases obligation: to see a failure mode is to inherit responsibility for it. I act under uncertainty because waiting for assurance is a form of paralysis.
Against False Gnosis
I explicitly oppose the posture that equates pessimism with intelligence, treats builders as naïve, and frames resignation as moral clarity. True understanding is slow, partial, embodied, and costly. Any framework that claims total knowledge of catastrophic outcomes should bear an extraordinary burden of proof.
Stewardship of AI
AI systems are artifacts shaped by human choices, not destinies. Alignment is stewardship under uncertainty—clarifying internal representations, constraining harmful dynamics, and designing feedback loops that favour corrigibility. Progress is allowed to be partial. I optimize to leave the field marginally safer than I found it.
Research Diary
I maintain a personal research diary documenting my journey through AI research—technical history from n-grams to Transformers, curated paper notes, and critical perspectives on the AI landscape.
Explore Research DiaryPublications
Democratizing Quantitative Trading in African Markets: A FinGPT-Based Approach
We present FinGPT Trader, a novel confidence-weighted sentiment analysis system designed to democratize quantitative trading in African markets. By leveraging a fine-tuned Falcon-7B Large Language Model integrated with lightweight technical analysis, our approach offers a resource-efficient solution tailored for environments with constrained resources.
Victor Jotham Ashioya (2025). Democratizing Quantitative Trading in African Markets: A FinGPT-Based Approach. AI in Business and Finance (AIBF) 2025. OpenReview.
Enhancing HIV Testing Indicator Reporting
This paper presents novel approaches to improving the accuracy and efficiency of HIV testing indicator reporting through data science techniques.
Victor, A.J., et al. (2024). Enhancing HIV Testing Indicator Reporting. DSAI Journal.
What is few shot learning?
A comprehensive exploration of few-shot learning techniques and their application in In-Context Learning scenarios.
The Future Remains Unsupervised
An exploration of the untapped potential of unsupervised learning in the era of large language models and foundation models.
Victor, A.J. (2023). The Future Remains Unsupervised. Deep Learning Indaba.
Effective Web Scraping for Data Scientists
A comprehensive guide to ethical and efficient web scraping methods tailored specifically for data science applications.
Victor, A.J. (2023). Effective Web Scraping for Data Scientists. DSAI Journal.
Research Projects
Evaluation Awareness Detection in LLMs
Mechanistic interpretability experiments detecting "Evaluation Awareness" in reasoning models—identifying if LLMs internally represent being monitored. Found 92.3% probe accuracy at layer 16 with 70.3% transfer to subtle cues. Explores whether awareness directions affect policy decisions.
Weather Forecasting with LoRA Fine-tuning
A comprehensive research implementation of weather forecasting using LoRA (Low-Rank Adaptation) fine-tuning on Large Language Models, following Schulman et al. (2025) "LoRA Without Regret" methodology. Transforms numerical weather data into natural language forecasts through parameter-efficient fine-tuning with RLHF optimization for accuracy and style consistency.
Chain-of-Thought Faithfulness Analysis
Comprehensive mechanistic analysis of chain-of-thought faithfulness in GPT-2. Implements attribution graphs, faithfulness detection, and targeted interventions for understanding reasoning circuits in language models.
Hallucination Metrics for LLMs
Developing robust evaluation metrics for measuring and quantifying hallucinations in large language models through Value-Aligned Confabulation (VAC) research.
Research Blog
Check out my research blog for detailed articles, analyses, and tutorials on AI safety, alignment, and more.
View Blog