I guess you've probably heard that artificial intelligence is transforming our world—from the software systems that power digital assistants like Alexa and Siri to facial recognition and self-driving cars. Today, it seems everywhere with Generative AI (GenAI). At the heart of these cutting-edge AI systems are machine learning algorithms. But what exactly is Machine Learning (ML)? How does it work its magic? And can we trust these algorithms if even the experts can't fully explain them?
“Machine learning is like a black box. What happens inside is often a mystery.”
In my book, written in the far 2020, I explored the limitations of AI as it is widely developed using machine learning. I used examples of chatbots and AI systems to work with language (aka, NLP, which stands for natural language processing) - entity recognition in a sentence, machine translation, and more. One limitation is data, and I covered this point in another article. The second is explainability. And what I found is that "black box" is an apt metaphor. We can see the data that goes in and the results that come out, but the calculations in between are often a mystery.
LLMs: an indecipherable black box?
Consider the latest release by OpenAI, GPT-4o, the Large Language Model (LLM) that integrates text, voice, and vision into a single model, allowing it to process and respond to a combination of input types. Under the hood, GPT-4o is a massive neural network with hundreds of billion parameters–or maybe a trillion, we don’t know. During training, it ingests hundreds of terabytes of online text, images, and audio files. It learns patterns that allow it to predict the most likely next word, pixel, or audio format in a sequence. Feed it a prompt; it can riff on that input to generate essays, stories, poems, computer code, and even images or answer you by voice.
But despite GPT-4o's impressive outputs, its inner logic remains largely opaque. We know it works by identifying statistical patterns in language. But why does it associate certain words together? How does it “decide” what to write? What is the representation of language that it's building internally? No one fully understands - I’m pretty sure not even the researchers at OpenAI who created it!
Sudden decisions
This interpretability challenge isn't unique to language models. Many computer vision algorithms, for example, can accurately detect objects, but their reasoning is similarly inscrutable. An image classifier might correctly distinguish birds from planes, but did it arrive at that judgment by looking at wings, beaks, or background clouds? It's often impossible to audit an algorithm's decision-making process. And let me tell you more: the decision can be dramatically altered with only a small perturbation of pixels, as demonstrated by Battista Biggio et al.’s groundbreaking work in adversarial machine learning. This problem gives rise to severe security issues, where malicious actors could potentially fool vision systems by subtly manipulating images in ways imperceptible to humans.
To be clear, this opacity doesn't negate machine learning's tremendous utility. Language models, computer vision, and other technologies are driving incredible breakthroughs in fields ranging from healthcare to education to scientific research. We can celebrate these advances while still surfacing the limitations and exploring the ramifications.
After all, the stakes are high when we deploy AI systems in the real world. Imagine an AI that determines your credit score, screens your job application, or chooses your medical treatment. Even if it usually generates the “right” answer, don't we need insight into how it arrived at those judgments? Black box algorithms concentrate immense power behind an inscrutable veil.
That's why a growing cohort of researchers advocates for “explainable AI.” They're developing techniques to trace algorithmic decisions back to input features, quantify uncertainty, and translate machine learning models into rules humans can interpret. Other experts counter that full explainability isn't always feasible or necessary and that the opacity of biological brains doesn't stop us from finding them useful and intelligent.
But we may figure this out.
While data is an inherent limitation of Machine Learning, interpretability depends on context, and maybe one day, we’ll figure it out. Recently, one of the best LLM makers made a leap towards the goal of interpretability.
Anthropic recently revealed a fantastic work on interpreting the workings of their LLM, Claude 3 Sonnet. They try to unveil the inner workings of large language models in an original way, using a technique called “dictionary learning” to map the internal states of the model. By isolating patterns of neuron activations that recur across different contexts, researchers could extract millions of features, creating a conceptual map that reflects the similarity between related concepts.
Remarkably, these features are abstract, representing the same concept across contexts and languages and even generalizing to image inputs. Moreover, artificially manipulating these features can change the model's behavior, offering a promising approach for ensuring the safety (or at least some robustness) of LLMs. However, mapping all existing features would require more computation than training the model itself, and the significance of activations and their circuits remains unknown, leaving uncertainty about the effectiveness of this technique in making AI models safer.
***
As organizations evaluate what are useful applications of Machine Learning and LLMs, they'll have to grapple with important questions. Will AI systems need to be fully interpretable to be reliably beneficial? How do we balance AI's power and peril?
Wrestling with these questions requires nuanced thinking. We shouldn't romanticize or demonize machine learning's “black magic.” Rather, we must strive to understand it—both its technical workings and its societal repercussions. This kind of AI literacy isn't just for computer scientists and tech giants but also for policymakers, business leaders, and everyday citizens.
So, while GPT-4o's billions of parameters might never be fully mappable, we can still illuminate machine learning as a technological paradigm. I believe interpretability research will lead to amazingly interesting discoveries, while data remains an inherent limitation of machine learning, as previously argued.
In future posts, I’ll cover the third limitation of ML that I believe can be overcome: the security of ML models.
Stay tuned. The future is ours to encode.