bookmark_borderAre Transformers Turing-complete? A Good Disguise Is All You Need.

I will avoid an introduction and get straight to it. I’ll assume that you know what Turing-complete means, that you are familiar with transformers, and that you care about the question: Are transformers Turing-complete? (I may later write about what that question means and why it is important.)

Several papers claim that transformers are Turing-complete and therefore capable of computing any computable function. But I believe these claims are all misleading. Some make conceptual errors and are simply incorrect in their claims of Turing-completeness for their models. Others have modified the essential properties of transformers to achieve universal computation, presenting models that only resemble transformers superficially. As of October 2024, the deep learning model widely known as a ‘transformer’ (despite its success in commercial large language models) has not been shown to be Turing-complete (barring add-ons and modifications to its essential properties).

Continue reading “Are Transformers Turing-complete? A Good Disguise Is All You Need.”

bookmark_borderThe Researcher’s Guide for Being Mind Blown by a Neural Network

Every so often a new neural network makes headlines for solving a computation problem. It is sometimes hard for me to judge how impressive these achievements are without diving into the details of the models. But my criteria are always the same and it should be easy for those who are familiar with their models to evaluate based on these criteria. For this purpose I have made a flowchart for how impressed I would be at a neural network. If you know of a new neural net that reaches “wow” please let me know about it, and if it reaches “mind-blown” you have permission to wake me up in the middle of the night – since I know no examples.

Continue reading “The Researcher’s Guide for Being Mind Blown by a Neural Network”

bookmark_borderThe Truth About the [Not So] Universal Approximation Theorem

Computation is usually conceptualized in terms of input/output functions. For instance, to compute the square root of a number is to somehow solve the function that takes a number as an input and outputs its square root. It is commonly stated that feed-forward neural networks are “universal approximators” meaning that, in theory, any function can be approximated by a feed-forward neural network. Here are some examples of this idea being articulated:

“One of the most striking facts about [feed-forward] neural networks is that they can compute any function at all.”

Neural Networks and Deep Learning, Ch. 4, Michael Nielsen

“In summary, a feed-forward network with a single layer is sufficient to represent any function [to an arbitrary degree of accuracy],…”

– Deep Learning (Section 6.4.1 Universal Approximation Properties and Depth), Ian Goodfellow

“…but can we solve anything? Can we stave off another neural winter coming from there being certain functions that we cannot approximate? Actually, yes. The problem of not being able to approximate some function is not going to come back.”

Course on Deep Learning, Universal Approximation Theorem, Konrad Kording

Continue reading “The Truth About the [Not So] Universal Approximation Theorem”