
Introduction
Recently, the scientific community has been invigorated by the publication of the article “KAN: Kolmogorov–Arnold Networks” which proposes an innovative neural network architecture. KANs represent a fundamentally different approach to artificial neural network design, based on the Kolmogorov-Arnold theorem. Although this theorem has been well-known for over 60 years, it has only now found practical application in machine learning. In this article, we will take a closer look at this concept.
Kolmogorov-Arnold Representation theorem
The theorem by Vladimir Arnold and Andrey Kolmogorov provides the mathematical foundation for the unique structure of KAN networks. It states that any multivariate continuous function on a bounded domain can be represented as a finite composition of continuous functions of a single variable combined with the binary operation of addition.
The mathematical formula of the Kolmogorov–Arnold representation theorem. (source: Wikipedia)
Although initially dismissed as impractical for machine learning applications, the authors of the KAN network decided to leverage this theorem. Their approach is based on generalizing the network to arbitrary widths and depths. They also highlight that most functions in science and daily life are often smooth and possess sparse compositional structures, potentially facilitating smooth Kolmogorov-Arnold representations.
This approach enables the creation of spline-based neural networks—mathematical functions that create smooth curves by connecting a series of control points. Splines provide flexibility in adjusting the curve’s shape while ensuring continuity and smoothness between adjacent segments. Consequently, KAN networks offer a novel way to model relationships within data.
Example of a B-Spline (source: opensourc.es)
KAN vs MLP
The classical MLP architecture consists of neurons grouped into layers. Each neuron in the network multiplies all its inputs by weights, sums them, and then passes them through a specified activation function (e.g., ReLU or sigmoid). In contrast, in KAN networks, each neuron simply sums its input signals. However, the activation functions are located at each connection between neurons and serve as learnable functions (B-Splines). Rather than allocating fixed weights, KAN assigns parametrically modelled functions to individual edges.
Comparison of MLP and KAN architectures
(Source: arXiv:2404.19756)
In traditional neural networks, optimization during training focuses on adjusting weight and bias values to enhance performance. Meanwhile, activation functions remain consistent across the network.
In KAN, each weight is represented by a spline function, which is optimized during training. Instead of fixed weight values, the model focuses on learning different parameters of specific spline functions for each connection.
Throughout training, adjustments are made to the coefficients of these splines to minimize prediction error. Techniques like gradient descent are frequently utilized for this task. In each iteration, the parameters of the splines are refined to minimize prediction error, facilitating the discovery of the most appropriate curves for the data.
Enhanced Scalability
In the realm of machine learning, efficiently and accurately approximating complex functions is paramount, especially with the rise of high-dimensional data. Current mainstream models like Multi-Layer Perceptron (MLPs) often struggle with such data due to what’s known as the curse of dimensionality.
KANs showcase superior scalability compared to MLPs, particularly in scenarios involving high-dimensional data. Their knack lies in breaking down intricate high-dimensional functions into compositions of simpler one-dimensional functions. By homing in on optimizing these one-dimensional functions rather than grappling with the entire multivariate space, KANs streamline complexity and reduce the number of parameters required for precise modelling. Moreover, since KANs operate with simpler one-dimensional functions, they can be crafted as straightforward and interpretable models.
Improved Accuracy
Consequently, KANs possess the capability not only to learn features, akin to MLPs, but also to finely optimize these acquired features, reminiscent of splines. Despite employing fewer parameters, KANs consistently outperform traditional MLPs in diverse tasks by achieving heightened accuracy and reduced loss. This superior performance can be attributed to their adaptive modelling of data relationships, leading to more accurate predictions and enhanced generalization to novel examples.
Interpretable Models
Various simplification techniques have emerged for KAN networks:
Training time
An important point to note about KAN is the slow training. KANs are typically 10x slower than MLPs, given the same number of parameters. If someone wants to train a model quickly, they should use MLPs. However, in other cases, KAN should be comparable or better than MLP, making it worth a try.
Conclusion
In summary, KAN networks represent a promising departure from the traditional MLP architecture, offering increased scalability, improved accuracy and the potential for interpretable models. Using parametrically modelled spline functions, KAN networks deftly navigate through multidimensional data while maintaining simplicity and interpretability.
A noticeable drawback is the training time; however, the authors honestly admit that they did not make significant efforts to optimize KANs’ efficiency, suggesting that the slow training is more of an engineering problem to be improved in the future rather than a fundamental limitation.
The authors also note that the idea behind KAN is not new, but its elements have so far been tested on problems too simple to demonstrate their advantages. Now, the challenge is to raise the bar and see if KANs can be used to build algorithms for image processing, object recognition, generative algorithms, and language models.
KAN networks present an innovative approach to confront the limitations of classical neural networks. The adaptive properties of spline-based edge functions make KANs very suitable for fitting and modelling data. With their advanced flexibility, high accuracy, and interpretability, KANs have the potential to play a significant role in the future of AI.
I encourage you to refer to the source and read the article in more detail.
Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, and Max Tegmark, “KAN: Kolmogorov-Arnold Networks,” 2024, arXiv:2404.19756.
Digital Fingerprints S.A. ul. Gliwicka 2/8, 40-079 Katowice. KRS: 0000543443, Sąd Rejonowy Katowice-Wschód, VIII Wydział Gospodarczy, Kapitał zakładowy: 4 528 828,76 zł – opłacony w całości, NIP: 525-260-93-29
Biuro Informacji Kredytowej S.A., ul. Zygmunta Modzelewskiego 77a, 02-679 Warszawa. Numer KRS: 0000110015, Sąd Rejonowy m.st. Warszawy, XIII Wydział Gospodarczy, kapitał zakładowy 15.550.000 zł opłacony w całości, NIP: 951-177-86-33, REGON: 012845863.
Biuro Informacji Gospodarczej InfoMonitor S.A., ul. Zygmunta Modzelewskiego 77a, 02-679 Warszawa. Numer KRS: 0000201192, Sąd Rejonowy m.st. Warszawy, XIII Wydział Gospodarczy, kapitał zakładowy 7.105.000 zł opłacony w całości, NIP: 526-274-43-07, REGON: 015625240.