Draw a Diagram of Neuron

Simple diagrams of convoluted neural networks

A good diagram is worth a one thousand equations — let's create more than of these!

Neural networks are complicated, multidimensional, nonlinear assortment operations. How tin we present a deep learning model architecture in a fashion that shows key features, while avoiding beingness too circuitous or repetitive? How can we nowadays them in a way that is clear, didactic and insightful? (Bonus points if it is cute as well!). Correct at present, in that location is no standard for plots — neither for inquiry nor didactic projects. Let me take you through an overview of tools and techniques for visualizing whole networks and particular blocks!

The baseline

AlexNet was a breakthrough architecture, setting convolutional networks (CNNs) as the leading machine learning algorithm for large image classification. The paper introducing AlexNet presents an excellent diagram — but there is something missing…

Alex Krizhevsky, Ilya Sutskever, Geoffrey Eastward Hinton, ImageNet Classification with Deep Convolutional Neural Networks (2012), the original crop

Information technology does non crave an eagle centre to spot it — the top part is accidentally cropped. Then it runs through all subsequent slide decks, references, etc. In my opinion, information technology is a symptom that, in deep learning research, visualization is a mere reconsideration (with a few notable exceptions, including the Dribble periodical).

One may argue that developing new algorithms and tuning hyperparameters are Existent Science/Engineering™, while visual presentation is the domain of art and has no value. I couldn't disagree more!

Sure, for computers running a program it does not affair if your lawmaking is without indentations and has obscurely named variables. Merely for people — it does. Academic papers are non a ways of discovery — they are a means of communication.

Take another complex idea — quantum field theory. If yous desire to bear witness the electron-positron anything process, creating a muon-antimuon pair, here's the Feynman diagram (of the first-order term):

Mark Thomson, Particle Physics, Handout four : Electron-Positron Annihilation

Cute, isn't it? Simply it is not an artistic impression. It is a graphical representation of the scattering amplitude, with each line being a propagator and each vertex — a betoken interaction. It direct translates to:

I may be biased towards "making things simpler" every bit I did with complex tensor operations in JavaScript, and visualized their results before it was absurd (for Quantum Game with Photons). Notwithstanding, there is more to the Feynman diagrams analogy than using visual representations for formulae. In both quantum mechanics and deep learning, we do a lot of linear algebra with tensor structures. In fact, one may even use the Einstein summation convention in PyTorch.

Explaining neural network layers

Earlier we jump into network architectures, let's focus on their building blocks — layers. For example, a Long Brusque-Term Memory (LSTM) unit of measurement tin exist described with the following equation:

Certain, it's reasonably piece of cake to parse these equations. At to the lowest degree — if you are already familiar with matrix multiplication conventions. Merely it is a very different thing to parse something, and to sympathize it. When I saw LSTM equations for the beginning time I could parse it, yet I had no idea what was going on.

By "understanding" I don't mean some spiritual sense of enlightenment — it may be as pleasing and intoxicating as misleading. Instead, I mean building a mental model we are able to work with (to explain, simplify, modify, predict what-if scenarios, etc). Often a graphical grade may be cleaner than a verbal ane:

Chris Olah, Agreement LSTM Networks (2015)

Understanding LSTM Networks is a wonderful blog post about LSTM cells that explains depicted operations in a step-by-step fashion. It gave me a big "Eureka!" moment, turning a seemingly random set of multiplications into a reasonable approach to writing (and reading!) data.

And here is an even more explicit diagram of LSTM below:

Eli Benderski, Minimal character-based LSTM implementation (2018)

In my opinion:

A good diagram is worth a thousand equations.

Information technology works for almost any other blocks. Nosotros can visualize concepts such equally dropout (i.e. switching-off neurons, and rendering their connections irrelevant):

Srivastava, Hinton et al., Dropout: A Simple Style to Prevent Neural Networks from Overfitting (2014)

While I am not a big fan of drawing information flows upside-down, this figure is very clear.

Graphical representations are useful for explaining chemical compound blocks, composed of smaller ones (e.g. a few subsequent convolutions). Accept a look at this Inception module diagram:

Szegedy, Vanhoucke, Ioffe, Shlens, Wojna, Rethinking the Inception Architecture for Computer Vision (2015)

Each visualization is unlike — not only in the terms of its style simply what does it put an emphasis on, and what does information technology abstract abroad. What's important? The number of layers, connections between them, convolution kernel size or activation function? Well, information technology depends. Abstraction means "the process of considering something independently of its associations or attributes". The claiming is to decide what is important for a given communication, and what should be hidden.

For example, in this Batch Normalization diagram, the emphasis is on the backward laissez passer:

Frederik Kratzert, Understanding the backward pass through Batch Normalization Layer (2016)

Information viz vs information art

You lot may get the impression that I argue for making deep learning papers more visually highly-seasoned. Well, it wouldn't hurt to make charts nicer. When I work with data exploration, I often pick nicer color schemes just to make a more pleasant experience. My main point is to turn visualizations into a more effective means of communication.

And then, does nicer mean improve? Not necessarily. The Line between Data Vis and Data Art by Lisa Charlotte Rost, which I found very insightful, explains the distinction.

Lisa Charlotte Rost, Meaning + Beauty in Data Vis and Data Fine art (2015)

For example, look this stunning film below:

Matt Fyles (Graphcore), Inside an AI 'brain' — What does machine learning look like? (2016)

Beautiful, isn't it? To me, it looks alive — like a cell, with its organelle. …but hey — can nosotros deduce anything from it? Would you even gauge information technology'south the same AlexNet?

In another case, an animated multi-layer perceptron is focused on its artful, rather than explanatory, value:

To brand it clear: data art has value on its own, every bit long as we don't misfile artistic value with educational value. If you like going this route, I encourage y'all to use 3D animations of impulses such as these sparks or that colorful brain — for an actual ConvNet.

Sometimes the trade-off is less articulate. This 1, is it data viz or data art?

GoogLeNet from Christian Szegedy et al., Going Deeper with Convolutions (2014)

I guess y'all said: "data vis, apparently". In this case — we are in disagreement. While in that location is a prissy colour scheme, and the repetition of similar structures is visually pleasing, information technology is hard to implement this network solely based on this drawing. Sure, you get the gist of the architecture — i.due east. the number of layers, and on the structure of blocks, but it'due south not enough to reimplement the network (at least, not without a magnifying glass).

To make it articulate — there is room for data art in publications. For example, in a network for detecting skin weather, nosotros run into the diagram of Inception v3 characteristic-extracting layers. Hither it is clear that the authors just use information technology, and represent it graphically, rather than explain its inner workings:

Andrea Esteva et al., Dermatologist-level nomenclature of skin cancer with deep neural networks (2017)

And how would you classify this diagram, for exploring visual patterns that activate selected channels?

Chris Olah et al., Characteristic Visualization — Appendix (2017), distil.pub

I would allocate the diagram below as a expert example of data-viz. A trippy visualization does non make information technology a slice data-art. In this case, the focus is on network compages abstraction and presenting relevant data (input images activating a given channel).

Some diagrams abstract a lot of information, giving just a very general idea of what is going on. Meet the Neural Network Zoo and its prequel:

Fjodor van Veen, Neural Network Zoo (2016), a fragment

Explanatory architecture diagrams

We saw a few examples of layer diagrams, and pieces of data fine art related to neural network architectures.

Let's go to (data) visualizations of neural network architectures. Here is the compages of VGG16, a standard network for image classification.

https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/

We come across, step-by-stride, tensor sizes and operations (marked as colors). It's not abstract — box sizes are related to tensor shapes. Bear in listen that the thickness related to the number of channels is not to scale (well, nosotros have iii to 4096).

A similar approach is to testify values for each channel, equally in this DeepFace work:

Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf, DeepFace: Closing the Gap to Homo-Level Operation in Face Verification (2014)

Such diagrams are not restricted to computer vision. Allow's see i for turning text into… colors:

Chengwei Zhang, How to train a Keras model to generate colors (2018)

Such diagrams might be useful if the goal is to show the network architecture and at the aforementioned time — requite some hints on its inner workings. They seem to be especially useful for tutorials, e.k. the seminal The Unreasonable Effectiveness of Recurrent Neural Networks.

Abstruse architecture diagrams

Yet, for larger models, explanatory diagrams may be unnecessarily complex or as well specific to show all possible layers inside a unmarried diagram style. Then, the manner to go is to use abstract diagrams. Typically, nodes announce operations, while arrows correspond the tensor flow. For instance, permit's look at this VGG-19 vs ResNet-34 comparison:

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Lord's day, Deep Residuum Learning for Paradigm Recognition (2015), cropped

We can see that there is some back-up, as some units get reused or repeated. Since diagrams can exist long (there is a reason why I cropped the one higher up!), information technology is beneficial to spot the patterns and consolidate them. Such hierarchy makes information technology simpler both to empathize concepts and present them visually (unless nosotros simply want to create information-artsy diagrams of GoogLeNet).

For instance, let's await at this one, of Inception-ResNet-v1:

Inception-ResNet-v1 as depicted in Szegedy et al., Inception-v4, Inception-ResNet and the Impact of Balance Connections on Learning (2016), combined two figures

I adore its composition — we run into what's going on, and which blocks are beingness repeated.

Another diagram that made a concept super clear to me was i for image segmentation, U-Cyberspace:

Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-Internet: Convolutional Networks for Biomedical Image Segmentation (2015)

Take notation that this fourth dimension nodes denote tensors, whereas arrows correspond operations. I find this diagram very articulate — we see tensor shapes, convolutions, and pooling operations. Since the original U-Net compages is not too complex, we tin exercise without looking at its hierarchical structure.

The job of creating articulate diagrams get slightly more complicated when we want to use more than complex building blocks. If nosotros want to reproduce the network, we demand to know its details:

Number of channels
Convolutions per MaxPool
Number of MaxPools
Batch normalization or dropout
Activation functions (ReLU? before or after Batch Norm?)

As a great example of condensing this level of detail into a diagram, see the diagram below:

Arkadiusz Nowaczyński, Deep learning for satellite imagery via image segmentation (2017)

While the color choice could accept been better, I adore its explicit form. At that place is a clear indication of the number of channels. Each complex layer is explicitly decomposed into its building blocks, maintaining all details (notation 3-level bureaucracy).

Another interesting approach to the neural network module hierarchy:

AdaptNet architecture from Abhinav Velda et al., DeepScene: Semantic Partitioning using Deep Upconvolutional Neural Networks (2016)

Automatic tools for neural network architecture visualization

Y'all can draw your network manually. Use Inkscape (as Chris Olah did), TikZ (if you are a fan of LaTeX) or any other tool. The other one is to generate them automatically.

I hope that you are enlightened that you already interact with one visual representation — code (yes, a text is a visual representation!). For some projects, the code might suffice, peculiarly if y'all work with a concise framework (such every bit Keras or PyTorch). For more convoluted (pun totally intended) architectures, diagrams add a lot of explanatory value.

TensorBoard: Graph

TensorBoard is arguably the most popular network visualization tool. A TensorFlow network graph looks like this:

Does it provide a readable summary for a neural network?

In my opinion, it does not.

While this diagram shows the construction of computations, some things are long-winded (e.one thousand. adding bias as a separate operation). Additionally, the most of import parts are being masked: the core parameters of operations (eastward.m. convolution kernel size), and tensor sizes. Though, before going into criticising, I actually encourage reading the accompanying paper:

Chiliad. Wongsuphasawat, D. Smilkov et al, Visualizing dataflow graphs of deep learning models in TensorFlow, 2018

This article provides insight into the many challenges of creating network diagrams bottom-up. Since nosotros are allowed to utilise all TensorFlow operations, including auxiliary ones (such as initialization or logging), it is challenging to make a general, readable graph. If we don't assume much about what is important to the reader(e.g. that convolution kernel size may vary, but all operations are expected to take a bias), it is hard to make a general tool for turning any TensorFlow computation diagram into a useful (retrieve: publication-fix) diagram.

Keras

Keras is a loftier-level deep learning framework and therefore has huge potential for beautiful visualizations. (Side annotation: if you want to use an interactive train graph for Jupyter Notebook, I wrote 1: livelossplot.) However, in my opinion, its default visualizing option (using GraphViz) is not stellar:

I call up it hides important details, while provides redundant information (duplicated tensor sizes). Aesthetically, I don't beloved information technology nigh much as Mike Bostock does.

I tried to write another one(pip install keras_sequential_ascii), for trainings:

Piotr Migdał, Sequential model in Keras -> ASCII (2017)

This structure works for small-sized sequential network architectures. I've institute it useful for grooming and courses, such equally Starting deep learning easily-on: image classification on CIFAR-10. But not for anything more advanced (though, I was brash to use branching viz like from git log). And, apparently, I am not the just one who tried ASCII art for neural network viz:

Brian Depression, Keras models as ASCII diagrams (2016)

Though, I would say that the most aesthetically pleasing is one found in Keras.js (an ambitious project bringing neural networks to the browser, with GPU support):

Leon Chen, SeqeezeNet v.1.1 from Keras.js Demo (2018)

This project is no longer in active development, in favor of TensorFlow.js. Yet, as it is open-source and modular (using Vue.js framework), information technology may work equally a starting ground for creating a standalone-viz. Ideally, one working in Jupyter Notebook or separate browser window, much alike displaCy for sentence decomposition.

Moniel

Instead of turning a functional neural network into a graph, we can define an abstract structure. In Moniel by Milan Lajtoš the best part is that we tin can define a hierarchical structure:

I like this hierarchical-structure arroyo. Moniel was an ambitious idea to create a specific language (rather than, say, to utilize YAML). Sadly, the project lies abandoned.

Netscope

I got inspired by Netscope CNN Analyzer by dgschwend (based on a project by ethereon). Information technology is a project with many forks, so by now a different one may be more upward-to-date:

It is based on Caffe's .prototxtformat. I dearest its color theme, the display of channel sizes and mouseover tooltip for verbal parameters. The main problem, though, is the lack of a hierarchical structure. Networks get (too) large very before long.

Netron

Another ambitious project: Netron past Lutz Roeder:

It is a web app, with standalone versions. Ambitiously, it reads diverse formats

Netron supports ONNX ( .onnx , .pb ), Keras ( .h5 , .keras ), CoreML ( .mlmodel ) and TensorFlow Calorie-free ( .tflite ). Netron has experimental support for Caffe ( .caffemodel ), Caffe2 ( predict_net.atomic number 82 ), MXNet ( .model , -symbol.json ), TensorFlow.js ( model.json , .pb ) and TensorFlow ( .pb , .meta ).

It sounds crawly! Though, it is a bit more than verbose than NetScope (with activation functions) and, most fundamentally, information technology lacks the hierarchical structure. Just for a general visualization, it may be the all-time starting point.

EDIT: Other tools

A few other tools that may exist useful or inspiring:

NN-SVG: LeNet- and AlexNet-style diagrams
Visualizing CNN architectures side by side with MXNet
TensorSpace.js — an in-browser 3D visualizations of channels (stunning but hardly useful)
HiddenLayer — diagrams with ONNX & Graphviz in Jupyter Notebook for TensorFlow, Keras and PyTorch
PlotNeuralNet — LaTeX code for drawing convolutional neural networks

And a few threads:

What tools are skilful for drawing neural network compages diagrams? — Quora
How practice you visualize neural network architectures? — Data Scientific discipline Stack Substitution

Conclusion and phone call for action

We saw quite a few examples of neural network visualization, shedding light on the following trade-offs:

data viz vs data art (useful vs beautiful)
explicit vs implicit (should I prove ReLU all the fourth dimension? But what about tensor dimensions?)
shallow vs hierarchical
static (works well in publications) vs interactive (provides more than data)
specific vs full general (does it work for a reasonably broad family of neural networks?)
data menstruum direction (peak to bottom, bottom to top, or left to right; hint: please don't draw lesser-to-peak)

Each of those topics is probably worth a Masters thesis, and all combined — a PhD (especially with a meticulous study of how people do visualize and what are the abstractions.)

I call up there is a big opportunity in creating a standard neural network visualization tool, as mutual for neural network architectures equally matplotlib is for charts. Information technology remains a challenge at the intersection of deep learning and data visualization. The tool should be useful and general plenty, to get a standard for:

tutorials in neural networks
academic publications
showing network compages to collaborators

If we want to brand it interactive, JavaScript is a must. Be it D3.js, Vue.js, React or whatsoever other tech. That mode, it is non only easy to arrive interactive, but also system doubter. Have Bokeh as an case — being useful within a Jupyter Notebook, only also — as a standalone website.

Would you similar to first a make new bundle? Or contribute to an existing one?

If you find whatsoever neural network particularly inspiring, or confusing, share it in the comments! :)

Afterwords

This article is based on my talk "Uncomplicated diagrams of convoluted neural networks" (abstract, slides) from PyData Berlin 2022 (BTW: and I invite you to PyData Warsaw, 19–20 Nov 2018). Typically I write on my blog p.migdal.pl. Now I give Medium a try, as it is easier to include images than with Jekyll.

I am grateful to Ilja Sperling for fruitful conversations later the talk and to Rafał Jakubanis and Sarah Martin, CSC for numerous remarks on the draft.

If you lot want to appreciate this article and requite more time to write more, consider becoming my GitHub sponsor! ❤️

championrusestiond.blogspot.com

Source: https://medium.com/inbrowserai/simple-diagrams-of-convoluted-neural-networks-39c097d2925b