agrosetr.blogg.se - Backwards e blogo

And obviously it would be expensive to compute all those derivatives. Training neural networks with derivatives? Surely you’d just get stuck in local minima. Worse, it would be very easy to write off any piece of the circular dependency as impossible on casual thought. Those are only obvious once you realize you can quickly calculate derivatives. It also wasn’t obvious that derivatives were the right way to train them. You see, at the time backpropagation was invented, people weren’t very focused on the feedforward neural networks that we study. It’s true that if you ask “is there a smart way to calculate derivatives in feedforward neural networks?” the answer isn’t that difficult.īut I think it was much more difficult than it might seem. When I first understood what backpropagation was, my reaction was: “Oh, that’s just the chain rule! How did it take us so long to figure out?” I’m not the only one who’s had that reaction. If one has a function with lots of outputs, forward-mode differentiation can be much, much, much faster.) Isn’t This Trivial? (Are there any cases where forward-mode differentiation makes more sense? Yes, there are! Where the reverse-mode gives the derivatives of one output with respect to all inputs, the forward-mode gives us the derivatives of all outputs with respect to one input. So, reverse-mode differentiation, called backpropagation in the context of neural networks, gives us a massive speed up! Now, there’s often millions, or even tens of millions of parameters in a neural network. We want to calculate the derivatives of the cost with respect to all the parameters, for use in gradient descent. When training neural networks, we think of the cost (a value describing how bad a neural network performs) as a function of the parameters (numbers describing how the network behaves).

Reverse-mode differentiation can get them all in one fell swoop! A speed up of a factor of a million is pretty nice! Forward-mode differentiation would require us to go through the graph a million times to get the derivatives. Forward-mode differentiation gave us the derivative of our output with respect to a single input, but reverse-mode differentiation gives us all of them.įor this graph, that’s only a factor of two speed up, but imagine a function with a million inputs and one output. If we want to get the derivative \(\frac\), the derivatives of \(e\) with respect to both inputs. In the above diagram, there are three paths from \(X\) to \(Y\), and a further three paths from \(Y\) to \(Z\). When one node’s value is the input to another node, an arrow goes from one to another. To create a computational graph, we make each of these operations, along with the input variables, into nodes. To help us talk about this, let’s introduce two intermediary variables, \(c\) and \(d\) so that every function’s output has a variable. There are three operations: two additions and one multiplication.

For example, consider the expression \(e=(a+b)*(b+1)\). Computational GraphsĬomputational graphs are a nice way to think about mathematical expressions. And it’s an essential trick to have in your bag, not only in deep learning, but in a wide variety of numerical computing situations. The general, application independent, name is “reverse-mode differentiation.”įundamentally, it’s a technique for calculating derivatives quickly. In fact, the algorithm has been reinvented at least dozens of times in different fields (see Griewank (2010)). That’s the difference between a model taking a week to train and taking 200,000 years.īeyond its use in deep learning, backpropagation is a powerful computational tool in many other areas, ranging from weather forecasting to analyzing numerical stability – it just goes by different names. For modern neural networks, it can make training with gradient descent as much as ten million times faster, relative to a naive implementation. Backpropagation is the key algorithm that makes training deep models computationally tractable.