Going through this book, I am familiar with the following:
For each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection (reverse pass), and finally slightly tweaks the connection weights to reduce the error.
However I am not sure how this differs from the reverse-mode autodiff implementation by TensorFlow.
As far as I know reverse-mode autodiff first goes through the graph in the forward direction and then in the second pass computes all partial derivatives for the outputs with respect to the inputs. This is very similar to the propagation algorithm.
How does backpropagation differ from reverse-mode autodiff ?
Automatic differentiation differs from the method taught in standard calculus classes on how gradients are computed, and in some features such as its native ability to take the gradient of a data structure and not just a well defined mathematical function. I'm not expert enough to go into further detail, but this is a great reference that explains it in much more depth:
https://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/
Here's another guide that looks quite nice that I just found now.
https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation
I believe backprop may formally refer to the by-hand calculus algorithm for computing gradients, at least that's how it was originally derived and is how it's taught in classes on the subject. But in practice, backprop is used quite interchangeably with the automatic differentiation approach described in the above guides. So splitting those two terms is probably as much an effort in linguistics as it is mathematics.
I also noted this nice article on the backpropagation algorithm to compare against the above guides on automatic differentiation.
https://brilliant.org/wiki/backpropagation/
The most important distinction between backpropagation and reverse-mode AD is that reverse-mode AD computes the vector-Jacobian product of a vector valued function from R^n -> R^m, while backpropagation computes the gradient of a scalar valued function from R^n -> R. Backpropagation is therefore a subset of reverse-mode AD.
When we train neural networks, we always have a scalar-valued loss function, so we are always using backpropagation. Since backprop is a subset of reverse-mode AD, then we are also using reverse-mode AD when we train a neural network.
Whether or not backpropagation takes the more general definition of reverse-mode AD as applied to a scalar loss function, or the more specific definition of reverse-mode AD as applied to a scalar loss function for training neural networks is a matter of personal taste. It's a word that has slightly different meaning in different contexts, but is most commonly used in the machine learning community to talk about computing gradients of neural network parameters using a scalar loss function.
For completeness: Sometimes reverse-mode AD can compute the full Jacobian on a single reverse pass, not just the vector-Jacobian product. Also, the vector Jacobian product for a scalar function where the vector is the vector [1.0] is the same as the gradient.
Thanks to the answer by David Parks for the valid contribution and useful links, however I have found the answer to this question by the author of the book himself, which may provide a more concise answer: