How to convert deep learning gradient descent equa

2020-08-02 17:00发布

I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please

Please see the following link for the equations used Click here to see the equations used for the calculations

Following is the function given to calculate the gradient descent,cost etc. The values need to be found without using for loops but using matrix manipulation operations

import numpy as np

def propagate(w, b, X, Y):
"""
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size
  (1, number of examples)

Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b

Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""

m = X.shape[1]


# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A =                                      # compute activation
cost =                                   # compute cost
### END CODE HERE ###


# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw = 
db = 
### END CODE HERE ###


assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())

grads = {"dw": dw,
         "db": db}

return grads, cost

Following are the data given to test the above function

w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), 
np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

Following is the expected output of the above

Expected Output:
dw  [[ 0.99993216] [ 1.99980262]]
db  0.499935230625
cost    6.000064773192205

For the above propagate function I have used the below replacements, but the output is not what is expected. Please kindly help on how to get the expected output

A = sigmoid(X)
cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A))),axis=0))/m)
dw = (np.dot(X,((A-Y).T)))/m
db = np.sum((A-Y),axis=0)/m

Following is the sigmoid function used to calculate the Activation:

def sigmoid(z):
  """
  Compute the sigmoid of z

  Arguments:
  z -- A scalar or numpy array of any size.

  Return:
  s -- sigmoid(z)
  """

  ### START CODE HERE ### (≈ 1 line of code)
  s = 1 / (1+np.exp(-z))
  ### END CODE HERE ###

return s

Hope someone could help me understand how to solve this as I couldn't continue with rest of the tutorials without understanding this. Many thanks

3条回答
Animai°情兽
2楼-- · 2020-08-02 17:44

You can calculate A,cost,dw,db as the following:

A = sigmoid(np.dot(w.T,X) + b)     
cost = -1 / m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A)) 

dw = 1/m * np.dot(X,(A-Y).T)
db = 1/m * np.sum(A-Y)

where sigmoid is :

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))    
    return s
查看更多
对你真心纯属浪费
3楼-- · 2020-08-02 17:55

After going through the code and notes a few times was finally able to figure out the error.

First it needs calculating Z and then pass it to the sigmoid function, instead of X

Formula for Z = w(T)X+b. So in python this is calculated as below

Z=np.dot(w.T,X)+b

Then calculate A by passing z to sigmoid function

A = sigmoid(Z)

Then dw can be calculated as below

dw=np.dot(X,(A-Y).T)/m

Calculation of the other variables; cost and derivative of b will be as follows

cost = -1*((np.sum((Y*np.log(A))+((1-Y)*(np.log(1-A))),axis=1))/m) 
db = np.sum((A-Y),axis=1)/m
查看更多
小情绪 Triste *
4楼-- · 2020-08-02 17:59
def sigmoid(x):
      #You have it right
      return 1/(1 + np.exp(-x))

def derivSigmoid(x):
      return sigmoid(x) * (1 - sigmoid(x))

error = targetSample - output

#Make sure to keep the sigmoided value around.  For instance, an output that has already been sigmoided can be used to get the sigmoid derivative faster (output = sigmoid(x)):
dOutput = output * (1 - output)

Looks like you're already working on the backprop. Just thought I'd help simplify some of the forward prop for you.

查看更多
登录 后发表回答