LMS batch gradient descent with NumPy

I'm trying to write some very simple LMS batch gradient descent but I believe I'm doing something wrong with the gradient. The ratio between the order of magnitude and the initial values for theta is very different for the elements of theta so either theta[2] doesn't move (e.g. if alpha = 1e-8) or theta[1] shoots off (e.g. if alpha = .01).

import numpy as np

y = np.array([[400], [330], [369], [232], [540]])
x = np.array([[2104,3], [1600,3], [2400,3], [1416,2], [3000,4]])
x = np.concatenate((np.ones((5,1), dtype=np.int), x), axis=1)
theta = np.array([[0.], [.1], [50.]])
alpha = .01

for i in range(1,1000):
    h = np.dot(x, theta)
    gradient = np.sum((h - y) * x, axis=0, keepdims=True).transpose()
    theta -= alpha * gradient
    print ((h - y)**2).sum(), theta.squeeze().tolist()

标签： python numpy linear-regression least-squares gradient-descent

1条回答

Animai°情兽

2楼-- · 2019-07-26 07:43

The algorithm as written is completely correct, but without feature scaling, convergence will be extremely slow as one feature will govern the gradient calculation.

You can perform the scaling in various ways; for now, let us just scale the features by their L^1 norms because it's simple

import numpy as np
y = np.array([[400], [330], [369], [232], [540]])
x_orig = np.array([[2104,3], [1600,3], [2400,3], [1416,2], [3000,4]])
x_orig = np.concatenate((np.ones((5,1), dtype=np.int), x_orig), axis=1)
x_norm = np.sum(x_orig, axis=0)
x = x_orig / x_norm

That is, the sum of every column in x is 1. If you want to retain your good guess at the correct parameters, those have to be scaled accordingly.

theta = (x_norm*[0., .1, 50.]).reshape(3, 1)

With this, we may proceed as you did in your original post, where again you will have to play around with the learning rate until you find a sweet spot.

alpha = .1
for i in range(1, 100000):
    h = np.dot(x, theta)
    gradient = np.sum((h - y) * x, axis=0, keepdims=True).transpose()
    theta -= alpha * gradient

Let's see what we get now that we've found something that seems to converge. Again, your parameters will have to be scaled to relate to the original unscaled features.

print (((h - y)**2).sum(), theta.squeeze()/x_norm)
# Prints 1444.14443271 [ -7.04344646e+01   6.38435468e-02   1.03435881e+02]

At this point, let's cheat and check our results

theta, error, _, _ = np.linalg.lstsq(x_orig, y)
print(error, theta)
# Prints [ 1444.1444327] [[ -7.04346018e+01]
#                         [  6.38433756e-02]
#                         [  1.03436047e+02]]

A general introductory reference on feature scaling is this Stanford lecture.

0人赞添加讨论(0) 举报

LMS batch gradient descent with NumPy

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间