Logistic regression in sagemaker

2019-04-13 04:47发布

问题:

I am using the aws sagemaker for logistic regression. For validating the model on test data, the following code is used

runtime= boto3.client('runtime.sagemaker')

payload = np2csv(test_X)
response = runtime.invoke_endpoint(EndpointName=linear_endpoint,
                                   ContentType='text/csv',
                                   Body=payload)
result = json.loads(response['Body'].read().decode())
test_pred = np.array([r['score'] for r in result['predictions']])

The result contains the prediction values and the probability scores. I want to know how I can run a prediction model to predict the outcome based on two specific features. Eg. I have 30 features in the model and have trained model using those features. Now for my prediction, I want to know the outcome when feature1='x' and feature2='y'. But when I filter the data to those columns and pass that in the same code, I get the following error.

Customer Error: The feature dimension of the input: 4 does not match the feature dimension of the model: 30. Please fix the input and try again.

What is the equivalent of say glm.predict('feature1','feature2')in R in AWS Sagemaker implementation?

回答1:

When you train a regression model on data, you're learning a mapping from the input features to the response variable. You then use that mapping to make predictions by feeding new input features to the model.

If you trained a model on 30 features, it's not possible to use that same model to predict with only 2 of the features. You would have to supply values for the other 28 features.

If you just want to know how those two features affect the predictions, then you can look at the weights (a.k.a. 'parameters' or 'coefficients') of your trained model. If the weight for feature 1 is x, then the predicted response increases by x when feature 1 increases by 1.0.

To view the weights of a model trained with the linear learner algorithm in Amazon SageMaker, you can download the model.tar.gz artifact and open it locally. The model artifact can be downloaded from the S3 location you specified in the output argument to the sagemaker.estimator.Estimator method.

import os
import mxnet as mx
import boto3

bucket = "<your_bucket>"
key = "<your_model_prefix>"
boto3.resource('s3').Bucket(bucket).download_file(key, 'model.tar.gz')

os.system('tar -zxvf model.tar.gz')

# Linear learner model is itself a zip file, containing a mxnet model and other metadata.
# First unzip the model.
os.system('unzip model_algo-1') 

# Load the mxnet module
mod = mx.module.Module.load("mx-mod", 0)

# model weights
weights = mod._arg_params['fc0_weight'].asnumpy().flatten()

# model bias
bias = mod._arg_params['fc0_bias'].asnumpy().flatten()

# weight for the first feature
weights[0]