Python - Subtract a number of samples from a given

2019-09-20 17:39发布

问题:

I have a dict structure with length 5. The dict structure is called "mat_contents". The information is located in "traindata" and their respective labels in "trainlabels". I want to extract a given number of samples from a given label value. For instance, 60 samples (out of 80) from "traindata" with label "trainlabels" equal 1. I have seen some examples in here but they are different from my request.

Assuming this as an example of Input

 traindata   trainlabels
a               1 
b               2
c               2
d               1
e               1
f               2

The result if I want to extract two random samples of traindata with trainlabels value of 2 could be:

   b  
   f  

回答1:

labels = [k for k, v in mat_contents.items() if v == 1]
result = np.random.choice(labels, 2, replace=False)

The first line extracts the relevant labels from your dictionary, and the second line chooses a random subset of 2 elements from these labels (without replacement), if numpy is imported as np.



回答2:

Can you not use a pandas data frame to do this? Link:Pandas Dataframe Sampling. This is an example that i have used in the past:

    import pandas as pd

    keeping = 0.8
    source = "/path/to/some/file"

    df = pd.DataFrame(source)

    ones = df[df.trainlabels == 1].sample(frac=keeping)
    twos = df[df.trainlabels == 2].sample(frac=keeping)