Replace value with the average of it's column

2019-07-21 06:15发布

I have an excel sheet with over 1000 columns and 11000 rows - all with numeric data. Within the data, there are missing values represented with '*'.

I would like to replace all of the '*' values with the average of the column that it is in.

Doing this manually would take a long time, so is there a formula that would achieve this?

Thanks so much in advanced for any help.

2条回答
小情绪 Triste *
2楼-- · 2019-07-21 06:40

As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.

By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.

I don't know if there is a method for directly treating "*" as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:

  1. Export the sheet as a CSV and save it.
  2. Use Ctrl+ F to bring up the find and replace dialog and enter "~*" for Find and leave Replace blank

Then login into AML and click the + New at the bottom of the screen

Select New > DATASET > FROM LOCAL FILE and select your file

When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:

Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS collection.

Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT

Drag and drop your saved dataset onto the canvas on the right:

In the Search experiment items box on the right, type: Clean Missing Data then drag the module that appears onto the canvas

Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box

Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.

Right click the bottom module and select Run selected

Right click again and select Cleaned dataset > Save as Dataset

The progress bar at the bottom will inform you when complete

Type in the Search experiment items box again: convert to csv and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:

Select the bottom module and right click > Run selected

Wait for the progress bar to complete.

Right-click the bottom module and hit Download. Done.

查看更多
老娘就宠你
3楼-- · 2019-07-21 06:42

I can give you a three sheet solution Sam?:

Sheet 2: Cell A1=

=AVERAGE(Sheet1!A:A)

Paste that along the top row for each of 1000 columns in sheet 2.

Sheet 3: Cell A1=

=IF(Sheet1!A1="*",Sheet2!A$1,Sheet1!A1)

Copy that and then paste it into the entire worksheet 3 (i.e., that top left corner symbol that allows you to do that). It's gonna take a while to update but will deliver what you want!

查看更多
登录 后发表回答