OpenRefine - Fill between cells but not at the end

I have a list of stock prices for several stocks. Some of the values are missing due to weekends, holidays and probably other reasons.

The gaps are not consistent. Some are two days and some are more than that.

I want to fill the gaps with the last known value but not at the end of the list.

I have tried in Excel to test a few cells below and if it's now empty, do the fill. The problem is that due to the inconsistency of the gaps, it's a tedious task to change the function for all the cases.

Is there a way to test for the end of a list?

UPDATE - added a screenshot.

See this screenshot. I want to fill where the blue dots are. The red dots are at the end of the list and I don't want to fill those cells.

I am looking for a way to detect the end of the list and stop the filling when the end is detected.

标签： openrefine

3条回答

看我几分像从前

2楼-- · 2019-08-11 10:49

I am doing this on the top of my head, but I think your best chance my be using the fill down option in record mode:

first move your column to the first column and switch to record mode.
then use the following GREL: row.record.cells["data"].value[-1] where data is the name of your column

The [-1] will take the last value and fill the blank. For the case with the red dot, since there is no value it should remains empty. Let us know how it goes.

0人赞添加讨论(0) 举报

Root（大扎）

3楼-- · 2019-08-11 10:56

I think this is pretty difficult in OpenRefine and probably a different tool would work better. The main issue is that OpenRefine does not offer the ability to easily work across rows so 'summing a column' (or part of a column) is tricky - this is mentioned in https://github.com/OpenRefine/OpenRefine/issues/200

However, you can do this by forcing OpenRefine in Record mode with the whole project containing a single record. Once you've done this you can access all values in a column using syntax like:

row.record.cells["Column name"].value

This gives an array of all the non-blank values in the column. Since this ignores blank values, in order to have a true view of the values in the column you have to fill in blank cells with a value.

So I think you could probably achieve what you want as follows:

For each column you are going to work with do a cell transform to put a dummy value in empty cells - e.g. if(isBlank(value),"null",value)
Create a new column at the start of your project and put a single value in the very first cell in that column
Switch to Record mode

At this point you should have a single 'Record' in your project - e.g.

You can now access all cells in a column using syntax like row.record.cells["Column 1"].value. You can combine this with 'forRange' to iterate through the contents of this array, using the row.index as the marker for the current row.

I used the following formula to add a new column to the project:

with(row.record.cells["Column 1"].value,w,if(forRange(row.index,w.length(),1,i,w[i].toNumber()).sum()>0,"a","b"))

Then...

Change back to 'Row' mode
Remove the 'null' placeholder from the original column

Create a facet on the 'fill filter' column

In my case I filter to 'a'
Use the 'fill down' option
Remove the filter
And remove the 'record' column

Rather a long winded way of doing it to say the least, but so far I've not been able to find anything better while not going outside OpenRefine. I'm guessing you could probably compress steps 5-11 into a single step or smaller number of steps.

If you want to access the array of cell values using Jython as suggested by iMitwe you need to use:

row["record"]["cells"]["Column 1"]["value"]

instead of

row.record.cells["Column 1"].value

(step 5)

0人赞添加讨论(0) 举报

在下西门庆

4楼-- · 2019-08-11 10:57

Unless there's something I am missing or not seeing... I would have just sorted reverse (date ascending) on the Date column, then individually use Fill Down on each column, except for that last column where you could then use a Date facet on your column Date to specify the exact Date range you wanted to work with, then fill down on that last column, then remove the Date range facet.

0人赞添加讨论(0) 举报

OpenRefine - Fill between cells but not at the end

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间