-->

dc.js and crossfilter reduce average counts per da

2019-03-31 09:25发布

问题:

I am struggle to get my crossfilter groups set up right. Maybe someone can drop a hint!

My datastructure looks more or less this way:

{datetime: "2014-01-01 20:00:00", id:1}
{datetime: "2014-01-01 22:21:08", id:2}
{datetime: "2014-01-02 12:00:23", id:3} etc...

The dimension is on datetime to return the day of week:

var weekdayDimension = ndx.dimension(function(d) {
    return new Date(d.datetime).getDay();
});

Now I have problems with the grouping. I want the average count of events per weekday. So far I have (of course no correct)

var weekdayAvgGroup = weekdayDimension.group(function (d) {
    return d;
});

I think I do not understand what that grouping is doing exactly...

My goal is to have some chart like:

Monday => Average 40.3 Events
Tuesday => Average 35.4 Events

I created a JSFiddle please take a look.

Can anybody drop a hint please?

UPDATE:

After additional thinking I could create a dimension on the Date. All I would have to do is to know the number of days selected in order to calculate the

(total amount of events selected/number of selected days)

So I would need to count the number of groups on the date dimension. But haven't found the solution on this one either.

Thank you

回答1:

The annotated stock example shows how to do averages: http://dc-js.github.io/dc.js/docs/stock.html

Basically you will use a custom reduce function, maintain a count and a sum, and divide the sum by the count (if the count is greater than zero) to get an average.

Reductio also makes this pretty easy: https://github.com/esjewett/reductio

EDIT: Looking back on this, I notice you mean the average of the aggregated counts, across the unique dates for each day of the week.

I know it's too late, but since we get a fair number of these "second-level aggregation" questions, I thought I'd answer this one, in case it helps someone else.

So, our results should bin the data on the day of the week, so we'll set up our dimension and group accordingly:

// dimension on day of week
var dim1 = ndx.dimension(function(d) {
    return d[0].getDay();
});
// group on day of week
var grp1 = dim1.group().reduce(
    ... // what goes here?
);

But how do we do the second-level aggregation? Already crossfilter is going to give all the entries for each day of the week, efficiently. What we need to do is count the entries per unique date.

We can use d3.map for this. We'll first use d3.time.day to remove the time-of-day info, then use .getTime() to get an integer we can index on. Then d3.map creates the "all Mondays", "all Tuesdays" bins:

var grp1 = dim1.group().reduce(
    function(p, v) { // add
        var day = d3.time.day(v[0]).getTime();
        p.map.set(day, p.map.has(day) ? p.map.get(day) + 1 : 1);
        p.avg = average_map(p.map);
        return p;
    },
    function(p, v) { // remove
        var day = d3.time.day(v[0]).getTime();
        p.map.set(day, p.map.has(day) ? p.map.get(day) - 1 : 0);
        p.avg = average_map(p.map);
        return p;
    },
    function() { // init
        return {map: d3.map(), avg: 0};
    }
);    

Finally, we'll compute the average of all bins in the d3.map with this function:

function average_map(m) {
    var sum = 0;
    m.forEach(function(k, v) {
        sum += v;
    });
    return m.size() ? sum / m.size() : 0;
}

It might not be so efficient to walk the d3.map every time a day is added, so the call to average_map could be moved into the valueAccessor we'll use in the chart. I'll leave that as an exercise.

Here is a fiddle demonstrating the technique: http://jsfiddle.net/gordonwoodhull/0woyhg3n/11/

And applied to the original fiddle: http://jsfiddle.net/gordonwoodhull/pkh03azq/6/