-->

Strategies to reduce DOM elements of large data se

2019-04-11 15:33发布

问题:

I have a large dataset that I want to display using dc.js. The amount of entries exceeds the available drawing space in pixels on the screen by far. So it does not make sense to render 20k points on a 500px wide chart and also slows down the browser.

I read the Performance teak section of the wiki and thought of some other things:

  • Aggregating groups using crossfilter (e.g. chunk the dataset in 500 groups if I have a 500px wide svg)
  • simplify my data using a Douglas–Peucker or Visvalingam’s algorithm

dc.js offers a neat rangeChart that can be used to display range selection that I want to use.

But the more I zoom in the rangeChart the more Detail I want to show. But I don't know on how to get the zoom level of the chart and aggregate a group 'on the fly'. Perhaps someone has a thought about this.

I created a codepan as an example.

回答1:

This comes up a lot so I've added a focus dynamic interval example.

It's a refinement of the same techniques in the switching time intervals example, except here we determine which d3 time interval to use based on the extent of the brush in the range chart.

Unfortunately I don't have time to tune it right now, so let's iterate on this. IMO it's almost but not quite fast enough - it could sample even less points but I used the built-in time intervals. When you see a jaggy line in the dc line chart

it's usually because you are displaying too many points - there should be dozens not hundreds and never thousands.

The idea is to spawn different groups for different time intervals. Here we'll define a few intervals and the threshold, in milliseconds, at which we should use that interval:

    var groups_by_min_interval = [
        {
            name: 'minutes',
            threshold: 60*60*1000,
            interval: d3.timeMinute
        }, {
            name: 'seconds',
            threshold: 60*1000,
            interval: d3.timeSecond
        }, {
            name: 'milliseconds',
            threshold: 0,
            interval: d3.timeMillisecond
        }
    ];

Again, there should be more here - since we will generate the groups dynamically and cache them, it's okay to have a bunch. (It will probably hog memory at some point, but gigabytes are OK in JS these days.)

When we need a group, we'll generate it by using the d3 interval function, which produces the floor, and then reduce total and count:

    function make_group(interval) {
        return dimension.group(interval).reduce(
            function(p, v) {
                p.count++;
                p.total += v.value;
                return p;
            },
            function(p, v) {
                p.count--;
                p.total += v.value;
                return p;
            },
            function() {
                return {count: 0, total: 0};
            }
        );
    }

Accordingly we will tell the charts to compute the average in their valueAccessors:

    chart.valueAccessor(kv => kv.value.total / kv.value.count)

Here's the fun part: when we need a group, we'll scan this list until we find the first spec whose threshold is less than the current extent in milliseconds:

    function choose_group(extent) {
        var d = extent[1].getTime() - extent[0].getTime();
        var found = groups_by_min_interval.find(mg => mg.threshold < d);
        console.log('interval ' + d + ' is more than ' + found.threshold + ' ms; choosing ' + found.name +
                    ' for ' + found.interval.range(extent[0], extent[1]).length + ' points');
        if(!found.group)
            found.group = make_group(found.interval);
        return found.group;
    }

Hook this up to the filtered event of the range chart:

    rangeChart.on('filtered.dynamic-interval', function(_, filter) {
        chart.group(choose_group(filter || fullDomain));
    });

Run out of time for now. Please ask any questions, and we'll refine this better. We will need custom time intervals (like 10th of a second) and I am failing to find that example right now. There is a good way to do it.

Note: I have one-upped you and increased the number of points by an order of magnitude to half a million. This may be too much for older computers, but on a 2017 computer it proves that data quantity is not the problem, DOM elements are.