I'm using dc.js
, crossfilter.js
and d3.js
to generate a barchart.
The barchart represents data for credit card transactions. It plots number of transactions (y-axis) over transaction dollar amount (x-axis).
It looks like this:
The data array basically looks like:
[
...
{
txn_id: 1,
txn_amount: 20
},
...
]
The data is highly variable depending on different merchants etc and I can't make any assumptions about distributions.
As you can see this graph isn't all that useful because of the data itself. In this case there is 1 transaction for -$7500
and 2
at around $7500
.
In between there other amounts, but most transactions cluster around $0 - $100
where you can see the spike.
Unfortunately there is enough variance that you can't even see the bars for the less frequent transaction amounts.
This answer seems close, but not quite there.
What I'd really like to do is break the x-axis ticks into 10 reasonably-sized chunks that group the transaction amounts sensibly to make the graph more useful.
For example let's say in this case the average transaction amount is $20
. And the extreme min and max values are -$7500
and $7500
So in this particular example I might like to have the x-axis chunked up as so:
Bin 1: -$1000 >= transaction amount
Bin 2: -$100 >= transaction amount > -$1000
Bin 3: -$50 >= transaction amount > -$100
Bin 4: $0 >= transaction amount > -$50
Bin 5: $15 >= transaction amount > $0
Bin 6: $25 >= transaction amount > $15
Bin 7: $40 >= transaction amount > $25
Bin 8: $100 >= transaction amount > $40
Bin 9: $1000 >= transaction amount > $100
Bin 10: transaction amount > $1000
(the chunk/bin size gets smaller and smaller the closer to the average we get).
Admittedly it's been ages since I've done any serious study of statistics, so I'm quite rusty. But it does seem that the way I break my data up into bins/chucks will have a lot to do with the standard deviation of my data.
I guess I have a good feel for what I want, I'm just a bit lost on how to use d3.js
(d3.mean()
, d3.quantile()
?) and dc.js
to get a histogram similarly to how I've described.
So what's the correct way, or what libraries should I be using to:
- Create 10 'reasonably' sized bins according to an arbitrarily given data set
- Group the data into those bins (actually, this part should be pretty straightforward)
In terms of the physical spacing histogram's x-axis, I don't think it's necessary or desired for the ticks to be unevenly spaced (thus perhaps it is no longer a histogram).
I'd prefer the ticks stay evenly spaced despite the fact that chunk sizes are not equal. I will just be sure to label the ticks appropriately.
Any pointers in the right direction would be much appreciated.
Update:
So it seems the d3.js
is several steps ahead of me as usual and has already got my back. I believe I can use d3.scale.quantile()
to break the x-axis up into 10 quantiles (decile). Indeed, I've setup my quantile scale and it seems to be doing the right thing, when I input numbers directly into the quantile scale function (via the JS console) it outputs the correct bucket (out of the 10).
But unfortunately my graph is still messed up. Here is my code:
var datum = crossfilter(data),
amount = datum.dimension(function(d) { return +d.txn_amount; }),
amounts = amount.group();
amountsChart = dc.barChart("#dc-amounts-chart");
amountsChart
.width(defaultWidth)
.height(defaultHeight)
.margins({top: 20, right: 20, bottom: 20, left: 50})
.dimension(amount)
.group(amounts)
.centerBar(true)
.gap(5)
.elasticY(true)
.x(d3.scale.quantile().domain(amounts.all().map(function(d) {
// d.key is the transaction dollar amount,
// d.value is the number of transactions at that amount
return d.key;
}))
.range([0,1,2,3,4,5,6,7,8,9]));
amountsChart.yAxis().ticks(5);
dc.renderAll();
and the resulting chart:
I think I'm getting close, but still not sure where I'm taking a wrong turn.