Supergroup brings extreme convenience and understandability to the manipulation of Javascript data collections, especially in the context of D3.js visualization programming.
As if in submission to the great programmers commandment–Don’t Repeat Yourself–every time I find myself writing a piece of code that solves basically the same problem I’ve solved a dozen times before, a little piece of my soul dies.
Utilities for grouping record collections into maps or nests abound: d3.nest, d3.map, Underscore.groupBy, Underscore.Nest, to name a few. But after these tools relieve us of a certain amount of repetitive stress, we’re often left with a tangle of hairy details that fill us with a dreadful sense of deja vu. Supergroup may seem like the kind of tacky wonder gadget you’d find on a late-night Ronco ad, but, for the low, low price of free, it makes data-centric Javascript programming fun again. And, when you find yourself in a D3.js callback routine holding a datum object that might have come from anywhere–for instance, with a tooltip callback used on disparate object types–everything you want to know about your object and its associated metadata and records is right there at your fingertips.
Just to be clear about the problem—you start with tabular data from a CSV file, a SQL query, or some AJAX call:
Some very fake hospital data in a CSV file…
...turned into canonical array of Objects (using d3.csv, for instance)
Without Supergroup, you’d group the records on the values of one or more fields with a standard grouping function, giving you data like:
d3.nest().key(function(d) { return d.Physician; }).key(function(d) { return d.Unit; }).map(data)
d3.nest().key(function(d) { return d.Physician; }).key(function(d) { return d.Unit; }).entries(data)
To my mind, these are awkward data structures (not to mention the awkwardness
of the calling functions.) The map
version looks ok in the console, but
D3 wants data in arrays, not as objects. The entries
version gives us
arrays of key/value pairs, but on upper levels values
is another array of
key/value pairs while on the bottom level values
is an array of records. In
both entries
and map
, you can’t tell from a node at any level what
dimension was being grouped at that level.
Supergroup gives you almost everything you’d want for every item in your nest (or in your single array if you have a one-level grouping):
var foo = bar;
Works as an Underscore (or Lo-Dash) mixin:
_.supergroup(data, fieldname)
returns an array whose elements are the
distinct values of <fieldname>
in the original data records. These elements,
or Values can be String or Number objects (Dates to be implemented eventually).
Each Value holds a .records
property which is an array containing the subset of
original records matching that Value.
In the example below we do a multi-level grouping by Physician and Unit. So
sg = _.supergroup(data,['Physician','Unit'])
returns a list of
physicians (the top-level grouping). The first item in this list,
sg[0]
, is “Adams”, a String object. sg[0].records
is an array
containing the records where Physician=“Adams”. sg[0].children
is a
list of the Units (our second-level grouping) in the records where
Physician=“Adams”. sg[0].children[0].records
would be the subset of
records where Physician=“Adams” and Unit=“preop”.
Supergroup on physician and unit
It does a bunch more I still need to document.
var gradeBook = [
{lastName: "Gold", firstName: "Sigfried", class: "Remedial Programming", grade: "C", num: 2},
{lastName: "Gold", firstName: "Sigfried", class: "Literary Posturing", grade: "B", num: 3},
{lastName: "Gold", firstName: "Sigfried", class: "Documenting with Pretty Colors", grade: "B", num: 3},
{lastName: "Sassoon", firstName: "Sigfried", class: "Remedial Programming", grade: "A", num: 3},
{lastName: "Androy", firstName: "Sigfried", class: "Remedial Programming", grade: "B", num: 3}
];
var byLastName = _.supergroup(gradeBook, "lastName"); // an Array of Strings: ["Gold","Sassoon","Androy"]
byLastName[0].records; // Array of Sigfried Gold's original 3 records
byLastName.rawValues(); // Array of native strings (easier to look at or use in contexts where you need a plain string)
var byName = _.supergroup(gradeBook, function(d) { return d.firstName + ' ' + d.lastName; });
// an Array of Strings: ["Sigfried Gold","Sigfried Sassoon","Sigfried Androy"]
byName.lookup("Sigfried Gold").records.pluck("num").mean(); // 2.6666666666666665
The above example shows how Supergroup can chain Underscore methods (and mixins), functionality it gets from underscore-unchained.
var byClassGrade = _.supergroup(gradeBook, ["class", "grade"]); // Array of top-level groups: ["Remedial Programming", "Literary Posturing", "Documenting with Pretty Colors"]
byClassGrade[0].children; // Children of a single group: ["C", "B"]
byClassGrade[0].records; // Array original records for a single group
byClassGrade.lookup("Remedial Programming"); // lookup a top-level group by name
byClassGrade.lookup(["Remedial Programming","B"]); // lookup a second-level group by name path
byClassGrade.lookup(["Remedial Programming","B"]).namePath(' -> '); // "Remedial Programming -> B"
byClassGrade.lookup(["Remedial Programming","B"]).dimPath() // "class/grade"
Supergroup can flatten a tree into an array of nodes much like D3’s hierarchy layout, but in a way
that’s easier to use IMHO.
javascript
byClassGrade.flattenTree(); // ["Remedial Programming", "C", "A", "B", "Literary Posturing", "B", "Documenting with Pretty Colors", "B"]
byClassGrade.flattenTree().invoke('namePath'); // ["Remedial Programming", "Remedial Programming/C", "Remedial Programming/A", "Remedial Programming/B", "Literary Posturing", "Literary Posturing/B", "Documenting with Pretty Colors", "Documenting with Pretty Colors/B"]
// only want leaf nodes?
byClassGrade.leafNodes().invoke('namePath'); // ["Remedial Programming/C", "Remedial Programming/A", "Remedial Programming/B", "Literary Posturing/B", "Documenting with Pretty Colors/B"]