I had two questions today about the DC3 Data.
First: the sub-tree totals don’t necessarily add up. This is intentional: the subtree count tries to avoid duplicates.
In other words, if you have category “parent”, “child1” and “child2”, parent.subtreeProductCount <= child1 + child2 (because there might be some products that are in both child1 and child2). As far as I know, the code computes this correctly.
Second: for the “small” data sets, the root nodes may have incorrect counts. There is a bug in the code that only affects the root (the first line in the file), and only for the subtree data sets. If you really need the numbers for those nodes, you can look them up in the “all-nodes” file.