
It has been a necessary evil to create data "buckets," typically to accummulate totals or other derived data that can be thought of as synthetic data. Synthetic data looks like data, but its really the result of the application of rules to "real" data. On your pay stub, your net pay amount is itself a data "bucket," because it is the result of various sorts of deductions from your nominal pay. Indeed, it is the result of multi-level bucketing, because witholding might depend on some intermediate calculation, on prior period total payments or other rules. The problem is that what appears to be a simple number, net pay, cannot be understood without tracking back through the layers of rules, real data (hours worked, insurance enrollments) and synthetic data (intermediate calculations).
Of course, magically net pay amount itself becomes "real" data, becuse right or wrong, it is in fact what you get paid and, if you have direct deposit, it becomes a very real transaction to your bank. Nevertheless, whether the data is trustworty depends on a stack of often leaky, sometimes mysterious buckets.
The question is how we can peel away as much of the hidden complexity in order to avoid untrustworthy data propagating downstream.
No comments:
Post a Comment