The theme of this blog is an examination of forces that would disrupt existing data warehouse implementations.  I categorize these as either long tail or black swan events.

Data lakes (and data oceans) are long tail impacts.  They have evolved as the cost of data storage and, most critically, the cost of data analytics across multiple data sources has plunged.  When I first started out over three decades ago, tracking individual interactions was much too expensive.  In the early 90’s for a retailer to store and analyze daily sales by product and store was cutting edge:  individual transactions were much too costly to store, let along tracking customer interactions.  Today that extra detail is worth chasing on a cost/benefit basis. 

For the online firms that pioneered cheap(er) analytics, the data was inherently cheaper to collect.  “For example, Google’s search algorithm is designed to learn from users: if slightly more people click on the ninth link of a given set of search results than one higher up, then the algorithm learns from this, and moves the lower item upwards.”  To compete, the offline firms have been forced to join the fight.  “But the expense of data collection and processing had forced offline businesses to stick to measuring just the big stuff, and thus only to value big wins rather than examining the fine grains of their activities for small but useful improvements. Companies might easily have missed out on changes that, spread across a large customer base, would reliably produce a gain in output of just 1% or so. Now they can make such changes, and even if the aggregate improvement is only of this modest order, it can over time help them to advance steadily on their rivals.”

One thing we can confidently predict is that the cost of computer storage and processing will become steadily cheaper, making the amount of data that can be stored and analyzed with a beneficial commercial result steadily increase in size and domains. 

What now starts to become an interesting question is, “What is the new data that can be captured?”  That is, what is the next black swan?  We have already identified as data targets all customer interactions, machine interactions (sensor readings), network activities, movement (e.g. traffic), etc.  Is it time to start capturing gestures and facial expressions for use by an AI?  Micro-weather activity for real-time stocking and assortment?  Got any others?