One of the first areas I explored when considering the long tail’s impact on EDW was the growth in data storage requirements. Supporting the long tail impacts the overall EDW storage requirements in two ways: width (expansion of the data model) and depth (more history). Let us look at one common subject area for all enterprises: Party (customer).

Long tail parties will now include low activity and dormant customers. And considering this NY Times article ( on a new service called Abandonment Tracker Pro, data capture requirements may also include customers who haven’t yet purchased anything. Offers to these long tail customers will be necessary as they comprise a larger portion of the overall enterprise sales than in the past, and as the targeted offers become more exact and more varied as the enterprise’s useable data grows. The additional costs will be the capture and storage of the long tail customers’ data and additional CPU and disk I/O to run the deeper analytics.

Then we can ask, “Just how much data do we want to store for a party?” Certainly we’d like to total history of the enterprise’s relationship with a customer, even if it spans decades. Some examples: a specialty retailer may see only a belt or a DVD as an initial purchase for a potentially long-term profitable customer; financial institutions would like to have a lifetime relationship with a client and then follow-on seamlessly the client’s descendants. These enterprises can vary offers and services based on changes in generational purchasing patterns.

Not only is there expansion of the history requirements, but also expansion of the width of the data model requirements. For example, Saks captures detailed data records of their customers’ shopping experience. Going forward, what will be the cost of record creation and retention for more retailers to capture this kind of data? How much time does it take for a sales associate to text a comment for inclusion in the EDW? Additional data needed to document the party relationship can include service requests, complaints, inquires, e-mail, web interactions, audio (call center), and GPS. To quote Bill Inmon, the enterprise has a “... financial interest in analyzing all types of customer interactions with the enterprise – not just transactions”

And what about data outside of the enterprise’s direct relationship with a customer? How about their social networking data (Facebook, MySpace, LinkedIn, etc.)? Should the enterprise capture and mine this seemingly near infinite data to generate offers that relate to the customer’s interests?

Other subject areas that immediately come to mind that can replicate this kind of growth include supply chain, product and product manufacturing, and regulatory tracking and reporting requirements.

We need to anticipate and plan for the EDW’s support requirements to grow exponentially over what might be considered to be “best practice” today.