If data modelling were easier, I doubt there would be as many books on the subject, and we wouldn’t have multiple methodologies to consider.
I’m not going to explore the different methodologies here – that’s almost a religious argument these days, and I am more than happy to let you adopt whichever method you like. Instead, I want to challenge you to think about what is driving your design, and what makes you consider whether it suits your business or not.
Time and time again I see companies that use software to help them run their business. Sometimes this is an off-the-shelf system or a cloud-based solution; sometimes it’s a bespoke system built by software developers. I’m definitely in favour of using software, and wonder how people operate without it these days.
...but how much is your business driven by the software? I see a lot of businesses being led by their software, rather than having the software adapt to the business. For the most part, I’m fine with either. There is a lot to be gained by using systems developed by similar businesses, and taking advantage of lessons learned by others. Letting that software help guide internal processes can be very useful.
But I don’t think that applies to data models – you should at least consider how much it does.
I don’t like to write about specific customer situations, so I’m not going to describe a particular anecdote in great detail here. But I want to say that I frequently see environments where the models used within data warehouses don’t describe the business that’s going on – they describe the software that’s used.
Many data professionals look at a data warehouse as a platform for reporting, built according to the available data sources. I disagree with this.
The models within a data warehouse should describe the business. If it doesn’t, it’s a data model gone wrong.
What is the central thing that your business does? What is the main interest point? What do you need to look after? For me, this forms the core of the warehouse.
The core of the warehouse is not necessarily the main fact table, but could be one of the main dimensions. If you’re a store, do you care about sales, or do you care about customers? The difference is subtle, but might drive some important design elements.
Two clothing stores might use the same back-end systems for their point-of-sales systems, and might have the same loyalty system set up to persuade people to keep coming back. But one store might have a focus of getting customers back, driving brand loyalty which leads to dedicated fans and word-of-mouth sales. The other store might be more about piquing interest from people walking past the door, and trying to get them to come in and pick up a bargain. Of course, there will be an element of both in both stores, but the culture amongst the staff will be slightly different, as the first tries to identify the customer, tries to make sure that the customer feels cared for, and tries to form a relationship with the customer. It’s less important that the customer buys something, so long as they are going to return. The second sees the customer as a way to get a sale, while the first sees the sale (or even the lack of a sale!) as a way to get a customer. I’m sure you can think of stores in each category.
It would be very easy to create the same data warehouse for both stores, using a standard retail environment. But are the needs of the stores adequately met?
There is no doubt that both stores need sales to stay afloat – the retail business requires it. But if your business culture has slightly different concerns to the industry standard, then the data model should cater for that. Perhaps you need a way of scoring customer loyalty, and some path analysis to see what helps a customer reach a particular level of engagement. Perhaps you need to start collecting extra data. Maybe the stores could consider awarding points for simply visiting the store, even if no sales are actually made. Is the person who works from a cafe and buys just one cup of coffee all morning good for business, or bad for business? Can your data model help explore this, or are you designing a system which only handles the data in your transactional system?
I like to come back to the description of a data warehouse being the single source of truth for an organisation. Many people consider this an issue for data quality – that once data is in the warehouse, it’s trusted and can be used for business analytics. But it should go beyond that. The data warehouse should have transformed the data as kept by the various software packages into data which describes the business, becoming the source of truth about the business. The reports and dashboards across this data should help identify the culture of the organisation, by highlighting the its values and ideals.
The starting point for a data warehouse design should not be “What are the facts we need to measure?” but rather “What are we about as a business?” – often similar, but occasionally not. Ask what success looks like and what questions will address that.
Don’t ignore the bottom line, but also don’t ignore what's really important to the business.
This post was prompted by the seventy-second monthly T-SQL Tuesday, hosted this month by Mickey Stuewe (@sqlmickey).