Hit enter to search or ESC to close

Data Science to Marketing Implementation: Correlation vs. Causation

blurred out shot of a marketing dashboard

I fudging love data. It can be a beast, and ensuring accuracy is a tough process, but once you can certify that the information is correct, digging out nuggets of wisdom is better than getting a hot pizza delivered to your table.

When it comes to marketers, though, most understand the benefit of big data, but the in-between of data generation and analysis can be a mystery. Establishing good data in is always the critical first step in system deployment for future analysis, but in order to do that, you need to understand a bit about data structure and the types of data you can collect.

Data Structure, Types of Data, and More

Every system within your marketing technology stack has a backend database that functions like a big ol’ spreadsheet or table. Information is then posted into that table per row based on some parameters and how that system functions. Sometimes there is one structure for a system, but sometimes there can be several (looking at you, Salesforce) depending on how it’s built and used.

So for a CRM, sometimes a single contact might be a single row, with various fields of information about them cascading out into any number of columns.

spreadsheet example 1

If the tool is more action-based, though, like web tracking, that person may be referenced over multiple rows, and every page view could be a separate row with a unique time stamp on it.

spreadsheet example 2

Each platform in your stack will have a different data structure with different naming conventions and unique identifiers for the housed information. This makes connecting data across systems difficult, so knowing your data and what each field means is a must. This lets us map and blend data from multiple sources for aggregate analysis and closed-loop reporting.

Correlation vs. Causation with Marketing Data

Not all data is created equally, and to delineate between it, we need to understand the core definitions of correlation and causation.

Causation is the reason that something occurs. For example, Starbucks releasing the pumpkin spice latte in October causes a surge in sales and causes lines at Starbucks to be significantly longer. (I have no clue if these are true, but I’ll be damned if those lattes aren’t delish.)

Correlation is a relationship between data variables that may or may not cause the event. Working off the Starbucks example, in October there is a surge in sales, and lines get longer (as does wait time). However, longer lines at Starbucks don’t cause an increase in sales, they’re merely a related data point.

Understanding the correlations between marketing activity data helps you predict things in the future. Understanding the causation behind marketing activity data provides you with the ability to make changes to improve performance.

Now when it comes to data itself (specifically related to marketing and sales activity), there are lots of different sources (web data, ad data, and sales data to name a few) but I like to break all that up into two main categories: known data and inferred data.

Known data means it is either directly tracked and inputted into a system, or it is submitted directly by the individual being tracked. This could be a page visit, the opening of an email, the click of an ad, or the submission of a form. This is data that has a 1:1 connection to a person who is engaging with your brand or digital properties and is understood to be 100% true.

Inferred data isn’t as clear, but it’s just as important. Inferred data is the correlation between additional information based on other known data. It’s the lines you can draw between data points. For example, if a known and tracked lead continues to visit the same group of product pages over and over again, let’s have our CRM set a new inferred data point identifying that this person is likely interested in this product. They haven’t explicitly told us this, but based on their actions it’s safe to assume it’s accurate.

Together, the collection of known data and the application of inferred data allow you to get a better understanding of user activity, purchase likelihood, and so much more. Data that is then collected per prospect can be utilized to automate marketing efforts, improve conversion rates, and bring in the cheddar.

This is where inferred data is critical, because if we operated solely on known data, marketing messaging would be entirely reactive. Inferred data allows us to apply assumptions about activity to relate to what a prospect is likely to be interested in.

Making Sense of the Data

In order to be effective with marketing campaigns and providing analysis, understanding the differences between known and inferred data and between correlation and causation is incredibly important. The setup for data collection depends on it, which means the accuracy of the final output of your marketing information is at stake. If you don’t understand how information is being collected, then you are probably generating bad data, and thus cannot provide accurate analysis.

Discerning what activity is causation as opposed to merely correlation requires testing and analysis, which plenty of software tools out there provide for any number of use cases. And just like that, your martech stack just got bigger.

The development of a complete marketing technology stack is a beautiful explosion of data, which should absolutely be leveraged to your business’ benefit. Do yourself a favor and think through the data details right from the start.

Grady Neff Team Photo at Element Three

When asked to sum up himself with just a single sentence, Grady responded with the following, "Commander of the resistance, unrelenting leader in the defense of organic life, chocolate lover."