How to know if your A/B Split Test really worked

One of the unique features of the WhatCounts A/B testing suite is the ability to test 100% of your list. On the surface, this seems counterintuitive. After all, the point of A/B testing is to find out which combinations of content, sender, and subject work best with a sample of your list and then use the most winning combination to send to the rest of the list.

However, there are cases where taking a small sample of your audience may not prove to be helpful for drawing an effective conclusion about your data. For example, one of our Business Development Managers asked recently whether animated calls to action are less or more effective than static ones. Initially, we did a standard 10/10/80 split, with 10% of the list receiving one version of content with a static image call to action and one version of content with an animated call to action.

Setting up an A/B Split Test.

Let’s take a look at those results:

Sample A: 4,892 people. Response rate for non-animated image: 24 clicks.
Sample B: 4,892 people. Response rate for animated image: 16 clicks.

Now, you might be tempted to make the mistake of declaring Sample A the winner in the test. That would be a critical error. If you know anything about statistics, you are familiar with Pearson’s Chi-Squared Test. Applying this test against our A/B split above reveals an important fact: the difference between Samples A and B is statistically insignificant. That is, random noise, random variations are just as likely to be responsible for the difference in results as the actual test itself.

When we saw these statistically insignificant results in our initial A/B split test, we knew that we had to retest with a much larger sample size. This is one of the reasons why the WhatCounts platform allows you to choose a winner of an A/B test manually – there was no actual winner!

Manual choosing let's you determine winner/ no winner.

Don’t just let your email marketing software automatically “pick a winner” when there may not be a clear winner!

To retest, we used the testing suite to divide our list evenly in half to use the entire list rather than just 20% of it.

Divide the test equally in half.

How did the much larger sample size turn out?

Sample A: 24,463 people. Response rate for non-animated image: 75 clicks.
Sample B: 24,463 people. Response rate for animated image: 87 clicks.

Ah ha! Animated images work, right?

Nope. Again, if you apply the chi-squared test, there is no statistical significance here. Animated and non-animated images behave largely the same – there’s no statistical difference that clearly declares one approach to be better than another. We can now reasonably conclude that for our list, for our audience, there is no statistical difference between animated and non-animated calls to action.

Here’s where data turns into action: by running two tests and validating that there is no statistical significance between animated and non-animated calls to action, we can now make the business and marketing decision that the time it takes to produce an animated graphic is likely not worth it. We’re better off focusing our efforts to improve our email marketing in other ways.

To do your own chi-squared test, use this handy Google Doc, based on the work of Rags Srinivasan.

Christopher S. Penn
Director of Inbound Marketing, WhatCounts

Download our latest eBook

Your email list is your most valuable asset. Learn 57 tips for building it today!

Download
Share on LinkedIn30Tweet about this on Twitter22Share on Facebook1Share on Google+3

1 Comment

  1. I may be missing something, but I’m not able to edit or put in my own values in the chi-squared Google Doc.

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>