How to Avoid A/B Testing Abuse

January 6th, 2009

A/B Testing is a great method to test well-researched hypotheses to improve website performance.  Unfortunately, this method is often abused because practitioners rely on the throw-it-up-and-see-if-it-sticks philosophy rather than performing the test as part of an experiment.

What is A/B Testing?

Simply put, A/B Testing is testing a “page” (page A) against another page (page B). This advertising/usability method is used to evaluate the performance of messaging (page text), layouts, images, colors and other elements like forms.

How do I perform an A/B test?

I like to think of A/B Testing like a 7th grade science experiment.  There are steps to be followed, and each step should be conducted completely and accurately to avoid abusing the method.

1. Ask a Question

Ask  yourself what you are trying to accomplish with the test.   It’s important to understand what you want answered  in order to keep the test focused.   A simple  question could be:

“How can I change register button on the current landing page to get more users to click on it?”

The more specific the question, the more focused your test will be.

2. Do some research

Research what has worked in the past or for others.  Are there any documented tried and true ways to achieve your goal in step 1?

Research leads you in the right direction to make a hypothesis.  For our button question,  let’s look at some research: a definition of Fitt’s Law :

Fitts’ Law is a model to account for the time it takes to point at something, based on the size and distance of the target object. Fitts’ Law and variations of it are used to model the time it takes to use a mouse and other input devices to click on objects on a screen.

Broadly, Fitts’ Law can be applied by designers to suggest moving target buttons closer and making them larger for extremely commonly used buttons. In detail, applying the formula can be extremely useful for exact design of time-critical applications.

3. Construct a hypothesis

From the above research, I interpret Fitt’s Law to say that if a button is bigger and closer, it’s more effective.  (This is a very generalized interpretation, but a fascinating topic.  If you’re interested learning more about Fitt’s Law, I suggest that you read the classic AskTog post:  “A Quiz Designed to Give You Fitts”.)

Given my research, I make the following hypothesis:

By increasing the size of my primary call to action button AND by putting that button on the right side of the page, more users will click on it.

4. Test the hypothesis with experimentation

A controlled experiment compares the results obtained from an experimental sample against a control sample.  So for my control sample I’m going to use the current page (A) which includes small button in the top left hand corner.  50% of my site’s visitor will be welcomed with page A.

For experimental sample,  I’m going to increase the size of the button to large and place it on the right hand side of the page to make up page B. Everything else on the page will remain the same.  50% of my site’s visitor will be welcomed with page B.

5. Analyze data and draw a conclusion

Once I’ve completed by experiment,  I’ll analyze the data collected through my web analytics solution, and notice (for example) that 40% of the visitors to page B registered compared to 2% entering the site through page A.  This leads me to the following conclusion:

Moving the button to the right and increasing it’s size significantly increases the propensity of visitors to click on the register button.

6. Communicate the results

One of the most difficult tasks of a product manager or user experience manager has is to socialize enhancements.  The data collected from web analytics, along with your experiment notes is evidence to help socialize, drive and prioritize changes.

A/B Testing Abuse

You know the saying, “Opinions are like a-holes.  Everyone has them.”  This is painfully evident when designing applications, marketing pages, etc.  Everyone from the CEO to the guy that empties your wastebasket has an improvement idea.  Some are good, some are bad, all are valid.  However, in organizations where everyone has a say in the design of a product (or landing page) A/B testing is used to see which idea performs the best.

Here’s a classic example.  A very well-known site that gets hundreds of thousands of page views per hour has been rumored to have tested 72 versions of a page element to see which one performed best.  72 versions before the element was officially changed on the site. 72! There are few reasons I can think of that led to this:

  • Clear goals weren’t established
  • Proper research wasn’t done on the front end to vet ideas that were generated
  • No one wanted to tell the guy-that-empties-the-wastebasket that his idea wasn’t going to be implemented or no one wanted to tell the CEO that his/her idea stank
  • The design team didn’t have the trust of the organization to implement only the changes they knew, from experience, to be the best

etc., etc.

Simple Tips to Avoid A/B Testing Abuse

Here are a few simple tips to avoid the abuse of A/B Testing:

  1. Perform your A/B tests like a scientific experiment ensuring that each step is conducted completely and accurately.
  2. Don’t be afraid to “table” ideas that do not address your question or goal.
  3. Cite reasons for your hypothesis with documented research.
  4. Test specific page elements with a control sample and experimental sample.
  5. Implement web analytics and A/B testing tools to easily perform tests and analyze data.
  6. Establish trust in your organization by communicating your findings with empirical evidence.
Share and Enjoy:
  • Facebook
  • LinkedIn
  • StumbleUpon
  • Digg
  • del.icio.us
  • Google
  • E-mail this story to a friend!
  • Slashdot
  • TwitThis
  • BlogMemes
  • Technorati
  • BlinkList

  1. January 9th, 2009 at 00:08 | #1

    You could not be more correct in that A/B Tests should follow a sound scientific methodology and that they can be misused/abused (the old throw up another version and see how it works approach) when people don’t know how to conduct them. When you’re doing a serious A/B Test, you’re not evaluating which page is better, rather, you’re measuring whether a discreet change in a page makes a difference. Put another way, you’re isolating variables. You cannot seriously claim that one page is better than another until you can identify each and every difference, measured one at a time. Not until you’ve isolated a variable (as in an A/B Test) or multiple variables (as in multivariate tests) can you begin to make serious claims about the performance of one page over another.

  2. Robert Eaglestone
    January 10th, 2009 at 11:38 | #2

    You’re soooo right about A/B Test Abuse. Like so many other things, the design team — including the manager — need to gently but firmly winnow suggestions down, based on their expertise alone, to two or three alternatives. As you mention, the failure could be with the team, or with the organization. Frankly if the organization doesn’t trust the team then the organization is failing in their primary responsibility to the team, and quite possibly to other teams as well.

  3. January 20th, 2009 at 05:48 | #3

    Nice tip, i agree there should be a scientific approach to resolve many retries of design elements.

  4. February 9th, 2009 at 13:31 | #4

    One thing probably you might want to add after #5 is the fact that how statistically significant the results were ? For. e.g. if the test was done on x no of users , what % of your total visits x represents ? In essence , do the results from the test significant enough to extend on larger population ?

    There is z test with which it can be know, however GWO gives this data automatically !

  1. No trackbacks yet.