A/B Testing in ASO. What Is It and How to Conduct It in Apple’s App Store or Google Play?

A/B testing in ASO is the process of comparing two or more variations of visual or textual elements to determine what the store visitors perceive as the most appealing option. You can conduct A/B testing on screenshots, icons, or textual metadata within the context of Google Play. RadASO team will take you by the hand and explain what is A/B testing in ASO, the key differences in A/B tests for the App Store and Google Play, and how to do it correctly.

Join the open ASO & User Acquisition community on Discord - ASO Busters! Here, engage in insider discussions, share insights, and collaborate with ASO and UA experts. Our channels cover the App Store, Google Play, visual ASO, ASA, UAC, Facebook, and TikTok.
  1. A/B Testing in ASO – How it Works
  2. What to Consider when Preparing a Hypothesis for A/B Testing
  3. A/B Test Differences Table in App Store and Google Play
  4. How to Publish an A/B Test in the App Store
  5. How to Publish an A/B Test in Google Play
  6. How to Prepare the Application for Testing
  7. A/B Test Results in Google Play
  8. A/B Test Results in the App Store

A/B Testing in ASO – How it Works

Split the total number of users into two groups: A and B. Group A continues with the usual experience and sees the current screenshots. Group B receives a new experience and views fresh new test screenshots. Continue testing until you identify the group with the superior installation conversion rates.

AB test_beginning

During the test launch, there is an opportunity to select various parameters:

  • The percentage of users to whom it will be displayed.
  • Countries in which the test will be conducted.
  • Conditions under which the test will be considered successful.

However, setting the parameters of whom the test will be displayed to or controlling the audience demographics is impossible.

The main objective of A/B tests in ASO is to improve conversion rates in one of the variants. Sometimes, minor changes, such as a different color for the CTA button, lead to significant differences in user interaction with the application. When creating a hypothesis, specify what you will change and why.

What to Consider when Preparing a Hypothesis for A/B Testing

  1. Choose an element that will be changed (tested) and, in your opinion, will have a significant impact on users. For example, the background in one of the screenshots. The hypothesis may be that changing it will increase conversion.
  2. Define the specifics of the change. Specifically and clearly indicate what you want to change in this element and add approximate references. For example, "replace the dark background with a light one" or "replace the background with an image of people with a solid background."
  3. Evaluate how the change affects users. The test will not show results if only a small percentage of users notice the change.
  4. The change should be noticeable on the first three vertical screenshots (if there's a video, on the first two). On horizontal screenshots or videos, the changes should be obvious right from the start.

Before_After

Let's look at examples:

Example 1. Changes are not immediately apparent on the sixth screenshot. Most users only look at the first few and don't scroll to the end. Therefore, such a test is not useful since its results do not allow you to draw a meaningful conclusion.

Graphic Bad Sample

Example 2. Changes are immediately noticeable on the very first, most conversion-driven screenshot. Only one crucial shift is being tested, not several simultaneously. The results of this A/B test will reveal what users find more alluring for viewing and downloading.

Graphic Good Sample

A/B Test Differences Table in App Store and Google Play

Google Play

App Store

What can be tested?

  • Short description
  • Long description
  • Icon
  • Feature graphics
  • Screenshots
  • Videos
  • Screenshots
  • Videos
  • Icon (has to be uploaded to the build*)

Number of simultaneously running tests 

5 tests (each test is valid within a single country. You can choose a default country test (details below): then it will run in all countries where there are no localized graphical or textual materials.

1 test (the test can be immediately extended to all countries where the application is available or opt for specific countries as needed)

The number of test variants that can be tested with the current version in the store

Compared with a maximum of 3 new variants

Can a test be launched while another item is under review?

Yes

No

Mandatory formats for screenshots uploaded to the store

6.5 

6.5
5.5
12.9 (if there is an iPad version)

*Build – is a new version of the application. Updating the icon is only possible when updating the application version in the store. In other words, the term "build" refers to a specific version or variant of the application that is ready to be downloaded and installed on the users' devices. It contains all the necessary files and data for users to install and use the application.

More about optimizing graphic elements in the App Store and Google Play can be found in the article 'Graphics in Mobile App Promotion in the App Store and Google Play (ASO) – How to Optimize Graphic Elements.'

How to Publish an A/B Test in the App Store

1. Navigate to the Product Page Optimization tab in the App Store Console. AppStore_1st step

2. After naming the test, specify the type of test you are launching (A/B, A/B/B, or A/B/C test, etc.), the countries for displaying this test (by default, all 39 countries are selected), and an approximate test duration. AppStore_2nd step

3. Upload your graphic materials. AppStore_3d step

For a more detailed description, read the official App Store documentation.

How to Publish an A/B Test in Google Play

1. On the Store listing experiments tab in the Google Play Console, select the countries where you wish to conduct the test. Unlike the App Store, you can only choose one country for one test or opt for a test in the default country (i.e., for all countries without localized graphic or text materials, depending on what you are testing). So, determine whether the test will be conducted in the default or a specific country.

More information can be found in the official documentation.

Google Play_1st step

2. Configure the metrics that affect the accuracy of the test and determine the number of downloads:

  • Metric aimed at users who have downloaded the application or those who downloaded and did not delete it within the first day.
  • The test variant you will launch (A/B, A/B/C, A/B/C/D – more information on the main differences below).
  • The percentage of visitors who will see the experimental variant instead of the currently active one.
  • The minimum difference between the new variants and the currently active variant that will determine the winner.
  • Confidence coefficient in the test results.

Google Play_2nd step

3. Determine what to test. Unlike what is the case in the App Store, you can test not only graphic elements but also text (full and short descriptions).

For A/B tests, you can only upload screenshots in one size. Google will automatically adapt them to other formats.

Google Play_3d step

How to Prepare the Application for Testing

1. Run A/B/B tests, not A/B tests.

А – is the current variant of screenshots (or other materials for testing) that are currently in the store.

В – is the new variant of screenshots that need to be tested.

В – duplicate the screenshots to be tested.

Screen A_B_B test

A/B/B tests additionally confirm the likelihood of results. Ideally, in the best scenario, B1 and B2 should exhibit fairly similar performance metrics (more about this in the 'A/B Test Results' section below).

ABB same results

2. The test should last for at least two weeks (depending on how much traffic the application is getting).

As shown in the example below, sometimes this is not enough. The total amount of traffic was low, so two weeks turned out to be insufficient. Ambiguous results persisted for about a month. However, in one and a half months, significant improvements were observed for options B1 and B2. In total, the test lasted for more than 70 days.

78 Days

3. Graphical changes should be significant.

  • Select a single hypothesis that has the greatest potential to impact the end-user in a significant manner.
  • Focus on the key changes in the first three screenshots during the hypothesis test (if there are also videos, focus on the first two). If the application has horizontal screenshots, focus on the first one.

In the case of rebranding (changing colors, fonts, characters, etc.), the screenshots should undergo a drastic transformation. This is also recommended if the previous screenshots are deemed to be unsatisfactory.

4. Cross-Marketing Activity

Consider global marketing activities. Users associate the brand with specific characters. Therefore, in all promotion channels and during tests in the store, use screenshots with the same characters.

 

Marketing

5.Consider the strength of the brand.

A popular application (e.g., Netflix) receives the majority of its views and downloads through brand-specific search queries. Graphics have little influence on user choices. The results of such a test may not always be indicative, despite the amount of traffic and changes.

6. Cultural Localization

Pay attention to the cultural nuances of each region. Localize the language in the screenshots, add colors, elements, and individuals representative of the country. This will spark the interest of the local population.

Different Culture

A/B Test Results in Google Play

Dictionary:

  • Audience – % of users who see the experiment.
  • Installers (current) – the number of actual downloads during the experiment.
  • Installers (scaled) – the number of downloads during the experiment divided by the audience share.
  • Performance – the likely change in conversion rates when applying the tested variant (the metric is available when there is enough data).

Example 1:

Test Results_good

Most likely, test screenshots A and B will win. However, if the result in the Performance column is not entirely in the 'red' or 'green' zone, such results should not be considered 100% reliable.

Let's calculate the expected conversion change:

  1. for Treatment B1: (-11.5 + 24.9) / 2 = 6.7
  2. for Treatment B2: (-9.5 + 15.1) / 2 = 2.8
  3. average of the two values: (6.7 + 2.8) / 2 = 4.75

Conversion will increase by 4.75%. If the current conversion was 30%, the projected conversion will be: 30 + (30 * 4.75 / 100) = 31.43%*

*Important! Do not add the average Performance percentage to the current conversion; instead, change the current conversion by that percentage.

Example 2:

Test Results_bad

Both variants displayed significantly negative outcomes. Conclusion: the test was unsuccessful.

Example 3.

Test Results_strange

The same test variant produces different outcomes: in V1, it results in a favorable outcome, while in V2, the opposite occurs. In such a case, calculations using the formula won't yield reliable results to base your decisions on. V1 and V2 should yield more or less similar results.

A/B Test Results in the App Store

Glossary:

  1. Conversion rate – the conversion of test variants (Apple, in contrast to Google Play, displays this immediately).
  2. Improvement – the relative difference between the variant being tested and the variant that’s currently active in the store. If you click on it, you'll see the percentage range over the entire testing period.
  3. Confidence – the confidence level in the results of each individual variant. It should be at least 50% to reach a conclusive decision about the test.

The Confidence and conversion rate improvement indicators in the chart below demonstrate that this test is a winner.

App Store_test results

After adopting the winning test variant, measure the conversion once again.

A or B

A/B tests are an ongoing process because user preferences constantly change. Today, they might be drawn to a blue background, but later, red might receive more attention.

It's also important to evaluate the results accurately. The test winner doesn't always guarantee an improvement in conversion, and vice versa, and drawing conclusions too hastily can lead to unexpected outcomes.

Topics:
8
2
Found a mistake? Select it and press Ctrl + Enter