By Tobin Harris
Managing Director, Pocketworks
December 21, 2024
Updated December 22, 2024
By Tobin Harris
Managing Director, Pocketworks
December 21, 2024
Updated December 22, 2024
If you're responsible for developing an app or digital product, you'll know how important it is to test your app for bugs each time you release it. You'll also know how expensive and time-consuming this process can be.
Quality control can consume up to 40% of revenue according to IBM (though our data shows it's typically 15-20% for mobile apps), I started wondering: could AI agents make testing both more efficient and more enjoyable for our testing team?
While I haven't found a fully working solution yet, I've been experimenting with some seemingly viable approaches that could transform how we test mobile apps.
Gartner predicts that Agentic AI will be a very hot topic in 2025. But what makes it different from what we're using now?
If you've used ChatGPT, you're familiar with its request-and-response pattern. Agentic AI works differently - it can operate autonomously toward a goal without constant human guidance. Think of it as an independent tester that can:
This is quite different from traditional automation because the agent isn't following a rigid script. For example, if it encounters an error, it might try rebooting the app or find alternative paths to complete the same task.
We currently use Sofy for our UI automation testing, and I'm hoping they're looking at this stuff.
Major UK banks are already pushing the boundaries of testing automation. Take Lloyds Banking Group - they've adopted an 'Automation first' philosophy, with dedicated Software Development Engineers in Test (SDET) using frameworks like WDIO, Selenium, Cucumber, and Appium. But even these sophisticated tools require significant human oversight and maintenance. AI agents could take this automation to the next level, reducing the manual workload on their testing teams while improving coverage.
Traditional testing requires writing detailed test cases with every single step spelled out. Even with automation tools like Selenium, someone needs to code all these steps. With a goal-driven agent, you simply tell it what the user should achieve, and it figures out how to get there. As you can imagine, this could massively reduce the work involved in test creation and maintenance.
Here's a quick overview of each experiment. Excuse the naff diagram!
My first and most promising experiment used Claude and its new "computer use" feature. Setting it up was surprisingly straightforward:
Here's the actual prompt I used:
Act as a mobile tester. Use the open firefox window to use the iPhone device in the BrowserStack window (take a fresh screenshot to see that) You want to use the free version of the app so dismiss the paywall. In the Carbs & Cals app, log your breakfast of a cup of tea with milk, and a small portion of yoghurt with honey. Then, find out how many carbs and calories your breakfast was. Wait 5 seconds between screenshots so you don't exceed the 40,000 token Claude rate limit.
Here's what happened:
As the video shows, the results were pretty cool.
Claude navigated through the onboarding flow, skipped the paywall (after I added that to the prompt), and started adding breakfast items. While it ultimately got stuck trying to add items to the diary, watching it figure things out was impressive. The main challenges were:
My second experiment involved MacOS Automator combined with ChatGPT. The basic workflow was:
While I got it working for basic tasks like finding buttons and filling search boxes, it struggled with maintaining context between actions. Still, with some more development time, this approach could be viable.
My final experiment used DoBrowser, a Chrome extension for automating browser tasks. While it seemed promising for browser-based testing, sadly it couldn't interact with the device screen in BrowserStack.
However, its speed was impressive - if only it could be adapted for mobile testing. At least the experiment was cheap - I only stumped up £25 to give the Chrome extension a whirl!
The financial impact of AI testing is compelling. Recent studies reported by Forbes show companies implementing AI-driven testing are seeing ROI improvements of 200-300% over traditional methods in just the first year. This matches what we're seeing in our experiments:
These improvements come from reducing the need for specialized automation engineers (currently commanding £50K-£150K salaries in the UK) and dramatically cutting down the time spent maintaining test cases.
Several questions still need addressing:
AI queries, especially those involving images, can be energy-intensive. Uber's approach of training smaller, more efficient models and using clean energy for their data centers seems promising. This also has the benefit of reducing latency since the model can be hosted closer to where it's needed.
If you're interested in sustainability, check out my list of apps that are driving sustainability.
While these experiments show promise, we're still waiting for commercial tools to make this technology widely accessible. Following proper mobile strategy, organisations should:
In case you're wondering, Pocketworks is a software consultancy that specialises in mobile apps.
We bring you expertise in user research, mobile technology and app growth tactics to help you develop apps that create positive impact for your customers, shareholders and society.
To get a flavour of us, check out our free guides and app development services. Or, see some more background info on us.
Enter your email below and get notified when we release new content.