Measuring User Experience
User experience (UX) is more than just a buzzword; it’s the pulse of how people interact with your product. But how do we know if a user is…
User experience (UX) is more than just a buzzword; it’s the pulse of how people interact with your product. But how do we know if a user is having a good or bad experience? That’s where measuring UX comes into play. Measuring user experience helps us understand what’s working, frustrating users, and where we need to improve. It’s about taking the guesswork out of design and making decisions driven by real data and user feedback. If you want your product to thrive, you need to know how it’s being experienced in the real world, and that’s why measuring UX is absolutely essential.
What is user experience?
“User experience.” may be defined in many ways, but to keep things concise, the NN/g (Nielsen Norman Group is perhaps the foremost user experience consulting firm) defines it like this: “‘User experience encompasses all aspects of the end-user’s interaction with the company, its services, and its products.”
It is important to draw a clear line between User Interface and User Experience.
An often-used analogy is that of the car the styling, dashboard, and steering wheel are the User Interface but driving it is the User Experience. We can see that the User Interface is an important contributor to the experience but is not the experience itself.
Why do we measure it?
The Plan-Do-Check-Act cycle introduced by W. Edwards Deming is used in business to control and continuously improve processes and products. Within this cycle, our ability to measure represents the Check activity.
When applied to User Experience we can state a simpler primary goal: Identify and highlight pain points experienced by the users of a process or product.
Secondary goals might be: to identify, quantify, and communicate the user experience to stakeholders or to get clarity about your positioning and competitive advantages.
What do we measure?
Our measurements are readily grouped into two main topics, task-level satisfaction and test-level satisfaction.
Along with many other satisfaction measurements, short questionnaires are used to survey users.
Task-level Satisfaction
How do we measure it?
When measuring task-level satisfaction users should immediately be given a questionnaire after they complete a task (whether or not they complete the goal). An expedient method of doing this may be to embed the questionnaires into the measured system itself (https://surveyjs.io/ is an MIT-licensed option) to ensure the timely proximity of the task and to reduce the burden on the users to complete the questionnaire. An obvious example of this would be Google’s call to action at the end of a Google Meet call.
Many techniques are in common usage to measure task-level user satisfaction, a sample of which might include:
- ASQ: After-Scenario Questionnaire (3 questions);
- NASA-TLX: NASA’s task load index (5 questions);
- SMEQ: Subjective Mental Effort Questionnaire (1 question);
- UME: Usability Magnitude Estimation (1 question);
- SEQ: Single Ease Question (1 question).
ASQ is a common choice and simple approach which has been validated by research to provide reliable, valid and useful returns for usability studies.
NASA-TLX is widely cited in academia and also has a good reputation for reliable results however it is somewhat more complex.
SMEQ is easy to implement and use, it is also supported by research but only supports single-dimensional feedback.
UME is similarly mono-dimensional and is easy to implement and use.
SEQ is also mono-dimensional and is particularly easy to implement and use — it closely aligns with other metrics. It is also recommended by Measuring U (a research agency specializing in measuring users’ attitudes and experiences)- If You Could Only Ask One Question, Use This One.
What do we do with the data?
The gathered data is used to identify quantitative measures of the task experience (problems encountered or the number of steps to complete a task). These quantitative measures are used against a database of tasks to identify outliers and highlight tasks which are particularly difficult/unsatisfying/frustrating or easy/satisfying/pleasant to complete.
Additional insight can be gleaned by assigning predicted difficulty, satisfaction or pleasure of tasks and comparing these with the actual results. Tolerances might be used in these comparisons to highlight tasks which are beyond the accepted tolerance of overall satisfaction.
The comparison of actual levels with predicted levels, with or without tolerances, can give an ordered list of the least satisfactory tasks within a process or product. When coupled with an impact assessment of this satisfaction a reliable plan of action to improve a process or product can be defined.
Test-level Satisfaction
How do we measure it?
Whereas task-level satisfaction is measured directly after each completed task test-level satisfaction is measured at the end of a session or intermittently over sessions. It often takes the form of a formalized questionnaire and measures the overall impression of the user on their user experience whilst engaging with the process or product.
Once more there are a range of techniques in common usage two of which are:
- System Usability Scale (SUS, 10 questions);
- Standardized User Experience Percentile Rank Questionnaire (SUPR-Q, 13 questions).
The System Usability Scale (SUS) provides a “quick and dirty”, reliable tool for measuring usability. It consists of a 10-item questionnaire with five options for respondents; from Strongly agree to Strongly disagree. It’s been around since 1986 and is used to measure hardware, software, mobile devices and websites.
The benefits are it’s validated through previous use and can work well with small sample sizes. The drawbacks are interpretation of results can be complex and there is a temptation to consider them as a percentage which they are not.
https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html
SUPR-Q is an 8-item instrument that’s gone through multiple rounds of psychometric validation and is used by hundreds of organizations around the world. It’s derived from research and refined across studies, is deemed as reliable and has been validated by extensive use. It measures usability, appearance, trust/credibility, and loyalty.
The benefits are it uses 50 as a midpoint and has extensive industry data, updated quarterly, that can be used for comparison purposes. It also covers NPS (https://measuringu.com/nps-go/) and can predict SUS scores. A drawback is that it does not act as a diagnostic, as similar to most standardized questionnaires (including SUS) it provides a broad measure of the experience, but it’s not specific enough to tell you what to fix.
What do we do with the data?
Similarly to the task-level data, the data is analysed to highlight outliers and potential issues, naturally the purpose of identification is to fix potential issues and to learn from or reinforce aspects that are going well. However, it’s also possible to compare results against industry peers and competitors to get an overall idea of how your users perceive your offering in comparison to others (https://measuringu.com/product/cswreport/)
The NPS aspect of SUPR-Q can also be illuminating allowing you to see overall ratios of Promoters (9–10), Passives (7–8) and Detractors (0–6) — the score is achieved by Simply subtracting the percentage of detractors from the percentage of promoters to get your NPS. Your NPS is not used to compare to your competitors in a similar fashion to a benchmark but is instead used to compare your progress on your journey of improvement.
Conclusion
There is no single “silver bullet” means of measuring your users’ satisfaction with the experience that you are offering them. Instead, there is a collection of techniques, both task and test level, that can be used appropriately (don’t use all of them) and holistically (don’t just task or test level) to arrive at a reasonable overview of how your users feel overall.
This does not consider other important metrics like completion rates, errors and task time which should also be measured to give a fuller overview of the value that your offering delivers.
Notes
ISO/IEC 9126 Quality In Use Metrics (Deprecated) — https://en.wikipedia.org/wiki/ISO/IEC_9126#Quality-in-use_metrics has been replaced by ISO/IEC 25010:2011 Systems and software Quality Requirements and Evaluation (SQuaRE) https://www.iso.org/standard/35733.html
Authors note:
I am currently deeply interested in using AI to generate both visual and text-based content. I am actively collaborating with AI on multiple platforms to explore my thoughts on what creativity is and is not.
My current approach is to collaborate with AI by using the output as a foundation upon which to build and modify.
Other than in the examples the images for this article are created using ChatGPT. The approach used was to enter only the section title as a prompt, the exact text used is under the image. Further instruction was given to make the initial image more abstract.