Usability Testing Case Study: Objective Evaluation vs Subjective Evaluation
Back in May 2016, I and my friends conducted usability testing with special case for our assignment and in short, our lecturer like the idea and concept and made it into usability research.
The case is based on my own experience, that I always had to flight back to my hometown in Bontang, East Borneo, Indonesia. I try different platforms or services to order the flight tickets, because I wanted to build my own feelings and experience, and then made decision which one is the best for me.
In short, I didn’t want this research is based on my case obviously. So, we search some articles about the most popular flight booking website and it came up with DailySocial.id article about the rank of top 10 the most popular flight booking website in Indonesia, you can find here. Why we choose that article? Because it has the latest survey about the rank of the most popular flight booking website in Indonesia at that time. And why I write this story now? Because, I know Medium in nowadays :)
We took the top 2 of the rank for the research, that are Traveloka and Tiket.com, the rank mentioned that Traveloka is the most popular flight booking website and Tiket.com is in a second place. We wanted to validate that rank position by conduct usability testing, that Traveloka and Tiket.com deserve their place exactly like the rank position or otherwise.
Our team proposed to compare of two methods for this research, that is eye-tracking for objective evaluation and System Usability Scale (SUS) for subjective evaluation. Why we do that? when you do some evaluation or assessment, it will be great if you look at and evaluate from all side, like objective side and subjective side. It will give you all perspective that you never thought before.
This is a within-group research with 10 participants, means that each participant got the same treatment to test Traveloka’s website and Tiket.com website. Using within-group research methods had a high probability for biased result, then we re-order the website (sometimes Traveloka first, sometimes Tiket.com first) to reduce the bias. We took random participants from our fellow students, no specific background about flight experience, because we think that everybody (students) will need a flight someday.
We used very simple tools, webcam on laptop by adding xLabs extension on Google Chrome browser (from EyeDecide). The eye-tracking process and SUS assessment is held in one process. Participants are asked to do 3 tasks, each by finding the tickets booking form, banner for tickets promotion, and customer service section either in Traveloka website or Tiket.com website. The tasks is based on the sections that often to use:
- Task 1: find tickets booking form
- Task 2: find banner for tickets/travel promotions
- Task 3: find customer service section
First of, user’s eyes calibrated with the tools to make sure that the camera “catch” the pupil. After did the eye-tracking test, then user scored about what they felt when using the websites with SUS. Okay, allow me to explain what eye-tracking and SUS are in the section below…
In the simplest term, eye tracking means a measurement for eye behavior, what and where their looking at, how their eyes move, how their visual activity behave when looking at an object. Eye tracking can conduct with a certain tools like special-glasses(I don’t know what its name haha), eye gaze, and webcam. For eye tracking using eye gaze or webcam, participants not allowed to wear glasses during the test, because it will prevent the sensor to track the pupil. But it is allowed to wear eye-contact lenses.
The results of eye tracking is a heat-map of visual activity and the eye movement on a certain section or object. The heat-map will showing about how long the duration that the participant stared at a visual display, and the fixation map is showing the eye movement from one section to another section. The longer duration of their gaze, the section will be hotter showed by a red area on heat-map. The fixation map will determine the eye scan path to show the path of eye movement, where it begins and where it ended.
SUS (System Usability Scale)
SUS is a usability testing method developed by John Brooke in 1980s as a “quick and dirty” subjective measure of system usability. SUS using 10-statement to scored a system usability, each statement was accompanied by a 5-point scale of “Strongly Disagree” to “Strongly Agree”. The 10 standard statements are:
- I think that I would like to use this system frequently
- I found the system unnecessarily complex
- I thought the system was easy to use
- I think that I would need the support of a technical person to be able to use this system
- I found the various functions in this system were well integrated
- I thought there was too much inconsistency in this system
- I would imagine that most people would learn to use this system very quickly
- I found the system very cumbersome to use
- I felt very confident using the system
- I needed to learn a lot of things before I could get going with this system
To evaluate the SUS score there is a standard measure to scale the answer. You can read here for the detail measurement.
Objective Evaluation: Eye Tracking
Using Eye Decide gave us two different evaluation approach, task success and task time. Task success is a score that participant finished tasks successfully. Level of success is determined by Area of Interest, an area for the task that need to be found by participants, that can found by participant successfully.
As you can see graphs in the above, for task 1, both Traveloka and Tiket.com has reached the same percentage of success are 90%. For task 2, Tiket.com has higher percentage of success than Traveloka. And for task 3, Traveloka has higher percentage of success than Tiket.com.
Task time is the required time by the participant to complete tasks. Traveloka has the shortest task time (3.46 s), means that the participant can complete the tasks about almost 3.5 s. It takes almost 5 times longer than Traveloka for participant on Tiket.com to complete the tasks.
We conduct hypothesis testing to determine the significant difference about task time between Traveloka and Tiket.com using t-Test: Two Sample Assuming Unequal Variances, with hypothesis:
- H0: There is no significant difference between Traveloka and Tiket.com average task time to complete the tasks
- H1: A significant difference between Traveloka and Tiket.com average task time to complete the tasks
After the t-Test calculation, is discovered that t-Stat is greater than t-Table, means that the null hypothesis is rejected and alternative hypothesis is accepted. It means that the task time between Traveloka and Tiket.com has a significant difference.
Subjective Evaluation: SUS
SUS evaluation score of Traveloka and Tiket.com are shown in grapsh below. A SUS score can be thought of as a percentage of the maximum possible score. This score is subjective, because it is generated based on user’s feelings or experience while using both websites during the experiment.
The mean SUS score was 60.68% for Tiket.com and 74.75% for Traveloka. It is shown that Traveloka had the highest SUS score. To know the significant difference between Traveloka and Tiket.com SUS score, the hypothesis testing is conducted using t-Test: Two Sample Assuming Equal Variances with hypothesis:
- H0: There is no significant difference between Traveloka and Tiket.com SUS score
- H1: A significant difference between Traveloka and Tiket.com SUS score
After the t-Test calculation, is discovered that t-Stat is greater than t-Table, means that the null hypothesis is rejected and alternative hypothesis is accepted. It means that SUS score between Traveloka and Tiket.com has a significant difference.
From the objective evaluation, is shown that Traveloka is superior than Tiket.com, as well as on the subjective evaluation. In essence, objective evaluation is intended to accompany subjective evaluation score. Objective evaluation can said as an evidence to proof what user’s feels about a system/service through physiological measurements.
Overall, it can be said that Traveloka’s website had better usability and user experience than Tiket.com. So, both of them deserve their place on the rank of top 10 the most popular flight booking website in Indonesia. In other hand, Tiket.com had the best score for laying promotion banner ads that can be obtained easily for users to know the latest promo.
Usually, usability testing only based on subjective evaluation. Usability testing for both objective and subjective are not necessarily done, there are times that objective evaluation need to be held, but not always.
Objective evaluation is usually in the form of eye tracking or mouse gaze, and is conducted on desktop/computer digital product (web, desktop app,etc). Currently, there is only a few usability testing with eye tracking for mobile apps on smartphones. When you conduct eye tracking test on desktop/computer, it won’t be the same as on the smartphones. The eye position when using smartphones is matters, so that maybe make it little be difficult to test.
Despite of all that, usability testing is an important activity in product development process, either put research on it or just a simple testing. Always think about time and cost when you conduct the test or evaluation. Complex and comprehensive testing not necessarily suites on your product development process.