How to audit digital UX with evidence, not opinion
Turning "this feels off" into cited, prioritised findings, using Nielsen's heuristics, WCAG 2.2, and the Laws of UX, with a scoring rubric and the numbers that prove why it matters.
Most feedback on a screen sounds the same: "it feels cluttered," "the buttons look off," "the flow is confusing." All possibly true, none of it useful, because it is taste, and taste is arguable. The way out is to stop reviewing interfaces by opinion and start auditing them. An audit names what is wrong, points to an established authority for why, rates how badly it hurts, and turns the result into a number you can track. That is how we review every interface we build at Astralab.
This is not pedantry. Bad UX is measurable money. Across roughly 50 studies, the average online cart is abandoned about 70% of the time (Baymard Institute), and Baymard's checkout research finds that design improvements alone can lift conversion by an average of around 35%. "This feels off" stops being a matter of preference when it is quietly costing a third of your sales.
Why "this feels off" is not a finding
One rule separates an expert review from generic critique: every finding cites a specific authority, or it gets dropped. "This feels cluttered" becomes "the primary navigation holds 14 items, past the 7±2 working-memory limit (Miller's Law)." "The button is hard to tap" becomes "the target is 28px, under the 24px floor (WCAG 2.2, SC 2.5.8), and sits far from the thumb (Fitts's Law)." Same observation, now actionable, prioritisable, and no longer arguable. If you cannot name the authority, it is not a finding. It is an opinion, and it belongs in a different list.
Four sources every UX finding maps to
Almost every real usability problem traces back to one of four bodies of work. Naming which one keeps a review honest.
| Authority | What it governs | Cite as |
|---|---|---|
| Nielsen's 10 heuristics | Interaction usability: status, error prevention, consistency, recovery | Nielsen H1 to H10 |
| WCAG 2.2 AA | Accessibility: contrast, keyboard, focus, labels, targets | WCAG SC |
| Laws of UX | Cognition and behaviour: memory, decision time, target size | by name |
| Deceptive patterns | Ethics: manipulative or coercive design | by named pattern |
There is a fifth, quieter source worth knowing: Don Norman's vocabulary of affordances, signifiers, mapping, and feedback, which names why a control does or does not invite the right action in the first place.
The numbers worth knowing
A handful of thresholds settle most arguments on the spot. They are not preferences. They are published standards, and they are the fastest way to turn a hunch into a verdict.
| What | Threshold | Cite |
|---|---|---|
| Text contrast | ≥ 4.5:1 (≥ 3:1 for large text) | WCAG SC 1.4.3 |
| UI and graphic contrast | ≥ 3:1 | WCAG SC 1.4.11 |
| Touch target size | ≥ 24×24px (44×44 for comfort) | WCAG SC 2.5.8; Apple HIG |
| UI response before flow breaks | ≤ 400ms | Doherty Threshold |
| Items in working memory | 7±2 | Miller's Law |
| Headings | one h1, no skipped levels | WCAG SC 1.3.1, 2.4.6 |
| Keyboard focus | visible indicator on every focusable element | WCAG SC 2.4.7 |
Accessibility is the floor, not a feature
Accessibility findings outrank everything else in our reviews. A contrast, keyboard, or labelling failure is an automatic blocker, because it locks real people out, and in many markets it is also a legal obligation. You do not need the full standard to catch most of it. Ten checks cover roughly 80% of real-world accessibility failures:
- Contrast: text (SC 1.4.3) and non-text (SC 1.4.11)
- Info and relationships: semantic structure and associated labels (SC 1.3.1)
- Keyboard operability (SC 2.1.1) and visible focus (SC 2.4.7)
- Target size (SC 2.5.8)
- Error identification in text, not colour alone (SC 3.3.1)
- Labels or instructions on every input (SC 3.3.2)
- Name, role, value for custom controls (SC 4.1.2)
- Status messages exposed to assistive technology (SC 4.1.3)
Turn the review into a score
A finding list is useful. A number is trackable. We weight each finding by severity and subtract from 100.
| Severity | Meaning | Score impact |
|---|---|---|
| Blocker | Fails WCAG A/AA, locks users out, or loses data | −12 |
| Critical | Will cause measurable task failure for many users | −8 |
| Warning | Visible friction, not failure | −4 |
| Tip | Polish, a small improvement | −1 |
Score = 100 − (blockers × 12) − (criticals × 8) − (warnings × 4) − (tips × 1). It is deliberately blunt. The value is not precision. It is that the same screen, reviewed twice, moves in a direction everyone can see, and that a stakeholder can compare two designs without relitigating taste.
Where usability becomes ethics
The last pass is for deceptive patterns: confirmshaming ("No thanks, I hate saving money"), the roach motel that is easy to enter and hard to leave, forced continuity from a free trial into a silent charge, and costs that surface only at the final step. These are not edge cases. Since 2024 the EU's Digital Services Act (Article 25) prohibits them on regulated platforms, and they are catalogued in Harry Brignull's deceptive-patterns taxonomy. A pattern that lifts a metric this quarter and erodes trust permanently is a bug, and we treat it as one.
None of this is about reciting frameworks. It is about producing a review an engineer can act on without a meeting and a stakeholder cannot argue with: each finding named, sourced, rated, and scored. We turned it into a repeatable system because we run it on everything we ship, and because "make it better" was never a brief anyone could build from.
- Nielsen Norman Group, 10 Usability Heuristics for User Interface Design.
- W3C, Web Content Accessibility Guidelines (WCAG) 2.2.
- Laws of UX, Jon Yablonski (Fitts's Law, Miller's Law, Doherty Threshold).
- Baymard Institute, cart-abandonment and checkout-usability research (the ~70% and ~35% figures).
- deceptive.design (Harry Brignull), and the EU Digital Services Act, Article 25.