WCAG 2.2 Accessibility Testing With axe-core and Playwright

Accessibility testing WCAG-style is two jobs glued together. There is the automated half, where a tool scans the DOM against rule sets. There is the manual half, where a human checks whether the experience actually works for someone using a screen reader, keyboard, or low-vision tooling.

Most teams confuse the two. They run an axe-core scan, see zero violations, and ship. That misses real failures. This post covers what WCAG 2.2 actually changed, what axe-core checks, what it cannot check, and how to wire it into Playwright. Then how QA Lab AI's /free-audit packages all of it for teams that do not want to assemble the toolchain themselves.

What WCAG 2.2 changed from 2.1

WCAG 2.2 became a W3C Recommendation in October 2023. It is backwards-compatible: passing 2.2 means you also pass 2.1, which means you also pass 2.0. It adds nine new success criteria (and removes 4.1.1 Parsing).

The new criteria you will trip on most often:

  • 2.4.11 Focus Not Obscured (Minimum), AA. When an element receives keyboard focus, it must not be entirely hidden by sticky headers, cookie banners, or other overlays. This breaks more sites than any other 2.2 criterion.
  • 2.4.12 Focus Not Obscured (Enhanced), AAA. Stronger version: not even partially obscured.
  • 2.5.7 Dragging Movements, AA. Anything you can do with dragging must also be doable with a single pointer click or tap.
  • 2.5.8 Target Size (Minimum), AA. Interactive targets are at least 24 by 24 CSS pixels, with documented exceptions.
  • 3.2.6 Consistent Help, A. Help mechanisms (chat, contact link, FAQ) appear in the same relative order across pages.
  • 3.3.7 Redundant Entry, A. Do not ask users to re-enter information they have already provided in the same session, unless re-entry is essential.
  • 3.3.8 / 3.3.9 Accessible Authentication. No cognitive function tests (puzzles, transcribing) unless an alternative is offered.

Of those, axe-core can fully automate roughly 2.5.8 (Target Size) and partially 2.4.11 (Focus Not Obscured). The rest are mostly manual or require flow-aware testing.

What axe-core actually checks

axe-core is a JavaScript engine that runs against a rendered DOM. It evaluates around 90+ rules mapped to WCAG levels A, AA, and AAA, plus best-practice rules.

Things it catches reliably:

  • Missing or empty alt attributes on images
  • Form inputs without associated labels
  • Insufficient color contrast (text and non-text)
  • Missing document language (<html lang>)
  • Improper ARIA usage (invalid roles, required attributes missing, conflicting states)
  • Heading order skips
  • Landmarks missing or duplicated
  • Links and buttons with no accessible name
  • Touch target size (WCAG 2.2)
  • Some focus-visible and focus-trap problems

Each finding is tagged minor, moderate, serious, or critical. In CI, most teams fail the build on serious and critical and triage the rest.

What axe-core cannot check

This is the part that gets skipped. Axe-core sees the DOM, not the experience. It cannot tell you:

  • Whether alt text is accurate ("photo" passes; the image is a chart)
  • Whether the keyboard tab order is logical, only that elements are focusable
  • Whether a screen reader announcement makes sense
  • Whether color alone conveys meaning (it can flag low contrast, not the meaning of red vs green)
  • Whether motion can be paused or disabled per user preference, beyond prefers-reduced-motion hooks
  • Whether form errors are helpful, only that they are programmatically associated
  • Whether your help mechanism is consistent across pages (WCAG 2.2, 3.2.6)
  • Whether dragging has a single-pointer alternative (WCAG 2.2, 2.5.7)

Deque, which builds axe-core, has been clear about this for years: automation reliably catches around 30 to 50 percent of WCAG issues. The rest needs assistive-tech testing and code review.

Treat axe-core as the floor, not the ceiling.

Integrating axe-core with Playwright

QA Lab AI uses @axe-core/playwright under the hood. You can use the same package directly in your test suite. Install it alongside Playwright.

npm install --save-dev @playwright/test @axe-core/playwright

A minimal accessibility spec:

import { test, expect } from '@playwright/test';
import AxeBuilder from '@axe-core/playwright';

test.describe('Accessibility — marketing home', () => {
  test('has no serious or critical WCAG 2.2 AA violations', async ({ page }) => {
    await page.goto('/');

    const results = await new AxeBuilder({ page })
      .withTags(['wcag2a', 'wcag2aa', 'wcag21aa', 'wcag22aa'])
      .disableRules(['region']) // example: scoped exclusion with a tracked ticket
      .analyze();

    const blocking = results.violations.filter(v =>
      ['serious', 'critical'].includes(v.impact ?? '')
    );

    expect(
      blocking,
      JSON.stringify(blocking, null, 2),
    ).toEqual([]);
  });

  test('contact form is accessible after interaction', async ({ page }) => {
    await page.goto('/contact');
    await page.getByLabel('Email').fill('not-an-email');
    await page.getByRole('button', { name: 'Send' }).click();

    // Re-scan after the error state renders
    const results = await new AxeBuilder({ page }).analyze();
    expect(results.violations).toEqual([]);
  });
});

A few things worth highlighting.

Scan after interaction, not just on load

The first scan tests the initial DOM. Real bugs hide in error states, modals, autocomplete dropdowns, and post-submission flows. Drive the page with Playwright, then run analyze().

Use `withTags` deliberately

wcag22aa includes the new 2.2 success criteria. Without it, you are testing 2.1 and missing target-size and focus-obscured rules. Pin to the level you commit to (typically AA), not AAA.

Scope rule disables

disableRules is sometimes necessary, especially during a remediation rollout. Comment every disable with a ticket ID and an expiration date. Otherwise the suppress list becomes the new normal.

Fail on impact, not count

Total violation count is noisy. Failing on serious and critical keeps the signal high. Track moderate and minor as a debt metric.

How QA Lab AI packages this for non-experts

The Playwright snippet above assumes you have a Playwright project, a CI runner, and the muscle memory to interpret axe output. Many teams — especially smaller product teams without a dedicated accessibility engineer — do not.

QA Lab AI's /free-audit runs axe-core, Lighthouse, OWASP checks, SEO audits, broken-link scans, and cross-browser and mobile checks against any URL. No signup, no card. The output groups violations by impact, links each one back to the WCAG criterion, and explains which findings still need manual review (because the tool will not pretend automation caught everything).

When you are ready to turn findings into runnable tests, the same engine generates BDD scenarios from the audit. Those .feature files cover accessibility alongside functional and security cases — see /test-cases for the full list of test types. Pricing for unlimited generation and team accounts is on /pricing.

Try it

Run an audit on a real URL. /free-audit accepts any public page and returns a WCAG 2.2 report with axe-core findings, Lighthouse scores, and SEO and security checks in one pass.

If you want to wire axe-core into your own Playwright suite first, the snippet above is a working starting point. When you are ready to scale generation across functional, accessibility, and security cases, the Test Repository keeps everything synced with Jira, Zephyr, or Azure DevOps on Enterprise.