a11yGPT: Experimental AI-based Accessibility Testing

Introducing a11yGPT

This is the first in a series of posts on a research project we’re calling “a11yGPT”. I want to see if a Large Language Model (LLM), specifically an LLM powered by Generative Pre-training Transformers (GPTs), can be used to check if websites meet the Web Content Accessibility Guidelines (WCAG). I will incorporate a successful implementation of the tool into Equalify, our open-source accessibility platform.

This post outlines key information about the a11yGPT project and invites expert feedback as we carry out our project.

Process

I am testing how well a GPT-powered LLM identifies items addressed in Version 2.1 of the WCAG. Every guideline and success criterion in WCAG 2.1 will be tested against two pages: one page will include code that meets the guideline or success criterion, and the other page will not meet the guideline or success criteria. All documents we test are available in the equalify-a11ygpt-tests repo. Results will be published on Issue #220 of the Equalify GitHub repo and the #a11y-ai Slack group

I’ll run a11yGPT against each page, iterating prompts as I go. I hope to run a11yGPT about 1000 times per sample page per prompt. That will give us 22,000 data points to assess the efficacy of the automated GPT-based WCAG scanning.

Expectations

WCAG 2.1 guidelines and success criteria that require items in website source code, like alt text, should be easy for ChatGPT to identify. Other items that are more qualitative, like guidelines around text descriptions, may be more difficult for ChatGPT to accurately assess. That said LLMs have the capability of understanding context-based information, so we do expect the LLM to perform qualitative assessments with greater accuracy than existing automated scanning solutions.

Value Proposition

With 96.8% of the million most popular websites failing accessibility compliance testing, accessibility professionals have a lot of work to do. Reliable LLM-based tools can help accessibility experts test and remediate web pages at scale.

Tools like a11yGPT will never replace accessibility experts. Rather than replacing experts, Decubing is committed to enhancing the work of talented individuals who make web content accessible.

Release Schedule

Good accessibility tools require lots of input before they can be publicly released.

We plan to share our findings from this project in a series of posts. We’re spreading these posts out to give accessibility experts a chance to share their thoughts on our results.

I aim to bring Equalify a GPT-powered tool that helps experts responsibly check WCAG on their websites. This tool will be one of many accessibility tools that Equalify is releasing this year.

Join our Journey

Accessibility experts are invited to join us on the #a11y-ai Slack group or join Decubing’s newsletter for key updates.

Also, I should note that I am currently looking for sponsors and partners to help with this project.

If you are interested in helping or have any questions at all, please contact Decubing.

Together, we can equalify the internet.


Posted

in

by

Receive Decubing Email updates!