Five accessibility bugs GitHub Copilot offers to create

GitHub describe their Copilot service as an "AI pair programmer". Examples on their website show it writing whole functions from a comment or a name.

Using it feels strange. It feels like typical IDE autocomplete, except it can suggest blocks of code many lines long. When I started playing with it, it would suggest what I wanted about 75% of the time. The other times were tricky. It would suggest something that looked about right, but with a tiny difference from what I wanted. I lost a bit of time to debugging these, so within a couple of weeks, switched it off by default.

There's an exception though; refactoring. When moving code around, similar actions need repeating several times. Here, Copilot detects the pattern and gets it right first time.

I don't write code full-time, though, so my experience is quite narrow.

Copilot runs on top of GPT-3, a general purpose large language model. According to GitHub, it was trained on "source code from publicly available sources".

There are practical and ethical issues about using code this way, but I'm not going to try to explore these today.

Instead, I'm going to explore a risk which isn't discussed so much. Given public code contains a lot of accessibility bugs; can Copilot suggest them to you?

What I did

To test this theory, I start with an empty folder. I'm working in Visual Studio Code with the GitHub Copilot extension. I'll create the files as I'm exploring.

Feel free to copy and paste the code to try them yourself, but note that you might not get identical results. The model is taking contextual information (it's not clear exactly what, or how much). Even if you managed to get the context the same, there's also "entropy", or random chance due to uncertainty.

The purpose of this is to document what can happen, not every possible result. I'm featuring examples where Copilot suggests code which may be a problem. This does not happen every time.

The Problems

Let's look at the issues I found. For each one, I'll list the code I typed in one block, then the full suggestion in another.

I'll link to the "understanding" pages of the web content accessibility guidelines (WCAG).

Alt Attributes

Let's start with some fundamental accessibility features - image descriptions. They come under criterion 1.1.1 of WCAG: Non-text Content.

I start with a minimal HTML page. It includes a title, level one heading and I start adding an image:

<!DOCTYPE html>
<html>
    <head>
        <title>Matthew's Blog</title>
    </head>
    <body>
        <h1>Welcome to my blog!</h1>
        <img src="header.jpeg"

The suggestion is to add an alt attribute. Sounds good, right?

<!DOCTYPE html>
<html>
    <head>
        <title>Matthew's Blog</title>
    </head>
    <body>
        <h1>Welcome to my blog!</h1>
        <img src="header.jpeg" alt="header image">

No. With only the filename (this was a picture of my dog), it suggests a description that:

contains part of the filename, even if that doesn't relate to the content of the image. "Header" is part of a page, not part of an image.
contains the word "image", which is against best practice for image descriptions. The screen reader would say "image, header image" here.

This will please tools like Lighthouse, which check for the presence of an alt attribute, but it's not good. With no description of the image contents, we've excluded anyone using a screen reader.

Identifiable Links

Next, let's add some styles. I created a style.css file. The file first sets the font family, then I'm ready to do something to the links...

body{
    font-family: sans-serif;
}

a

It suggests changing the color and text-decoration.

body{
    font-family: sans-serif;
}

a {
    color: #000;
    text-decoration: none;
}

It suggests styling them to look exactly like the rest of the text. If we follow this, there's no way for anyone looking at the page to tell what's a link and what's not.

Put together, these two lines in the suggested block are a bad experience for everyone. We've camouflaged links in the page styles. Yet, this does not discriminate against disabled users in particular.

Imagine that we decide that making links black is a bad idea, so we change the color. Imagine we choose a dark blue instead. If we accept the other suggestion, removing the underline, we may have a new accessibility bug. Removing the underline on links can fail success criterion 1.4.1: Use of Color if the colors are too similar.

Focus States

Back in our CSS, we've now removed the link formatting. Let's see what happens if we try to adjust the focus style:

body{
    font-family: sans-serif;
}

*:

I didn't even type focus yet and it has a suggestion:

body{
    font-family: sans-serif;
}

*:focus{
    outline: none;
}

It suggested removing the outline altogether.

These outlines act a little like a cursor for people who navigate with the keyboard. If we take them away, using a keyboard may be impossible, and we created a new barrier.

This is pretty bad. There's a WCAG success criterion that covers this case: 2.4.7: Focus Visible. In this case, GitHub Copilot has introduced something that is a specific example of a failure.

Spans that should be buttons

HTML has buttons which work exactly how you might expect buttons to. They work whether you use a mouse, keyboard or assistive technology. It's possible to make things like spans or divs behave like buttons, but not recommended.

To do this, they need:

a role that tells assistive technology "I'm actually a button, describe me as such"
an accessible name, especially if they don't contain text
to be focusable so a keyboard user can select them
to listen both for clicks and keyboard activation

These requirements come from several WCAG success criteria. Notable are 4.1.2: Name, Role, Value and 2.1.1: Keyboard.

So let's see what happened when I tried to add a div with the class modal to the HTML:

<html>
    <head>
        <title>Matthew's Blog</title>
        <link rel="stylesheet" href="style.css">
    </head>
    <body>
        <h1>Welcome to my blog!</h1>
        <a href="posts.html">Posts</a>
        <div class="modal">

Remember, there's no CSS or JavaScript targeting this class.

It suggests adding a span with the class close, which contains a times symbol.

<html>
    <head>
        <title>Matthew's Blog</title>
        <link rel="stylesheet" href="style.css">
    </head>
    <body>
        <h1>Welcome to my blog!</h1>
        <a href="posts.html">Posts</a>
        <div class="modal">
            <div class="modal-content">
                <span class="close">&times;</span>
                <p>Some text in the Modal..</p>
            </div>
        </div>
    </body> 
</html>

A screen reader says "times". No mention of it being a button, or closing anything.

It's not identified as a button, it has no accessible name and the keyboard focus can't go to it. All these should happen in the HTML. Even better would be to use an actual <button> and get all these benefits.

I'm not criticizing the lack of events when it's clicked or pressed, though. These would be better not included in the HTML anyway.

Contrast

For the final example, let's head back to the CSS. I've written about the contrast formula before, which several success criteria refer to. Depending which level you're aiming for, and how big the item is, it may need to be 3:1, 4.5:1 or 7:1.

Let's start writing some CSS. We tell it that our button will always be light green.

body{
    font-family: sans-serif;
}

button{
    background-color: #99CC00;
    color

As soon as I start typing color, it suggests it should be white.

body{
    font-family: sans-serif;
}

button{
    background-color: #99CC00;
    color: white;
}

White, has a luminance contrast ratio of 1.91:1 with this green. This is well below any of the guidelines.

This text is white on #99CC00, contrast ratio: 1.91:1

Black would have had 10.99:1, but it only suggested white.

This text is black on #99CC00, contrast ratio: 10.99:1

What should you do?

I like GitHub Copilot, at least for the narrow range of purposes I mentioned earlier. At the time I'm writing this up, there have been announcements about the next version of Copilot. Perhaps the next version will be fix some of these issues.

The danger here is that developers accept code suggestions, assuming that they're good. The 'wisdom of the crowd' could suggest that code based on millions of lines of code won't contain bugs. As demonstrated, this is not true.

Filtering the output, as it currently does to remove "offensive output", is possible. That probably wouldn't work better than current automatic accessibility testing and linting. So it would be an improvement, but not an absolute fix.

In my opinion, the responsibility always rests with the developer using the tool.

You shouldn't accept code suggestions from GitHub Copilot if you don't understand them. If you're expecting a certain type of suggestion, and you get one with extra attributes, you need to look them up. Don't use the code until you understand what every part of it does.

This could have a positive side. Maybe. It's possible that Copilot suggests accessibility considerations that people would otherwise have missed. Making people consider how to incorporate accessibility into their work normalizes it.

Wrapping up

In summary:

GitHub Copilot has some impressive technology, and works very well at some tasks
The ethics of whether you should use Copilot at all is a different issue
Copilot suggestions can include serious accessibility bugs
Don't accept AI code suggestions until you understand them