Facebook updates its automatic alternative text system with expanded object recognition

In an update from its artificial intelligence team today, Facebook revealed that it has improved the automatic alternative text (AAT) technology it first introduced in 2016. With this system, Facebook is able to automatically add alternative text to images shared on its platform for users who are blind or visually impaired (BVI).

Alternative text included with uploaded images enables BVI users to learn about the content using a screen reader, which reads the text describing what the image presents. Many people do not provide alternative text when uploading images, which is where the AAT system and its object recognition capabilities come in.

In its update on the technology today, Facebook introduced the next generation of its automatic alternative text, one that it says utilizes ‘multiple technological advances’ that increase the concepts AAT is able to find in photos by 10x and enables the system to provide more detailed image descriptions for users.

With this update, Facebook says that more images will benefit from AAT going forward and that the text descriptions will provide users with more insight into the content, noting the presence of landmarks, animals and even the activities that may be taking place.

The next-generation of AAT is, according to Facebook, also the first of its kind able to include the approximate sizes and location of subjects in the photo, such as noting that a person in the photo is standing off to the side next to a tree that towers over them.

The first version of AAT relied on training data featuring objects in images labeled by humans. Facebook notes that because of the time-intensive nature of this type of training, its original AAT model would only ‘reliably’ identify 100 objects in images.

The next-generation model moves away from that learning method and instead utilizes ‘weakly supervised data,’ namely billions of photos shared publicly on Instagram with hashtags. Using language translation for hashtags and a bit of fine-tuning, Facebook explains that its new models:

…are both more accurate and culturally and demographically inclusive — for instance, they can identify weddings around the world based (in part) on traditional apparel instead of labeling only photos featuring white wedding dresses.

At this point in time, the AAT system is now able to reliably recognize and identify more than 1,200 concepts in images, with Facebook noting that it only includes concepts that the technology can handle with a certain high level of precision. Concepts that were identified with less precision were omitted from this version of AAT.

The next-generation AAT system is available in 45 languages on Facebook and Instagram. Users can choose to get more detailed descriptions for certain images that interest them, such as ones shared by family and friends.