Making Augmented Reality Accessible: A Case Study of Lens in Maps

Transcript

Oda: I wonder if any of you could guess what this number, 1 out of 4, may represent? You might be surprised to hear that actually this number represents today’s 20-year-old becoming disabled before they retire. This could be caused by accidents, disease, or their lifestyle, or it could be anything that happens that’s disastrous in their lifetime. It’s been estimated that about 1.3 billion people worldwide have significant disability today, which is roughly 16% of the world population.

As humans live longer, the chance of having a disability increases. Starting in the mid-1800s, human longevity has increased a lot, and the life expectancy is increasing by an average of six hours a day. The point is that we all hope to live healthy and without any disabilities, but that’s not always the case. Anyone can have a certain type of disability during their lifetime. There are actually many types of disabilities that exist, and I know these are not legible and that’s on purpose. For my topic, I’ll be specifically focusing on visual impairment-related disabilities, such as blindness and low vision.

My name is Ohan Oda. I work on Google Maps as a software engineer. I’ll be talking about how we made our augmented reality feature, called Lens in Maps, accessible to visually impaired users, and how our learnings could apply to your situation. I wonder how many of you here have used the feature called Lens in Maps in Google Maps? Just a hint, it’s not a lens. It’s not street view. It’s not immersive view. It’s not one of the AR Walking Navigation that we provide that has a big arrow overlaid in the street. It’s a different feature. Most of you don’t know.

What is Lens in Maps?

First, let me introduce what Lens in Maps is. It’s a camera-based experience in Google Maps that helps on-the-go users understand their surroundings and make decisions confidently by showing information in first-person perspective. Here’s a GIF that shows how Lens in Maps works in Google Maps. The user enters the experience by tapping on the camera icon at the top on the search bar, and the user will hold their phone up, and they can see places around them. They can also do search for specific types of places, such as restaurants. Here’s a video showing how this feature works with screen reader, which is an assistive technology often used by visually impaired users.

Allen: First, we released new screen reader capabilities that pair with Lens in Maps. Lens in Maps uses AI and augmented reality to help people discover new places and orient themselves in an unfamiliar neighborhood. If your Screen Reader is enabled, you can tap the camera icon in the search bar, lift your phone, and you’ll receive auditory feedback about the places around you – Restaurant Canet, fine dining, 190 feet – like ATMs, restaurants, or transit stations. That includes helpful information like the name and type of the place you’re seeing, and how far away it is.

Oda: Here you saw an illustration of how this feature works with screen reader.

Motivation

AR is a visual-centric experience. Why did we try to make our AR experience accessible to visually impaired users? Of course, there are apps like Be My Eyes that is targeted specifically for visually impaired users. Our feature, Lens in Maps, was not designed for such a case. Indeed, there are not many AR applications that exist today that are usable by visually impaired users. Lens in Maps is useful when used during traveling, where the place or the language is not familiar to the user. Our feature can show the places and streets around the user with the language that the user is familiar with.

However, this feature is not used very often in everyday situations, because people know the places and they understand the language seen on the street. There’s also a friction to this feature. Like any other AR apps that you probably have used before, you have to take out the phone, and you have to hold your phone up and face the direction where the AR elements can be overlaid. This can be sometimes awkward, especially in the public area where people are standing in front of you. They might be thinking you’re actually taking a video of them. In addition to this general AR friction, our feature also requires a certain level of location and heading accuracy relative to the ARs so that we can correctly overlay the information in the real world.

This process is very important so that we don’t mistakenly, for example, overlay the name of the restaurant in front of you with the name of the restaurant next to it. This localization process really only takes a few seconds, but people are sometimes impatient to even wait for just a few seconds, and they would exit the experience before we can show them any useful information. These restrictions make our Lens in Maps feature used less often than we would like it to be. We have spent a lot of time designing and developing this feature, so we would love to have more users using it and also loved by the user.

Ideation

While thinking about ideas, how we can achieve that, I found that our other AR feature that we provide in Google Maps, called AR Walking Navigation, has a very good DAU and has a very good user retention rate as well. This is a feature that is targeted to navigate users from point A to point B with instructions overlaid in the real world with big arrows, big red destination pins as you can see from the slides. Why so? This feature has the exact same friction as Lens in Maps, where people have to hold their phone up and they have to wait for a few seconds before they can start the experience.

After digging through our past documents, our past presentations in our team, I found that our past UX studies have shown that AR Walking Navigation can really help certain users and those users who actually have difficulties reading maps and understanding it. Basically, the directions displayed on the 2D map didn’t make much sense to those users, and showing those directions directly overlaid in the real world really helped them understand which directions to take and where exactly the destination is, which made me think what kind of user would really benefit from using Lens in Maps that eventually it becomes a must-have feature for them. Even though this feature has some restrictions to start the experience, the benefit of using this would actually outweigh the friction.

Research

After thinking over and over, an idea struck me that maybe Lens in Maps could help visually impaired users because our feature can basically show the places and streets in front of them. Not show, but tell, for this case. I thought it was a good idea, but I had to do some research to make sure this feature can really help those users. Luckily, Google provides many learning opportunities throughout the year and they had a few sessions about ADI, which stands for Accessibility and Disability Inclusion. After attending those sessions, I learned that last-mile problems can be very challenging for visually impaired users. The navigation app that you have today may actually tell you exactly how to get to the destination, but once you are at the destination, it’s really up to you or the user to figure out where exactly that destination is.

The app may say the destination is on your left side or right side, but often you realize that the destination actually can be many feet away from you, and it could be in any direction on your left or right side. Also, blind and low-vision users tend to visit places that they have been before and are familiar with, because it’s a lot harder for them to explore new places, because it’s hard to know what places are there, first of all, and it’s hard to get more information about those new places without a lot of pre-planning. Once I learned that Lens in Maps could really help those users, I started to build a prototype and demoed it to my colleagues and also other internal users who have visual impairment.

Challenge

However, as I built my prototype, I realized that there are many challenges, because we are basically trying to do the reverse of the famous saying, a picture is worth a thousand words. It’s actually even worse here because we are trying to describe a live video, which may actually require 1 million words. Also, I myself am not an accessibility expert. Indeed, I was more on the side of avoiding any type of accessibility-related features because it’s really hard to make it work right. I know there are many great tools that exist that can help you debug and create those accessibility features, but a lot of us engineers are probably not that familiar with those kinds of tools, so it takes a lot longer to make those features work right compared to non-accessibility related features.

For first-party apps at Google, there is an accessibility guideline called GAR, which stands for Google Accessibility Rating. These guidelines were not very applicable for a lot of the AR cases we encountered during their development. For example, one of the guidelines recommends that we should describe what’s being displayed on the screen. Unlike 2D UIs, where the user has more control over which element to focus on, what to be described, the objects in the AR scene could move around a lot. The object could even disappear and appear based on how your camera moves, which makes it really hard for the user to decide which things to focus on.

Also, we are detecting places in the world that have a lot of information to present, like the name of the place, the rating of the place, how many reviews it has, what type of place it is, what time it opens, and so on. If the user wants to hear all this information, they have to hold up their phone in a very specific position until all these information is described to them. There are also many other cases that I won’t go through, but these general guidelines that existed before were mostly designed for non-AR cases. The general guidelines basically didn’t apply much for what we have been doing.

Once I have the prototype ready, it was hard for me to tell whether this works or not, because I myself am not a target user. Even though I think it works well, it may not work well for the actual target user. None of my colleagues near me were actually a target user either. It wasn’t very easy for me to test. I basically have to go out and find somebody else from our team that has visual impairment to test it. Last but not least, I’m sure my company doesn’t want to hear about this, but it’s a reality that it’s really hard to get leadership buy-in for this type of project, because often leadership themselves are not the target user. It’s really hard for them to see the real value of this type of feature. These days also companies are under-resourced, and so this type of project tends to get lower priority over others. We indeed had several proposals in the past to make our AR features accessible to visually impaired users, but they always got deprioritized over other more important projects, and they just never got implemented.

Coping with Challenges

How did I cope with all these challenges? As I said, I’m not an expert in this accessibility field. The first thing I did was to reach out to teams who work on technology for visually impaired users, such as the team working on Lookout, which is an Android app that can describe what’s in the image. I explained to those teams how Lens in Maps could revolutionize the way those visually impaired users would interact with maps, and basically demoed my prototype to them. Because they are the specialists in the field, they gave me a lot of good feedback, and I iterated my prototype based on those feedbacks. Now I have my prototype ready to test.

As I said before, I cannot test it myself, so I basically try to find volunteers internally to first check if it’s working ok. Luckily, there are several visually impaired users within Google who are very passionate about pushing the boundary of assistive technology and willing to be early adopters. It’s actually usually hard to find those users within anyone’s company because they are very limited, and they are usually overwhelmed with a lot of requests to test any accessibility features that are being developed in that company. I got a lot of good feedback from those users, and I was able to incorporate again to my prototype and improve it further.

Once the prototype is polished enough or to a satisfying level from the internal testing, I also wanted to test with external users to get a wider range of opinions. I had great support from our internal UXR group who are specialized in accessibility testing. They basically organized, from recruiting to running the tests and everything, with external blind and low-vision users. The study went really well, and actually the response was very positive. From those responses, I was more confident that this feature is getting ready to go public. The study went well, but from those external testing, I actually didn’t get to interact directly with those users. I also wanted to demo my prototype and get direct feedback from external target users. I was looking for where I can do that. Luckily, I was able to find this great conference called XR Access, which is directed by Dylan.

In the conference, I proactively approached two target users and asked if they could try out my prototype. That went well, and I again got a lot of good feedback from the real users, and I was able to incorporate those. Last but not least, when I was developing this feature, it takes several months, so I need to make sure that my project doesn’t suddenly come to an end because of leadership saying, priority has changed, so let’s work on something else. What I did was I tried to demo my prototype to various internal accessibility events to get this project more attention and also get people excited. I don’t know if my effort has really worked out, but at least I was able to release my feature to the public on both Android and iOS.

What Worked Well?

What worked well for us? It worked well that we used technology that blind and low-vision users are already familiar with. We decided to use screen reader technology to describe places and streets around the user. Basically, on iOS, this will be VoiceOver, and on Android, this will be TalkBack. We also considered using text-to-speech libraries, but it won’t be very easy to adjust a lot of the settings, like volume, the speech rate, and those, which blind and low-vision users tend to adjust to suit their needs.

The thing is, also, if we would require them to have additional configuration, that means they have to take extra steps just for Lens in Maps to make those configurations. It made a lot of sense for us to use the screen reader technology. There could be multiple places and streets visible from where the user stands. Like you see here, there are many things there. We can only describe them one at a time because our brain does not process multiple channels of audio very well. You may hear the sound, but it’s hard to understand all of them at once. Not only places and streets, but we also detect situations, like the user might be near an intersection, so we need to tell them that they need to be careful. Or maybe they’re facing a direction that has nothing to see, but if they turn left or right, they could actually see more. In those cases, we also want to notify the user. We iterated multiple times and carefully prioritized what to announce at what situation.

When we describe places and streets, Lens in Maps already had this thing called hover state, which is basically detecting what’s around the center of the image and highlighting those places or streets, as you can see on this slide. We basically made the feature to announce what’s being hovered in our experience. We initially described many things that appear on the screen that is hovered, because that’s what we show in our experience, like here, which has a label that has all the information of the hovered place, and that’s also what the accessibility guideline recommends.

This prevented the user from quickly browsing through different places because they have to wait for a longer time to get all those information, especially in a busy area like downtown. We got great feedback from the Lookout team that we might be over-describing, and it’s probably better for us to shorten the description, even though it may not exactly match what they see on the screen. We decided to only describe what’s most important to the blind and low-vision users at the moment, which is the name of the place, the type of the place, and distance to the place. For example, as you see in this slide, instead of announcing T.J.Maxx, 4.3 stars, department store, open, closes at 11 p.m., which is what you usually hear if you’re using any other 2D device with screen reader technology. We instead only announced T.J.Maxx, department store, 275 feet.

If we only provide this succinct description, the user won’t know if it’s really a place they want to visit. We provide an easy way for the user to get detailed information when they want to see, like the one seen on the right side of this slide. We added double-tap interaction on the screen to bring up this information. This interaction may not be obvious to the user, so we added a hint of the succinct description so that they can actually get more information by double-tapping. Using the example before, we would announce T.J.Maxx, department store, 275 feet, double-tap for details.

We only made changes to existing Lens in Maps behavior that is absolutely needed, such as disabling an action to go into 2D basemap view, which didn’t help much for the visually impaired users, because they can’t get any information out of the 2D, and it’s hard to know the distance to anything. We also hide places that are just too far away for them to walk to within five minutes. We made small adjustments here and there, but we tried to minimize those changes. This is important, otherwise it would be really hard to maintain the application in sync between screen reader and non-screen reader experiences. Whenever you modify or add a new feature to your experience, you have to make sure that it doesn’t break the other experience, and if their experience is too different, then there’s more chance of breaking the other one.

If the experience really diverges a lot, then, at that point, there’s no point of having a single application to support it, and at that point, it’s better to just create another one. Besides auditory feedback, haptic feedback can also help blind and low-vision users, and it won’t interfere with audio cues when it’s being used right. We use the general vibration to indicate that something is hovered. Before we can describe the place to the user, we have to fetch additional information from our server, and this means the user, when they hover something in the screen, they have to wait for a few seconds before we’re ready to announce anything.

For this wait time, if we announced loading every time, that would be annoying because we have a lot of things, a lot of places that we detect. Instead of that, we change it to haptic feedback so that the user will, over time, learn that whenever they feel this small haptic feedback, they need to wait a little bit before they can hear the information.

How to Apply Learnings

How can you apply our learnings to your situation? I won’t say that every AR app should work for users with visual impairment because, again, AR is a visual-centric experience. Most of the cases, it works best for sighted users. However, it would be really great for you to at least think whether your AR application could be useful or entertaining to blind and low-vision users, if you make them accessible. As an example, the IKEA app has a very useful AR feature that allows the user to overlay the furniture in their room. The 3D furniture blends really well with the actual environment. The left sofa is a fake one, and the right chair is the real one.

As you can see from here, it uses all the lighting conditions of the room and surroundings. It looks almost like it’s there. For people using this feature today, they use this feature to see if the furniture fits well in their space before they make the decision to buy. However, when I tried this feature on Android with TalkBack turned on, it didn’t describe what’s happening in the AR scene. Of course, it was covering all the 2D UIs, what it says or what it does, but whatever happens in the AR scene, there was no description. Also, I couldn’t do any interaction with the 3D model using the general interaction model provided by TalkBack. I would imagine if this feature could be made accessible, it will really help visually impaired users to explore new furniture before they actually buy them. Once you have determined that your AR app can be useful or entertaining for blind and low-vision users, making sure it’s accessible doesn’t mean you have to change a lot.

Like I said before, it’s important to keep the behaviors in sync between screen reader and non-screen reader experience, so it doesn’t become a burden to maintain or improve in the future. Also, there’s no need to explain everything that’s going on. A picture is worth a thousand words, but the user doesn’t have the time to listen to a thousand words. Try to make it succinct and only extract the most important information the user needs to know at the moment. However, make sure you can also provide a way to get additional information if the user requests, so that they can explore further.

As part of the make it succinct principle, it’s a good idea to combine auditory feedback with haptic feedback, since they can be sensed simultaneously. Try to use haptic feedback like gentle vibration when the meaning of the vibration is easy to figure out after a few tries. You may also change the strength of the vibration to give it a different meaning, but make sure you don’t overuse haptic feedback for many different meanings, because the strength of the vibration is very subtle to sense.

Real User Experience (Lens in Maps)

Now I’d like to show a short video from Ross Minor, who is an accessibility consultant and content creator. He shared how Lens in Maps helped him.

Minor: For the accessibility features that I really liked, I really love the addition of Lens in Maps. It’s honestly just a gamechanger for blind people, I feel, when it comes to mobility. I talked about it in my video. Just GPSs and everything, they’re only so accurate and so just being able to move my phone around and pretty much simulate looking, has already helped me so much. This is a feature I literally use all the time when going out and about. Some use cases that I really have benefited from is when I’m Ubering.

A lot of times I’ll get to the destination, and places can be wedged between two buildings, or buried, or whatever, and it’s difficult to find. In the past, my Uber drivers would always be like, “Is it right here, this is where you’re looking for?” I was like, “I can’t tell you that. I don’t know”. Now I’m able to actually move my phone around and say, yes, it’s over there, and saying it’s over there and pointing is like a luxury I’ve never had before. There have very much been cases where my Uber is about to drop me off at the wrong place and I’m like, no, I see it over there, it’s over that way. It’s a feature I use all the time. I’m just really happy to have it, and it works so well.

Oda: It’s really great and rewarding to hear this type of feedback from a user, that it’s a gamechanger to the user.

Prepare Your Future-Self

Now we’re back to stats again. Roughly 43 million people living with blindness and 295 million people living with moderate to severe visual impairment worldwide. You might be thinking that you are advancing the technology for people with disabilities. That’s great, but, remember, you’re not only helping others, but you might be helping your future self. Let’s prepare for our future self.

Lens in Maps Precision vs. Microsoft Soundscape

Dylan: Obviously, this is fantastic work. I’m really glad that it’s out there and improving people’s lives. I’m very curious to compare these features to something like Microsoft Soundscape, which I think used GPS mostly to figure out, there’s stuff around you in this direction in that direction, and help people explore and get a sense for a space. It feels like here the major advantage that this would have over that is that ability to be much more precise, to use those visual markers, understand, you are specifically looking at this. What are some of the specific things that that level of precision enables that an app like Soundscape may not be able to do?

Oda: As Ross in the video shared, for example, he was riding with his Uber driver. From Soundscape he uses GPS and compass and all those information to tell you, these are places around you. It may even tell you that your destination is 100 meters away from you. The thing is, it doesn’t have the ability to tell you which direction, and that actually sometimes can be very difficult. One of the sessions I learned from our internal ADI session is that they know that they’re near a destination, but the question is, where exactly is that? In the video actually they shared with us their story, is that they reached the destination and they have to wander around 10 minutes to actually find where exactly that destination is. That made me think that if we can provide this exact preciseness based on your phone, which is basically the direction you’re facing, so you know, it’s on that direction. This level of precision really helped for those last-mile cases.

Questions and Answers

Participant 1: Earlier you described that there was friction in holding up the camera. I was wondering if that was consistent around the world or if there are certain countries where Lens in Maps was less used because of that or any other reason.

Oda: I think that’s probably not the first reason that the feature itself is being used less. It is more of the cases people don’t understand they’re supposed to use this outside. Also, there are certain places in the world that we don’t have a lot of information because the technology heavily depends on street view collection. The way we detect where exactly you stand and where exactly you’re facing is based on comparing your image with street view information, which is a technology called VPS. Of course, there is some social awkwardness, especially if people are in front of you and if you’re holding your phone up, they may think you’re taking video.

Actually, we were being intimidated when we were testing this feature outside. Not just for the accessibility feature but just testing these Lens in Maps in general, that even though we’re actually facing the restaurant because people pass by, they sometimes think, we’re taking their video. There’s definitely a certain level of friction from there. The only thing is it’s really hard to know from the metrics gathered in the production to know, did they stop using because of their social awkwardness or something else. This is really just our guess but from our own experience we can see that. From the data itself that we can gather, it’s like we know if people are using this feature inside not outside, and that’s where their use is for.

Participant 1: You also mentioned that it was good to focus on one thing at a time. If there was too much on screen, how did you decide what to focus on and how to limit what to focus on?

Oda: We assign priority for each type of announcement, and whichever we think is most important at the moment is something we describe first. Anything that becomes a danger to the user is the highest priority. Like they are near an intersection so we don’t want them to cross. They are very careful, but we still want to add extra caution. Also, the places you hover is also considered to be more important than things that we tell you, there’s something else on your left side or right side. I think for any apps, you can think about, what is the most important things even though there might be multiple stuff. For our very specific use cases, those were the ranking of what we thought is important, and we only describe the one that has the highest priority.

See more presentations with transcripts