Published Papers


Abstract: Object detection plays a crucial role in the development of Electronic Travel Aids (ETAs), capable to guide a person with visual impairments towards a target object in an unknown indoor environment. In such a scenario, the object detector runs on a mobile device (e.g. smartphone) and needs to be fast, accurate, and, most importantly, lightweight. Nowadays, Deep Neural Networks (DNN) have become the state-of-the-art solution for object detection tasks, with many works improving speed and accuracy by proposing new architectures or extending existing ones. A common strategy is to use deeper networks to get higher performance, but that leads to a higher computational cost which makes it impractical to integrate them on mobile devices with limited computational power. In this work we compare different object detectors to find a suitable candidate to be implemented on ETAs, focusing on lightweight models capable of working in real-time on mobile devices with a good accuracy. In particular, we select two models: SSD Lite with Mobilenet V2 and Tiny-DSOD. Both models have been tested on the popular OpenImage dataset and a new dataset, named L-CAS Office dataset, collected to further test models’ performance and robustness in a real scenario inspired by the actual perception challenges of a user with visual impairments.

Abstract: Sound perception is a fundamental skill for many people with severe sight impairments. The research presented in this paper is part of an ongoing project with the aim to create a mobile guidance aid to help people with vision impairments find objects within an unknown indoor environment. This system requires an effective non-visual interface and uses bone-conduction headphones to transmit audio instructions to the user. It has been implemented and tested with spatialised audio cues, which convey the direction of a predefined target in 3D space. We present an in-depth evaluation of the audio interface with several experiments that involve a large number of participants, both blindfolded and with actual visual impairments, and analyse the pros and cons of our design choices. In addition to producing results comparable to the state-of-the-art, we found that Fitts’s Law (a predictive model for human movement) provides a suitable a metric that can be used to improve and refine the quality of the audio interface in future mobile navigation aids.

Abstract: The ActiVis project's aim is to build a mobile guidance aid to help people with limited vision find objects in an unknown environment. This system uses bone-conduction headphones to transmit audio signals to the user and requires an effective non-visual interface. To this end, we propose a new audio-based interface that uses a spatialised signal to convey a target's position on the horizontal plane. The vertical position on the median plan is given by adjusting the tone's pitch to overcome the audio localisation limitations of bone-conduction headphones. This interface is validated through a set of experiments with blindfolded and visually impaired participants.

Abstract: The ActiVis project aims to deliver a mobile system that is able to guide a person with visual impairments towards a target object or area in an unknown indoor environment. For this, it uses new developments in object detection, mobile computing, action generation and human-computer interfacing to interpret the user's surroundings and present effective guidance directions. Our approach to direction generation uses a Markov Decision Process to track the system's state and output the optimal location to investigate and has been shown to work. This work adds an object detector and adapts the guidance process accordingly to provide a complete object search and guidance pipeline. The improved ActiVis system was evaluated in a set of experiments and outperforms the baseline, unguided case.

Abstract: Modern smartphones can provide a multitude of services to assist people with visual impairments, and their cameras in particular can be useful for assisting with tasks, such as reading signs or searching for objects in unknown environments. Previous research has looked at ways to solve these problems by processing the camera’s video feed, but very little work has been done in actively guiding the user towards specific points of interest, maximising the effectiveness of the underlying visual algorithms. In this paper, we propose a control algorithm based on a Markov Decision Process that uses a smartphone’s camera to generate real-time instructions to guide a user towards a target object. The solution is part of a more general active vision application for people with visual impairments. An initial implementation of the system on a smartphone was experimentally evaluated with participants with healthy eyesight to determine the performance of the control algorithm. The results show the effectiveness of our solution and its potential application to help people with visual impairments find objects in unknown environments.

Abstract: Recent advances in mobile technology have the potential to radically change the quality of tools available for people with sensory impairments, in particular the blind and partially sighted. Nowadays almost every smart-phone and tablet is equipped with high-resolution cameras, typically used for photos, videos, games and virtual reality applications. Very little has been proposed to exploit these sensors for user localisation and navigation instead. To this end, the “Active Vision with Human-in-the-Loop for the Visually Impaired” (ActiVis) project aims to develop a novel electronic travel aid to tackle the “last 10 yards problem” and enable blind users to independently navigate in unknown environments, ultimately enhancing or replacing existing solutions such as guide dogs and white canes. This paper describes some of the project’s key challenges, in particular with respect to the design of a user interface (UI) that translates visual information from the camera to guidance instructions for the blind person, taking into account the limitations introduced by visual impairments. In this paper we also propose a multimodal UI that caters to the needs of people with vision impairment that exploits human-machine progressive co-adaptation to enhance the user’s experience and improve navigation performance.

Relevant Reading


Abstract: In this paper we discuss the concept of co-adaptation between a human operator and a machine interface and we summarize its application with emphasis on two different domains, teleoperation and assistive technology. The analysis of the literature reveals that only in a few cases the possibility of a temporal evolution of the co-adaptation parameters has been considered. In particular, it has been overlooked the role of time-related indexes that capture changes in motor and cognitive abilities of the human operator. We argue that for a more effective long-term co-adaptation process, the interface should be able to predict and adjust its parameters according to the evolution of human skills and performance. We thus propose a novel approach termed progressive co-adaptation, whereby human performance is continuously monitored and the system makes inferences about changes in the users’ cognitive and motor skills. We illustrate the features of progressive co-adaptation in two possible applications, robotic telemanipulation and active vision for the people with vision impairment.

Abstract: The diffuse availability of mobile devices, such as smartphones and tablets, has the potential to bring substantial benefits to the people with sensory impairments. The solution proposed in this paper is part of an ongoing effort to create an accurate obstacle and hazard detector for people with vision impairment, which is embedded in a hand-held device. In particular, it presents a proof of concept for a multimodal interface to control the orientation of a smartphone’s camera, while being held by a person, using a combination of vocal messages, 3D sounds and vibrations. The solution, which is to be evaluated experimentally by users, will enable further research in the area of active vision with human-in-the-loop, with potential application to mobile assistive devices for indoor navigation of people with vision impairment.