ACCESSPACE is an Electronic Orientation and Travel Aid that will provide Visually Impaired People (VIP) with a dynamic and specialized representation of their environment’s spatial topography, through egocentric vibrotactile feedback. With little training, this spatial substitution information will allow them to intuitively form mental maps of their large-scale surroundings, accurate enough to navigate autonomously towards their destination.
This special kind of Human-Machine Interface (HMI) will communicate through a belt fitted with vibrators. The information needed will be acquired through a set of sensors, placed on a special pair of glasses worn by the VIP. These glasses will communicate with the user’s smartphone through a specific app, which will process the gathered information and control the belt’s vibrators accordingly.
With ACCESSPACE, VIP should be able to navigate indoors or outdoors in true autonomy, helping them to intuitively perceive where they are, where they want to go, to choose how to get there, and avoid the incoming obstacles on their way. By not having to blindly follow directions, and by progressively learning the topography of their environment, ACCESSPACE will help VIP feel less lost in novel environments, improving their feeling of safety and autonomy, and will increase their mobility, thus their integration in our society.
Project’s description under construction
Table of Contents
I. 1) Sensory Substitution:
The Sensory Substitution (SS) framework was initiated by Paul Bach-y-Rita in the sixties 1. The basic idea behind SS is to convey no longer accessible information about the environment through an unusual sensory channel. For example, the distance to surrounding objects through tactile feedback, when it is usually conveyed by vision (or sometimes audition).
The objective of those devices is to augment the user’s Umwelt, a word used to describe an individual’s or species specific perceptual environment. If the information provided was not innately accessible in healthy individuals, it is referred to as Sensory Augmentation.
Most existing systems focus on substituting vision by either touch or audition 2. Some well known examples are:
|Vision to Audition||The vOICe 3||Transcription of the camera’s feed into a soundscape of 60x60 sound sources, emitting all at once, row by row, in a 1 second left-to-right sweeping motion. Each of those sources is linked to an area of the camera’s field of view. A given source’s intensity is linked to the corresponding area’s luminosity. Its position on the horizontal axis is linked to the stereo disparity of the sound, and its elevation to the pitch of the sound.|
|Vision to Touch||TVSS 1||Transcription of the camera’s feed into a 2D 20x20 matrix of pins placed against the users’ back. Each pin is linked to a corresponding area of the camera’s input: top-right area with top-right pin. The intensity of the pin’s vibration is linked to the average luminosity of that area.|
The possibilities of SS:
Sensory Substitution has yielded interesting experimental results over the years, whether on its assistive and rehabilitative potential, or the insights it provides on the internal mechanics of how we perceive, integrate and make sense of sensory information.
Such results include:
- Recognize and distinguish simple shapes. 4 5
- Recognize and distinguish objects. 6
- Localize objects in space. 7 8
- Learn to draw. 9
- Navigate inside a maze. 10
- Drive a robot into a simple environment. 11
Furthermore, SS gives us an invaluable window into the still misunderstood ways our brain is able to adapt and learn to use new sensory information, and thus new interaction possibilities with its environment. Among others, studies with SS devices have shown that:
- Some optical illusions are reproductible through SS. 12
- Long-time users of The vOICe were able to perceive depth information through the device, despite the fact that it only provided “low-level” luminosity information. 13
- After some training, some behavioural reflex start to appear, such as users protecting their head to brace for collision when an operator unwittingly changed the zoom level of the camera that was acting as sensor for a visuo-tactile substitution device. 14
- Sensations become externalized only when the user can manipulate (move, rotate) the device: the stimulations start to be perceived as the product of external objects localized in the spatial surroundings of the user, and not as mere “random” stimulations when the person is able to link (through learning) their actions to the resulting changes in stimulations 15.
Based on the strong link between action and perception research on SS highlighted, some authors argued that SS should rather be called Sensorimotor Supplementation 7 to:
- Better highlight the crucial role of action to disambiguate sensory information, and to externalize percepts.
- Avoid misleading interpretation of the word substitution, since the information provided allows to supplement a set of task, but will not replace the lost modality, whether in terms of the possibilities it offers, or in the subjective feeling (qualia) it elicits in the user.
- Bach-y-Rita, P., Collins, C. C., Saunders, F. A., White, B., & Scadden, L. (1969). Vision substitution by tactile image projection. Nature, 221(5184), 963–964. ^
- Elmannai, W., & Elleithy, K. (2017). Sensor-Based Assistive Devices for Visually-Impaired People: Current Status, Challenges, and Future Directions. Sensors, 17(3), 565. ^
- Meijer, P. B. (1992). An experimental system for auditory image representations. Biomedical Engineering, IEEE Transactions On, 39(2), 112–121. ^
- Bach-y-Rita, P., Collins, C. C., Saunders, F. A., White, B., & Scadden, L. (1969). Vision substitution by tactile image projection. Nature, 221(5184), 963–964. ^
- Sampaio, E., Maris, S., & Bach-y-Rita, P. (2001). Brain plasticity: ‘visual’ acuity of blind persons via the tongue. Brain Research, 908(2), 204–207. ^
- Auvray, M., Hanneton, S., & O’Regan, J. K. (2007). Learning to perceive with a visuo – auditory substitution system: Localisation and object recognition with ‘The vOICe.’ Perception, 36(3), 416–430. ^
- Lenay, C., Gapenne, O., Hanneton, S., Marque, C., & Genouëlle, C. (2003). Sensory substitution: Limits and perspectives. Touching for Knowing, 275–292. ^
- Auvray, M., Hanneton, S., Lenay, C., & O’REGAN, K. (2005). There is something out there: distal attribution in sensory substitution, twenty years later. Journal of Integrative Neuroscience, 4(04), 505–521. ^
- Rovira, K., Gapenne, O., & Ammar, A. A. (2010). Learning to recognize shapes with a sensory substitution system: A longitudinal study with 4 non-sighted adolescents. In Development and Learning (ICDL), 2010 IEEE 9th International Conference on (pp. 1–6). IEEE. ^
- Stoll, C., Palluel-Germain, R., Fristot, V., Pellerin, D., Alleysson, D., & Graff, C. (2015). Navigating from a Depth Image Converted into Sound. Applied Bionics and Biomechanics, 2015, 1–9. ^
- Segond, H., Weiss, D., & Sampaio, E. (2005). Human Spatial Navigation via a Visuo-Tactile Sensory Substitution System. Perception, 34(10), 1231–1249. ^
- Renier, L., Collignon, O., Poirier, C., Tranduy, D., Vanlierde, A., Bol, A., … Devolder, A. (2005). Cross-modal activation of visual cortex during depth perception using auditory substitution of vision. NeuroImage, 26(2), 573–580. ^
- Ward, J., & Meijer, P. (2010). Visual experiences in the blind induced by an auditory sensory substitution device. Consciousness and Cognition, 19(1), 492–500. ^
- Bach-y-Rita, P. (1972). Brain mechanisms in sensory substitution. New York: Academic Press. ^
- Auvray, M., Philipona, D., O’Regan, J. K., & Spence, C. (2007). The perception of space and form recognition in a simulated environment: The case of minimalist sensory-substitution devices. Perception, 36(12), 1736–1751. ^
I. 2) Neuroplasticity:
The astonishing results obtained from Sensory Substitution research mainly relies on our brain’s formidable ability to adapt to new sources of information about its environment, through neuroplasticity.
I. 3) Sensory Substitution Devices:
Sensory Substitution Devices (SSD) are a special type of Human-Machine Interface (HMI) designed to provide additional (or no longer accessible) information about the environment, allowing their users to carry out tasks that were previously impossible or notably difficult.
SSD distinguish themselves from regular wearable devices, such as smartphones, through the way they convey information to the user. By imitating how our senses communicate with our brain, SSDs allow their users to learn and internalize new ways to interact with their environment, or to rehabilitate lost or no-longer used ones. Indeed, instead of symbolic visual or audio codes (i.e. images or language), SSDs communicate through low-level signals from which the brain can learn to extract regularities, thanks to its plasticity).
SSDs’ main properties:
SSD are usually comprised of 3 main elements:
|Sensors||Capture the no-longer accessible information|
|Mapping software||Transform the gathered information into a suitable representation. This step can involve more or less processing of the sensors’ data.|
|Actuators||Provide this representation in a code fitted for the output (substitute) modality’s properties, through an ergonomic interface|
Classification of SSDs:
SSDs work by remapping inputs from the device’s sensors to a specific code delivered by the actuators through the substitute modality, which serves as communication interface with the user. For the same sensors and actuators, this mapping can be done in several ways, depending on what information is conveyed to the user (e.g. distance, colour, etc), through what modality and what channel(s) it is send, and how it is encoded.
Based on what information they provide:
Some SSD can be more or less specialized, meaning that the information they convey is aimed to assist the user with a specific subset of tasks, from all the ones that are performed with the substituted modality.
Historically, first SSDs provided a direct transposition of a camera field of view into a 2D tactile or audio image. But it evolved progressively with the rise of more advanced & specialized devices aimed at assisting specific needs, such as locating objects or navigation for blind people.
|Name||Task(s) to assist with||Information provided|
Other examples also include vestibular substitution 1, providing balance information through tactile feedback.
This specialization implies that different types of processing is done on the sensors’ data, to extract, from the low-level information, what is required for the tasks the device aims to supplement.
Based on the substitute modality they use:
SSD can also differ by what interface and communication channel they use to interact with the user. The first choice to make is the output modality: audio or tactile.
Furthermore, the tactile modality is comprised of several possible communication channels : vibrations, pressure, temperature changes, electric stimulations, etc. Those channels correspond to different types of tactile receptors on the skin, which have different properties, such as adaptation speed, sampling rate, dynamic range of response, variable spatial density on the body, etc.
Those properties will affect how much information you can convey through their channel in a given time-frame, and how much energy is required to do so, which affects the viability of each of those channels for sensory substitution purposes.
For example, how much pressure does an actuator has to apply to be felt; or what range of frequency of vibrations will be perceived, and in that range, how many discreet steps of frequencies can be distinguished accurately by the users ?
Based on how they encode information:
Information lies in variations: for a same output channel (such as the vibro-tactile channel), an information can be encoded in different ways, through changes on how the SSD maps the changes of the features provided to changes in the properties of the selected output channel.
A given information can be encoded in very different ways, depending on how the mapping (correspondence) between the changes of the information to provide and the changes made to the output channel used for this information are handled.
Let’s take the example from the thesis of Dr Mandil 2: an helicopter pilot with a vibro-tactile vest designed to help him land his aircraft smoothly despite extremely bad visibility conditions, by providing him with altitude information.
SSDs can encode the information they provide in a more or less abstract (vs symbolic) manner 3. In our example, coding the altitude can be done in several ways:
- Through a succession of vibrations moving downwards on the user’s back, which is symbolically linked to altitude change.
- Through changes of the vibration’s frequency, which is more abstract, since it cannot be directly interpreted without some training and focus from the user.
- Alberts, B. B. G. T., Selen, L. P. J., Verhagen, W. I. M., & Medendorp, W. P. (2015). Sensory substitution in bilateral vestibular a-reflexic patients. Physiological Reports, 3(5), e12385. ^
- Mandil, C. (2018). Informations vibrotactiles pour l’aide à la navigation et la gestion des contacts avec l’environnement. University of Caen-Normandy. ^
- MacLean, K. E. (2008). Haptic Interaction Design for Everyday Interfaces. Reviews of Human Factors and Ergonomics, 4(1), 149–194. ^
I. 4) Research axes:
Despite Bach-y-Rita’s work dating back almost 50 years, the design of SSDs still raises several challenges today, which can be summarized into 3 main research axes:
Axis 1: What information to convey ?
- For a given task to assist, what information should the SSD provide ?
Axis 2: How to convey this information ?
- What substitute modality to use for a given task, and how to recode the provided information in that modality for an intuitive use ?
Axis 3: How to gather this information ?
- What sensors should the device use to be able to access the required information in a reliable manner, while respecting the specified ergonomic constraints ?
II) Theoretical Framework:
This section will present how we chose to address the first two research axes (what to convey and how to convey it), and the theories and models we based our hypotheses upon.
The third research axis “How to gather this information” will be investigated in a subsequent section.
II. 1) What to convey: spatial substitution
Whether using audition or touch as the substitute modality, the main challenge faced during the design of an SSD for VIP is that we simply can’t substitute every information that is gathered by vision.
About a third of our brain is directly dedicated to the processing of visual inputs, and around 2⁄3 are overall involved when areas associating visual inputs to other modalities are included. Because it involves many areas and pathways, the visual system is the topic of many current research, trying to better understand the structure and functions of its various layers.
The tactile and auditive modalities, however, have a much more limited bandwidth, which greatly limits the amount of information we have to substitute.
II. 1. a) The intuitiveness vs accuracy dilemma
As mentioned earlier, learning to use a new SSD is usually a quite tedious process, especially when the information provided is very complex.
For our project, we want a device that is almost readily usable ( = requires very little training time to obtain from it the information required to perform the tasks it aims to assist with). Thus, we chose to leave most of the “feature extraction” to the device itself, instead of having the user learn it.
So now this leaves us with a new challenge to solve: what information about the spatial topography of their surroundings is necessary to convey in order to assist people with orientation and navigation tasks in indoor and outdoor settings ?
II. 1. b) Spatial Cognition
Spatial cognition theories investigate how we perceive, interpret, mentally represent and interact with the spatial properties of our environment. Gaining a better understanding of the underlying mechanisms of space perception and how those mechanisms adapt to blindness would allow researchers to pinpoint precisely what information is crucial to elicit spatial perception and learning, and how to provide it intuitively, by reinforcing known sensorimotor loops, thus improving the SSD’s ease of use and acceptance.
II. 1. c) Spatial Substitution: the spatial gist
Our device will provide a representation of the VIP surrounding spatial topography through a set of 4 features, which we call the spatial gist :
- The nearby obstacles on their current path, to know what to avoid and adjust their trajectory in anticipation.
- The nearby path possibilities, such as the various streets branching from their current location, or the nearby “visible” corridors and door openings inside a building. This will allow VIP to feel the topography of their surroundings, and to perceive the different paths they can take to get to their destination.
- The surrounding Points of Interests (PoI), consisting in known salient elements of the environment, such as landmarks. This will allow VIP to localize themselves in relation to those elements.
- The destination of their journey, to know towards where to go.
Our hypothesis is that providing those 4 spatial features will be enough to substitute for the loss of vision in orientation and navigation tasks. More specifically, we hypothesise that this spatial gist is a minimal necessary set of information which will allow people to form mental maps of their environment, and use those maps to efficiently and intuitively navigate in it. This is our spatial substitution model.
II. 2) How to convey this information:
II. 2. a) Action and perception:
II. 2. b) Ergonomic considerations:
II. 2. c) Our tactile encoding:
The spatial gist will be encoded as such:
- The direction of the provided element will be given by activating the vibrator that directly point towards that element.
- The distance (between the VIP and the provided element) will be linked to the vibration’s intensity: the closer it gets, the more intense the vibration.
- The type of the element (i.e. obstacle, node, PoI, destination) will be differentiated through the signature of the vibration: the temporal succession of pulses, encoded through the number of pulses and the interval in between them.
II. 3) How to gather this information:
III) First prototype: virtual navigation evaluation
We are currently in the process of evaluating our spatial substitution model and its tactile encoding through virtual navigation experiments done with VIP.
III. 1) Material:
III. 1. a) The evaluation environment:
The virtual environment we designed allows to easily create 2D mazes in which the user can navigate by controlling the movements of an artificial agent through mouse and keyboard inputs. The only feedback the user will get about this virtual environment is the vibrotactile feedback of the belt.
III. 1. b) Interface: the TactiBelt
The spatial gist will be provided in an egocentric manner, through the vibrotactile belt, the TactiBelt, comprised of 48 ERM vibrators split into 3 vertical layers. The vibrators are controlled by an Arduino Mega with a PWM coding.
III. 2) Protocol:
III. 3) Results:
III. 4) Discussion:
IV) Second prototype:
We are simultaneously working on developing a second prototype with the proper sensing tools to allow it to work in real indoors and outdoors environments. This requires us to acquire real-world data on the user’s environment through a set of sensors, and combine this data through sensor fusion algorithms in order to obtain the spatial gist features we have to provide.
This work could be summarized into 3 main axes:
- Detecting obstacles in the user’s path.
- Obtaining the map of the user’s surrounding.
- Localization and tracking of the user on the aforementioned map.
Once we have a map of the user’s surroundings, we can strip it out of all unnecessary info, to only keeping the spatial gist elements : position of the nodes, PoIs, and of the user’s destination. By knowing (and continuously updating) the user’s position on that map, we can determine the position of those elements in the egocentric referential of the user, and transmit them through the belt.
IV. 1) Obstacle detection:
Various methods exist to detect obstacles, involving different types of sensors able to estimate depth information, such as infrared, ultrasound, cameras, structured light sensors, LIDARs, etc. Each of them have pros and cons depending on the task constraints, which, for our system, are :
- The weight and size of the sensor
- Its cost
- It should work indoors and outdoors
- Its data can be processed in real-time
To be added: comparative summary of depth-estimation sensors
Based on this analysis, we decided to use stereoscopic cameras, placed on
Several algorithms exist to estimate depth information from visual data, each of them representing a certain trade-off between computational cost (and thus speed) and accuracy.
To be added: comparative summary of vision depth-estimation algorithms
IV. 2) Geographical information retrieval:
If the user is navigating outdoors, the map of it’s surroundings can be easily obtained by querying online Geographical Information Systems (e.g. Google Maps) with the user’s coordinates, obtained through a GNSS positioning system (e.g. GPS).
However, it can be more challenging for indoor environments since:
- The user’s position cannot be obtained through classical GNSS positioning, requiring us to implement a custom Indoor Positioning System (c.f. next section).
- The building’s floor plan may not exist, it may not be available online, and if it is, it may be represented in a graphical format, and often littered with unnecessary details, making it really difficult to exploit for our purposes.
Thus, if a floor plan does exist, the first step is to find a way to programmatically obtain it, and clean it from all unnecessary information.
Once obtained, the floor plan can be cleaned through Image Processing techniques to remove unnecessary information.
The next step is to programmatically extract the nodes from this floor plan. The purpose of nodes is to represent pathway possibilities in a way that’s easy to interpret for the users. They correspond to positions on the floor plan where the current path changes direction, or where a new path begins.
To be added: visual represenation of nodes
Finally, concerning the PoIs, we can distinguish:
- Those automatically extracted by the system, such as landmarks or bus stops outside, or elevators or stairs indoors (if they are indicated by standard symbols on the floor plan).
- Those that can be manually added by the user, by pressing a specific button of the interface, which will place a PoI at their current position on the map.
Subsequent vocal message can be added to give more information about a PoI the person is currently at, and those PoIs + associated vocal note can be shared across users if desired, allowing them to warn other VIP of inaccessible streets due to construction work, for example.
IV. 3) Indoor and outdoor positioning:
Having this curated map of the surroundings will allow to inform the VIP of the nodes and PoIs in a certain radius. But to send those information in an egocentric manner, meaning the vibrations will change with the movements of the user, the systems need to be able to accurately pinpoint the user’s position on the map.
For outdoor environments, this can be accomplished through the user’s smartphone GPS chip, even if the accuracy of GNSS positioning in cities can be quite variable depending on weather conditions, satellite coverage, and the narrowness of the streets, due to the canyon effect.
The positioning accuracy can be increased by combining the GNSS signal with other sensor’s data, such as inertial measurements.
For indoor environments, since GNSS data is not available, there is no universal or standard process to match the user’s position to a position in the referential of the floor plan. This requires us to design our own Indoor Positioning System (IPS).
Number of IPS solutions have been devised recently
TODO: review of Visual-Inertial Odometry
TODO: review of Inertial WiFi-Magnetic Positioning
V) Second prototype evaluation:
V. 1) Material:
V. 2) Protocol:
V. 3) Results:
V. 4) Discussion:
VI) Miscellaneous information:
This project has been warmly welcomed by the VIP community and was awarded the “Applied research on disability” prize from the CCAH in 2017.
More details on the project’s official website