The German word for TV is Fernseher, meaning far-seer. I often think about that concept of the fixture of our living rooms which allows us to teleport to perspectives of other places far away. A mode of communion with others, distraction, learning. We are societally connected across the world like never before. We tend to live our lives situationally in our local communities, then at some point in our evenings we teleport our awareness into the lives of others for the snippet of time that came to be known as prime time. This slot of our societal calendars is reputed to have the broadest attention span of collective conscious focus. It came to have that moniker because of marketers seeking to have some time during the hour of evening news or entertainment that would give their messages the broadest appeal to the space-portal's "share of voice" in this communal time of focus.
When terrestrial TV fragmented into multi-platform and multi-screen surface areas along with the proliferation of the world wide web, our far-seeing capabilities stretched the demand for high-bandwidth, low-latency server infrastructures that now allow affordable person-to-person or person-to-server utilities. Any single person may have a dozen streaming or point-to-point server connections maintaining private messaging or content streams to their devices in a single day. It's an amazing platform we have for real-time-communication over web protocols at present. It's baffling to think back on every shift over a single lifetime of memory over how this shift took shape.
I now work with a visor that transmits my writing and art projects from a nearby device to render either as a hologram in front of me, or as a flat window, depending on the mode of work I'm engaged with and my preference to work spatially or linearly. Sometimes I'll reach up in the air and grab onto a virtual dial to adjust the settings of this environment. I'm surprised at how normal it has become to adapt to a virtual representation of a utility. Through neuroplasticity and habit, my brain has transitioned from needing a tactile mechanical button, to using a digitalized utility represented graphically in a simulation of a physical interface. For devices far away from me I can call out to them with voice commands and they respond. Remembering my family sitting in front of a small Sony Trinitron TV when I was a teenager, it's hard to believe that now the screen is perceptually as large as half a room. It's easy to see how commonplace these media experiences are becoming. It's fascinating to see how they are enhancing and shaping our lives.
As a teenager I remember seeing the old Star Trek movie The Wrath of Khan. In it Admiral Kirk is pitted in an awful battle against a charismatic cult leader who Kirk had unwittingly marooned on a barren planet by mistake. Now Khan was coming after him with a vengeance. Kirk and Spock, who regularly play 3D chess on a multi-dimensional chess board, are used to thinking on multiple levels at once. They reason that Khan lived his whole life on the surface of a planet and was unable to think of using extra dimensions to navigate and therefore are able to out-maneuver Khan in the fateful battle to preserve the lives of all the souls aboard the Enterprise.
Growing up on the tools of the 1980s and learning to use graphical user interfaces of 2-dimensional planes I got used to track-pads and mice to move abstract representations around on a connected screen. With the emergence of WebXR (3D utility embedded in a 2D web browser that allows us to inflate a 3rd dimension around a piece of content using that web browser) I am now able to launch through the screen into an immersive simulated world much like the Holodeck portrayals of yesteryear's Star Trek of a vast universe far far away where I could boldly go in my spacey visor into the computer rendered worlds of our most imaginative 3D artists.
Seeing Eye Humans
Occasionally, I get a notification on my phone that someone who doesn't have the use of their eyes needs assistance. I pick it up and they point a camera at the thing they're needing assistance with. Their farseeing device shows me their situation and they ask me questions that I can advise on verbally. (Can you help me see the driver coming to pick me up? Can you check to see something in my environment that I can't find?) 10 million people in this network of seeing-eye-humans help other people in need of a helping eye. We as a community are a bionic panopticon of help for each other, somewhat like a Borg, but with better intent. Unlike Bentham's panopticon, we aren't a community spying on others in any other sense than for their requested support. Eventually people like me won't need to do this of course. Machine learning companies are trying to offer the same utility to seeing cases that they currently do for voice-requests on smart speakers.
I joined this volunteer support group after meeting a blind photographer in India, through his art. My discovery of his work was as improbable as his bold work to create his art with an assistive narration tool embedded in his camera. Microsoft had built a web hosting tool called AltspaceVR in the last decade for people to share 3D environments created with computer assisted design and animated 3D programming tools. When they announced they were getting rid of their AR headsets and backing out of 3D hosting during the global pandemic, I decided to go in to see everything that artists had created in the platform before its disappearance. (Microsoft, like many technology companies at that time, had to lay off thousands of staff as their businesses were impacted by the contraction of the global economy caused by rapidly contracting consumer spending.) One of the "worlds" I heard about was a 3D art gallery created by people who didn't have the use of their eyes. Wow! A 3D world created for sighted people made by artists who had limited sight? I had to see it. The world was a collection of sounds built on a vast landscape I had to traverse to reach the art gallery, a gigantic palace that had a sign on it reading "Blind Burners : Burning Bright Without Much Sight."
I am a photographer, so I was curious to explore this gallery of photographic renderings. On the top of the palace I found the work of Marimuthu, a photographer and poet who lives in Tamil Nadu in the south of India. I'd been to Chennai decades ago on a trip to explore the traditions of Carnatic music. Marimuthu had created a gallery arrayed with scenes he'd photographed from around his home. He narrated each photograph with a description of what his camera's descriptive image analysis said was in the picture. Usually I grasp a picture without paying detailed attention to all that is happening within the picture. Marimuthu came at photography from a different dimension. He "saw" through the description of what was rendered to a disembodied lens and made it into poetic evocative words. An ivy was more than a plant. The sun was more than a bunch of bleached out pixels on a piece of paper that you'd glance at.The amazing leap of his work, from something described to him by machine, to a 3D art gallery he hosted to re-narrate the scenes of his surrounding was like he was bringing me into his world as if I was the one who had limited sight and he had a super-power of words about what was happening in the scene. There I was in Tamil Nadu, virtualized and reinflated, with him telling me what to notice in his environment. His gallery is gone now, so you can't visit it anymore yourself. But there may be another 3D environment where he provides tours of his home sometime in the future. Marimuthu's testimonial about the impact of BeMyEyes on his life made me want to become one of those sets of extra eyes. I joined BeMyEyes and Envision as a contributing sight-assistant. These services are like Chatroulette services that enable people with sight limitations to use the cameras in their phones to connect with remote sighted individuals who can narrate the scene to them. There are over 10 million people who volunteer like I do to help callers understand their surroundings. It only takes a minute or two to lend an eye. But it’s very impactful for those who rely on this service.
Then one day while I was in grad school I got a ping from one of the alumni. Would I talk to Meta about adding a BeMyEyes socket to the newly announced smart glasses the company was launching? Blind people wear glasses to protect their eyes typically anyway, the use of the forward facing cameras would be a great utility for customers who used the BeMyEyes service. I thought it was incredibly unlikely that two distinct apps on a phone would do pass-through of private camera information from the glasses. But if the right precautions were taken, it seemed feasible. Envision had done something similar with the Google Glass before Glass was discontinued by the manufacturer. So I reached out to Rusty Perez, the musician who was requesting the feature, and tried to network him into the group at Meta that deals with accessibility features for Meta, Quest, Whatsapp, Instagram and Facebook apps. Theoretically, I suggested, we could use the capability for Instagram Video to do a collaborative assisted meeting from the new glasses. Google Meet had a similar capability prior to the Glass deprecation, where one Glass-wearing user could stream forward looking camera into a shared session for to others on the call. We pitched the idea of him doing a presentation with me giving seeing-eye-human voice-over guidance about visual elements of his surroundings as he went about his day for an augmented reality conference, similar to the way BeMyEyes founder Hans Jørgen Wiberg had done for TEDx with his partial sight. Upon discussing the project with a friend of mine he said, "Ah yes. White Christmas episode of Black Mirror!" I hadn't seen it yet, but this was a story about a dating coach giving in-ear guidance for a shy lad who was uncomfortable going on dates alone. Because it's Black Mirror (which focuses on the "what could possibly go wrong" narrative arc of technology) you can imagine it doesn't turn out as hoped. The good news is that Meta announced a collaboration with BeMyEyes shortly after this conversation, addressing Rusty's requested use case to the new smart glasses. And they one-upped the value expected by offering a capability of using a voice prompt tool so that the wearer didn't have to fumble with buttons to make it work.
Since this exploration, I've been testing and researching other kinds of assistive scenarios using other augmented reality device form factors. Apple, Pico and Rockid have all enabled hub and spoke AR device utilization in assistive technology use cases for enterprise customers. This model of assisted seeing may come to be commonplace in future consumer hardware once the support model and privacy considerations have been robustly tested as it has been for our common use of web-cameras today. But for now, my community of seeing-eye-humans are just a tap of a phone away from reaching the tens of thousands of people who rely on BeMyEyes every day.
Hearing Ear Assistive Devices
When I was in college, I studied with one professor who was partially
blind and another who was completely blind. It didn’t result in any
particular issue for the students. But it was a particularly interesting
learning experience to study focused primarily on sound. Professor
Russel said the one benefit for students is that there would be an added
requirement that we would submit all our term papers as recordings of
us reading our work. “This is going to make your writing better. Because
if there is anything wrong with your prose, you’re going to hear it
before I have to.” There is an old adage that seeing is believing. Our
eyes can take in chaotic information very easily, including badly
written essays. But they often won't notice something that is there on the page. That semester I learned that things which look good
on paper, to the minds eye, can ring false when read aloud. I learned
that writing for the ear is a much better goal than writing for the eye.
We can bend our minds around long sentences with parenthetical
clarifications and multiple dependent clauses. But the narrative-seeking
ear will often get lost in tangles of those complex sentences.
Hearing a student reading a paper they don’t believe, or crafted
hastily, is easy to determine by the sound of their voice while reading
it. My German teacher similarly once cautioned me against complex
sentence structure that I typically like to explore in my prose.
“Don’t speak like you think. You’ll get lost in the conjugations and
separable prefixes. Speak German in short sentences to be comprehended easily,”
she cautioned.
During my time working with Mozilla, I learned about
accessibility features built into the browser that could read pages
aloud for Firefox users. From this I became interested in tools that allowed
default-audiovisual experiences to be rendered to blind or deaf community
easily without complex hurdles. I sometimes used these features myself to
transcribe and translate texts from non-English webpages or audio into English,
or for study purposes from English into the languages of my academic
study. (Japanese, German and French) The web is an amazing medium because
it is so mutable. We can come at content communally being
differently-abled by linguistic background or anatomical impairments to
perception. The web is a great equalizer. It’s a super-power for those
with extensible tools. This is worthy of investment and development efforts I believed.
In the early days of crowdfunding platform Indiegogo, I had the opportunity to participate in a real-time speech translation program that would be a carried device with a microphone and a phone chip to send speech to a remote server for processing with output coming back to an earphone. My wife uses speech transcription tools in movie theaters already. This would do audio-to-audio for language translation, we hoped. The campaign collapsed due to unforeseen costs to the startup. But in the decade following, other companies made bold forays into realizing the vision in web services delivered over app interfaces. The server capability to achieve this utility is now quite affordable for the everyday consumer. But the intake-output front end remains a cost hurdle for most if they're looking for a wearable accessory. Now that there is considerable competition entering the space, devices are dropping in price to make this kind of consumer technology readily accessible to the masses via a lightweight apps operating on their mobile devices.
During my own company’s explorations into accessibility hardware, we focused on haptics. We theorized a tool for trans-literation of audio communications into a kind of haptic Morse Code of the body which the neuroplasticity of the brain could swiftly adapt to. Would an augmented hearing capability circumventing the ear be useful for people without functioning cochlea? It turned out that fixing the cochlea bionically was a more lasting and less cumbersome approach than routing sound to a tactile interface. Apple's latest advancements in wrist watches however has added the concept of an array of rhythmic patterns that can be associated with specific app-based triggers, similar to a haptic ringtone, to convey different notifications that the wearer can distinguish between contextual messages. (Calendar notifications might tap, thud or buzz differently than a text message or phone calls for instance.)
Accessibility research is a fascinating field. I saw demonstrations on urban design where spaces had been architected specifically with consideration for how a blind citizen would interact with them uniquely. The Braille-like dashes and dots on my city’s sidewalks were there to guide those with a cane around the city. As I started noticing these accessibility-focused designs, I began to see my city differently. I heard lectures about accessible design for architects. “Design for the edges and you end up making things better for everyone,” one lecturer pointed out. Slanted curbs designed for the blind ended up also aiding those with crutches or wheelchairs and even prevented phone-encumbered people from accidentally tripping into a road or onto the sidewalk even if they weren't otherwise motility or visually challenged, but because their focus was elsewhere beyond obstacle-navigation mode.
As I aged, I started to augment my seeing with lenses myself. Because I grew up in a sunny place, my eyes started to blur with cataracts quite early. My optometrist said she could help me for a while. But eventually, I would need new inorganic lenses installed. Reading glasses were great for a few years. I could magnify brightly lit paper pages and read just fine. Then, I wanted to illuminate text through a bright screen which would shine right through the organic eye clouds as they appeared. I would need to read on a tablet that illuminated text. (This was not in the sense of Illuminated Manuscripts, but an interesting historical comparison.) Instead of leaning over my computer to read bright text, I got a bigger screen. Then I got a set of AR glasses and eventually a head-mounted-display. When the world illuminates instead of being reflected, cataracts don’t cloud vision, I noticed. But eventually I realized that I needed to take my optometrist’s advice and get new lenses in my actual eyes. I had no realization of how much I’d lost because I’d lost it gradually. Having visual acuity suddenly reintroduced after gradual blurring over a decade is a profound experience. I’m so grateful that in this era this is possible. The experience of gradually blurring vision is still a humbling memory. I have a new lens of sympathy for my friends who experience diminished visual acuity.
Augmented seeing devices have gone through a rocky decade. I remember in 2014 my first experience of wearing the Google Glass AR device after its first launch. I could get turn by turn directions while I biked. I could get text translated from Spanish to English visually in real time as I looked at a sign in Spanish (there are a lot) in my home town. We appeared to be on the brink of a new assisted-seeing revolution. And there was promise that augmented hearing was just around the corner. This future promise was briefly interrupted by a pandemic and the introduction of the Google Translation team's paper Attention is All you Need, which ignited the neural network machine learning revolution yielding the Generative Pre-trained Transformer. This in turn resulted in tremendous shifts of venture capital resources being redirected from hardware to machine learning investments for half a decade as new companies sought to capture funds in Generative AI utilities. Another shift of future funding is about to take place beyond plain inventiveness use cases of Generative AI tools directly related to situational awareness. That is significantly gated at present by hardware-specific connectivity challenges. But it's enough to say that the dreaminess of pure imagination will become more significantly relevant to assistive use cases within 3-5 years be useful for on-the-spot use cases like BeMyEyes is demonstrating.
I currently participate in multiple forums where developers brainstorm new extensibility of device and content features for socketed apps and tools to enhance our cognition, perception and creation using AI, apps or hardware tools. There is a thriving community of users with various levels of sight, hearing and varying cognitive styles using AR and AI tools to navigate and create more richness into the world around them.
Machine Learning tools are just beginning to scrape the surface of what assisted cognition can enable for our future. Beyond correcting spelling and learning the fastest travel routes, machine learning tools in the assisted seeing and hearing hardware we wear will give us all super powers to be enhanced far-seers of tomorrow.