Saturday, February 14, 2026

Could we make Gaussian Splat capture of Saggitarius A?

A couple of years ago I was studying the technique of capturing Gaussian Splats (3D holographs) with a group based out of England. We used the software from the Google spinoff Niantic Labs to stitch thousands of photos together in a technique called photogrammetry. The compelling thing about creating captures of Gaussian Splats is that, unlike a “mesh” or point cloud 3D capture, every point in space captured in a Gaussian Splat can have multiple hues to each voxel. (Voxels are the word for pixels in 3D space contrasted to 2D surfaces.) So as you move through the hologram, every point in space around you can have different colors that change as light reflects or refracts differently through each captured point in the space. The benefit of this is that the holographs look volumetrically real like the holodeck concept from Star Trek.

Around that time I was discussing Gaussian Splats with my friend Ben, who also is from the UK coincidentally. He had worked in light-field capture with Lytro, a kind of 3D camera that could be used to adjust focus within a captured light field after the photo was taken. He had also worked for two other companies that were building Augmented Reality scene creation for mobile devices, glasses and head mounted displays. So he had thought and worked in depth with these concepts of 3D photography and scene rendering in viewing platforms far longer than I had as a hobbyist.

In a discussion about neural radiance fields (NeRF is a predecessor of Gaussian Splats) we were discussing how renderings of photogrammetry stitches of field captures across time could remove ephemeral items from the scene. Watch this scene (Waymo & UC Berkeley "Block NeRF" study) where a NeRF was generated from multiple Waymo cars' 2D photos taken as they drove through San Francisco then stitched into a 3D model. By fusing many different captures over time, you see only what lasts while things that are moving or ephemeral disappear because their existence is not confirmed between different measurements. This is a boon for privacy purposes because what people generally want not to be public in the world is their own ephemeral presence in it. So cars, people, birds and leaves for instance are not reinforced in a light field capture once the captures are stitched over time instead of stitched over points in space in a single time. Note that in the BlockNeRF video actual time is subtracted, the visual perspective of time applied later is artificial. All temporal things were removed by their not being there when another photographic pass was made. 


Studying further on this, you’ll find dozens of resources in the radiance field community about creating visual fidelity improvements that can be made by taking in multiple perspectives and using the consistencies between them to clarify blurry or low-light images. You can think of this like the dragonfly’s compound eye where each photoreceptor in their eye may have low color interpretation of a broad spectrum, but averaged in a mosaic with 20,000 other photoreceptors is able to create a very detailed and precisely accurate capture of the light field that can even supersede human level precision. 

The extrapolated stitching between perspectives can be used to create “novel views” between two or more perspectives that have shared consistencies between them to synthesize other fictional but accurate perspectives of what an interstitial view between the perspectives might look like in reality. A great example of this is this study a friend made of his drone flying through the woods and running into a tree limb. His drone couldn’t see the limb it crashed into in order to avoid it. But assembling hundreds of images into a Gaussian Splat depiction of the scene allows the novel view perspective to see the obstacle that the drone could not see head-on. 

I have done some amazing Gaussian Splat captures at home where I’ve focused the capture inside the home, on a porch or a scene with a window and then rotated the output ply-file to see what the inside of the house looks like from a novel view 20 feet outside the house looking back. It’s hard to explain how striking this experience is if you haven’t seen it yourself. But what happens here is my moving photoreceptor camera creates a depiction of the entire volume of the house and outside the house as can be seen from within. I could then move outside the house, go down to the scale of a blade of grass captured in that scene and walk behind the blade of grass which the camera identified as a solid object in the distance. I could then see what looking back at the house might appear like to an insect emerging from behind that blade of grass. It’s astounding to see how powerful this technology is.

My conversation with Ben one day went down a fascinating rabbit hole as we talked about the concept of time-exposures across cameras. Typically, a time exposure is one camera having its shutter open for a longer duration in low light contexts to create a detailed picture by allowing more photons through the iris-like shutter to achieve the desired exposure on the film. Time exposures don’t just get brighter, they get more accurate by taking in more of the photons from the reference scene, thereby getting sharper and truer to the scene observed with eyes. In our conversational framing we discussed a hypothetical lens as big as any number of people with any number of cameras at any number of times through a period. If you abstract away the lens and you abstract away time and positions of scene capture based on where the photographers or observers were, you’d get an ultra high resolution image, like a UHD photo, but taken as a 3D volume. (You could even throw other spectra like X-ray or infrared into the mix!) Naturally, if you sprinkle time lapsing into it, across derived novel perspectives you can re-animate the scene not as it was, but as it would have appeared from novel perspectives in the scene not actually captured, but inferred. As Ben pointed out, you could even color correct and optimize known-objects in the scene. That image of the moon not showing up well? Then you can over-dub your blurry moon with a high resolution NASA image of the moon and color/size adjust it down to fit in the appropriate place in your captured photo, he pointed out. You could re-apply sunny day hues to identifiable items in the scene based on the statistical average pantone scale averaged across a month or a specific season and position of the sun in the sky.

Riffing on this, we discussed a concept of a Gaussian Splat of a house photographed not in terms of visible light, but invisible radio spectra. Phones, Wi-Fi and Bluetooth radio devices are in nearly all homes and buildings now bathing our environment with short-distance high precision wavelenghts. So hypothetically, you could conduct the same Gaussian Splat capture I did with photons and replicate it using only the invisible spectra to get a clear view of the house in terms of how each-device’s radio waves bouncing or passes through the various walls in my house depending on their physical density and the reflective or absorptive properties.

All these things are on the edges of our emerging technologies now. Many don’t have tremendous utility at present. But eventually they may. We don’t have a use for Mantis Shrimp level spectral Gaussian Splats or radio-wave radiance fields of homes, unless for engineering purposes. But it’s fun to think about how these new advances can apply to new sectors. 

Artist rendition of Pulsar illuminating a gas cloud

I was reading about the discovery of BLPSR, a pulsar believed to be near the center of our galaxy. As Carl Sagan explained at length in his series Cosmos in the 1980s, pulsars are rapidly spinning neutron stars which serve as beacons we see all over the cosmos caused by collapsing stars in the final throes of their existence. They are not particularly exciting in general. All they do is blink rapidly in the sky. But what is exciting about them is that they are very regular in their pulsing which is caused by the period of their spin. So their signals can be used to precisely clock and therefore detect aberrations in spacetime that are caused by gravity wave ripples coming out of massive explosions between us and them or passing through us locally that we otherwise couldn't detect because our bodies ripple along with the spacetime warps of the of the environment containing us. The article in Scientific American pointed out that we could create a galaxy-scale Light Interferometer Gravitational-wave Observatory that would be far better than the ones we built on Earth because the length of the Milky Way LIGO would have a fidelity to peer across a distance of 26,000 light years. (LIGO on Earth is only 4 kilometers in size) 

The change of interval of a pulsar's flashes over time would help us measure the bending of time and warping of space between us and the light source. We could use it as a telescope for super-massive explosions in the pulsar's proximity and beyond, like a microphone at the center of the galaxy! We could perhaps even detect aftershocks of the Big Bang 13.8 billion years ago, thought to be echoing back from the edges of time's beginning. (We can see light from the edge of the fireball of our origin already. We may soon be able hear it as well.) When LIGO first captured the gravitational wave ripple of two neutron stars colliding, they rendered the wave frequency into audible range in this demonstration. A gravitational wave microphone at our galactic center might capture some fantastic cacophony from our immediate neighborhood. Also an interferometer the size of half the galaxy will pick up much larger wavelength variations with higher fidelity than any interferometer that can be built on Earth. 

Implementing a BLPSR gravitational observatory could be a fascinating development over the coming decades. But I had another funny thought inspired by my discussions with Ben. Just like the wifi/bluetooth signal being a proxy for a light source that can traverse walls, if we have an opportunity to monitor BLPSR over a number of years, and it happens to transit behind Sagittarius A, our galaxy’s black hole, then we can make a capture of Sagittarius A with much finer volume-based precision, isolating its Schwarzschild radius (half the diameter of its event horizon) very precisely by comparing the gravitational lensing that would happen when BLPSR is directly behind our black hole. We could also determine the full width of Sagittarius A's accretion disc by noting exactly when BLPSR’s redshifted appearance disappears until when her blue-shifted pulses emerge. Just as Henrietta Swan Levitt used Type 1A supernovae as a “standard candle” of brightness across the cosmos to determine the rate of our universe’s expansion, the so-called Hubble constant we may use a single source pulsar like and its perturbations to scan through and around Milky Way's environs and generate multi-point reference volumetric view impossible before. 

Beyond the LIGO-like applications, we may be able to use BLPSR, or other domestic pulsars in our galaxy if any others are found, as a camera flash to make time exposures of their light bending around their invisible companions. We could do so by merging time of all frames of the transit of BLPSR around an orbit and use her light blinks and time-warp corrected intervals from the flash rate changes to reveal Sagittarius A in a whole new light that can only be inferred from the motion of other objects at present. A Gaussian scan of Sagittarius A may even give us better sense of visible versus invisible matter in proximity to a black hole.

GAL-CLUS-022058s is nicknamed the "Molten Ring."
We know what a gravitational-lensing around space-time warps looks like outside our galaxy from Hubble Space Telescope and JWST images. But those astronomical objects are profoundly distant from us as are the light sources behind that bend around the warp illuminating the spacetime curvatures. So to have a single color flash from a known pulsar acting like a standard-candle through its journey over the long term could illuminate an orbit of our galactic center that would give us a beautifully precise view of the hottest densest part of our galaxy. Whether it looks like Kip Thorne's suggested theoretical depiction of an up-close view of an accretion disk (from the movie Interstellar below) or not remains to be seen. But if we are able to make a Gaussian Splat volumetric capture of our galaxy in the coming years that would be a fantastic way for the next generation to explore the mysteries of dark matter and dark stars in our proximity.

Kip Thorne's theoretical light refraction model of a black hole





Saturday, March 15, 2025

Far-seeing Devices for Accessibility

The German word for TV is Fernseher, meaning far-seer. I often think about that concept of the fixture of our living rooms which allows us to teleport to perspectives of other places far away. A mode of communion with others, distraction, learning. We are societally connected across the world like never before. We tend to live our lives situationally in our local communities, then at some point in our evenings we teleport our awareness into the lives of others for the snippet of time that came to be known as prime time. This slot of our societal calendars is reputed to have the broadest attention span of collective conscious focus. It came to have that moniker because of marketers seeking to have some time during the hour of evening news or entertainment that would give their messages the broadest appeal to the space-portal's "share of voice" in this communal time of focus.

When terrestrial TV fragmented into multi-platform and multi-screen surface areas along with the proliferation of the world wide web, our far-seeing capabilities stretched the demand for high-bandwidth, low-latency server infrastructures that now allow affordable person-to-person or person-to-server utilities. Any single person may have a dozen streaming or point-to-point server connections maintaining private messaging or content streams to their devices in a single day. It's an amazing platform we have for real-time-communication over web protocols at present. It's baffling to think back on every shift over a single lifetime of memory over how this shift took shape. 

I now work with a visor that transmits my writing and art projects from a nearby device to render either as a hologram in front of me, or as a flat window, depending on the mode of work I'm engaged with and my preference to work spatially or linearly. Sometimes I'll reach up in the air and grab onto a virtual dial to adjust the settings of this environment. I'm surprised at how normal it has become to adapt to a virtual representation of a utility. Through neuroplasticity and habit, my brain has transitioned from needing a tactile mechanical button, to using a digitalized utility represented graphically in a simulation of a physical interface. For devices far away from me I can call out to them with voice commands and they respond. Remembering my family sitting in front of a small Sony Trinitron TV when I was a teenager, it's hard to believe that now the screen is perceptually as large as half a room. It's easy to see how commonplace these media experiences are becoming. It's fascinating to see how they are enhancing and shaping our lives. 

As a teenager I remember seeing the old Star Trek movie The Wrath of Khan. In it Admiral Kirk is pitted in an awful battle against a charismatic cult leader who Kirk had unwittingly marooned on a barren planet by mistake. Now Khan was coming after him with a vengeance. Kirk and Spock, who regularly play 3D chess on a multi-dimensional chess board, are used to thinking on multiple levels at once. They reason that Khan lived his whole life on the surface of a planet and was unable to think of using extra dimensions to navigate and therefore are able to out-maneuver Khan in the fateful battle to preserve the lives of all the souls aboard the Enterprise. 

Growing up on the tools of the 1980s and learning to use graphical user interfaces of 2-dimensional planes I got used to track-pads and mice to move abstract representations around on a connected screen. With the emergence of WebXR (3D utility embedded in a 2D web browser that allows us to inflate a 3rd dimension around a piece of content using that web browser) I am now able to launch through the screen into an immersive simulated world much like the Holodeck portrayals of yesteryear's Star Trek of a vast universe far far away where I could boldly go in my spacey visor into the computer rendered worlds of our most imaginative 3D artists.

Seeing Eye Humans

Occasionally, I get a notification on my phone that someone who doesn't have the use of their eyes needs assistance. I pick it up and they point a camera at the thing they're needing assistance with. Their farseeing device shows me their situation and they ask me questions that I can advise on verbally. (Can you help me see the driver coming to pick me up? Can you check to see something in my environment that I can't find?) 10 million people in this network of seeing-eye-humans help other people in need of a helping eye. We as a community are a bionic panopticon of help for each other, somewhat like a Borg, but with better intent. Unlike Bentham's panopticon, we aren't a community spying on others in any other sense than for their requested support. Eventually people like me won't need to do this of course. Machine learning companies are trying to offer the same utility to seeing cases that they currently do for voice-requests on smart speakers.

I joined this volunteer support group after meeting a blind photographer in India, through his art. My discovery of his work was as improbable as his bold work to create his art with an assistive narration tool embedded in his camera. Microsoft had built a web hosting tool called AltspaceVR in the last decade for people to share 3D environments created with computer assisted design and animated 3D programming tools. When they announced they were getting rid of their AR headsets and backing out of 3D hosting during the global pandemic, I decided to go in to see everything that artists had created in the platform before its disappearance. (Microsoft, like many technology companies at that time, had to lay off thousands of staff as their businesses were impacted by the contraction of the global economy caused by rapidly contracting consumer spending.) One of the "worlds" I heard about was a 3D art gallery created by people who didn't have the use of their eyes. Wow! A 3D world created for sighted people made by artists who had limited sight? I had to see it. The world was a collection of sounds built on a vast landscape I had to traverse to reach the art gallery, a gigantic palace that had a sign on it reading "Blind Burners : Burning Bright Without Much Sight."

I am a photographer, so I was curious to explore this gallery of photographic renderings. On the top of the palace I found the work of Marimuthu, a photographer and poet who lives in Tamil Nadu in the south of India. I'd been to Chennai decades ago on a trip to explore the traditions of Carnatic music. Marimuthu had created a gallery arrayed with scenes he'd photographed from around his home. He narrated each photograph with a description of what his camera's descriptive image analysis said was in the picture. Usually I grasp a picture without paying detailed attention to all that is happening within the picture. Marimuthu came at photography from a different dimension. He "saw" through the description of what was rendered to a disembodied lens and made it into poetic evocative words. An ivy was more than a plant. The sun was more than a bunch of bleached out pixels on a piece of paper that you'd glance at.

The amazing leap of his work, from something described to him by machine, to a 3D art gallery he hosted to re-narrate the scenes of his surrounding was like he was bringing me into his world as if I was the one who had limited sight and he had a super-power of words about what was happening in the scene. There I was in Tamil Nadu, virtualized and reinflated, with him telling me what to notice in his environment. His gallery is gone now, so you can't visit it anymore yourself. But there may be another 3D environment where he provides tours of his home sometime in the future. Marimuthu's testimonial about the impact of BeMyEyes on his life made me want to become one of those sets of extra eyes. I joined BeMyEyes and Envision as a contributing sight-assistant. These services are like Chatroulette services that enable people with sight limitations to use the cameras in their phones to connect with remote sighted individuals who can narrate the scene to them. There are over 10 million people who volunteer like I do to help callers understand their surroundings. It only takes a minute or two to lend an eye. But it’s very impactful for those who rely on this service.

Then one day while I was in grad school I got a ping from one of the alumni. Would I talk to Meta about adding a BeMyEyes socket to the newly announced smart glasses the company was launching? Blind people wear glasses to protect their eyes typically anyway, the use of the forward facing cameras would be a great utility for customers who used the BeMyEyes service. I thought it was incredibly unlikely that two distinct apps on a phone would do pass-through of private camera information from the glasses. But if the right precautions were taken, it seemed feasible. Envision had done something similar with the Google Glass before Glass was discontinued by the manufacturer. So I reached out to Rusty Perez, the musician who was requesting the feature, and tried to network him into the group at Meta that deals with accessibility features for Meta, Quest, Whatsapp, Instagram and Facebook apps. Theoretically, I suggested, we could use the capability for Instagram Video to do a collaborative assisted meeting from the new glasses. Google Meet had a similar capability prior to the Glass deprecation, where one Glass-wearing user could stream forward looking camera into a shared session for to others on the call. We pitched the idea of him doing a presentation with me giving seeing-eye-human voice-over guidance about visual elements of his surroundings as he went about his day for an augmented reality conference, similar to the way BeMyEyes founder Hans Jørgen Wiberg had done for TEDx with his partial sight. Upon discussing the project with a friend of mine he said, "Ah yes. White Christmas episode of Black Mirror!" I hadn't seen it yet, but this was a story about a dating coach giving in-ear guidance for a shy lad who was uncomfortable going on dates alone. Because it's Black Mirror (which focuses on the "what could possibly go wrong" narrative arc of technology) you can imagine it doesn't turn out as hoped. The good news is that Meta announced a collaboration with BeMyEyes shortly after this conversation, addressing Rusty's requested use case to the new smart glasses. And they one-upped the value expected by offering a capability of using a voice prompt tool so that the wearer didn't have to fumble with buttons to make it work.

Since this exploration, I've been testing and researching other kinds of assistive scenarios using other augmented reality device form factors. Apple, Pico and Rockid have all enabled hub and spoke AR device utilization in assistive technology use cases for enterprise customers. This model of assisted seeing may come to be commonplace in future consumer hardware once the support model and privacy considerations have been robustly tested as it has been for our common use of web-cameras today. But for now, my community of seeing-eye-humans are just a tap of a phone away from reaching the tens of thousands of people who rely on BeMyEyes every day.

Hearing Ear Assistive Devices

When I was in college, I studied with one professor who was partially blind and another who was completely blind. It didn’t result in any particular issue for the students. But it was a particularly interesting learning experience to study focused primarily on sound. Professor Russel said the one benefit for students is that there would be an added requirement that we would submit all our term papers as recordings of us reading our work. “This is going to make your writing better. Because if there is anything wrong with your prose, you’re going to hear it before I have to.” There is an old adage that seeing is believing. Our eyes can take in chaotic information very easily, including badly written essays. But they often won't notice something that is there on the page. That semester I learned that things which look good on paper, to the minds eye, can ring false when read aloud. I learned that writing for the ear is a much better goal than writing for the eye. We can bend our minds around long sentences with parenthetical clarifications and multiple dependent clauses. But the narrative-seeking ear will often get lost in tangles of those complex sentences. Hearing a student reading a paper they don’t believe, or crafted hastily, is easy to determine by the sound of their voice while reading it. My German teacher similarly once cautioned me against complex sentence structure that I typically like to explore in my prose. “Don’t speak like you think. You’ll get lost in the conjugations and separable prefixes. Speak German in short sentences to be comprehended easily,” she cautioned.

During my time working with Mozilla, I learned about accessibility features built into the browser that could read pages aloud for Firefox users. From this I became interested in tools that allowed default-audiovisual experiences to be rendered to blind or deaf community easily without complex hurdles. I sometimes used these features myself to transcribe and translate texts from non-English webpages or audio into English, or for study purposes from English into the languages of my academic study. (Japanese, German and French) The web is an amazing medium because it is so mutable. We can come at content communally being differently-abled by linguistic background or anatomical impairments to perception. The web is a great equalizer. It’s a super-power for those with extensible tools. This is worthy of investment and development efforts I believed.

In the early days of crowdfunding platform Indiegogo, I had the opportunity to participate in a real-time speech translation program that would be a carried device with a microphone and a phone chip to send speech to a remote server for processing with output coming back to an earphone. My wife uses speech transcription tools in movie theaters already. This would do audio-to-audio for language translation, we hoped. The campaign collapsed due to unforeseen costs to the startup. But in the decade following, other companies made bold forays into realizing the vision in web services delivered over app interfaces. The server capability to achieve this utility is now quite affordable for the everyday consumer. But the intake-output front end remains a cost hurdle for most if they're looking for a wearable accessory. Now that there is considerable competition entering the space, devices are dropping in price to make this kind of consumer technology readily accessible to the masses via a lightweight apps operating on their mobile devices.

During my own company’s explorations into accessibility hardware, we focused on haptics. We theorized a tool for trans-literation of audio communications into a kind of haptic Morse Code of the body which the neuroplasticity of the brain could swiftly adapt to. Would an augmented hearing capability circumventing the ear be useful for people without functioning cochlea? It turned out that fixing the cochlea bionically was a more lasting and less cumbersome approach than routing sound to a tactile interface. Apple's latest advancements in wrist watches however has added the concept of an array of rhythmic patterns that can be associated with specific app-based triggers, similar to a haptic ringtone, to convey different notifications that the wearer can distinguish between contextual messages. (Calendar notifications might tap, thud or buzz differently than a text message or phone calls for instance.)

Accessibility research is a fascinating field. I saw demonstrations on urban design where spaces had been architected specifically with consideration for how a blind citizen would interact with them uniquely. The Braille-like dashes and dots on my city’s sidewalks were there to guide those with a cane around the city. As I started noticing these accessibility-focused designs, I began to see my city differently. I heard lectures about accessible design for architects. “Design for the edges and you end up making things better for everyone,” one lecturer pointed out. Slanted curbs designed for the blind ended up also aiding those with crutches or wheelchairs and even prevented phone-encumbered people from accidentally tripping into a road or onto the sidewalk even if they weren't otherwise motility or visually challenged, but because their focus was elsewhere beyond obstacle-navigation mode. 

As I aged, I started to augment my seeing with lenses myself. Because I grew up in a sunny place, my eyes started to blur with cataracts quite early. My optometrist said she could help me for a while. But eventually, I would need new inorganic lenses installed. Reading glasses were great for a few years. I could magnify brightly lit paper pages and read just fine. Then, I wanted to illuminate text through a bright screen which would shine right through the organic eye clouds as they appeared. I would need to read on a tablet that illuminated text. (This was not in the sense of Illuminated Manuscripts, but an interesting historical comparison.) Instead of leaning over my computer to read bright text, I got a bigger screen. Then I got a set of AR glasses and eventually a head-mounted-display. When the world illuminates instead of being reflected, cataracts don’t cloud vision, I noticed. But eventually I realized that I needed to take my optometrist’s advice and get new lenses in my actual eyes. I had no realization of how much I’d lost because I’d lost it gradually. Having visual acuity suddenly reintroduced after gradual blurring over a decade is a profound experience. I’m so grateful that in this era this is possible. The experience of gradually blurring vision is still a humbling memory. I have a new lens of sympathy for my friends who experience diminished visual acuity.

Augmented seeing devices have gone through a rocky decade. I remember in 2014 my first experience of wearing the Google Glass AR device after its first launch. I could get turn by turn directions while I biked. I could get text translated from Spanish to English visually in real time as I looked at a sign in Spanish (there are a lot) in my home town. We appeared to be on the brink of a new assisted-seeing revolution. And there was promise that augmented hearing was just around the corner. This future promise was briefly interrupted by a pandemic and the introduction of the Google Translation team's paper Attention is All you Need, which ignited the neural network machine learning revolution yielding the Generative Pre-trained Transformer. This in turn resulted in tremendous shifts of venture capital resources being redirected from hardware to machine learning investments for half a decade as new companies sought to capture funds in Generative AI utilities. Another shift of future funding is about to take place beyond plain inventiveness use cases of Generative AI tools directly related to situational awareness. That is significantly gated at present by hardware-specific connectivity challenges. But it's enough to say that the dreaminess of pure imagination will become more significantly relevant to assistive use cases within 3-5 years be useful for on-the-spot use cases like BeMyEyes is demonstrating.

I currently participate in multiple forums where developers brainstorm new extensibility of device and content features for socketed apps and tools to enhance our cognition, perception and creation using AI, apps or hardware tools. There is a thriving community of users with various levels of sight, hearing and varying cognitive styles using AR and AI tools to navigate and create more richness into the world around them. Machine Learning tools are just beginning to scrape the surface of what assisted cognition can enable for our future. Beyond correcting spelling and learning the fastest travel routes, machine learning tools in the assisted seeing and hearing hardware we wear will give us all super powers to be enhanced far-seers of tomorrow.

Sunday, January 5, 2025

Resurrecting the third dimension from the second

I remember my first time witnessing a hologram. In my case it was the Haunted Mansion ride at Disney World when I was six. As the slow-moving roller coaster progressed, we peered down on a vast ballroom beneath us with dozens of ghostly figures appearing to dance in front of the physical furniture that adorned the room.

Haunted Mansion Walt Disney World
Walt Disney World - Ghosts in the Haunted Ballroom

I asked my brother to explain how the illusion worked. He had a book on optical illusions that we would pore over, fascinated and bewildered. How does that still image appear to be moving, I wondered? I could see the two distinct parts of the illusion if I covered either of my eyes. But it appeared to come alive and move with depth when seen with both eyes open as my brain assembled the parts into a synthesized whole. In elementary school, I went to a science museum to learn more about holographic film. Holographs are easy to create in static film. It’s the perspective of having two eyes with an extrapolated sense of spatial location of what we see with each eye that creates the illusion of depth between two contrasting images or angles of a single image. 

In Disney’s example, a large wall of holographic film stays still while the audience moves by it. Each person's shifting perspective as the ride moves forward reveals a different view through the film. Though all the patrons are seated in a shared audience position, the image they each see is a different stage of motion of the ghostly characters as they pass by the static holographic film. The motion of the roller coaster at each moment in time is what animates the scene. Disney had created an industry of moving film in front of people who sit still. In this illusion, the Disney team had inverted the two. They animated the motion of the audience, while the projection remained still. This form of holography is very expensive, but it makes sense at scale, like an amusement park ride. The other way is to give holographic lenses to people to wear as glasses in a theater. The illusion can be created by different gradients between those two, film near to the user or progressively further away. Bifurcating the views between each eye in a way that the brain decides to merge them as a spatial volume is the real trick. Recently intermediate distance screens for rendering stereoscopic views are coming to the consumer market. In these interactions, you don't have to wear glasses because a film on the surface of the display creates the split image perspective. The 3D displays of Sony and Leia achieve their optical illusions on the surface of the computer screen, positioned within a yard of the viewer, extrapolating where the viewer's head is positioned, thereafter flashing the two different images intermittently in different directions. 

I am fascinated by content marketplaces and developer/publisher ecosystems for the internet. So I and my colleagues spend a lot of time talking about how the content creator and software developer side of the market will address content availability and discoverabilty now that stereoscopic head mounted displays (VR and AR headsets or 3D flat screens) are becoming more mainstream. The opportunity to deliver 3D movies to audience primed to enjoy them is now significant enough to merit developer effort to distribute software in this space. Home 3D movie purchase and viewing is currently constrained by the cost of requiring at once the 3D player + 3D screen + 3D movie. A simplified content delivery process over the web or a combination of those three elements will produce lowered cost, broader availability and greater scale of addressable audience.

Hollywood has been making stereoscopic content for cinematic distribution for decades and will continue to do so. Those budgets are very large and some of those budgets are trickling down to architectures that will make the overall cost drop for other content creators over time. As an example, Pixar’s 3D scene format "Universal Scene Description" is now open source and can be leveraged for holographic content creators in mobile and desktop computers today. USD support is now embedded in iPhone’s ARKit such that 3D holographic streaming on everybody’s computers is just a hand’s reach away if the content distribution network were in place.

What can we do to bring 3D depth back to formerly released media? I remember my father taking me to see a 3D rendering of Creature from the Black Lagoon from the 1950s. Each frame of the movie was slightly offset with a blue halo to one side and a red halo to the other side. When we sat in the theater wearing “anaglyph 3D” viewing glasses that had red and blue lenses, our minds would think they were seeing depth as the lenses adjusted to the length of the red and blue shading around the actors and landscapes on the flat movie screen. Filming that movie required the actors to be stereoscopically filmed with two cameras so that it could be rendered to theaters that provided audiences with either anaglyph 3D or polarized lens 3D glasses.

However, today creating scenes that appear slightly different from each other, offset by 3 inches of interpupillary distance of eyes for instance, is is a tactic achievable with radiance field photography. (Read more about radiance fields and the formation of novel views here.) Leveraging neural radiance fields allows a scene, that is otherwise a still photograph, to be wiggled side to side generating a slight senses of depth perception. An alternative approach is to artificially generate the interpupillary offset by entity isolation artificially in the interpreted image leveraging a process in the graphical processing unit (GPU) of the local device. AI rendering capabilities could enable us to synthesize depth into older movies that lacked stereoscopic cinematography previously. 

Depth-simulation in the gaming sector captivated my colleagues a few years back when we discussed the process of using a game engine to re-render a legacy video game on a PC to synthesize an artificial depth perspective. (See Luke Ross R.E.A.L VR Mods to understand how this effect is achieved.) PC modification of HDMI signal output is a relatively simple trick of flickering different perspectives very fast between each eye in a VR headset. The spatial layout of a game environment is previously coded by the game developer with the position of the player and nearby objects interpreted at game play as the coordinates of motion are communicated by the player and sent to the player's display as a flat image. The game works just fine if there are two cameras inserted into the rendering instead of just one in the original game design. This allows a whole trove of legacy 2D games to be experienced anew in a different way than players had first enjoyed them. Currently there is a very small market subset of PC-gamers who are opting for this intermediary layer re-rendering of flat screen games to enjoy them deeply in VR. "Flat2VR" community has a discord channel dedicated just to this concept of game modding old games with some developers offering gesture mapping to replace former game controller buttons. The magic of depth perception in modded PCVR games happens on the fly on the player's computer for each game by having the game engine render two scenes to a VR headset's dual ocular screens. It requires an interpreting intermediary layer that the player themselves has to install. 

In theory, the intermediary/interpretive layer that inflates the flat scene to be depth-rendered dynamically on the player's screen is not exceedingly complex. But it does require extra work from the GPU that encodes the two outputs to the player's screens. A similar feat couldn't be done easily by a mobile device or an ordinary game console. However, a set-top box could, in theory, render other types of media beyond games to depth perspective on the fly just as the game engine approach does for old games. The same alternating-eye-method used in Flat2VR game modding could introduce fake sense of volume perspective into the background of a movie even without a leveraging a 3D game engine.

Apple has introduced a new capability to infer spatial depth back into 2D photographs we have taken in the past, as individual users. I enjoy viewing my photographs afresh with in 3D interpreted depth perspective in my Vision Pro even though the pictures were taken 20 years ago in 2D. These aren’t holographic renderings because the original depth of field wasn't captured in the pixels. The rendered image is re-synthesizing estimations of what the original scene looked like with depth added based on distinctions between subject and background elements rendered on the device trained by a machine learning process across thousands of other photographs. Apple didn't need to receive a copy of the picture (in this case of my brother standing in front of St. Mark's Cathedral in Venice) to know how far in front other people my brother was standing at the time of the photo. The position of his feet appearing in front of the flat ground behind him let the machine learning algorithm extrapolate his position spatially by assuming the ground is flat and that he was closer to me than the other people who appear smaller in the flat image. These photos are generally more fascinating to stare at than the originals because they lure the eyes to focus on different levels in the background of the original photo instead of just resting immediately on the initial subject.

Depth inference by an machine learning could also achieve the same effect for older 2D movies from the last century by the same means used estimate depth in stills. This means a treasure trove of 3D movies could await at the other end of our streaming web services if they could be rerendered for broadcast by an depth analysis layer to the stream or prior to streaming, similar to the real time Flat2VR depth-rendering display mods.

Whether our industry moves forward by machine learning re-rendering of back catalog 2D movies for mass consumption on a central server (leveraging a new distribution methods) or whether we take the path of doing depth-rendering in a set-top box approach, for all content already delivered to the home dynamically, remains to be demonstrated and tested in the marketplace. In a market constrained by the availability of HMD and stereoscopic flat screen adoption, the latter approach makes more sense. Yet the decreasing cost to manufacture these new screens may mean studio efforts to focus on depth-rendering of an entire back catalog of films makes sense for new dedicated 3D streaming channels across a diaspora of stereoscopic display options. This mode of development and distribution would be possibly parallel what is already happening in the Flat2VR gaming community. Over just 2 years since my following the sector, I've seen several major game studios jump into re-releasing their own titles as dedicated VR games, sometimes with almost no embellishment beyond the story of the original game other than depth, yet unlocking new audience access for the old titles just by doing so. However, gaming market moves much faster than mainstream TV media as gamers are already fully bought into the expensive GPUs and HMDs that make the art of 3D rendering a minor additional expense. My estimate is that it will take only one decade to make 3D viewing of all legacy 2D content an obvious and expected path for the consumer market. We may look at legacy 2D movies as a diminished experience in the near future, wanting to have our classic films embellished to show depth the way that black and white movies are now being released as colorized versions on web streaming channels. 

Having a movie studio re-render depth into all legacy content is expensive. But it makes sense at scale, rather than having all consumers buying a set-top converter to achieve the same ends. It's a question of how many people want to benefit from the enhanced means of viewing, and who we want to pay the cost of achieving the conversion. We have 2D movies now because it was the easiest and cheapest means to achieve mass media entertainment at scale at the time. Now as we reach a broader distribution of 3D displays, we approach the moment where we can re-inflate 3D into our legacy creations to make them more similar to the world we inhabit.





Monday, December 23, 2024

Reflections on the evolution toward volumetric photography

During college I read Stephen Jay Gould’s books on natural history in the animal kingdom with fascination. He writes extensively on the many different paths distinct species took to develop eyes through successive enhancements in different branches of the tree of life. The development of eye designs and uses were randomly selected for benefits conveyed over time in enhancing survival traits for those species with them over lesser complex traits of their predecessors as biological competition increased over time. In science workshops as a youth, I designed pinhole cameras simulating the pupil of the eye and enjoyed taking apart old cameras to study how their shutters worked. When planes land, I’d notice inverse images of the ground projecting on the ceiling of the plane’s cabin, like a retina image, through the pupil-like windows. I’d study and ponder about General Relativity, the cosmic limits of light’s speed, and its implications about the nature of the cosmos and its origins.

With this obsessive fascination with light and seeing, it makes sense that I’ve been an avid photographer for decades. I always look out for the newest tools to capture and save light reflections of the places I travel. Now that stereoscopic and 360 cameras are coming into the mainstream, my hobby as a light collector has had to shift accordingly. My 2D photos can be conveyed well in my photo books. However these newer forms of photographic media need to be shared and enjoyed with different means of re-reflection, depending on the nature of the scene captured. Photospheres of 360 cameras are best experienced in spherical projections, like a planetarium, which can only be seen in a head mounted display. (HMD) Stereoscopic depth images can be viewed on both 2D and 3D flat screens, but are also better appreciated in HMDs. Now the cameras to capture these new scenes outnumber the volume of HMDs sold in the consumer market. So the audience for these 3D scene captures is likely limited to those who craft them and share them peer-to-peer.

This will shift as more companies popularize their own new form factors for head worn displays and glasses, be they pass-through visors offering augmented views of the normal world (Google Glass, Snap Spectacles, Magic Leap, Xreal) or dual purpose with both AR and VR modes (Pico, Quest, Vive, Varjo, Vuzix, Vision Pro, Moohan, etc.) As you can see with a plurality of different devices people have to choose from, it’s clear there needs to be a common platform for photographers and motion capture enthusiasts to distribute their media. App based content distribution would be really limited to address the full breadth of the markets’ device form factors. But they currently bridge us to the future of unfettered browser platform support. It is clear that web based utilities are necessary to facilitate ubiquitous across the panoply of devices. This is what makes me very excited about the cross-device web coding standard WebXR, which is now supported across all major web browsers.

WebXR hosting and streaming will be the easiest path for content creators and photographers to give access to their captures and artistry. Affordable and comfortable HMDs or flat-screen stereoscopic displays (made by Leia or Sony for instance) are going to gradually scale up to supporting mass audiences who today can only experience this media on a borrowed or shared device. With Apple announcing official support for WebXR in Safari last year, and Google’s recent announcement of their renewed investments in 3D rendering with Android XR operating system for HMDs, the ease of access to experience spatial depth pictures and 3D movies will soon be broadly accessible on affordable hardware. Many people may not particularly want to see photospheres of Italian architecture or watch action movies, like Avatar, streamed in 3D depth because of being susceptible to motion sickness or vertigo. But for those who do, cost of access is soon not going to be such a limiting factor as it has been over the past decade. Price points for HMDs and 3D flat screens will fall due to increased manufacture scale, lower component cost and increased competition. Now that the hardware side of access is starting to become affordable, the content side is a new opportunity space that will grow in coming years for novice creators and photographers like me.

Last year marked my first leap into volumetric photography and videography. I’d read about it for years, but due to lack of good public sharing platforms, hadn’t taken the leap to sharing anything of my own. There are a few compelling applications in HMD app marketplaces for peer sharing of stereoscopic video streaming, 360 and wide angle photographs. But what I’m really excited about now is volumetric scanning of public scenes. This is yet another spin on photography that allows a feeling of presence in the space photographed because of distributed perspective of the light field of the scene, meaning different angles of reflection on all points of space inside the rendered volume. These aren't captured by one shutter click at one moment of time, but multiple perspectives over a span of time, distilled into a static scene that can be navigated after capture. In volumetric captures, you can walk through the space as if you were there in the original scene. Your camera display, or your avatar in HMDs, will show you different positions in space as areas of a point cloud look from any particular perspective you navigate to within the still spatial image. 

Mark Jeffcock Gaussian Splat - Forest of Arden
Mark Jeffcock Gaussian Splat - Forest of Arden
This kind of volumetric capture was formerly only accessible to professionals who used lidar (laser reflection) cameras to generate archeological, geological or municipal landscape scans leveraging moving cars, satellites, drones or with large camera rigs. This approach was used to create Nokia Here Maps (now branded Here), Keyhole (now branded Google Earth) as well as satellite maps of Earth’s crust that reveal archeological and geological formations such as this scan of the Yucatán peninsula, which revealed stone structures underground of a previously undiscovered Maya city.


Lidar scan of Maya city in Campeche region (courtesy BBC)

Years ago I had my first experience in a simulated archeological site when my brother-in-law gave me a tour of a reconstructed dig site in stereoscopic simulation. Thereafter, I sought out the founder of the Zamani Project, a team of archeologists who scan ancient anthropology sites in Africa and the Middle East to preserve their structures for remote study. Would it be possible to make these structures discoverable on the web I wondered? I had explored the ancient cave city of Petra by foot and wandered the vast cities of Chichén Itzá and Uxmal in the Yucatán on my photographic expeditions. I wanted to go back to them virtually. But there isn’t currently such a means to do this. Online map versions of these amazing sculptural sites were rendered as flat photographs. Then I learned more about aerial photogrammetry. In contrast to lidar scans of Zamani Project, photogrammetry allows a rough topography to be captured and synthesized in a computer aided design program such as Capture Reality. It's much less expensive than lidar scanning. Upon meeting drone photogrammetry experts, I tried to price out a project to put Chichén Itzá on the web-map in a way that people at home could experience these archeological sites remotely. Cost wasn’t a particularly high barrier I learned. Even I could afford to do it as a civilian enthusiast it seemed. The challenging part was the authorization to pilot drones over an archeological site and subsequently republish them which needs to be obtained from local government.

I’d first discovered photogrammetry when Microsoft announced its Photosynth service over a decade ago. The same way our eyes assemble our comprehension of 3D depth using optical parallax of our two eyes' blended perspective, Photosynth would assemble photos of hundreds of perspectives to infer the geometry of large spatial objects. Though stitching of photos into a 3D simulation can be somewhat expensive in terms of compute time, the increase in cell phone computing power of today means that the processing of these images can be offloaded to the image-capturing mobile phone instead of the more expensive centralized rendering of on a server. Over the past couple of years I followed the emergence of neural radiance field research, abbreviated as NeRF, which was re-invigorating the approach of 3D map views  accessible to the public. (Read about the photo stitching process demonstrated in the Block NeRF research coming out of UC Berkeley and Alphabet company Waymo which assembled millions of photos taken from self driving cars to create a navigable map of the city formed only from stills taken from the onboard cameras. Note also that moving objects and people's ephemeral presence is abstracted out of the still images over time as their transience is removed from the underlying point cloud of the city. Privacy protection is therefore a useful attribute of scene capture over time as well.)

Block NeRF - Waymo camera neural radiance field scan of San Francisco

Last year I took online classes for photographers to learn on how to do photogrammetry scans and self publish them. These “holographic” images are called Gaussian Splats. This is the newest approach to doing photogrammetry to assemble sculptural depictions of 3D landscapes. The teacher of my course was Mark Jeffcock, a photogrammetry specialist based out of the UK who captures amazing landscapes and architecture using apps that allow the export of Gaussian Splat images to be uploaded for web access. (See his amazing capture of a sculpture he titles Knife Angel hosted on Arrival Space.) 

Knife Angel by Mark Jeffcock
Mark Jeffcock Gaussian Splat - Knife Angel
Though spatial photos and videos using parallel cameras is just getting started in the mainstream, I anticipate this new standard of volumetric capture is going to keep pace with the others as an engaging way for VR HMD users to get into sharing real world settings with each other. The question is how to make these volume captures broadly accessible. Both Arrival Space and Niantic are jumping into the opportunity to offer peer-to-peer hosting of Guassian Splats. (There is a hint that Varjo and Meta may eventually introduce peer-to-peer sharing of Gaussian Splats in the future, though probably only within an app contained and logged-account context.)

If you are keen to explore some Gaussian Splats in your own HMD or 2D browser, I encourage you to visit Arrival Space to see some of the community scans that are shared by the 3D photographer community in the galleries there. Though I am just a beginner at volumetric captures you can start off in my gallery to see how I tell stories with the scans I make. Creating Gaussian Splat scene renders takes a lot of time, as areas of the scene will appear blurry when the photographer hasn't dwelt long enough on any certain portion of the scene. I still remember during my classes last year when Mark said, "This is a really good capture. But you forgot to point the camera at the ground." Because of this, our class was standing in a Gaussian Splat capture that had nothing to stand on. Becoming a true 3D photographer means that we have to think like cinematographers, capturing how a scene draws the eye over time. Or we need to use a camera lens broad enough to capture photons from angles that we typically ignore when we frame a photo. (Fish-eye lens splat capture is currently being researched, by a company called Splatica, which takes full fish-eye movie footage of a scene to render the 3D still capture.) I anticipate that in the coming year, dozens of new cameras and new hosting platforms will emerge to address peer-sharing of amateur photography. If this media captures your imagination, then this is a perfect time to become one of those photographers. If you have ambition to try your own hand at creating Gaussian Splats, start off by trying Scaniverse which allows easy exporting of the ply/spz 3D file types necessary for uploading to your own personal galleries on Arrival Space. I encourage you to get out and explore aspects of your environment and culture that you can share with others across the internet now that WebXR makes it possible for us to share spaces with those far away.

Note that if you are seeking to explore my travel scans in 3D spatial depth, you'll need to open that URL in a 3D viewer or an HMD browser. You'll have a default avatar to explore these spaces and talk to other people. You can change that with a login. Get started at https://arrival.space/ with your own personalized URL.

For in-depth reading on Gaussian Splat capture, see more from New York Times R&D team.

Sunday, November 24, 2024

What does it want to say? (An approach for chasing bugs)

 Long ago when I was studying French, I came across an idiomatic means of expressing a question about a word or thing. If you want to ask a French speaker to help you with defining a word’s meaning, you can ask, “What does it want to say?” (Qu'est-ce que ça veut dire?) I liked the idea that the word actually is personified as a willful entity in this question’s framing. The word has a desire or a will that needs to be considered. Over time, I used to think of technology problems in this way. If a computer wasn’t functioning properly, I’d frame it as “What is the computer wanting to say/do? And how am I seeing it try to do that? And what’s the result? If you ever have a strange behavior of an app or computer process, walking through the steps to diagnose how they communicate will sometimes help you troubleshoot. By narrowing down the steps in their way of reasoning/acting, you can often isolate the problems and facilitate the process such that the device can achieve it’s goal, which is a proxy for your goal in using the device.

If your phone or computer wants to find an internet connection. Is there one? Is it LAN cable, wifi, Edge, LTE, 3G, 5G? (Each is slightly different in the process of connecting and what volumes of data it can transmit which can gate your device from accessing or sending some information.) If your device “wants” to use that internet connection to reach a website to refresh its content, what happens or doesn’t happen when that request to the server returns a response? Gradually from the keyboard to the screen to the operating system to the network to the connected service provider, you can observe each step in this flow. When I worked in Tokyo providing search engine services to internet portals, my clients would sometimes call me on the cell phone asking me to diagnose an issue. I could run an isotope query or trace route through the network to test their servers’ response and connection to my company’s servers. It’s like the game of Pooh sticks from the Winnie the Pooh stories. If you drop a stick on one side of a bridge, you can watch it come across under the other side of the bridge and have races with your friends, or bears, to see whose sticks come out on the other side the fastest. This is the same process as doing device level troubleshooting. Who is saying what how?

I recently had to troubleshoot a perplexing connectivity issue. My father so enjoyed the process of going through this with me that he asked me to write about it. He could find no web documentation of the issue he’d encountered as he was experiencing it. My father worked as a systems engineer for many years at IBM and wrote programs in dozens of languages from early mainframe computers to modern day Macs. He knows pretty much every trick that a Mac can do and has followed computer web forums for years to expand his understanding of how the Apple operating systems had shifted from pre system 10 (aka OSX), through the AMD and Intel chip phases to modern “Apple Silicon” chips.  So him reaching out to me about a technical bug is somewhat of a rare thing. It was usually the other way around. But I was able to narrow down the observable symptoms to several potential root causes until finally figuring out the bug as a network issue rather than an operating system or hardware issue. In case this issue affects your home computer/network, or if you’re just curious the steps involved in doing this, here is how we sorted it out.

My father’s computer had a peculiar symptom that he’d never seen before in 30+ years of working on Macs, that I’d also not seen.  Inside perspective, the symptoms were that he couldn’t access bank websites on his new Mac devices. He couldn’t do a phone-home system re-install from Apple servers nor contact Apple support from the machine’s integrated communication system. The symptom didn’t happen on his wife’s older operating system on an Intel-chip Mac. So he narrowed down his conclusion that it was either his Apple Silicon Mac or the operating system updates, which had recently been upgraded, which were potentially the source of the issue. Outside perspective, his FaceTime (VOIP calling) availability disappeared for me when networking to him. Apple's servers were telling my devices that they couldn’t find him on network. So I expected something was wrong with his ID, account or potentially a compromised device. Because I’d read of phishing tactics in the press and how to avoid them, I started by triaging if there was a potential malware issue with his machine. It didn’t seem that was the case. Everything else on his Mac worked, except for those apps that required web resources to be fetched securely by an internal web-dependent function. He checked with his banks to ensure there was no suspicious activity in his accounts nor attempts to reroute or reset his bank logins recently. We established a secondary channel of communication and assured that his account phone numbers had not been redirected. Once we sorted out that he wasn’t subject to any immediate risk, we took to testing and ruling out other potential issues.

Source: Wiki Commons
Cutting to the chase scene: Ultimately, after ruling out as many factors as we could, we isolated the issue as being an IPv6 problem. Over a decade prior I had attended a lecture on how the web industry was transitioning to a process of generating IP addresses for the planned future of broader range of computers coming online over future decades. IPv4 process for issuing IP addresses for devices, to identify themselves as unique entities across the web, was going to reach a scaling issue akin to the old Y2K issue that took place at the turn of the 21st century. (More information about that elsewhere as it’s analogous but not directly related. It had to do with date formats used in computers not device namespace on the web.) The new IPv6 process of self-identifying devices over a network would use a much wider range of values than IPv4 addresses, meaning that there would be lower risk of any two devices being confused with each other and creating network conflicts due to simultaneous inbound connection requests. This lecture was deep in my mind, but the memory was triggered because of a comment someone made about the IPv6 transition resulting in higher security of networks in the future. It was relevant here because my father’s computer was somehow communicating in a high-trust context, banks and Apple networks, in a way that wasn’t being accepted by those parties on the other end because of the way they were being contacted by his devices. The banks were not responding because they didn't accept the incoming request as valid for the high-trust context. Why would other computers work and the newer computers not? Could there be a difference in how Intel-chip Macs and their operating systems convey TLS (Transport Layer Security) traffic over the web? Sure enough, that was appearing to be the issue. The banks and the Apple netowork were accepting the IPv4 traffic from the older Mac. But the newer Macs and their respective operating systems were transmitting IPv6 values which weren’t getting through the network to establish that trust necessary to proceed. Once we configured his network to route IPv6 values generated by the OS his computer, browsers and applications started functioning flawlessly again. You can read much more in depth about IPv4 and IPv6 elsewhere, but suffice it to say that there was nothing wrong with his Mac. It was the attempts to communicate secure device-unique values over the network that were failing.

I hope that you don’t run into a network routing problems like he had over these winter holidays. Bank customer service, Apple customer service and even your internet provider may not be familiar with your home computing or network setup. They also may have difficulty understanding what issues you face based on how you describe the problem. But tracking down how the symptoms represent, will help you communicate with them, or your relatives, friends or technical support services to resolve whatever challenges you face.

In the computing era we delegate our willful processes to these device "agents" that act over the web on our behalf. Just like humans, they can get tripped up on the way to saying things, or the channel through which to express them. Like studying a foreign language, we can examine the terms our agents use to help them communicate for us more effectively. When their speech breaks down, we have only to examine the vocabulary and steps they use to get across their "meaning" and thereby return them to functioning eloquently on our behalf.

(Info Link) For more on IPv6 see: https://en.wikipedia.org/wiki/IPv6

(Non-paid promotion of French) Learn more French from my favorite French podcast by Louis: https://podcasts.apple.com/us/podcast/learn-french-with-daily-podcasts/id191303933

(Non-paid shout out to CES) Special thanks to the Consumer Electronics Show for offering the lecture on IPv4 vs IPv6 that set us on the right track in this particular case.

Further details:

For those who want to follow the troubleshooting steps we used, the important clues and conclusions and the route to isolating the problem behavior, details are:

Key issues:

  • Important steps in the investigation were first the non-working FaceTime VOIP service and his device’s inability to connect to Apple servers. This showed that it was a two way problem across multiple applications while non-sensitive web traffic was unhindered.
  • Testing the IP address configuration was the main key to resolving it. My computer registered an IPv6 address when querying whatismyipaddress.com from outside his home. His computer registered an IPv4 address but not an IPv6 identity for web traffic. 
  • Then, when I tested my computer in his home environment, my computer experienced the same issue as his. (I used a more recent beta version of MacOS than he did.) Replicating the bug with a different machine on a different version of the OS conclusively proved that the network was the gating source of the problem.

Questions and steps in our exploration narrow down to the answer:

  • Had his IP addresses been flagged as a phishing or malware source, leading to banks blocking traffic? (Confirmed not.)
  • Had his phone number been re-routed recently in a way that hinted a shift in the trust relationship the bank had with his account?
  • Operating System issue? Could we try a fresh install of a base operating system. (Not possible in his case because Apple silicon OS doesn’t allow boot from terminal mode on an external machine the way AMD/Intel chip Macs did. Both of his Macs couldn’t revert to his wife’s OS because of OS incompatibility between recent Intel chips and newer Apple silicon chip devices.) 
  • Browser issue? He was accessing banks via several browsers, all failed, while regular non-logged-in sites would function fine across all browsers. (This means there was not a browser-dependent issue causing the problem. But because secure sites were failing to load, including Apple, it made me suspect Transport Layer Security over TCP/IP was the problem of some kind.
  • Cable internet access restrictions? Because some cable internet providers give parental controls, I suspected something that Comcast had done could have rolled out a traffic throttling limit for some accounts or all customers in a region inadvertently. Did any of his friends who used this provider complain of loss of access?
  • We reset the DCHP settings of his computer to no avail once we expected networking to be the problem.
  • Finally we bypassed his router which was the main gating device and thereby resolved the issue. We resolved to let his router be used for non-sensitive traffic around the house, but not sensitive or secure traffic from newer devices in the home.

Sunday, October 15, 2023

Using VR comfortainment to bring an end to the US blood supply shortage

I conducted my MBA during a fascinating time in our world economy. We’d endured through a pandemic that shut down significant portions of our economy for nearly a year followed by surging interest rates as government response to the pandemic resulted in significant inflation and subsequent layoffs in my region. While this was a dramatic time for the world, it was a fascinating time to return to academia and evaluate the impacts to the global economy of natural and artificial stimuli.

For our masters thesis we were asked to identify an opportunity in the economy that could be addressed by a new business entrant. In discussing with several of my MBA class cohort, we decided to focus on the blood supply shortage that resulted from the end of the pandemic. Why would the US go into blood crisis at the end of the pandemic we wondered. Shouldn’t that have been expected during the peak of the pandemic in 2020 or 2021? But it turns out that during the pandemic surgeries and car crashes dropped at the same time that blood intake to the supply dropped. It was only after the pandemic ended that supply and demand got out of sync. In 2022 people started going to hospitals again (and getting injured at normal rates) while the blood donor pool had significantly shrunk and not recovered its pre-pandemic rate of participation. So hospitals were running out of blood. What's more concerning is that it looks as if the drop in donor participation isn't a short term aberration. Something needs to shift in the post-pandemic world to return the US to a stable blood supply. This was a fascinating subject for study.

As we began our studies we interviewed staff at blood banks and combed through the press to understand what was taking place at this time. There were several key factors in the drop-off of donors. Long-Covid had impacted 6% of the US population, potentially impacting willingness to donate among those individuals who’d participated before. (Even though blood banks accept donations from donors who have recovered from Covid, the feeling that one's health is not at full capacity impacts the sentiment one has about passing on blood to another.) At the same time there was a gradual attrition of baby boomer generation leaving the donor pool while younger donors were not replacing them due to generational cultural differences. Finally, the new hybrid-work model companies adopted post-pandemic meant that blood-mobile drives that took place at companies, schools and large organizations could no longer receive the same turnout for blood drives that had formerly taken place at those locations.

The donation pool we’ve relied on for decades requires several things. So we tried to identify those aspects that were in the control of the blood banks directly:

  • First, an all-volunteer unpaid donor pool requires a large number of people in the US (~7 million) willing to help due to their own internal motivations and having the ample time to do so. Changing people’s attitudes toward volunteerism and blood donation is hard to do while marketing efforts to achieve this are expensive. In an era when more people are having to work multiple jobs, the flexibility to volunteer extra time is becoming constrained. There is likely going to be an ever worsening trend of time scarcity among would-be donors in contrast to the pre-pandemic times.
  • Second, there needs to be elasticity in eligible donor pool to substitute for ill would-be donors in times of peak demand. Fortunately, this year FDA has started expanding eligibility criteria in reaction to the blood crisis, permitting people who were previously restricted from donating to participate now. However, this policy matter is is outside the control of blood banks themselves. Blood demand is seasonal, peaking in winter and summer. But donors are consistent and are difficult to entice when need spikes due to their own seasonal illnesses or summer travel plans.
  • Third, and somewhat within the control of blood banks, is in-clinic engagement and behaviors. Phlebotomists can try to persuade upgrades in donor time during donor admission and pre-screening. This window of time when an existing donor is sitting in clinic is the best time to promote persistent return behaviors. Improving the method of how this is achieved is the best immediate lever to bolstering the donor pool toward a resilient blood supply. But should we saddle our phlebotomists with the task of marketing and up-selling donor engagement?

Considering that there is no near-term solution to the population problem of the donor pool, we need to do something to bolster and expand the engagement of the remaining donors we have. In our studies we came across several interesting references. "If only one more percent of all Americans would give blood, blood shortages would disappear for the foreseeable future." (Source Community Blood Center) This seems small. But currently approximately 6.8 million Americans donate blood, less than 3% of Americans. So it's easy to see how a few million more donors would assuage the problem. But the education and marketing needed to achieve this end would be incredibly expensive, slow and arduous to achieve. It’s hard to change that many minds in a short time frame. Yet this comment from the same source gave us an avenue to progress with optimism: "If all blood donors gave three times a year, blood shortages would be a rare event. The current average is about two." We agreed that this seemed like a much more achievable marketing strategy. In our team calls, Roy Tomizawa commented that we need to find something that makes people want to be in the clinic environment beyond their existing personal motivations for helping others. He suggested the concept of “comfortainment” as a strategy, whereby people could combine their interest in movie or TV content with time they’d sit still in the clinic for blood donation, dialysis or other medical care. If we were to transform the clinic from its bright fluorescent-lit environment into a calm relaxing space, more people may wish to spend more time there.

As a life-long donor, I've heard a lot of promotions to increase the frequency of donation while in clinic. But during intake so many things are happening. 1) FDA screening questions, 2) temperature check, 3) blood pressure measurement, 4) hemoglobin/iron test, 5) verbal confirmation of no smoking or vaping. This battery of activity is an awkward time for phlebotomists to insert promotional campaigns on increasing engagement. One day I noticed some donors were doing something different in the blood bank and I asked about it. Then I was informed how the blood apheresis process differs from whole blood donation. It involves the use of a centrifuge device that can collect more of a specific component of blood product at time of draw from a single donor then returning the rest of the blood to the donor. Not only does this yield multiple individual units of blood per draw, the recovery time between donations is shorter. Whole blood donations require 2 months of time for the donor to replenish their blood naturally before another whole blood donation. Apheresis donors lose less of overall blood and can therefore return more often. The only downside of this is that it requires more time from the donor in-clinic.

Because apheresis was the most flexible variable that blood banks could impact as demand and supply waxed and waned, our study zeroed in on optimizing this particular lever of supply to address the blood shortage. In a single blood draw via apheresis, a donor can provide 3 units of platelets, compared to whole blood draws. This allows the blood bank to supply three units immediately after draw to hospitals instead of having to use a centrifuge on post-donation pooled units of whole blood from multiple donors. Platelets are uniquely needed for certain hospital patients in the case of cancer patients or among those with blood clotting disorders. Regarding other blood components, an apheresis blood draw can provide 2 times more red blood cells than what would otherwise be donated as whole blood. At the same time that a donor is providing platelets, they may also provide plasma in the same draw, which provides leukocytes which can help patients with weakened immune systems by providing natural antibodies from healthy donors.

Hearing all this you might think that everybody should be donating via apheresis. But the problem with it is the extra time needed, an additional hour of donor time at least. A donor planning to donate for just a 15 minute blood draw may be reluctant to remain in apheresis for one to two hours, even if it triples or quadruples the benefit of their donation. Though this is one factor that can be immediately augmented based on the local hospital demand, asking donors to make the trade off for the increased benefit can be a hard sell. 

When I first tried apheresis, I didn’t enjoy it very much. But that’s because I don’t like lying down and staring at fluorescent lights for long periods of time. Lying on the gurney for 15 minutes is easy and bearable. Having phlebotomists try to persuade hundreds of people to change their donations to something much more inconvenient is a difficult challenge. Some blood banks offer post-donation coupons for movies or discounts on food and shopping to promote apheresis donations. My team wondered if we could we bring the movies into the clinic the way that airlines had introduced movies to assuage the hours of impatience people feel sitting on flights. Having people earn two hours of cinema time after donation by sitting still for two hours in clinic begs the question of why you couldn't combine the two together. Donors could watch IMAX films at the clinic when they'd plan to be immobile anyway!

We interviewed other companies which had launched VR content businesses to help people manage stress, chronic pain or to discover places they may want to travel to while they're at home. We then proceeded to scope what it would take to create a device and media distribution company for blood banks to entice donors to come to the clinic more often and for longer stays with VR movies and puzzle games as the enticement. Introducing VR to apheresis draws doesn't create more work for phlebotomist staff. In fact one phlebotomist can draw several apheresis donations at once because the process provides an hour between needle placement and removal as idle time. So while we increase yield per donor, we also reduce the busywork of the phlebotomy team, introducing new cost efficiencies into the clinic processing time overall.

Consumer grade VR headsets have now decreased in price to the level that it would be easy to give every donor an IMAX-like experience of a movie or TV show for every 2 hour donation. To test the potential for our proposed service, we conducted two surveys. We started with a survey of existing donors to see if they would be more inclined to attend a clinic that offered VR as an option. (We were cautious not to introduce an element that would make people visit the clinic less.) We found that most existing donors wouldn’t be more-compelled to donate just because of the VR offering. They already have their own convictions to donate. Yet one quarter of respondents claimed they’d be more inclined to donate at a clinic where the option existed rather than a clinic that did not offer VR. The second survey was for people who hadn't donated yet. There we heard significant interest in the VR enticement, specifically among a younger audience.

Fortunately, we were able to identify several other existing potential collaborators which could make our media strategy easy to implement for blood clinics. Specifically, we needed to find a way to address sanitation of devices between use, for which we demoed the ultra-violet disinfection chambers manufactured by Cleanbox Technologies. If donors were to wear a head mounted display, they would need to make sure that any device that was introduced to a clinical setting had been cleaned between uses. Cleanbox is able to meet the 99.99% device sterilization standard required for use in hospitals, making them the best solution for a blood clinic introducing VR to their comfortainment strategies.

Second, in order for the headsets to have regular updates and telemetry software checks, we talked to ArborXR which would allow a fleet of deployed headsets to be updated overnight through a secure update. This would take device maintenance concerns away from the medical staff onsite as well. Devices being sterilized, charged and updated overnight while they weren’t in use could facilitate a simple deployment alongside the apheresis devices already supplied to hospitals and blood banks through medical device distributors, or as a subsequent add-on.

Using the Viture AR glasses at an apheresis donation

While we hope that our study persuades some blood banks to introduce comfortainment strategies to reward their donors for their time spent in clinic, I’ve firmly convinced myself that this is the way to go. I now donate multiple times a year because I have something enjoyable to partake in while I’m sharing my health with others.

I’d like to thank my collaborators on this project, Roy Tomizawa, Chris Ceresini, Abigail Sporer, Venu Vadlamudi and Daniel Sapkaroski for their insights and work to explore this investment case and business model together. If you are interested in hearing about options for implementing VR comfortainment or VR education projects in your clinic or hospital, please let us know.

 

For our service promotion video we created the following pitch which focuses on benefits the media services approach brings to blood clinics, dialysis clinics and chemotherapy infusion services.






Special thanks to the following companies for their contribution to our research:

Quantic School of Business & Technology 

Vitalant Blood Centers

Tripp VR

Cleanbox Technologies

Viva Vita

Abbott Labs 

International VR & Healthcare Association

VR/AR Association

Augmented World Expo