Sunday, January 5, 2025

Resurrecting the third dimension from the second

I remember my first time witnessing a hologram. In my case it was the Haunted Mansion ride at Disney World when I was six. As the slow-moving roller coaster progressed, we peered down on a vast ballroom beneath us with dozens of ghostly figures appearing to dance in front of the physical furniture that adorned the room.

Haunted Mansion Walt Disney World
Walt Disney World - Ghosts in the Haunted Ballroom

I asked my brother to explain how the illusion worked. He had a book on optical illusions that we would pore over, fascinated and bewildered. How does that still image appear to be moving, I wondered? I could see the two distinct parts of the illusion if I covered either of my eyes. But it appeared to come alive and move with depth when seen with both eyes open as my brain assembled the parts into a synthesized whole. In elementary school, I went to a science museum to learn more about holographic film. Holographs are easy to create in static film. It’s the perspective of having two eyes with an extrapolated sense of spatial location of what we see with each eye that creates the illusion of depth between two contrasting images or angles of a single image. 

In Disney’s example, a large wall of holographic film stays still while the audience moves by it. Each person's shifting perspective as the ride moves forward reveals a different view through the film. Though all the patrons are seated in a shared audience position, the image they each see is a different stage of motion of the ghostly characters as they pass by the static holographic film. The motion of the roller coaster at each moment in time is what animates the scene. Disney had created an industry of moving film in front of people who sit still. In this illusion, the Disney team had inverted the two. They animated the motion of the audience, while the projection remained still. This form of holography is very expensive, but it makes sense at scale, like an amusement park ride. The other way is to give holographic lenses to people to wear as glasses in a theater. The illusion can be created by different gradients between those two, film near to the user or progressively further away. Bifurcating the views between each eye in a way that the brain decides to merge them as a spatial volume is the real trick. Recently intermediate distance screens for rendering stereoscopic views are coming to the consumer market. In these interactions, you don't have to wear glasses because a film on the surface of the display creates the split image perspective. The 3D displays of Sony and Leia achieve their optical illusions on the surface of the computer screen, positioned within a yard of the viewer, extrapolating where the viewer's head is positioned, thereafter flashing the two different images intermittently in different directions. 

I am fascinated by content marketplaces and developer/publisher ecosystems for the internet. So I and my colleagues spend a lot of time talking about how the content creator and software developer side of the market will address content availability and discoverabilty now that stereoscopic head mounted displays (VR and AR headsets or 3D flat screens) are becoming more mainstream. The opportunity to deliver 3D movies to audience primed to enjoy them is now significant enough to merit developer effort to distribute software in this space. Home 3D movie purchase and viewing is currently constrained by the cost of requiring at once the 3D player + 3D screen + 3D movie. A simplified content delivery process over the web or a combination of those three elements will produce lowered cost, broader availability and greater scale of addressable audience.

Hollywood has been making stereoscopic content for cinematic distribution for decades and will continue to do so. Those budgets are very large and some of those budgets are trickling down to architectures that will make the overall cost drop for other content creators over time. As an example, Pixar’s 3D scene format "Universal Scene Description" is now open source and can be leveraged for holographic content creators in mobile and desktop computers today. USD support is now embedded in iPhone’s ARKit such that 3D holographic streaming on everybody’s computers is just a hand’s reach away if the content distribution network were in place.

What can we do to bring 3D depth back to formerly released media? I remember my father taking me to see a 3D rendering of Creature from the Black Lagoon from the 1950s. Each frame of the movie was slightly offset with a blue halo to one side and a red halo to the other side. When we sat in the theater wearing “anaglyph 3D” viewing glasses that had red and blue lenses, our minds would think they were seeing depth as the lenses adjusted to the length of the red and blue shading around the actors and landscapes on the flat movie screen. Filming that movie required the actors to be stereoscopically filmed with two cameras so that it could be rendered to theaters that provided audiences with either anaglyph 3D or polarized lens 3D glasses.

However, today creating scenes that appear slightly different from each other, offset by 3 inches of interpupillary distance of eyes for instance, is is a tactic achievable with radiance field photography. (Read more about radiance fields and the formation of novel views here.) Leveraging neural radiance fields allows a scene, that is otherwise a still photograph, to be wiggled side to side generating a slight senses of depth perception. An alternative approach is to artificially generate the interpupillary offset by entity isolation artificially in the interpreted image leveraging a process in the graphical processing unit (GPU) of the local device. AI rendering capabilities could enable us to synthesize depth into older movies that lacked stereoscopic cinematography previously. 

Depth-simulation in the gaming sector captivated my colleagues a few years back when we discussed the process of using a game engine to re-render a legacy video game on a PC to synthesize an artificial depth perspective. (See Luke Ross R.E.A.L VR Mods to understand how this effect is achieved.) PC modification of HDMI signal output is a relatively simple trick of flickering different perspectives very fast between each eye in a VR headset. The spatial layout of a game environment is previously coded by the game developer with the position of the player and nearby objects interpreted at game play as the coordinates of motion are communicated by the player and sent to the player's display as a flat image. The game works just fine if there are two cameras inserted into the rendering instead of just one in the original game design. This allows a whole trove of legacy 2D games to be experienced anew in a different way than players had first enjoyed them. Currently there is a very small market subset of PC-gamers who are opting for this intermediary layer re-rendering of flat screen games to enjoy them deeply in VR. "Flat2VR" community has a discord channel dedicated just to this concept of game modding old games with some developers offering gesture mapping to replace former game controller buttons. The magic of depth perception in modded PCVR games happens on the fly on the player's computer for each game by having the game engine render two scenes to a VR headset's dual ocular screens. It requires an interpreting intermediary layer that the player themselves has to install. 

In theory, the intermediary/interpretive layer that inflates the flat scene to be depth-rendered dynamically on the player's screen is not exceedingly complex. But it does require extra work from the GPU that encodes the two outputs to the player's screens. A similar feat couldn't be done easily by a mobile device or an ordinary game console. However, a set-top box could, in theory, render other types of media beyond games to depth perspective on the fly just as the game engine approach does for old games. The same alternating-eye-method used in Flat2VR game modding could introduce fake sense of volume perspective into the background of a movie even without a leveraging a 3D game engine.

Apple has introduced a new capability to infer spatial depth back into 2D photographs we have taken in the past, as individual users. I enjoy viewing my photographs afresh with in 3D interpreted depth perspective in my Vision Pro even though the pictures were taken 20 years ago in 2D. These aren’t holographic renderings because the original depth of field wasn't captured in the pixels. The rendered image is re-synthesizing estimations of what the original scene looked like with depth added based on distinctions between subject and background elements rendered on the device trained by a machine learning process across thousands of other photographs. Apple didn't need to receive a copy of the picture (in this case of my brother standing in front of St. Mark's Cathedral in Venice) to know how far in front other people my brother was standing at the time of the photo. The position of his feet appearing in front of the flat ground behind him let the machine learning algorithm extrapolate his position spatially by assuming the ground is flat and that he was closer to me than the other people who appear smaller in the flat image. These photos are generally more fascinating to stare at than the originals because they lure the eyes to focus on different levels in the background of the original photo instead of just resting immediately on the initial subject.

Depth inference by an machine learning could also achieve the same effect for older 2D movies from the last century by the same means used estimate depth in stills. This means a treasure trove of 3D movies could await at the other end of our streaming web services if they could be rerendered for broadcast by an depth analysis layer to the stream or prior to streaming, similar to the real time Flat2VR depth-rendering display mods.

Whether our industry moves forward by machine learning re-rendering of back catalog 2D movies for mass consumption on a central server (leveraging a new distribution methods) or whether we take the path of doing depth-rendering in a set-top box approach, for all content already delivered to the home dynamically, remains to be demonstrated and tested in the marketplace. In a market constrained by the availability of HMD and stereoscopic flat screen adoption, the latter approach makes more sense. Yet the decreasing cost to manufacture these new screens may mean studio efforts to focus on depth-rendering of an entire back catalog of films makes sense for new dedicated 3D streaming channels across a diaspora of stereoscopic display options. This mode of development and distribution would be possibly parallel what is already happening in the Flat2VR gaming community. Over just 2 years since my following the sector, I've seen several major game studios jump into re-releasing their own titles as dedicated VR games, sometimes with almost no embellishment beyond the story of the original game other than depth, yet unlocking new audience access for the old titles just by doing so. However, gaming market moves much faster than mainstream TV media as gamers are already fully bought into the expensive GPUs and HMDs that make the art of 3D rendering a minor additional expense. My estimate is that it will take only one decade to make 3D viewing of all legacy 2D content an obvious and expected path for the consumer market. We may look at legacy 2D movies as a diminished experience in the near future, wanting to have our classic films embellished to show depth the way that black and white movies are now being released as colorized versions on web streaming channels. 

Having a movie studio re-render depth into all legacy content is expensive. But it makes sense at scale, rather than having all consumers buying a set-top converter to achieve the same ends. It's a question of how many people want to benefit from the enhanced means of viewing, and who we want to pay the cost of achieving the conversion. We have 2D movies now because it was the easiest and cheapest means to achieve mass media entertainment at scale at the time. Now as we reach a broader distribution of 3D displays, we approach the moment where we can re-inflate 3D into our legacy creations to make them more similar to the world we inhabit.