Monday, December 23, 2024

Reflections on the evolution toward volumetric photography

During college I read Stephen Jay Gould’s books on natural history in the animal kingdom with fascination. He writes extensively on the many different paths distinct species took to develop eyes through successive enhancements in different branches of the tree of life. The development of eye designs and uses were randomly selected for benefits conveyed over time in enhancing survival traits for those species with them over lesser complex traits of their predecessors as biological competition increased over time. In science workshops as a youth, I designed pinhole cameras simulating the pupil of the eye and enjoyed taking apart old cameras to study how their shutters worked. When planes land, I’d notice inverse images of the ground projecting on the ceiling of the plane’s cabin, like a retina image, through the pupil-like windows. I’d study and ponder about General Relativity, the cosmic limits of light’s speed, and its implications about the nature of the cosmos and its origins.

With this obsessive fascination with light and seeing, it makes sense that I’ve been an avid photographer for decades. I always look out for the newest tools to capture and save light reflections of the places I travel. Now that stereoscopic and 360 cameras are coming into the mainstream, my hobby as a light collector has had to shift accordingly. My 2D photos can be conveyed well in my photo books. However these newer forms of photographic media need to be shared and enjoyed with different means of re-reflection, depending on the nature of the scene captured. Photospheres of 360 cameras are best experienced in spherical projections, like a planetarium, which can only be seen in a head mounted display. (HMD) Stereoscopic depth images can be viewed on both 2D and 3D flat screens, but are also better appreciated in HMDs. Now the cameras to capture these new scenes outnumber the volume of HMDs sold in the consumer market. So the audience for these 3D scene captures is likely limited to those who craft them and share them peer-to-peer.

This will shift as more companies popularize their own new form factors for head worn displays and glasses, be they pass-through visors offering augmented views of the normal world (Google Glass, Snap Spectacles, Magic Leap, Xreal) or dual purpose with both AR and VR modes (Pico, Quest, Vive, Varjo, Vuzix, Vision Pro, Moohan, etc.) As you can see with a plurality of different devices people have to choose from, it’s clear there needs to be a common platform for photographers and motion capture enthusiasts to distribute their media. App based content distribution would be really limited to address the full breadth of the markets’ device form factors. But they currently bridge us to the future of unfettered browser platform support. It is clear that web based utilities are necessary to facilitate ubiquitous across the panoply of devices. This is what makes me very excited about the cross-device web coding standard WebXR, which is now supported across all major web browsers.

WebXR hosting and streaming will be the easiest path for content creators and photographers to give access to their captures and artistry. Affordable and comfortable HMDs or flat-screen stereoscopic displays (made by Leia or Sony for instance) are going to gradually scale up to supporting mass audiences who today can only experience this media on a borrowed or shared device. With Apple announcing official support for WebXR in Safari last year, and Google’s recent announcement of their renewed investments in 3D rendering with Android XR operating system for HMDs, the ease of access to experience spatial depth pictures and 3D movies will soon be broadly accessible on affordable hardware. Many people may not particularly want to see photospheres of Italian architecture or watch action movies, like Avatar, streamed in 3D depth because of being susceptible to motion sickness or vertigo. But for those who do, cost of access is soon not going to be such a limiting factor as it has been over the past decade. Price points for HMDs and 3D flat screens will fall due to increased manufacture scale, lower component cost and increased competition. Now that the hardware side of access is starting to become affordable, the content side is a new opportunity space that will grow in coming years for novice creators and photographers like me.

Last year marked my first leap into volumetric photography and videography. I’d read about it for years, but due to lack of good public sharing platforms, hadn’t taken the leap to sharing anything of my own. There are a few compelling applications in HMD app marketplaces for peer sharing of stereoscopic video streaming, 360 and wide angle photographs. But what I’m really excited about now is volumetric scanning of public scenes. This is yet another spin on photography that allows a feeling of presence in the space photographed because of distributed perspective of the light field of the scene, meaning different angles of reflection on all points of space inside the rendered volume. These aren't captured by one shutter click at one moment of time, but multiple perspectives over a span of time, distilled into a static scene that can be navigated after capture. In volumetric captures, you can walk through the space as if you were there in the original scene. Your camera display, or your avatar in HMDs, will show you different positions in space as areas of a point cloud look from any particular perspective you navigate to within the still spatial image.

This kind of volumetric capture was formerly only accessible to professionals who used lidar (laser reflection) cameras to generate archeological, geological or municipal landscape scans leveraging moving cars, satellites, drones or with large camera rigs. This approach was used to create Nokia Here Maps (now branded Here), Keyhole (now branded Google Earth) as well as satellite maps of Earth’s crust that reveal archeological and geological formations such as this scan of the Yucatán peninsula, which revealed stone structures underground of a previously undiscovered Maya city.

Lidar scan of Maya city in Campeche region (courtesy BBC)

Years ago I had my first experience in a simulated archeological site when my brother-in-law gave me a tour of a reconstructed dig site in stereoscopic simulation. Thereafter, I sought out the founder of the Zamani Project, a team of archeologists who scan ancient anthropology sites in Africa and the Middle East to preserve their structures for remote study. Would it be possible to make these structures discoverable on the web I wondered? I had explored the ancient cave city of Petra by foot and wandered the vast cities of Chichén Itzá and Uxmal in the Yucatán on my photographic expeditions. I wanted to go back to them virtually. But there isn’t currently such a means to do this. Online map versions of these amazing sculptural sites were rendered as flat photographs. Then I learned more about aerial photogrammetry. In contrast to lidar scans of Zamani Project, photogrammetry allows a rough topography to be captured and synthesized in a computer aided design program such as Capture Reality. It's much less expensive than lidar scanning. Upon meeting drone photogrammetry experts, I tried to price out a project to put Chichén Itzá on the web-map in a way that people at home could experience these archeological sites remotely. Cost wasn’t a particularly high barrier I learned. Even I could afford to do it as a civilian enthusiast it seemed. The challenging part was the authorization to pilot drones over an archeological site and subsequently republish them which needs to be obtained from local government.

I’d first discovered photogrammetry when Microsoft announced its Photosynth service over a decade ago. The same way our eyes assemble our comprehension of 3D depth using optical parallax of our two eyes' blended perspective, Photosynth would assemble photos of hundreds of perspectives to infer the geometry of large spatial objects. Though stitching of photos into a 3D simulation can be somewhat expensive in terms of compute time, the increase in cell phone computing power of today means that the processing of these images can be offloaded to the image-capturing mobile phone instead of the more expensive centralized rendering of on a server. Over the past couple of years I followed the emergence of neural radiance field research, abbreviated as NeRF, which was re-invigorating the approach of 3D map views  accessible to the public. (Read about the photo stitching process demonstrated in the Block NeRF research coming out of UC Berkeley and Alphabet company Waymo which assembled millions of photos taken from self driving cars to create a navigable map of the city formed only from stills taken from the onboard cameras. Note also that moving objects and people's ephemeral presence is abstracted out of the still images over time as their transience is removed from the underlying point cloud of the city. Privacy protection is therefore a useful attribute of scene capture over time as well.)

Last year I took online classes for photographers to learn on how to do photogrammetry scans and self publish them. These “holographic” images are called Gaussian Splats. This is the newest approach to doing photogrammetry to assemble sculptural depictions of 3D landscapes. The teacher of my course was Mark Jeffcock, a photogrammetry specialist based out of the UK who captures amazing landscapes and architecture using apps that allow the export of Gaussian Splat images to be uploaded for web access. (See his amazing capture of a sculpture he titles Knife Angel hosted on Arrival Space.) 

Knife Angel by Mark Jeffcock
Though spatial photos and videos using parallel cameras is just getting started in the mainstream, I anticipate this new standard of volumetric capture is going to keep pace with the others as an engaging way for VR HMD users to get into sharing real world settings with each other. The question is how to make these volume captures broadly accessible. Both Arrival Space and Niantic are jumping into the opportunity to offer peer-to-peer hosting of Guassian Splats. (There is a hint that Varjo and Meta may eventually introduce peer-to-peer sharing of Gaussian Splats in the future, though probably only within an app contained and logged-account context.)

If you are keen to explore some Gaussian Splats in your own HMD or 2D browser, I encourage you to visit Arrival Space to see some of the community scans that are shared by the 3D photographer community in the galleries there. Though I am just a beginner at volumetric captures you can start off in my gallery to see how I tell stories with the scans I make. Creating Gaussian Splat scene renders takes a lot of time, as areas of the scene will appear blurry when the photographer hasn't dwelt long enough on any certain portion of the scene. I still remember during my classes last year when Mark said, "This is a really good capture. But you forgot to point the camera at the ground." Because of this, our class was standing in a Gaussian Splat capture that had nothing to stand on. Becoming a true 3D photographer means that we have to think like cinematographers, capturing how a scene draws the eye over time. Or we need to use a camera lens broad enough to capture photons from angles that we typically ignore when we frame a photo. (Fish-eye lens splat capture is currently being researched, by a company called Splatica, which takes full fish-eye movie footage of a scene to render the 3D still capture.) I anticipate that in the coming year, dozens of new cameras and new hosting platforms will emerge to address peer-sharing of amateur photography. If this media captures your imagination, then this is a perfect time to become one of those photographers. If you have ambition to try your own hand at creating Gaussian Splats, start off by trying Scaniverse which allows easy exporting of the ply/spz 3D file types necessary for uploading to your own personal galleries on Arrival Space. I encourage you to get out and explore aspects of your environment and culture that you can share with others across the internet now that WebXR makes it possible for us to share spaces with those far away.

Note that if you are seeking to explore my travel scans in 3D spatial depth, you'll need to open that URL in a 3D viewer or an HMD browser. You'll have a default avatar to explore these spaces and talk to other people. You can change that with a login. Get started at https://arrival.space/ with your own personalized URL.

For in-depth reading on Gaussian Splat capture, see more from New York Times R&D team.

No comments:

Post a Comment