Tuesday, December 24, 2024

What does it want to say? (An approach for chasing bugs)

 Long ago when I was studying French, I came across an idiomatic means of expressing a question about a word or thing. If you want to ask a French speaker to help you with defining a word’s meaning, you can ask, “What does it want to say?” (Qu'est-ce que ça veut dire?) I liked the idea that the word actually is personified as a willful entity in this question’s framing. The word has a desire or a will that needs to be considered. Over time, I used to think of technology problems in this way. If a computer wasn’t functioning properly, I’d frame it as “What is the computer wanting to say/do? And how am I seeing it try to do that? And what’s the result? If you ever have a strange behavior of an app or computer process, walking through the steps to diagnose how they communicate will sometimes help you troubleshoot. By narrowing down the steps in their way of reasoning/acting, you can often isolate the problems and facilitate the process such that the device can achieve it’s goal, which is a proxy for your goal in using the device.

If your phone or computer wants to find an internet connection. Is there one? Is it LAN cable, wifi, LTE, Edge, 3G, 5G? (Each is slightly different in the process of connecting and what volumes of data it can transmit after all.) If it your device “wants” to use that internet connection to reach a website to refresh its content, what happens or doesn’t happen when that request to the server returns a response? Gradually from the keyboard to the screen to the operating system to the network to the connected service provider, you can observe each step in this flow. When I worked in Tokyo providing search engine services to internet portals, my clients would sometimes call me on the cell phone asking me to diagnose an issue. I could run an isotope query or trace route through the network to test their servers’ response and connection to my company’s servers. It’s like the game of Pooh sticks from the Winnie the Pooh stories. If you drop a stick on one side of a bridge, you can watch it come across the other side of the bridge and have races with your friends, or bears, to see whose sticks come out on the other side the fastest. This is the same process as doing device level troubleshooting. Who is saying what how? Your email is stuck in your out box for instance?

I recently had to troubleshoot a perplexing issue. My father so enjoyed the process of going through this with me that he asked me to write about it. He could find no web documentation of the issue he’d encountered as he was experiencing it. My father worked as a systems engineer for many years at IBM and wrote programs in dozens of languages from early mainframe computers to modern day Macs. He knows pretty much every trick that a Mac can do and has followed computer web forums for years to expand his understanding of how the Apple operating systems had shifted from pre system 10 (aka OSX), through the AMD and Intel chip phases to modern “Apple Silicon” chips.  So him reaching out to me about a technical bug is somewhat of a rare thing. It was usually the other way around. But I was able to narrow down the observable symptoms to several potential root causes until finally figuring out the bug as a network issue rather than an operating system or hardware issue. In case this issue affects your home computer/network, or if you’re just curious the steps involved in doing this, here is how we sorted it out.

My father’s computer had a peculiar symptom that he’d never seen before in 30+ years of working on Macs, that I’d also not seen.  Inside perspective, the symptoms were that he couldn’t access bank websites on his new Mac devices. He couldn’t do a phone-home system re-install from Apple servers nor contact Apple support from the machine’s integrated communication system. The symptom didn’t happen on his wife’s older operating system on an Intel-chip Mac. So he narrowed down his conclusion that it was either his Apple Silicon Mac or the operating system updates, which had recently been upgraded, which were potentially the source of the issue. Outside perspective, his FaceTime availability disappeared for me when networking to him. I couldn’t call him with Voice over IP tools. So I expected something was wrong with his ID account or potentially a compromised account. Because I’d read of phishing tactics in the press and how to avoid them, I started by triaging if there was a potential malware issue with his machine. It didn’t seem that was the case. Everything else on his Mac worked, except for those apps that required web resources to be fetched securely by an internal web-dependent function. He checked with his banks to ensure there was no suspicious activity in his accounts nor attempts to reroute or reset his bank logins recently. We established a secondary channel of communication and assured that his account phone numbers had not been redirected either. Once we sorted out that he wasn’t in any particular immediate risk, we took to testing and ruling out other potential issues.

Source: Wiki Commons
Cutting to the chase scene: Ultimately, after ruling out as many factors as we could, we isolated the issue as being an IPV6 problem. Over a decade prior I had attended a lecture on how the web industry was transitioning to a process of generating IP addresses for the planned future of broader range of computers coming online over future decades. IPV4 process for issuing IP addresses for devices, to identify themselves as unique entities across the web, was going to reach a scaling issue akin to the old Y2K issue that took place at the turn of the 21st century. (More information about that elsewhere as it’s analogous but not directly related, as it had to do with date formats not device namespace.) The new IPV6 process of self-identifying devices over a network would use a much wider range of values than IPV4 addresses, meaning that there would be lower risk of any two devices being confused with each other and creating network conflicts due to simultaneous inbound connection requests. This lecture was deep in my mind, but the memory was triggered because of a comment someone made about the IPV6 transition resulting in higher security of networks in the future. It was relevant here because my father’s computer was somehow communicating in a way that wasn’t being accepted by banks and Apple itself. Could there be a difference in how Intel-chip Macs and their operating systems convey TLS (Transport Layer Security) traffic over the web? Sure enough, that was the issue. We found that banks were accepting the IPV4 traffic from the older Macs. But the newer Macs and their respective OSes were trying to transmit IPV6 values which weren’t getting through the network. Once we configured his network to route IPV6 values generated by the OS (and now we suspect are anticipated by banks) his computer, browsers and applications started functioning flawlessly again. You can read much more in depth about IPV4 and IPV6 elsewhere, but suffice it to say that there was nothing wrong with his Mac. It was the attempts to communicate secure device-unique values over the network that were failing.

I hope that you don’t run into a network routing problems like he had. Bank customer service, Apple customer service and even your internet provider may not be familiar with your home computing or network setup. The also may have difficulty understanding what issues you face based on how you describe the problem. But tracking down how the symptoms represent, will help you communicate with them to resolve whatever challenges you face.

We delegate our processes to these device "agents" that act on the web on our behalf. Just like humans, they can get tripped up on the way to saying things, or the channel through which to express them. Like studying a foreign language, we can examine the terms our agents use to help them communicate for us more effectively. When their speech breaks down, we have only to examine the vocabulary and steps they use to get across their "meaning" and thereby return them to functioning eloquently on our behalf.

For more on IPv6 see: https://en.wikipedia.org/wiki/IPv6

Special thanks to the Las Vegas Consumer Electronics Show for offering the lecture on IPv4 vs IPv6 that set us on the right track in this particular case. For those who want to follow the troubleshooting steps we used, the important clues and conclusions and the route to isolating the problem behavior:

  • Important steps in the investigation were first the non-working FaceTime VOIP service and his device’s inability to connect to Apple servers. Not only could he not access Apple’s network, Apple’s network could not reach him either. It was a two way problem across multiple applications, but non-sensitive web traffic was unhindered.
  • Testing the IP address configuration was the main key to resolving it. My computer registered an IPV6 address when querying https://whatismyipaddress.com from outside his home. His computer registered an IPV4 address but not an IPV6. Then, when I tested my other computer in his home environment, my computer then suffered the same issue as his. (I used a more recent beta version of MacOS than he did.) Replicating the bug with a different machine on a different version of the OS conclusively proved that the network was the gating source of the problem.

Questions and steps in our exploration narrow down to the answer:

  • Had his IP addresses been flagged as a phishing or malware source, leading to banks blocking traffic? (Confirmed not.)
  • Operating System issue? Try a fresh install of a base operating system. (Not possible in his case because Apple silicon OS doesn’t allow boot from terminal mode on an external machine the way Intel based Macs did. Both of his Macs couldn’t revert to his wife’s OS because of OS incompatibility between Intel and Apple silicon versions.) 
  • Browser issue? He was accessing banks via several browsers, all failed, while regular non-logged-in sites would function fine. (This means there was not a browser-dependent issue causing the problem. But because secure sites were failing to load, including Apple, it made me suspect Transport Layer Security over TCP/IP was the problem of some kind, which it ultimately turned out to be.
  • Cable internet access restrictions? Because some cable internet providers give parental controls, I suspected something that Comcast had done could have rolled out a traffic throttling limit for some accounts or  to all customers in a region inadvertently. Did any of his friends who used this provider complain of loss of access? (It was just him. Specifically it was his newer Macs on the newest operating system release.)
  • We reset the DCHP settings of his computer to no avail. 
  • Finally we bypassed his router and thereby resolved the issue. We resolved to let his router be used for non-sensitive traffic around the house, but not sensitive or secure traffic.

Sunday, October 15, 2023

Using VR comfortainment to bring an end to the US blood supply shortage

I conducted my MBA during a fascinating time in our world economy. We’d endured through a pandemic that shut down significant portions of our economy for nearly a year followed by surging interest rates as government response to the pandemic resulted in significant inflation and subsequent layoffs in my region. While this was a dramatic time for the world, it was a fascinating time to return to academia and evaluate the impacts to the global economy of natural and artificial stimuli.

For our masters thesis we were asked to identify an opportunity in the economy that could be addressed by a new business entrant. In discussing with several of my MBA class cohort, we decided to focus on the blood supply shortage that resulted from the end of the pandemic. Why would the US go into blood crisis at the end of the pandemic we wondered. Shouldn’t that have been expected during the peak of the pandemic in 2020 or 2021? But it turns out that during the pandemic surgeries and car crashes dropped at the same time that blood intake to the supply dropped. It was only after the pandemic ended that supply and demand got out of sync. In 2022 people started going to hospitals again (and getting injured at normal rates) while the blood donor pool had significantly shrunk and not recovered its pre-pandemic rate of participation. So hospitals were running out of blood. What's more concerning is that it looks as if the drop in donor participation isn't a short term aberration. Something needs to shift in the post-pandemic world to return the US to a stable blood supply. This was a fascinating subject for study.

As we began our studies we interviewed staff at blood banks and combed through the press to understand what was taking place at this time. There were several key factors in the drop-off of donors. Long-Covid had impacted 6% of the US population, potentially impacting willingness to donate among those individuals who’d participated before. (Even though blood banks accept donations from donors who have recovered from Covid, the feeling that one's health is not at full capacity impacts the sentiment one has about passing on blood to another.) At the same time there was a gradual attrition of baby boomer generation leaving the donor pool while younger donors were not replacing them due to generational cultural differences. Finally, the new hybrid-work model companies adopted post-pandemic meant that blood-mobile drives that took place at companies, schools and large organizations could no longer receive the same turnout for blood drives that had formerly taken place at those locations.

The donation pool we’ve relied on for decades requires several things. So we tried to identify those aspects that were in the control of the blood banks directly:

  • First, an all-volunteer unpaid donor pool requires a large number of people in the US (~7 million) willing to help due to their own internal motivations and having the ample time to do so. Changing people’s attitudes toward volunteerism and blood donation is hard to do while marketing efforts to achieve this are expensive. In an era when more people are having to work multiple jobs, the flexibility to volunteer extra time is becoming constrained. There is likely going to be an ever worsening trend of time scarcity among would-be donors in contrast to the pre-pandemic times.
  • Second, there needs to be elasticity in eligible donor pool to substitute for ill would-be donors in times of peak demand. Fortunately, this year FDA has started expanding eligibility criteria in reaction to the blood crisis, permitting people who were previously restricted from donating to participate now. However, this policy matter is is outside the control of blood banks themselves. Blood demand is seasonal, peaking in winter and summer. But donors are consistent and are difficult to entice when need spikes due to their own seasonal illnesses or summer travel plans.
  • Third, and somewhat within the control of blood banks, is in-clinic engagement and behaviors. Phlebotomists can try to persuade upgrades in donor time during donor admission and pre-screening. This window of time when an existing donor is sitting in clinic is the best time to promote persistent return behaviors. Improving the method of how this is achieved is the best immediate lever to bolstering the donor pool toward a resilient blood supply. But should we saddle our phlebotomists with the task of marketing and up-selling donor engagement?

Considering that there is no near-term solution to the population problem of the donor pool, we need to do something to bolster and expand the engagement of the remaining donors we have. In our studies we came across several interesting references. "If only one more percent of all Americans would give blood, blood shortages would disappear for the foreseeable future." (Source Community Blood Center) This seems small. But currently approximately 6.8 million Americans donate blood, less than 3% of Americans. So it's easy to see how a few million more donors would assuage the problem. But the education and marketing needed to achieve this end would be incredibly expensive, slow and arduous to achieve. It’s hard to change that many minds in a short time frame. Yet this comment from the same source gave us an avenue to progress with optimism: "If all blood donors gave three times a year, blood shortages would be a rare event. The current average is about two." We agreed that this seemed like a much more achievable marketing strategy. In our team calls, Roy Tomizawa commented that we need to find something that makes people want to be in the clinic environment beyond their existing personal motivations for helping others. He suggested the concept of “comfortainment” as a strategy, whereby people could combine their interest in movie or TV content with time they’d sit still in the clinic for blood donation, dialysis or other medical care. If we were to transform the clinic from its bright fluorescent-lit environment into a calm relaxing space, more people may wish to spend more time there.

As a life-long donor, I've heard a lot of promotions to increase the frequency of donation while in clinic. But during intake so many things are happening. 1) FDA screening questions, 2) temperature check, 3) blood pressure measurement, 4) hemoglobin/iron test, 5) verbal confirmation of no smoking or vaping. This battery of activity is an awkward time for phlebotomists to insert promotional campaigns on increasing engagement. One day I noticed some donors were doing something different in the blood bank and I asked about it. Then I was informed how the blood apheresis process differs from whole blood donation. It involves the use of a centrifuge device that can collect more of a specific component of blood product at time of draw from a single donor then returning the rest of the blood to the donor. Not only does this yield multiple individual units of blood per draw, the recovery time between donations is shorter. Whole blood donations require 2 months of time for the donor to replenish their blood naturally before another whole blood donation. Apheresis donors lose less of overall blood and can therefore return more often. The only downside of this is that it requires more time from the donor in-clinic.

Because apheresis was the most flexible variable that blood banks could impact as demand and supply waxed and waned, our study zeroed in on optimizing this particular lever of supply to address the blood shortage. In a single blood draw via apheresis, a donor can provide 3 units of platelets, compared to whole blood draws. This allows the blood bank to supply three units immediately after draw to hospitals instead of having to use a centrifuge on post-donation pooled units of whole blood from multiple donors. Platelets are uniquely needed for certain hospital patients in the case of cancer patients or among those with blood clotting disorders. Regarding other blood components, an apheresis blood draw can provide 2 times more red blood cells than what would otherwise be donated as whole blood. At the same time that a donor is providing platelets, they may also provide plasma in the same draw, which provides leukocytes which can help patients with weakened immune systems by providing natural antibodies from healthy donors.

Hearing all this you might think that everybody should be donating via apheresis. But the problem with it is the extra time needed, an additional hour of donor time at least. A donor planning to donate for just a 15 minute blood draw may be reluctant to remain in apheresis for one to two hours, even if it triples or quadruples the benefit of their donation. Though this is one factor that can be immediately augmented based on the local hospital demand, asking donors to make the trade off for the increased benefit can be a hard sell. 

When I first tried apheresis, I didn’t enjoy it very much. But that’s because I don’t like lying down and staring at fluorescent lights for long periods of time. Lying on the gurney for 15 minutes is easy and bearable. Having phlebotomists try to persuade hundreds of people to change their donations to something much more inconvenient is a difficult challenge. Some blood banks offer post-donation coupons for movies or discounts on food and shopping to promote apheresis donations. My team wondered if we could we bring the movies into the clinic the way that airlines had introduced movies to assuage the hours of impatience people feel sitting on flights. Having people earn two hours of cinema time after donation by sitting still for two hours in clinic begs the question of why you couldn't combine the two together. Donors could watch IMAX films at the clinic when they'd plan to be immobile anyway!

We interviewed other companies which had launched VR content businesses to help people manage stress, chronic pain or to discover places they may want to travel to while they're at home. We then proceeded to scope what it would take to create a device and media distribution company for blood banks to entice donors to come to the clinic more often and for longer stays with VR movies and puzzle games as the enticement. Introducing VR to apheresis draws doesn't create more work for phlebotomist staff. In fact one phlebotomist can draw several apheresis donations at once because the process provides an hour between needle placement and removal as idle time. So while we increase yield per donor, we also reduce the busywork of the phlebotomy team, introducing new cost efficiencies into the clinic processing time overall.

Consumer grade VR headsets have now decreased in price to the level that it would be easy to give every donor an IMAX-like experience of a movie or TV show for every 2 hour donation. To test the potential for our proposed service, we conducted two surveys. We started with a survey of existing donors to see if they would be more inclined to attend a clinic that offered VR as an option. (We were cautious not to introduce an element that would make people visit the clinic less.) We found that most existing donors wouldn’t be more-compelled to donate just because of the VR offering. They already have their own convictions to donate. Yet one quarter of respondents claimed they’d be more inclined to donate at a clinic where the option existed rather than a clinic that did not offer VR. The second survey was for people who hadn't donated yet. There we heard significant interest in the VR enticement, specifically among a younger audience.

Fortunately, we were able to identify several other existing potential collaborators which could make our media strategy easy to implement for blood clinics. Specifically, we needed to find a way to address sanitation of devices between use, for which we demoed the ultra-violet disinfection chambers manufactured by Cleanbox Technologies. If donors were to wear a head mounted display, they would need to make sure that any device that was introduced to a clinical setting had been cleaned between uses. Cleanbox is able to meet the 99.99% device sterilization standard required for use in hospitals, making them the best solution for a blood clinic introducing VR to their comfortainment strategies.

Second, in order for the headsets to have regular updates and telemetry software checks, we talked to ArborXR which would allow a fleet of deployed headsets to be updated overnight through a secure update. This would take device maintenance concerns away from the medical staff onsite as well. Devices being sterilized, charged and updated overnight while they weren’t in use could facilitate a simple deployment alongside the apheresis devices already supplied to hospitals and blood banks through medical device distributors, or as a subsequent add-on.

Using the Viture AR glasses at an apheresis donation

While we hope that our study persuades some blood banks to introduce comfortainment strategies to reward their donors for their time spent in clinic, I’ve firmly convinced myself that this is the way to go. I now donate multiple times a year because I have something enjoyable to partake in while I’m sharing my health with others.

I’d like to thank my collaborators on this project, Roy Tomizawa, Chris Ceresini, Abigail Sporer, Venu Vadlamudi and Daniel Sapkaroski for their insights and work to explore this investment case and business model together. If you are interested in hearing about options for implementing VR comfortainment or VR education projects in your clinic or hospital, please let us know.

 

For our service promotion video we created the following pitch which focuses on benefits the media services approach brings to blood clinics, dialysis clinics and chemotherapy infusion services.






Special thanks to the following companies for their contribution to our research:

Quantic School of Business & Technology 

Vitalant Blood Centers

Tripp VR

Cleanbox Technologies

Viva Vita

Abbott Labs 

International VR & Healthcare Association

VR/AR Association

Augmented World Expo

Sunday, March 19, 2023

The evolution of VR spaces and experiences

Six years ago, Meta launched the first consumer version of its VR headset, the Oculus Rift CV1. I had my first experience of that new media interface at San Francisco's Game Developer Conference (GDC). Oculus technicians escorted me into a sound-proof dark room and outfited me with the headset attached to an overhead boom that would keep the wires out of my way as I experienced free motion simulated environments that were crafted in Epic's Unreal Engine world-building game architecture. (This is the same developer environment that was used to create The Mandalorian TV series.) The memory of that demonstration is strong to this day because it was such a new paradigm of media experience. As I moved in a simulated world, parallax depth of distant objects shifted differently relative to those objects near. Everything appeared a bit like a cartoon, more colorful than the real world. But the sense of my presence in that world was incredibly compelling and otherwise realistic.

Yesterday I went into a physical VR gym in Richmond, California with a dozen other people to try a simulated journey where we would physically walk on a virtual replica of the International Space Station. It was profound to reflect on how much the technology has advanced in the six years since my first simulated solitary spacewalk at the GDC. The hosts of the event walked us through a gradual orientation narrative like were were astronauts ascending the top of an Apollo era launch tower before we were set free to roam on the purely visual ISS along with brief video greetings from real astronauts, previously filmed, at the exact locations marked by green dots on the map to the right. When we approached the astronauts, glowing orbs showed camera positions that were filmed on the ISS previously. By standing right where the astronauts were in the filming we could see all the equipment and experience what it was like to live on the ISS for those astronauts.

In a recent interview with Wall Street Journal reporters, Philip Rosedale (the founder of Linden Labs) commented, "The appeal of VR is limited to those people who are comfortable putting on a blindfold and going into a space where other people may be present." Here I was, actually doing that in a crowd of people I had never met before. All I could see of those people was a ghostly image of their bodies and hand positions with their gold/blue/green heart beacon indicating their role as fellow VR astronauts, family members for those in a group, or the event staff who kept an eye out for anyone having hardware or disorientation issues with the VR environment. Aside from an overheating headset warning and a couple of times the spatial positioning lost sync with the walls of the spaceship, I didn't have any particular issues. It was very compelling!

Six years ago at GDC, I remember a clever retort a developer shared with me at the unveiling of the Rift CV1. While waiting in line at the demo booth, I asked what he thought about nascent VR technology. He said, “Oh, I think it will be like the xBox Kinekt. At first, nobody will have one and everyone will want one. Then, later, everyone will have one and nobody will want one!” Now, years later, we can look back in retrospect to see what happened. VR didn't reach a very broad market penetration yet because of rather high price of hardware. But when the pandemic shuttered the outside world to us temporarily, many of us took to virtual workrooms to meet, socialize and work. Meta was well positioned for this. Zoom conference calls felt like flashbacks to the Brady Bunch/Hollywood Squares grid of tic tac toe faces. Zoom felt oddly isolating in contrast to sharing spaces with people physically. Peering into people’s homes also seemed a little disturbing. Several engineers and product managers I frequently meet with suggested we switch to VR instead. One of them challenged me to give a lecture in VR. So I researched how Oxford University was doing VR lectures in EngageVR and conducted my own lecture on the history of haptic consumer technology in an EngageVR lecture room. It was challenging at the time getting around lecture slide navigation and simultaneously controlling my spatial experience of appearing as a lecturer in the classroom. But I succeeded in navigating the rough edges of the early platform limitations. (EngageVR has drastically improved since then, introducing customizable galleries and broader support of imported media assets.)

While the experience felt rough at first, I found it much more compelling than using shared slides and grid camera views of the Zoom conference call format. So my colleagues collaborated with me to create a bespoke conference room where we could import dozens of lecture resources, videos, pdfs and 3D images. In this conference room a large group could assemble and converse in a more human-like way than staring into a computer camera. While we gave up the laptop camera with its tag-team game of microphone hand-off, we took up using VR visors where we could see everybody at once, oriented around us in a circle. Participants could mill around the room and study different exhibits from previous discussions while others of us were engrossed in the topic of the day.

I know that people like us are rather atypical because we adopt technology long before the mainstream consumer. But the interesting thing is that years later, even with the pandemic isolation waning, we all still prefer to convene in our virtual conference spaces! It typically comes down to two choices of where we convene. If it’s a large group, we assemble in the lecture hall hosted on Spatial’s web servers. These are fast paced and scintillating group debates where we have to coordinate speakers by hand waving or following auditory cues of interjecting speakers. If it’s four people or less, we use EngageVR or VTime, which allow for a more intimate discussion. Those platforms have us use virtual avatars that, unlike Spatial, don’t resemble our physical bodies or faces. But the microphone handoff of the dialog is very easy to hear natural language auditory cues of speakers.

“Why does this simulated space feel more personal than the locked-gaze experience of a Zoom call?” I wondered. My thought is that people speak differently when they are being stared at (camera or otherwise) than when they have free moving gaze and a sense of personal space. Long ago I heard an interview with NPR radio show host, Terry Gross. She said that she never interviewed her guests on camera, as she preferred to listen closely only to their voice. Could this be the reason the virtual conference room feels more personal than the video conference?

During my years studying psychology, I remembered the idea of Neuro-Linguistic Programming in which author Richard Bandler lectured that the motion of eyes allows us to access and express different emotions that are tied to how we remember ideas and pictorial memories. In NLP’s therapeutic uses, a therapist can understand traumatic memories discussed in the process of therapy based on how people express with their eyes and bodies during memory recollection. Does freedom from camera-gaze permit better psychological freedom in the VR context perhaps?

In lectures and essays by early VR pioneers, I kept hearing references to people preferring the virtualized environment among those who identified as neurodivergent. In my early study of autism spectrum disorder, I had read that one theory if ASD is an over-reaction to sensory stimuli. Often people who have ASD may avoid eye contact due to the intensity of social interaction. In casual contexts this behavior can be interpreted as an expression of disinterest or dislike. Perhaps virtualized presence in VR can address this issue of overstimulation, allowing the participants to have a pared down environmental context. In an intentionally-fabricated space, everything there is present by design.

I still don’t think the trough of the hype cycle is upon us for VR. (Considering the perspective of my developer friend’s theory about the land of VR disenchantment.) First, VR is still too expensive for most people to experience a robust VR setup. The "Infinite" ISS exhibit costs considerably more than watching a IMAX movie, its nearest rival medium. Yet soon Samsung, ByteDance, Pimax and Xiaomi are coming to market with new VR headsets that will drive down the cost of access and give most of the general public a chance to try it. I'm curious to see when we will get to that point of "everybody having it and nobody wanting it." I still find myself preferring the the new media social interactions because they approximate proximity and real human behavior better than Zoom, even if they still have a layer of obvious artificiality. 

A funny thing is that I have a particular proclivity to preferring visits to space for my VR social sessions. After my GDC experience years ago I downloaded the BBC's Home VR app that simulates a semi-passive perspective of an astronaut conducting a space walk. This allowed me to relive my GDC experience, with a surprise twist involving space debris. Then I tried the Mission ISS walk-through VR app that gives users a simulated experience of floating around inside a realistic looking simulated ISS assembled from NASA photographs of the station. Then, when Meta announced its new Horizon Venues platform, I was able to go into a virtual IMAX theater with a gigantic half-dome theater rendered in front of hundreds of real-time avatars of people from around the world to watch 360 videos from taken from the ISS and produced for redistribution by Felix & Paul Studios. And finally this week I was able to visit the Phi Studios physical walk-through. What I like about this progression is that the experience became more and more social. Getting away from the feeling of the movie Gravity, of being isolated in space.

Yet, for the most social experience of all, my friends and I like to go to an artificial simulated space station hovering 250 miles above earth where we can sit and have idle conversations as a realistic-looking model of Earth spins beneath us. This is powered by a social app called Vtime. When I go here with my colleagues, we inevitably end up talking about the countries we're orbiting over and relating experiences that are outside of our day to day lives. Perhaps it takes that sense of being so far removed from the hum drum daily environment to let the mind wander to topics spanning the globe and outside the narrow confines of our daily concerns. In one such conversation, my friend Olivier and I got into a long discussion about the history and culture of Mauritius, his home country, over which we were then flying. Vtime's Space Station location only has 4 chairs for attendees to sit in at a time. So we use this for small group discussions only. If you ever get inspired to try VR with your friends, I recommend trying this venue for your team discussions. It's hard to say what is so compelling about this experience in contrast to gazing at people's eyes in a video conference. But even after the pandemic lockdown subsided and we could once again meet in person, I still find myself drawn back to this simulated environment. I believe when every one of us has access to this, we will come to prefer it for remote-meetings in lieu of the past decades' 2D panel plus camera.




Sunday, October 2, 2022

Coding computers with sign language

I am one of those people who searches slightly outside the parameters of the near term actual with an eye toward the long term feasible, for the purpose of innovation and curiosity. I'm not a futurist, but a probable-ist, looking for the ways we can leverage the technologies and tools we have at our fingertips today to achieve adjacent potential opportunities leveraging those tools. There are millions of people at any time thinking how to address new applications of any specific technology in novel ways to push the technological capabilities toward exciting new utilities. We often invent the same things using different techniques, the way that eyes and wings evolved via separate paths in nature, called convergent evolution. I remember going to a Google developer event in 2010 and heard the company announce a product that described my company's initiative down to every granular detail. At the time I wondered if someone in my company had jumped the fence. But I then realized that our problems and challenges are common. It's only the approaches to address them and the resources we have that are unique.

When I embarked into app development during the launch of the iPhone, I knew we were in a massive paradigm shift. I became captivated with the potential that we could use camera interfaces as inputs to control the actions of computers. We use web cameras to send messages person to person over the web. But we could also communicate commands directly into code as well if we leverage an interpretive layer to communicate as the computer interprets.

This fascination with the potential future started when I was working with the first release of the iPad. My developer friends were toying around with what we could do to extend the utility of the new device beyond the bundled apps. At the time, I used a Bluetooth keyboard to type, as speech APIs were crude and not yet interfacing well with the new device and because the in-screen keyboard was difficult to use. One pesky thing I realized was that there was no mouse to communicate with the device. Apple permitted keyboards to pair, but they didn't support the pairing of a Bluetooth mouse. Every time I had to place the cursor, I had to touch the iPad, and it would flop over unless I took it in my hands. 

I wanted to use it as an abstracted interface, and didn't like the idea that the screen I was meant to read through would get fingerprints on it unless I bought a pen to touch the screen with. I was acting in an old-school way wanting to port my past computer interaction model to a new device while Apple wanted iPad to be a tactile device at the time, seeking to shift user expectations. I wanted my device to adapt to me rather than having me adapt to it. "Why can't I just gesture to the camera instead of touching the screen?" I wondered.

People say necessity is the mother of invention. I often think that impatience has sired as many inventions as necessity. In 2010 I started going to developer events to scope out use cases of real-time camera input. This kind of thing is now referred to as "augmented reality" where the interaction of a computer overlays some aspect of our interaction with the world outside the computer itself. At one of these events, I met an inspirational computer vision engineer named Nicola Rohrseitz. I told him of my thoughts that we should have a touchless-mouse input for devices that had a camera. He was thinking along the same lines. His wife played stringed instruments. Viola and cello players have trouble turning pages of sheet music or touching the screen of an iPad because their hands are full as they play! So gesturing with a foot or a wave was easier. A gesture could be captured by tracking motion rendered through light color shifts on pixel locations of the camera chip. He was able to track the shift of pixel color locally on the device and render that as input to an action on the iPad. He wasn't tracking the hand/foot directly, he was post-process analyzing the images after they were written into random access memory (RAM). By doing this on device, without sending the camera data to a web server, you avoid any kind of privacy risk of a remote connection. So having the iPad think about what it was seeing, it could interpret the input as a command and thereafter turn the page of sheet music on his wife's iPad. He built an app to to achieve this for his wife's use. And she was happy. But it had much broader implications for other actions.

What else could be done with signals "interpreted" from the camera beyond hand waves we wondered? Sign language was the obvious one. We realized at the time that the challenge was too complex then because sign language isn't static shape capture. Though ASL alphabet are static hand shapes, most linguistic concept signs have a hand position shifting a certain direction over a period of time. We couldn't have achieved this without first achieving figure/ground isolation. The iPad camera at that time did not have a means for depth perception. Now a decade later, Apple has introduced HEIC image capture (a more advanced image compression format than JPEG) with LIDAR/depth information that can save layers of the image available, much like the idea of multiple filter layers in a Photoshop file.

Because we didn't have figure/ground isolation, Nicola created a generic gesture motion detection utility which we applied to allow users to play video games on a paired device by use of hand motions and tilting rather than pushing buttons on a screen. We decided it would be fun to adapt the tools for distribution with game developers. Alas, we were too early with this particular initiative. I pitched the concept to one of the game studios in the San Francisco Bay Area. While they said the game play concept looked fun, they said politely that there have to be a lot more mobile gamers before there would be demand among those gamers to play in a further augmented way with gesture capture. The iPad had only recently come out. There just wasn't any significant market for our companion app just yet.

Early attempts to infer machine models of human or vehicle motions would overlay an assumed shape of a body over a perceived entity in the camera's view. In a depiction of a video intake of a driving car, it might be inferred that every object in the field of view represented by a moving object is a car. (So being a pedestrian or biker in the proximity of self-driving cars became risky as object and behavior assumptions of the seeing entity predicted different behaviors than pedestrians and bicyclists exhibited.) In a conference demo on an expo floor, it is likely that most of what the camera sees are people, not cars. So the algorithm can be set to infer body position represented by the assumed skeletal overlays of legs related to bodies, and presumed eyes atop bodies. The purpose of this program pictured below was to be used in shop windows to notice when someone was captivated by the displayed items in the window. For humans near, eyes and position of arms were accurately projected. For humans far away, less so. (The Computer Electronics Show demo did not capture any photographs of the people moving in front of the camera. I captured that separately with my camera.)

Over the ensuing years, other exciting advancements brought the capture of hand gestures to the mainstream. With the emergence of VR developer platforms, the need for alternate input methods became even more critical than the early tablet days. With the conventional technique of wearing head-mounted-displays (HMDs) and glasses, it became quite obvious that conventional input methods like keyboard and mouse were going to be too cumbersome to render in the display view. So rather than trying to simulate a mouse and keyboard in this display, a team of developers at LeapMotion took the approach of utilizing an infrared camera which could detect hand position, then infer knuckle and joint positions of the hands which could in turn be rendered as input methods to any operating system to figure out what they hands were signaling for the OS to do at the same time as they were projected into the head-mounted-display. (Example gesture captures could be mapped to commands for grabbing objects, gesturing for menu options, etc.)

The views above are my hands detected by infrared from a camera sitting below the computer screen in front of me, then passed into the OS view on the screen, or into a VR HMD. The joint and knuckle positions are inferences based on a model inside the OS-hosted software. The disadvantage of LeapMotion was that it required an infrared camera to be set up and for some additional interfacing challenges through the OS to the program leveraging the input. But the good news was that OS and hardware developers noticed and could pick up where LeapMotion left off to bring on these app-specific benefits to all users of next generation devices. Another five years of progress and the application of the same technology in Quest removes the x-ray style view of the former approach with something you can almost infer as realistic presence of one's own hands.

 
Hololens and Quest thereafter merged the former external hardware camera into the HMD directly facing forward. This could then send gesture commands from the camera inputs to all native applications on the device, obviating the need for app developers to toil with the interpretive layer of joint detection inside their own programs. In the Quest platform, app developer adoption of those inputs is slow at present. But for those that do support it, you can use "Hands API" to navigate main menu options and high-level app selection. A few apps like Spatial.io (pictured above) take the input method of the Hands API and allow the use of the inferred hand position to replace the role formerly filled by hardware controllers for Spatial content and motility actions. Becacuse Spatial is a hosted virtual world platform, the Hands API offers the user a capability to navigate within the 3D space through more direct hand signals. This lets the user operate in the environment with their hands in a way resembling digital semaphore. Like Spider Man's web-casting wrist gesture, a certain motion will teleport the user to a different coordinate in the virtually-depicted 3D space. Pinching fingers allows command menus to come up. Hovering over an option and letting go of the pinch selects the desired input command. The entire menu of the Spatial app can be navigated with hand signals much like the Spielberg film Minority Report's futuristic computer interfaces. It takes a bit of confused experimentation before the user's neuro-plasticity rewires the understanding of the new input method. (The same way learning abstract motions of the mouse cursor or game-pad controls require a short acclimatization period.)
 
This is great advancement for the minority of people reported to be putting HMDs on their heads to use their computers. But what about the rest of us who don't want to have visors on our noggins? For those users also we can anticipate computer input from our motions in front of the machine of our choice too, very soon. Already, the backward-facing camera in iOS devices detects full facial structure of the user. The depth vision of that camera enables mirroring of the shape of our facial features such that it can be used in a similar way that old skeleton keys precisely matched the internal workings of bolt locks. Simulating the precise shape of your face, plus the pupil detection of your eyes looking at the screen, is trustworthy indication that you are awake and presently expecting your phone to awaken as well. Pointing my camera at a photo of me doesn't unlock the phone, nor would someone pointing my phone at me while I'm not looking at it. As a fun demonstration of this capability, new emoji packs called "memoji" allow you to enliven a cartoon image of your selection with the CGI animation by mirroring your facial gestures. Cinematographers have previously used body tracking to enable such animation for films including Lord of the Rings and Planet of the Apes. Now everybody can do the same thing with position mirroring models hosted in their phones.

The next great leap of utility for cross-computer communication as well as computer programming will be enabling the understanding of other human communication beyond what our faces and mouths express. People video-conferencing use body language and gesture through the digital pipelines of our web cameras. Might gestural interactions be brought to all computers allowing conveyance of intent and meaning to the OS for command inputs?

At a recent worldwide developer convention, Apple engineers demonstrated a concept of using machine pattern recognition to simulate gestural input commands to the operating system extending and expanding the approach from the infrared camera technique. Apple's approach uses a set of training images stored locally on the device to infer input meaning. The method of barcode and symbol recognition with the Vision API pairs a camera-matched input to a reference database. The matching database can of course be a web query to a large existing external database. But for a relatively small batch of linguistic pattern symbols such as American Sign Language, a collection of reference gestures can be hosted within the device memory and paired with the inferred meaning the user intends to convey for immediate local interpretation without a call to an external web server. (This is beneficial for security and privacy reasons.)

In Apple's demonstration below, Geppy Parziale has the embedded computer vision capability of the operating system to isolate the motion of two hands separate from the face and body. In this example he tracked the gesture of his right hand separately from the left hand making the gesture for "2." Now that mobile phones have figure/ground isolation and the ability to isolate portions of the input image into segments, enormously complex gestural sign language semiotics can be achieved in ways that Nicola and I envisioned a decade prior. The rudiments of interpretation via camera input can now represent the shift of meaning over time that forms the semiotics of complex human gestural expression.

 

I remember in high school going to my public library and plugging myself into a computer, via a QWERTY keyboard, to try to learn the language that computers expect us to comprehend. But with these fascinating new transitions in our technology, future generations may be able to "speak human" and "gesture human" to computers instead of having us spend years of our lives adapting to them! 

My gratitude, kudos and hats off to all the diligent engineers and investors who are contributing to this new capability in our technical platforms.

 

Friday, September 9, 2022

Looking it up with computer vision

My mother introduced me to a wide range of topics when I was growing up. She had fascinations with botany, ornithology, entomology and paleontology, among the so-called hard sciences. As she was a teacher, she had adapted certain behaviors she'd learned from studying child development and psychology in her masters degree program on the best way to help a young mind to learn without just teaching at them. One of her greatest mantras from my childhood was "Let's look it up!" Naturally she probably already knew the Latin name for the plant, animal or rock I was asking about. But rather than just telling me, which would make me come to her again next time, she taught me to always be seeking the answers to questions on my own. 

This habit of always-be-looking-things-up proved a valuable skill when it came to learning languages beyond Latin terms. I would seek out new mysteries and complex problems everywhere I went. When I traveled through lands with complex written scripts that were different from English, I was fascinated to learn the etymologies of words and the way that languages were shaped. Chinese/Japanese script became a particularly deep well that has rewarded me with years of fascinating study. Chinese pictographs are images that represent objects and narrative themes in shape representation rather than in sound, much like the gestures of sign language. I'd read that pictographic languages are considered right brain dominant because understanding them depends of pattern recognition rather than decryption of alphabetic syllables and names which are typically processed in the left brain. I had long been fascinated by psychology, so I thought that learning a right brain language would give me an interesting new avenue to conceive language differently and potentially thereby think in new ways. It didn't ultimately change me that much. But it did give me a fascinating depth of perspective into new cultures.

Japanese study became easier by degrees. The more characters I recognized, the faster the network of comprehensible compound words accelerated. The complexity of learning Japanese as a non-native had to do with the idea of representing language by brush strokes instead of phonemes. To look up a word you don't know how to pronounce, you must look up a particular shape within the broader character, called a radical. You then look through a list of potential matches by total brush stroke count that contain that specific radical. It takes a while to get used to. I'd started, while living in Japan with the paper dictionary look-up process, which is like using a slide rule to zero in on the character which can then be researched elsewhere. Computer manufacturers have invented calculator-like dictionaries that sped up the process of search by radical. Still it typically took me 40-60 seconds with a kanji computer to identify a random character I'd seen for the first time. That's not so convenient when you're walking around outside in Tokyo. So I got in the habit of photographing characters for future reference when I had the time for the somewhat tedious process.

Last month I was reviewing some vocabulary on my phone, when I noticed that Apple had introduced optical-character-recognition (OCR) into the operating system of new iPhones. OCR is a process that's been around for years for large desktop computers with supplemental expensive software. But having this at my fingertips made the lookup of kanji characters very swift. I could read any text through a camera capture and copy it into my favorite kanji dictionaries (jisho.org or imiwa app). From there I could explore compound words using those characters and their potential translations. Phones have been able to read barcodes for a decade. Why hadn't it been applied to Chinese characters until now? Just like barcodes, they are a specific image block that has a direct reference to a specific meaning. My guess is that recognizing barcodes had a financial convenience behind it. Deciphering words for poly-linguists was an afterthought that was finally worth supporting. This is now my favorite feature of my phone! 

What's more, the same Vision API allows you to select any text from any language and even objects in pictures and send it to search engines for further assistance. For instance, if you remember taking a picture of a tree recently, but don't know what folder or album you put it in, the Spotlight search can allow you to query across your photo library on your phone even if you never tagged the photo with a label for "tree." Below you can see how the device-based OCR indexing looked for the occurrence of the word "tree" and picked up the image of the General Sherman Tree exhibit sign in my photo collection of a trip to Sequoia National Park. You can see how many different parts of the sign there were where the Vision API detected the word "tree" in a static image. 

But then I noticed that even if I put in the word "leaf" in my Spotlight search, my photos app would pull up images that had the shape of a leaf in them, often on trees or nearby flowers that I had photographed. The automatic semantic identification takes place inside of the Photos application with a machine learning process, which then has a hook to show relevant potential matches to the phone's search index. This works much like the face identification feature in the camera which allows the phone to isolate and focus the image on faces in the viewfinder when taking a picture. There are several different layers of technology that achieve this. First identifying figure/ground relationships in the photo, which is usually done at the time the photo is taken with the adjustable focus option selected by the user. (Automated focus hovers over the viewfinder when you're selecting the area of the photo to pinpoint as the subject or depth of focus of the photo.) Once the subject and ground can be isolated from the background, a machine learning algorithm can run on a batch of photos to find inferred patterns, like whose face matches to which person in your photo library. 

From this you can imagine how powerful a semantic-discovery tool would be if you had such a camera in your eye glasses, helping you to read signs in the world around, you whether in a foreign language or even your own native language. It makes me think of Morgan Freeman's character "Easy Reader" who'd go around New York looking for signs to read in the popular children's show Electric Company. The search engines of yester-decade looked for semantic connections between words written and hypertext-linked on blogs to string together. This utility we draw on every day uses machine derived indication of significance by the way people write web pages about subjects then based on which terms the blog authors link to which subject webpages. The underlying architecture of web search is all based on human action. Then the secondary layer of interpretation of those inferences is based on the amount of times people click on results that address their query well. Algorithms are used to make the inferences of relevancy. But it's human authorship of the underlying webpages and human preference for those links thereafter that informs the machine learning. Consider that all of web search is just based on what people decide to publish to the web. Then think about all that is not published to the web at present, such as much of our offline world around us. So, you can just imagine the semantic connections that can be drawn through the interconnectedness of our tangible world we move through everyday. Assistive devices that see the code we humans use to thread together our spatially navigable society are a web of inter-relations that will be easily mapped by the optical web crawlers we employ over the next decade.

To see test out how the Vision API deals with ambiguity, you can throw a picture of any flower of varying shape or size into it. The image will be compared to potential matches that can be inferred against a database of millions of flowers in the image archives of WikiCommons the public domain files which appear on Wikipedia. This is accessed via the "Siri knowledge" engine at the bottom of the screen on your phone when you look at an image (See below the small star shape next to "i"). While WikiCommons is a public database of free-use images, it could easily be expanded to any corpus of information in the future. For instance, there could be a semantic optical search engines that only matches against images in the Encyclopedia Britannica. Or if you'd just bought a book on classic cars, the optical search engine could fuzzy-match input data from your future augmented reality lenses to only match against cars you see in the real world that match the model type you're interested in.


Our world is full of meanings that we interpret from it or layer onto it. The future semantic web of the spatial world won't be limited to only what is on Wikipedia. The utility of our future internet will be as boundless as our collective collaborative minds are. If we think of the great things Wikimedia has given us, including the birth of utilities like the brains of Siri and Alexa, you can understand that our machines only face the limits that humans themselves impose on the extensibility of their architectures. 


 

 



Thursday, August 18, 2022

On the evolution of mechanical pencils

When I was a physics student, my father started giving me mechanical pencils. You’d think all possibilities would have been invented by the 1980s. But some of these pencils were incredible feats of engineering with fun new ways to click out and retract the pencil graphite. I think my father had a point that there was always more to be invented, even for something so simple. We'd frequently look at bridges and discuss ways that they could be designed differently to distribute the weight and talk through the engineers' decisions around the common designs around us in the real world. Every time I'd invent something new, I'd diagram it for him with my pencils and he'd ask probing questions about the design choices I had made. One day, for instance, I’d invented a "Runner’s Ratchet Shoe." I was an avid runner in those days and would get sore knees and shin splints. The Runner's Ratchet was a set of levers attaching to the runner's ankle that would put the downward motion of a runner's shoe into softened impact, cushioning the shock to the knees while redirecting that downward force into a forward springing action propelling the runner forward. He looked at my drawing and exclaimed, “Congratulations for re-inventing the bicycle!” It took me a moment to see that the motion of the foot between my invention and the bicycle was the same, while the muscle stress of my design was probably greater, obviating the benefit I was pursuing in the first place.

When I went to college. He started giving me fountain pens. I asked him why he was giving me all these. I could just use a biro pen after all. He said, “If you have a better pen, you’ll write better thoughts.” I could feel the effect of the instrument on the way I framed my thoughts. I was more careful and considered about what I wrote and how. Fountain pens slowed me down a little bit. They force you to write differently and sometimes they alter the pace of your writing and therefore the way you see the initial thought. It feels as if you’re committing something weighty to paper when you write with quill ink. I enjoyed the complexity over time. I took away the lesson that the instrument shapes the experience, emphasis of the path over the goal. Complexity, challenge and adversity in any process can make the end product more refined.

One day, I was working on a new invention that, yet again, had complex moving gears and levers. It was another piece of sports equipment I'd named, Ski Pole Leg Support. This invention was again to address knee soreness, this time from hours of sitting on ski lifts with dangling legs tugged on by heavy skis and ski boots. The device would allow skiers like me to suspend the weight of the legs onto a chair lift through the length of the ski-pole which would suspend a retractable support for the base of the ski boot. As I was visualizing the motion of the device in my head, I thought that what I really needed was a pen that could draw in 3 dimensions directly, the way I saw the device in my mind's eye. That way I could demonstrate the machine in spatial rendering rather than asking my professors to look at my 2D drawings and then asking them to to re-imagine them as 3 dimensional objects, Da Vinci style. 

With this new inspiration, I designed a concept of just such a drawing tool. It looked like a fishbowl that had a pen-like proboscis which would move through a viscous solution to draw the lines which would suspend in location supported by the viscosity of the medium. I realized that a fountain pen moving through the solution would disturb the suspension solution itself through friction and therefore damage the rendered image as the drawing got more complex, unless the user drew from the bottom up. So, as an alternative to that approach, I imagined using an array of piezoelectric ultrasound speakers laid out in a grid in the base of the globe to direct shock waves through the solution, converging shock waves on certain points. At the locations where the shock waves would intersect, they would cause destructive interference and therefore increase the fluid pressure at the drawing coordinates desired. The solution at those shock-points would form a visible distillate from the solution's chemicals at these points, which would allow the drawing to persist. (The same way that a cathode ray tube uses a single string of electrons to paint a picture over a grid where the electrons strike the luminescent screen. But I'd use sound instead of electrons.) When the drawing was finished, you could preserve it temporarily, then erase it by spinning the globe. Centripetal rotation would pull the distillate apart so it could settle in the base to dissolve again and be reused, like the way a lava lamp uses its molten wax in solution, re-melting when it falls close to the heated base. I thought of it like a 3D version of the popular 2D Etch-a-sketch toy which was popular during my childhood. Might this drawing globe have market potential I wondered?

Before I shared the concept with toy manufacturers, I met an inventor named Michael Zane who had graduated from my college, Franklin and Marshall. He said he was willing to look over my product concepts and give me some advice. After he stared at my drawings a bit he gave me an approving glance. He said he liked my ideas. But he then commented “If you have any interest or ability outside of inventing, pursue that with a passion!” He thought his career path was not something that he’d wish on anybody. It was incredibly difficult to file and protect patents as you tried to sell your products in a fiercely competitive international market, he explained. He told me stories of many inventors whose lives were consumed and hopes dashed by throwing too much of their lives into one idea. So his advice was to live a different life than he saw on paper as my future. I did go to the US Patent and Trademark Office to research the ideas in the sector of several of my future patent ideas. But over the years I let my dreams my physical hardware inventions trickle out of my mind and focused my inventions on new problems and opportunities in the digital technology and internet space.

Looking back on my 3D Etch-a-sketch concept 30 years later, I see how a fountain pen for aquariums wasn’t going to find a mass market fit, even if I'd thrown all my gusto behind it. Mr. Zane had saved me lots of frustration and door knocking. I’m very glad I pursued those other interests I had at the time. Mechanical pencils and fountain pens are cool. But your life should be about more than something that has been reinvented 1000 times since the dawn of art. The inventions I focused on over the last three decades were team efforts, not lone entrepreneur stories. Coordinating groups of people in a company to build something new is the way to go these days. As the oft quoted adage goes: If you want to go fast, go alone; If you want to go far, go together. My teams have won patents, changed industries and impacted the lives of millions of people. It would have been a different life story if I'd just pursued selling plastic drawing toys at the start of my career.

I say all this because I have been toying around with VR and AR products for the last 8 years since my company decided to leap into the new industry. I’m starting to see echoes of what I’d wanted decades ago, now implemented in products. My colleagues and I go into VR to discuss technology and new product ideas. We tend to use Spatial.io, a virtual reality conferencing platform. One day I drew a diagram in the air with my fingers. I described a product concept I’d debated with one of my friends, Paul Douriaguine, who was a work colleague from my time working at a startup in Sydney. We discussed the concept of using aerial photography and photogrammetry to assemble a virtual reproduction of an oil refinery or other physical facility or factory. We discussed using automated time-lapse captured images from multiple drone flights around the facility to watch for areas of discoloration that might indicate mold, rust or oil leaks which could be used to prevent physical structure damage. 

My rendering, pictured above, showed how a drone flight path, conducted autonomously or crowd-sourced, to capture structural images for analysis. Flight path was portrayed in green on the camera angles a drone would follow around the facility depicted in blue. Then my friend John P Joseph, who had actually worked on oil facilities with his own AR company jumped in and diagrammed how his team looked at the problem for long-distance pipeline maintenance and function monitoring.  

Then my other friend, Olivier Yiptong, jumped in to talk about how to establish the server architecture to achieve the service we were describing mechanically across pipes, facilities and flying devices. 

It was an amazing thing to watch. Three people with entirely different backgrounds (business, product and engineering) had assembled sporadically. In the span of about 15 minutes, all of us were able to rapidly discuss different layers of a product and service initiative to achieve understanding of a range of opportunities and limitations through a process that might have taken hours of preparation and presentation time in any other medium.

The experience made me reflect back in time. My first conception on the best way to draw an invention 3 decades ago was to make a product leap from paper to fish bowl globes. Here I was today, inside the globe with other clever folk inventing in a shared virtualized space. In Spatial, I was able to diagram the concept in a fun and effective way, even if a bit sloppy because I didn't have a 3D ruler and protractor yet! (Wait a tick... what if... oh never mind...)

VR is an old idea receiving heaps of attention and investment right now. Just as some say that the Apollo missions were something the past could afford that the present could not, I think that VR is an idea that couldn’t have found a market when I was young, but that actually can address new use cases and interesting applications now. Perhaps it doesn’t pass the Larry Page toothbrush test: something you’d want to use 2 times a day. But it is significantly valuable in what it can convey experientially. I find myself preferring it over the camera-gaze experience of pandemic life video conferencing platforms. Now when my engineering and product expert friends want to meet, we typically opt for the VR meeting as it feels more human to move and gesture in a space rather than sit transfixed, staring into a camera lens. Perhaps the recent pandemic has created a great enough frustration in general society that we yearn to get back to a 3D experience, even in remote conference meetings. Seeing people through a flat screen while posing to be rendered in 2D is an artifice of the past times. I suspect that eventually people will come to prefer 3D to meet when they can’t meet in person more broadly. 2D seems to be an awkward adolescent phase for our industry in my view.

These VR designing pens are similar to the fountain pens and mechanical pencils of yesteryear, but in a medium that is just being created for our next generation of inventors and future physicists now. Our tools are imprecise at present. But in the coming years they will be honed because of the obvious benefit and efficiency they bring in facilitating social connections and collaboration. Over the pandemic years I've gained a new form of conversational venue that has caused more discussions to happen in better ways than the technologies I'd become used to before. I will continue these brainstorms in virtual 3D environments because, when separate from my team, I still want to communicate and share the way we are used to in physical space. But there, we obviate separations in space while still keeping the robust density of media that we can share in our globe-like environment, unlimited by the restrictions of what can be crammed through a small camera aperture.

Friday, July 8, 2022

The emerging technology for stereoscopic media in the home

A decade ago, I followed the emergence of affordable 3D televisions at the Consumer Electronics Show for experiencing movies at home. One of the problems of adoption of the technology was the lack of media that mainstream audiences could view on the devices. If you were to have one of these advanced TVs, there was a lack of viewable content streamed or sold to those same households. It was just too much work to pair the content with the device. Filming in stereoscope is a complex process that isn't well supported by the commercial media channels to the home as it is for the cinema.

While stereoscopic headsets are being released in significant volumes following the wave of VR as a mainstream consumer experience, the content availability challenge still looms. (IDC projects 30 million headsets a year to be released by 2026 across multiple vendors. Meta claims 15 Million sold to date.) This time the gaming sector is leading the charge in new media creation with 3D virtual environment simulation using world building software platforms distributed by Unity & Epic Games. The video gaming industry dwarfs the scale of cinema in terms of media spend with US gaming sector alone totaling over $60 Billion annually in contrast to cinema at $37 Billion. So this time around the 3D media story may be different. With lower production cost of using software media creation and a higher per-customer revenue stream of game sales, there will be more options than I had with my 3D Vizio TV.

I recently discovered the artistry of Luke Ross, an engineer who is bringing realistic 3D depth perception to legacy video games originally rendered in 2D. His technique is currently applied to allow 3 dimensional "parallax" depth to a 2D scene by having the computer render parallel images of the scene, depicted to each eye in a head-mounted-display sequentially. Leveraging the way that our brains perceive depth in the real world, his technique persuades us that typically flat-perspective scenes actually are deep landscapes, receding into the distance. Filming the recent Disney series The Mandalorian was conducted using the same world building programs used to make video game simulations of spacious environments. Jon Favreau, the show's director, chose to film in studio using Unreal Engine instead of George Lucas style on-scene filming because it drastically extended the world landscapes he could reproduce on his limited budget. Converting The Mandalorian into Avatar-like 3D rendering for Vizio TVs or VR head mounted displays would still be a huge leap for a studio to make because of the complexity of fusing of simulated and real sets. But when live action goes a step deeper to simulate the actors movements directly into 3D models, such as the approach of Peter Jackson's Lord of the Rings series, rapid rollouts to 2D and 3D markets simultaneously becomes far more feasible using Luke Ross "alternate-eye-rendering" (abbreviated AER).

Stereoscopic cameras have been around for a long time. Capturing parallax perspective and rendering that same two camera input to two display outputs is the relatively straightforward way to achieve 3D media. What is so compelling about the concept of AER is that the technique achieves depth perception through the use of a kind of illusion which occurs in the brain's perception of synthesized frames. Having a stereoscopic play-through of every perspective a player/actor might navigate in a game or movie is exceedingly complex. So instead, Luke moves a single perspective through the trajectory, then having the display output jitter the camera slightly to the right and left in sequence. When right glimpse happens, input to the left eye pauses. Then the alternate glimpse is shown to the left eye while right eye output pauses. You can envision this by blinking your right, then left eye while looking at your finger in front of your face. Each eye sees more behind the close object's edges than the other eye in that instant. So objects near appear to hover close to you against the background, which barely moves at all.

Vast landscapes of Final Fantasy VII appear more realistic with parallax depth rendering. 
https://www.theverge.com/2022/8/10/23300463/ffvii-remake-intergrade-pc-vr-luke-ross-mod

The effect, when you perceive it for the first time, astounds you with how realistic the portrayed landscape becomes. It's like having a 3D IMAX in your home to experience this with a VR headset. The exciting thing is that game designers and directors don't have to rework their entire product to allow this to be possible. AER can be done entirely post-production. It is still a fair bit of work. But much more feasible to achieve on grand scale than rendering all legacy media anew in 3D VR stereoscopic view. This makes me believe that it will be a short matter of time before this will be commonly available to most readers of my blog. (Especially if I have anything to do with this process.)

You may not yet have a consumer VR headset at your disposal yet. But currently HP Reverb, Pico, Meta Quest, and HTC Vive are all cheaper than my 3D Vizio TV. The rendered experience of a 65 inch TV in your living room is still typically smaller in your field of view than a wide field of vision VR headset. So over coming years, many more people may opt for the nearer screen over the larger screen. When they do, more people will start seeking access 3D content which now, thanks to Luke, has a more scalable way to reach the market for this emerging audience.