In this pandemic world, advances in the use of visual media are helping firms to interact with their customers in new ways. Many brands are adopting XR and offering non-touch product experiences. Visual search has become a reality and the customer journey is getting shorter. However, the use of visual products has not been without controversy. While there is excitement around brand placement on TikTok, the usage of the platform has raised concerns on data privacy. Similarly, there is a strong case to be made for building a smart city and being prepared for the next pandemic yet the negative buzz around image (facial) recognition is making its future viability questionable. Visual artificial intelligence — the combination of computer vision, content formats (video, images and XR), natural language processing (NLP), and machine learning applied for foundational use cases of customer analytics — is the next frontier for many companies. The road is slippery, however, and the pace of innovation is rapid. Companies must tread carefully to realize the full potential of visual AI.

In an earlier piece, we identified strategic considerations for companies to realize benefits from voice AI. Many of those recommendations extend to visual AI as well. The following three strategies, in addition, will help companies benefit from their visual AI initiatives:

Focus on Quantifiable Outcome Metrics

Visual AI can become the latest shiny object in your organization given the typical use of vanity metrics to assess its returns. The consequences of doing so can be costly. You should, instead, push for the deployment of quantifiable outcome metrics to determine if the return on investment from visual AI is appropriate for your company. Consider the following examples for how companies have thought about the benefit-cost tradeoff. In the EdTech college recruitment market, sophisticated platforms are delivering personalized online virtual tours for prospective students, allowing better audience targeting that drives enrollment and yield rates for a diverse set of colleges in financially challenged times with virtual attendance. Advertising industry is replacing the manual coding of identifying end credits to place ads or index content with automated AI recognition and metadata tagging capabilities to cost effectively process higher volume of content. Architecture and design firms are creating precise VR space before any construction to drive collaboration with vendors globally, and achieve significant cost reductions. In sum, while there can be benefits of adopting visual AI for your customers, it is critical that you lay out the costs.

Utilize Computer Vision (CV) Platforms

Companies that have achieved good returns from their investments in visual AI have done so by implementing a CV platform that streamlines the tasks — object detection, object tracking, semantic or instance segmentation, facial recognition, visual search, and optical character recognition — and uses pretrained models for the specific application, e.g. moderating user-uploaded content, recognizing celebrities, or detecting brands and logos. Consider the following success stories: By detecting actions taking place in video automatically, advertisers are able to accurately and automatically place relevant ads. The movie industry is using machine vision systems of trailers to predict audience success giving studios more flexibility over marketing, distribution, and release decisions. The emergence of haptics, or use of touch sensation, combined with VR and AR, is allowing users to “touch” things in the virtual world leading to instant feedback and emotional response with visual AI solutions.

Build a Crisis Room for Trust Breach

The interaction between audiences and companies is increasingly virtual and visual. While the use of social media is appealing to reach a broader audience, there are significant hurdles for companies to build and maintain trust. Social media companies have taken steps to rank news sources by various measures of credibility to control the spread of disinformation, though the validity and reliability of methodology is debatable. And, given the stochastic programmatic ad placement, brands can get associated with hate speech. Companies need to mitigate risk by investing in crisis room with personnel and tools that utilize visual AI to detect fraud, identify influencers driving conversation, and plan quick reputation management responses. We propose that brands that allow consumers to “trust but verify” multimedia content targeted at them will be able to increase customer lifetime value (CLV), in addition to enhancing their brand equity.

There is increasing consumer concern about deep-fakes and rapid advances in visual AI now allow automated content generation (using natural language generation), which is blurring the line between human vs machine generated visual content. However, during the COVID-19 pandemic, 38 percent of consumers surveyed have tried a new digital activity or subscription for the first time and 72 percent households with wireless internet now stream video on their connected TVs (CTVs). The forcing function leading to adoption and changes in customer behavior offers a generation changing opportunity for companies across the world. Visual AI, combined with other forms of sensory inputs including voice, touch, and smell, and emergence of newer forms of natural language processing (NLP), together referred to as perception AI, can lead to a much deeper consumer engagement and better outcomes for companies and society.


Saurabh Goorha is a senior fellow at Wharton Customer Analytics and CEO of AffectPercept, a perception AI advisory and analytics firm. Raghuram Iyengar is Miers-Busch W’1885 Professor and professor of marketing at the Wharton School and faculty director of Wharton Customer Analytics.