From modest beginnings of the Audrey system, a system built by Bell Labs in the early 1950s that could only recognize numbers said aloud, the business of voice has exploded. At CES 2020, Amazon announced there are “hundreds of millions of Alexa-enabled devices globally now.” At the heart of this disruption around voice-enabled technologies, there are two key trends: (1) rapid adoption of the Internet of Things (IoT) and (2) advances in psycholinguistic data analytics and affective computing.

With the global penetration of smart devices, close to half of all consumer searches online are predicted to originate from voice-based searches by the end of 2020. As a complement to the availability of smart devices, there is rapid development of AI tools and data-modeling techniques for inferring emotion and intent from speech. For instance, neural-network language models are being combined with techniques from linguistics and experimental psychology for a real-time inference of human intention.

Consider the impact already realized: 200 million Microsoft Teams participants interact in a single day using the collaboration tool, customer handling times have been reduced by 40 percent in call centers, and voice shopping is predicted to become a $40 billion business in the next two years.

As companies globally embark on the journey of realizing benefits from voice analytics, what strategic considerations are in play? Here are our three recommendations:

1. Think Local

With governments making regulatory changes to support the “home-market-effect,” there will be a higher concentration of innovations targeted to local consumers. As an example, consider India, which is home to more than 100 languages spoken by 10,000 or more people. That, combined with a rural literacy rate of about 65 percent, suggests enormous potential for voice-enabled technology as a force for inclusion. The recent development of a voice-activated response system during the COVID-19 pandemic provides one such example. Voice analytics will play a key role in many last-mile solutions. And like other such solutions, attention to the local context is necessary for success. Another local example is the Chinese government pushing smart voice as one of their four areas for AI development; voice AI technology developed by iFLYTEK has led to an average of 4.5 billion interactions daily. Similarly, Yandex, which owns 58 percent of the search market in its native Russia, has a popular voice AI assistant, Alice, with around 35 million users.

2. Reinforce Privacy

Consumer trust is fragile, especially so for voice AI due to its opted-in use case of consumers relying on IoT devices in personal spaces. Added news about employees accessing consumer voice captured by Alexa inspires little confidence on this front. We foresee companies that take a lead in engendering trust and incorporating a “Privacy by Design” (PbD) to ensure that personally identifiable information (PII) in systems, processes, and products is protected can have a major competitive advantage. The trend for local solutions, as we noted in the earlier point, also suggests that consumer privacy laws, e.g. GDPR in the E.U., CCPA in the U.S., LGPD in Brazil, and NDPR in Nigeria, may remain fragmented. This disparity in laws will also offer an opportunity for companies to earn consumer trust by directly promoting ideas like “PbD Inside” to consumers – analogous to the “Intel Inside” strategy, which turned a hidden internal chip into a brand, and that brand into billions in added sales and consumer trust for Intel.

3. Prepare for New Emotional Intelligence Tools

Advances in Artificial Emotional Intelligence (AEI) are going to allow for more nuanced reactions to human emotions. The affective computing market is estimated to grow to $41 billion by 2022, and “emotional inputs will create a shift from data-driven IQ-heavy interactions to deep EQ-guided experiences, giving brands the opportunity to connect to customers on a much deeper, more personal”  Companies that carefully plan for how Perception AI, which covers the gamut of sensory inputs including voice, vision, smell, and touch, can complement their offerings will find a competitive edge.

In the movie Her, Joaquin Phoenix plays a lonely soul, Theodore Twombly, who discovers the joys of friendship with a voice-enabled AI assistant, Samantha. Ted feels Samantha is not just a computer but her own person. The movie brings to life the debate that leaders in the technology industry are having on AI and its long-term impact on humanity. Will disembodied voice assistants enrich human lives in the future, possibly even to form a friendship like Ted did with Samantha? Or, will voice AI become Frankenstein’s monster and bring harm to the human race?

Voice AI’s unique personal sensory experience democratizes the use of technology, giving it the potential to be a game-changer. For many companies, however, the journey can be fraught with risks if they do not curate local solutions, consider privacy-enhancing solutions, and utilize AI-based data analytics. Business leaders must carefully consider these techniques to strategically own the future of Perception AI.


Saurabh Goorha is a Senior Fellow at Wharton Customer Analytics and CEO of AffectPercept, a Perception AI advisory and analytics firm. Raghuram Iyengar is Miers-Busch W’1885 Professor and Professor of Marketing at the Wharton School and Faculty Director of Wharton Customer Analytics. For more on the future of voice analytics and AI in the post-COVID world, read this WCA white paper authored by Goorha and Iyengar.