Unspoken

Unvocalized speech sensing for private voice interactions

In the evolution of audio technology, we’ve achieved remarkable duality in output: speakers for public listening and earphones for private, personal audio. Yet for input, we remain stuck in a singular paradigm: the microphone. Whether dictating notes, issuing voice commands, or conversing with AI systems, audio input inherently broadcasts our interactions. This lack of privacy has significant implications, from awkward moments in public spaces to limitations in environments where sound isn’t an option.

Unspoken aims to challenge this status quo. By combining ultrasound and laser speckle sensing, we’ve developed a novel silent speech detection system that bridges the gap in audio input privacy. Offered in multiple form factors, Unspoken provides a private, wearable alternative to microphones—enabling users to interact with technology through subvocalized speech without making a sound.

The Case for Private Audio Input

The disparity between public and private modes of communication becomes apparent when we examine our reliance on microphones. Unlike the versatility of earphones, microphones inherently capture sound in ways that are:

Audible to others, compromising privacy in shared spaces.
Sensitive to noise, making them unreliable in crowded or loud environments.
Culturally intrusive, particularly in settings where speaking aloud is discouraged or impossible.

By contrast, Unspoken envisions a world where private audio input is as seamless and ubiquitous as personal audio output. Whether dictating a response during a crowded commute, issuing commands in a quiet library, or interacting with AI systems in a secure environment, Unspoken makes private communication truly private.

How Unspoken Works: Combining Ultrasound and Laser Speckle Sensing

The core of Unspoken lies in its sensing technologies, which capture silent speech through subtle physiological signals:

1. Ultrasound-Based Micro Jaw Vibration Sensing

Ultrasound sensors detect minute vibrations in the jaw and throat as a user subvocalizes speech. These vibrations are imperceptible to others but carry the acoustic information necessary to reconstruct speech. This technique enables highly accurate detection without requiring audible sound.

2. Laser Speckle Sensing for Surface Displacement

Laser speckle sensing captures microscopic surface movements on the skin, such as those caused by tongue and lip motion during subvocalization. These tiny displacements are analyzed using machine learning models to extract phoneme patterns, enabling precise speech reconstruction even in noisy or dynamic environments.

By combining these two modalities, Unspoken achieves robust, low-latency silent speech detection that adapts to diverse usage scenarios.

Interaction Design: A Form Factor for Every Context

One of the key design goals for Unspoken was to offer flexibility in how and where the technology is used. We developed multiple form factors to suit different contexts:

Neckbands: A lightweight, wearable option that fits seamlessly into everyday life, ideal for use with smartphones or AR/VR devices.
Headsets: Designed for enterprise and professional use, offering precise detection for collaborative or technical workflows.
Standalone Modules: Compact sensors that integrate with existing devices, from laptops to smart speakers.

These form factors ensure that Unspoken can be adopted across a wide range of scenarios, from private conversations to hands-free command environments.

Engineering Challenges and Innovations

Building Unspoken required solving several critical challenges at the intersection of HCI, sensing, and machine learning:

Noise and Artifact Reduction: Detecting subvocalized speech requires filtering out non-speech signals caused by movement, ambient noise, or device placement. We developed advanced signal processing techniques to isolate meaningful patterns without sacrificing accuracy.
Model Generalization: Subvocalization patterns vary significantly between individuals. To address this, we trained our models on diverse datasets while incorporating adaptive learning mechanisms that personalize detection to each user over time.
Low-Power Operation: Wearable devices demand energy efficiency. By combining lightweight, edge-optimized models with energy-efficient sensors, Unspoken achieves real-time performance without draining battery life.

Why This Matters: Expanding the Interaction Paradigm

Unspoken isn’t just a technology—it’s a rethinking of how we interact with machines. By enabling private, silent speech input, it opens up possibilities that traditional microphones can’t support:

Increased Privacy: Interact with AI systems or voice-enabled devices without broadcasting your commands to the world.
Accessibility: Provide a new avenue for individuals with speech or mobility impairments to communicate effectively.
Enhanced Usability: Extend voice interaction to environments where sound isn’t practical, from libraries to noisy factory floors.

As we move toward a future where voice interfaces become increasingly central to our technology, Unspoken fills a critical gap. It ensures that voice input is as flexible, accessible, and private as the rest of our interaction toolkit.

Toward a More Personal Future

In designing Unspoken, our goal wasn’t just to create a better input device—it was to address a fundamental limitation in how we think about communication. By offering silent, private speech detection, Unspoken redefines what’s possible in voice interaction, making it more inclusive, adaptive, and human-centered.

As audio input evolves to meet the needs of a connected world, we believe privacy should be at its core. With Unspoken, we’re one step closer to that vision—a future where speaking doesn’t mean being overheard, and interaction feels as personal as thought itself.