Synesthesia: Detecting Screen Content via Remote Acoustic Side Channels

Daniel Genkin Mihir Pattani Roei Schuster Eran Tromer
University of Michigan University of Pennsylvania Tel Aviv University,
Cornell Tech
Tel Aviv University,
Columbia University

Summary

We observe that a new side-channel information leak: the visual content displayed on user's screens leaks onto the faint sound emitted by the screens. This sound can be picked up by ordinary microphones built into webcams or screens, and is inadvertently transmitted to other parties, e.g., during a videoconference call or archived recordings.

Thus, users' privacy may be compromised whenever voice is captured in screen proximity, which is very common: audio is recorded during video-chat calls using apps such as Skype or Hangouts, by “smart speakers” such as Amazon Echo or Google Home, by the user's smartphone (and its apps), and more. A motivated attacker can even capture these secret-carrying noises from a distance, using a parabolic microphone.

Empirically demonstrating various attack scenarios, we show how this channel can be used for real-time detection of on-screen text, or users' input into on-screen virtual keyboards. We also demonstrate how an attacker can analyze the audio received during video call, captured by the victim's own microphone, to infer whether the other side is browsing the web in lieu of watching the video call, and which web site is displayed on their screen.

Inferring screen content through a VoIP session.
Inferring screen content through a VoIP session (illustration): Alice's own webcam is directed at her face and away from the screen. However, sound from the webcam-embedded microphone is still transmitted to the attacker. By simply relaying Alice's voice, VoIP traffic also carries her screen's content.

Full paper


Q&A

Q1: How sensitive does audio-recording equipment have to be?

The attack can be performed using commodity microphones such as those embedded in webcams, within screens, in "smart speakers" and phones.

Webcam
Microsoft LifeCam webcam
Google Home
Google Home
Smartphone
LG V20 Smartphone
Sound, along with screen content information, can be captured by microphones embedded in various commodity products

Q2: What screens are vulnerable?

The leak stems from the visual rendering mechanism in PC screens. We tested dozens of LCD screens, with both CCFL and LED backlighting, of various models and manufacturers including Dell, Samsung, HP, ViewSonic, Philips, Soyo, and Apple screens. We tested screens as old as 2003-made, or as new as 2017-made. A similar leakage behavior existed in all models, old and new alike.

To demonstrate this, we visualize the attacker's acquired signal (on a spectrogram), when displaying a known-in-advance, alternating pattern of color transitions on the screen ("Zebra", as described in Q6).

Webcam
Samsung 920NW
Google Home
HP ZR30w
Smartphone
Dell U3011t
Smartphone
Philips 170S4.
We visualize the attacker's acquired signal from various screen models, all displaying similar patterns

Q3: How does microphone distance affect the attack?

Distance definitely affects sound recording: the farther a microphone is away from the screen, the lower is the attacker's signal-to-noise ratio. Nevertheless, naturally-placed microphones still capture exploitable signals. Acoustic signals can also be captured at a distance of up to 10 meters, using designated equipment.

Close-range phone Naturally-placed phone
Up-close vs. naturally-placed smartphone. The attacker's signal is attenuated when distancing the mic, but remains sufficiently clean and strong.
Parabolic microphone
At-distance attack using a parabolic dish

Q4: Are on-screen keyboards safer than physical ones against audio-based snooping?

Physical keyboard noises can reveal the identity of a key being pressed (whether by the difference in sound, or by the timing pattern) [1, 2, 3, 4, 5, 6, 7, 8, 9]. Virtual (on-screen) keyboards were considered safer, since they avoid mechanical key sounds. However, we show that virtual keyboards also expose keystrokes acoustically. Caution should be excervised when typing sensitive text such as passwords around audio recording equipment, whether a desk or on-screen keyboard is used.

Q5: What can be done to mitigate this attack?

Mitigations are possible, but expensive. We can consider both hardware and software mitigations.

Hardware mitigations such as eliminating the signal, masking it by emitting other noise, or shielding to obstruct it (1) can only be applied for screens manufactured in the future, and (2) are each expected to have a significant overhead. Masking or shielding this relatively clean signal would require careful hardware instrumentation. Eliminating this signal requires a change in the design common to most computer screens.

Software countermeasures change the actual screen content, for example crafting the pixel values to induce a uniform acoustic signal (regardless of screen visuals), or adversarially fooling machine learning models such as those used in the paper. By definition, these countermeasures do not protect any software used with a leaky screen.

See the paper for more details about the above countermeasures.

Q6: What are "Zebras"?

Zebras are black-white stripes displayed on the screen (see below figure). When displaying a Zebra, due to the visuals-to-sound leak, the visual periodic color transitions, or stripes, correspond with sound frequencies. The smaller the stripes, the higher the imposed sound frequencies. A Zebra displayed on the screen tends to be clearly visible on the acoustic signal's spectrogram, (the time-frequency heatmap).

This phenomena is due to the information leak, and is thus useful for visually gauging leakage.

Zebra
Visualization using Zebras: black-white transitions (stripes) correspond with sound frequencies.

Q7: What about other leakage from screens?

Extraction of screen content via electromagnetic emanations (``van Eck phreaking'' or screen ``TEMPEST'') is well known and studied, originally for CRT screens, and later also for modern flat-panel screens and digital interfaces. Such electromagnetic attacks require an antenna and radio receiver in physical proximity to the screen, and tuned to suitable radio frequencies. Acoustic emanations, relying on microphones (which are ubiquitous and open new attack scenarios), have not been previously addressed.

Q8: What about other acoustic attacks?

See the Wikipedia page on Acoustic Cryptanalysis. In a nutshell:

In a prior work on acoustic leakage from CPUs, we showed ongoing computation (such as cryptographic decryption or signing) induces acoustic noise from the CPU's power supply, from which secret data can be extracted. Eavesdropping on keyboard keystrokes is extensively investigated (see Q4 above). Keys can be distinguished by timing, or by their different sounds. Acoustic leakage has been identified from hard disk head movements, and inkjet printers.

Preceding modern computers is MI5's "ENGULF" technique (recounted in Peter Wright's book Spycatcher), whereby a phone tap was used to eavesdrop on the operation of an Egyptian embassy's Hagelin cipher machine, thereby recovering its secret key. Declassified US government publications describe "TEMPEST" acoustic leakage from mechanical and electromechanical devices, but do make no mention of modern electronic computers.


Acknowledgments