Sensor Data: A Way to Make Smart Devices Secure?

by: Staff, Thu Jul 05 16:44:00 EDT 2012


Every tap to a smart device touch screen creates vibrations that vary based on the force and direction of the tap, how the device is held or cradled, and where on the screen the tap occurs. These vibrations are picked up by the device’s sensitive accelerometers and gyroscopes—the same ones that make it possible to rotate a screen or control a gaming action using simple motions—and encoded in the motion data generated by the sensors.

Because everyone taps differently and in a characteristic way, motion data may carry a sensor signature distinctive enough to serve as a biometric identifier to authenticate a smart phone user. The device could, for example, shut down if it detects an uncharacteristic tapping pattern.

But the same motion data that may allow devices to authenticate users may also pose a security risk if it’s possible to accurately correlate unique vibrations with the keys and icons being tapped. In the same way PCs have long been vulnerable to malware capable of spying on keyboard strokes to steal passwords and other private information, smart phones and tablets may be vulnerable to attackers accessing their motion data.

A team of researchers . . . were able to identify English letters with an accuracy of up to 80% on both Android and iPhone devices.

The security implication of inferring screen taps from motion data is not hard to imagine: an attacker could launch a background process and silently monitor the user's soft keyboard presses and icon taps to steal sensitive information. If motion data is to be of use for authenticating users, these vulnerabilities must first be understood.


An ensemble classification method for inferring screen taps

To test whether taps at different screen locations produce a signature identifiable enough to correlate with a screen location, a team of researchers conducted a series of experiments, ultimately finding they were able to identify English letters with an accuracy of up to 80% on both Android and iPhone devices. The accuracy of predicting icons was even higher (up to 90%).

The exact details of the methodology employed to identify screen taps is detailed in the paper TapPrints: Your Finger Taps have Fingerprints being presented at MobiSys 2012, but the general outlines of the investigation can be summarized as follows:

AT&T Researchers Emiliano Miluzzo, Alexander Varshavsky, Suhrid Balakrishnan along with Romit Roy Choudhury of Duke University collected data on 40,000 taps from 10 users over four weeks, labeling each tap with screen coordinates, letters or IDs of the tapped icons, time-stamped accelerometer and gyroscope readings, and timestamps to mark the beginning and ending of a tap.

From this training data, researchers extracted 273 features of sensor readings generated by a tap, including an energy amount, the striking force of the typing finger, the angular velocity of the device due to a tap, the sensor data frequency response, and correlation metrics between the accelerometer and gyroscope data.

Capturing the motion data was easy, requiring only the activation of off-the-shelf APIs provided by the device manufacturer. Because motion data is considered innocuous, it does not require the user’s permission to access (unlike GPS location information).

The data however is noisy and hard to classify; the prediction task of inferring a letter or an icon requires discriminating among 26 and 20 classes, respectively. For these reasons, researchers chose an ensemble of machine learning classifiers that collectively covered parametric and nonparametric methods, and both linear and nonlinear techniques. The ensemble included k-nearest neighbor (kNN), multinomial logistic regression, support vector machine (SVM), random forest, and bagged decision trees.

Training the models took relatively few taps, needing only around 20 taps per letter/icon to achieve the highest accuracy, which was on average 55%.

The ensemble method also ensured that each model’s classification errors were uncorrelated (each model’s weaknesses were unique to it, so the use of multiple methods corrected for the weaknesses of individual models). The results of the individual classifiers were combined, and a majority-voting scheme was employed to obtain a final result.

Training the models took relatively few taps, needing only around 20 taps per letter/icon to achieve the highest accuracy, which was on average 55%.

It was relatively easy to improve on this accuracy since most misclassifications tend to localize in the region of the tap and a letter is confused only with nearby letters in the keyboard layout. Narrowing down the correct letter required focusing only on the small number of neighboring letters rather than the entire keyboard. Any attacker using a dictionary attack could quickly zero in on the correct keys. Generally, researchers found it would take only three or four attempts to correct an error.




This example shows how few repetitions are needed to improve the accuracy of inferring the letters tapped.


The models were trained combining part of the data from all the users and tested with the remaining data from the same users. In other words, the models are somewhat tuned to each user.

But researchers also found the models were easily adapted for users who hadn’t been involved in the training process. While new users’ taps were identified initially with only a 40% accuracy, this percentage increased rapidly as more data was collected from the new users. It took only 13 additional taps per letter/icon to achieve for new users the same high accuracy obtained for the users on which the models were trained.

Researchers also identified possible countermeasures to make it difficult to infer tap locations . . .

These experiments clearly demonstrated not only that motion data from accelerometers and gyroscopes can be used to accurately infer screen taps, but that it can be done using pre-trained models.

It should be pointed out that researchers were motivated to examine motion data by a suspicion that it posed a possible threat, not by any specific instance of an attacker accessing motion data. In any case, inferring tap locations from motion data is very difficult since it requires sophisticated machine-learning algorithms.


Possible solutions to ensure user privacy

Researchers also identified possible countermeasures to make it difficult to infer tap locations:

  • Modify the device’s operating system to block or reduce access to motion sensor data when the keyboard is in use.
  • Require user permission for an app to access motion sensor.
  • Reduce the quality, sampling rate, and amount of sensor data for any application running in the background.
  • Engineer a device case to absorb the device motion, or implement a swiping-based keyboard.

One simple step users can do themselves: Walk or move while typing or tapping.

Researchers are also examining a combination of solutions that can be implemented at the application level and thus require no additional support from the operating system or hardware.

Once privacy-protection measures are in place, researchers will be able to turn their attention to investigating the use of sensor motion data as a biometric identifier, turning a possible security threat into a security enhancement.

For more information about the study, see the paper TapPrints: Your Finger Taps have Fingerprints.

Sensor measurements correlate to screen locations

Accelerometer sensor measurements (in particular, that of the z-axis) show distinct patterns during taps. Similar patterns can also be observed in the gyroscope data. The top line shows a step function that identifies letter taps. The boundaries of two taps are demarcated by dashed vertical lines.


About the researchers

Emiliano Miluzzo is a Senior Member of the Technical Staff at AT&T Research working at the intersection of mobile systems and applied machine learning. His research interests include mobile, pervasive, distributed computing, mobile sensing systems, and big data analysis. He holds a PhD. in Computer Science from Dartmouth College, and an MS and BS in Electrical Engineering from University of Rome La Sapienza, Italy.

Alexander Varshavsky is a Senior Member of Technical Staff at AT&T Labs. His research interests include mobile and ubiquitous computing, context-awareness and security in mobile systems. He holds a PhD in Computer Science from the University of Toronto, Canada.

Suhrid Balakrishnan is a Senior Member of the Technical Staff at AT&T Research specializing in machine learning, and particularly interested in scalable, accurate and efficient algorithms for statistical learning. His current research focuses on predictive modeling for computational advertising, recommender systems, and sensor data. He has a PhD in Computer Science from Rutgers University, and a B.Tech. from I.I.T. Bombay.

Romit Roy Choudhury is an Associate Professor in the Dept. of Electrical & Computer Engineering and Dept. of Computer Science at Duke University. His current research focuses on SyNRG (our Systems Networking Research Group). He has a PhD in Computer Science and an MS in Electrical and Computer Engineering, both from the University of Illinois (at Urbana-Champaign, USA), and a B.Tech in computer science from Haldia Institute of Technology, India.