Sensor Data: A New Threat to Smart Phones?
PCs have long been vulnerable to malware capable of spying on keyboard strokes to steal passwords, account numbers, and other private information. A similar threat may exist for smart phones and tablets due to the unique vibrations that result from tapping at specific locations on the touch screen. These vibrations are picked up by a smart phone’s sensitive accelerometer and gyroscope sensors and encoded in motion data easily collected through standard, off-the-shelf APIs.
The security implication of inferring screen taps from motion data is not hard to imagine: an attacker could launch a background process and silently monitor the user's soft keyboard presses and icon taps to steal sensitive information.
To test whether taps at different screen locations produce a signature identifiable enough to correlate with a screen location, a team of researchers conducted a series of experiments, ultimately finding they were able to identify English letters with an accuracy of up to 80% on both Android and iPhone devices. The accuracy of predicting icons was even higher (up to 90%).
The exact details of the methodology employed to identify screen taps will be detailed in a paper to be published at MobiSys this June, but the general outlines of the investigation can be summarized as follows:
AT&T Researchers Emiliano Miluzzo, Alexander Varshavsky, Suhrid Balakrishnan along with Romit Roy Choudhury of Duke University collected data on 40,000 taps from 10 users over four weeks, labeling each tap with screen coordinates, letters or IDs of the tapped icons, time-stamped accelerometer and gyroscope readings, and timestamps to mark the beginning and ending of a tap.
From this training data, researchers extracted 273 features of sensor readings generated by a tap, including an energy amount, the striking force of the typing finger, the angular velocity of the device due to a tap, the sensor data frequency response, and correlation metrics between the accelerometer and gyroscope data.
Capturing the motion data was easy, requiring only the activation of off-the-shelf APIs provided by the device manufacturer. Because motion data is considered innocuous, it does not require the user’s permission to access (unlike GPS location information).
The data however is noisy and hard to classify; the prediction task of inferring a letter or an icon requires discriminating among 26 and 20 classes, respectively.
For these reasons, researchers chose an ensemble of machine learning classifiers that collectively covered parametric and nonparametric methods, and both linear and nonlinear techniques. The ensemble included k-nearest neighbor (kNN), multinomial logistic regression, support vector machine (SVM), random forest, and bagged decision trees.
The ensemble method also ensured that each model’s classification errors were uncorrelated (each model’s weaknesses were unique to it, so the use of multiple methods corrected for the weaknesses of individual models).
The results of the individual classifiers were combined, and a majority voting scheme was employed to obtain a final result.
Training the models took relatively few taps, needing only around 20 taps per letter/icon to achieve the highest accuracy, which was on average 55%.
It was relatively easy to improve on this accuracy since most misclassifications tend to localize in the region of the tap and a letter is confused only with nearby letters in the keyboard layout. Narrowing down the correct letter thus required focusing only on the small number of neighboring letters rather than the entire keyboard. Any attacker using a dictionary attack could quickly zero in on the correct keys. Generally, researchers found it would take only three or four attempts to correct an error.
The models were trained combining part of the data from all the users and tested with the remaining data from the same users. In other words, the models are somewhat tuned to each user.
But researchers also found the models were easily adapted for users who hadn’t been involved in the training process. While new users’ taps were identified initially with only a 40% accuracy, this percentage increased rapidly as more data was collected from the new users. It took only 13 additional taps per letter/icon to achieve for new users the same high accuracy obtained for the users on which the models were trained.
These experiments clearly demonstrated not only that motion data from accelerometers and gyroscopes can be used to accurately infer screen taps, but that it can be done using pre-trained models.
The experiments also pointed to possible countermeasures: moving the keyboard periodically, reducing the sampling rate of sensors so the data is less precise, and reducing access to sensor data when the keyboard is up (though this last measure would prevent the keyboard from rotating with the device).
While the motion data is clearly a vulnerability, the same data might be used to enhance security. The unique ways people handle their phones may create a sensor signature distinctive enough to serve as a biometric to authenticate a smart phone user. If the phone detects an uncharacteristic tapping pattern, it could shut down to prevent unauthorized use.
But before this avenue can be explored, the current problem of inferring tap location from sensor motion data must be brought to the attention of the research community and device manufacturers so the security ramifications can be addressed. More information will be contained in the paper to be presented in June at MobiSys.
Sensor measurements correlate to screen locations
Accelerometer sensor measurements (in particular, that of the z-axis) show distinct patterns during taps. Similar patterns can also be observed in the gyroscope data.
The top line shows a step function that identifies letter taps. The boundaries of two taps are demarcated by dashed vertical lines.
About the researchers
Emiliano Miluzzo is a Senior Member of the Technical Staff at AT&T Research working at the intersection of mobile systems and applied machine learning. His research interests include mobile, pervasive, distributed computing, mobile sensing systems, and big data analysis. He holds a PhD. in Computer Science from Dartmouth College, and an MS and BS in Electrical Engineering from University of Rome La Sapienza, Italy.
Alexander Varshavsky is a Senior Member of Technical Staff at AT&T Labs. His research interests include mobile and ubiquitous computing, context-awareness and security in mobile systems. He holds a PhD in Computer Science from the University of Toronto, Canada.
Suhrid Balakrishnan is a Senior Member of the Technical Staff at AT&T Research specializing in machine learning, and particularly interested in scalable, accurate and efficient algorithms for statistical learning. His current research focuses on predictive modeling for computational advertising, recommender systems, and sensor data. He has a PhD in Computer Science from Rutgers University, and a B.Tech. from I.I.T. Bombay.
Romit Roy Choudhury is an Associate Professor in the Dept. of Electrical & Computer Engineering and Dept. of Computer Science at Duke University. His current research focuses on SyNRG (our Systems Networking Research Group). He has a PhD in Computer Science and an MS in Electrical and Computer Engineering, both from the University of Illinois (at Urbana-Champaign, USA), and a B.Tech in computer science from Haldia Institute of Technology, India.