att_abstract={{The explosion of mobile devices in the past decade has brought with it an onslaught of unwanted SMS (Short Message Service) spam [1].  It was reported that the number of spam messages in the US rose 45% in 2011 to 4.5 billion messages [2].  Furthermore a 2012 Pew Research Center study reported more than 69% of mobile users have received text spam [3].   The sheer volume of spam messages not only inflict an annoying user experience, but also incur significant costs to both cellular carriers and customers alike. Due to the proliferation of unwanted messages, SMS Spam may be compared to Email spam.  However, in contrast to email spam where the number of possible email addresses is unlimited, SMS spammers can more easily reach victims by, e.g., simply enumerating all numbers from the finite phone number space. This, combined with wide adoption of mobile phones, makes SMS a medium of choice among spammers.  Furthermore, the increasingly rich functionality provided by smart mobile devices also enables spammers to carry out more sophisticated attacks via both voice and data channels, e.g., using SMS spam to entice users to visit certain websites for product advertisement or other illicit activities.
Despite the importance and urgency of the SMS spam problem and its wide impact on cellular networks, the scarcity of representative spam datasets makes network-wide SMS spam studies a rather challenging task. The volume of SMS messages makes it difficult to collect SMS messages (with content) inside cellular networks.  Meanwhile, what lacks is a reliable automated approach to differentiate spam messages from legitimate ones.  For these reasons, existing research focuses primarily on building content-based SMS spam filtering at end user devices, e.g., [4,5], as opposed to studying large-scale SMS spam across the entire network.  Though anomymized SMS  records can be employed to characterize the network behaviors of individual phone numbers that initiate spamming, e.g.,  without the spam message content, it is difficult to correlate these spam numbers so as to understand how different phone numbers collaborate to launch large scale spam campaigns.
To circumvent this challenge, in this paper, we employ a novel data source – user (victim) generated spam reports (a.k.a. victim spam reports or spam reports in short) – to study SMS spam in a large cellular network.  As a means to combat SMS spam, many cellular network carriers have adopted and deployed an SMS spam reporting mechanism for mobile users. In particular, once receiving a spam message, a victim can report it via a text message forward. Cellular carriers can then investigate and confirm the reported spam and finally restrict the offending spam phone numbers.  Such victim spam reports not only contain the entire spam text, but they represent a more reliable and cleaner source of SMS spam samples, as all the spam messages contained in spam reports have been vetted and classified by mobile users with the human intelligence.
In addition to detecting spammers, the content, as reported by the spam victims, also serves as a valuable asset to understand spammers’ approaches and strategies. Taking advantage of this SMS spam reporting mechanism, in this paper we collect a year of spam reports from one of the largest cellular carriers in the US which contains approximately 543K spam messages – and carry out an extensive and multi-facet analysis of SMS spamming using these messages. Our research objectives are three-fold: 1) to devise an effective approach for identifying large-scale SMS spam campaigns which are initiated collaboratively by many offending phone numbers; 2) to assess the scale and impact of today’s SMS spam campaigns in large cellular networks; 3) to infer the intents and strategies of spammers behind these spam campaigns.   
Methodology:  To fulfill these objectives, we first carry out comprehensive and multi-dimensional studies of victim supplied spam reports and use them as a proxy to help understand large-scale SMS spam campaigns.  One key observation is that a majority of the spam messages contain a URL, we therefore adopt the definition of spam campaigns used in email spam studies [6] and group messages with embedded URLs pointing to the same site as a SMS spam campaign. For spam messages within each campaign, we apply a text mining tool, CLUTO, to further cluster them into spam activities, where messages belonging to the same activity exhibit great resemblance of their text content except for a few words.
On top of the clustering results, we identify 10 dominant spam campaigns that contribute to nearly half of the spam reports, and conduct an in-depth analysis of these spam campaigns. Results:  We find that all these campaigns are related to fraud sites which trick victims to submit personal information, such as addresses and personal phone numbers, in order to claim a free gift card or to redeem a free mobile device. In addition, these campaigns can be long-lasting, with a life span from several months to a year. Moreover, many of these campaigns are launched by hundreds to thousands of spammers from geographically diverse states in the US and have affected mobile users across the entire country. Our analysis sheds light on the intentions and strategies of SMS spammers and provides unique insights in developing better methods for detecting SMS spam.
	att_tags={SMS Spam},
	author={Ann Skudlark},
	institution={{International Telecommunications Society (ITS) Biennial Conference}},
	title={{Characterizing SMS Spam in a Large Cellular Network via Mining Victim Spam Reports}},