Spam texts: What's being done?
Spam texts (SMS) are a growing problem for consumers and telecoms.
For consumers, they represent, at a minimum, an annoyance and an invasion of cell-phone privacy, and count against message limits for those without unlimited messaging. But spam texts do worse than annoy. They can trick consumers into clicking on embedded links or providing private information, and increasing numbers of spam texts now attempt to install malware.
For telecoms, spam is a drain on the network and resources. A single spammer can send thousands of spam texts an hour, bombarding a single cell tower and hogging signaling bandwidth needed for legitimate customers. Fielding spam-related complaints also siphons away customer-service resources.
To combat text-spamming, AT&T worked with other North American telecoms to create the free 7726 service, which lets anyone report spam texts to short code 7726 (see sidebar). Reports go first to a messaging security company for verification before being forwarded to telecoms, which can then take action such as preventing delivery of texts from phone numbers associated with an identified spammer.
AT&T researchers are looking at network information to provide a faster, more comprehensive approach to fight spam.
The 7726 reporting service is having an effect. AT&T researchers, fraud teams, and engineers using 7726 reports, supplemented with other anti-spam methods, have seen a steady, week-by-week decline in the number of spam senders and the numbers of spam messages sent by each sender.
The 7726 service is most effective when customers report spam immediately; but this rarely happens. Reports may not be sent until hours or even days after a spam text was received, giving spammers time to send out thousands of additional texts.
If spam is to be shut down, a quicker response is needed, and AT&T researchers are looking at network information to provide a faster, more comprehensive approach to fight spam.
Why speed matters when reporting text spamming
The key to effectively shutting down spam texts is speed. If spammers can be shut down quickly, they can’t make money. Many spammers get paid by directing “leads” to another website, usually $1-$2 for each spam recipient who clicks a link embedded in the spam text or who enters an email address or other personal information.
But to make money, spammers must first spend money .
If spammers can be shut down quickly, before turning up enough leads, their business model fails.
In North America, spammers typically buy prepaid cards anonymously and activate them with daily prepaid rate plans that include unlimited messaging. (Spamming patterns are different in other parts of the world with tight daily traffic limits.) Spammers may also invest in special equipment that holds 16-128 prepaid cards, allowing them to vastly increase the volume of texts.
The spammers’ business model depends on an expected per-message return exceeding the per-message costs. If a prepaid card costs $15, a spammer needs to generate more than $15 in revenue. If spammers can be shut down quickly, before turning up enough leads, their business model fails.
But 7726 reports are too spotty and slow to serve as an effective, broad-based deterrent.
Only 1 in approximately 1700 spam messages is reported. Most customers do not know about 7726; many learn about it hours or days later when they call customer care to complain, and of these customers, only a fraction take the time to report.
Customer reports can be erroneous. When prompted by the 7726 service to enter the spam sender’s number, customers sometimes type the wrong number, maliciously enter a non-spam number, or simply do not respond.
Another shortcoming is that 7726 identifies only the reported number. Some spammers have well over 100 numbers in use simultaneously; shutting down a single number does little to dent the volume of spam texts on a spammer’s other (unreported) numbers.
Validating 7726 spam reports with network information
Because of errors associated with 7726, AT&T doesn’t act on a single 7726 report, but requires additional verification that the reported spam is actually spam. Originally, AT&T waited for multiple customer reports of the same number, but the extra wait for other 7726 reports allowed spammers to continue spamming.
Needing a faster way to validate 7726 reports, AT&T researchers began looking at network information, with its obvious advantages of speed and accuracy. Network information is created in real time as spam texts are generated and then transmitted through the network. Network information is also broad and deep; each text that passes over the network leaves multiple clues about its existence—from the cell tower that originally transmitted it to each of the routers along the route taken by a text.
Benchmarked against the old multiple-report method, network-identified spam cases proved faster in almost 95% of the cases.
Because spammers send thousands of texts, it stands to reason that many AT&T customers are receiving the exact same spam identified by a 7726 report. In the network, there is a way to check for this since every spam text (and every phone call) generates a billing record that notes the originating and receiving phone numbers, the time and date, and other information.
If a significant number of other AT&T customers received a text from the same 7726-reported number within roughly the same time frame, researchers could reasonably infer that these texts came from the same source. If so, these recipients identified through network information could serve to validate a single 7726 report.
As a test, researchers initially examined 1000 reported spam sources over three days, comparing how well the network-based spam detection method—numerous distinct network-identified recipients of a 7726-reported number—worked as a validation check vs the previous method of using multiple 7726 customer reports. No false positives were identified, and subsequent analysis with a sample size of over 30,000 reported sources also identified no false positives. Additionally, the network-based method was immune to malicious and erroneous reporting.
And it was much faster. Benchmarked against the old multiple-report method, using network-based spam cases proved faster in almost 95% of the cases. In 60% of the cases, it was one hour faster, and in 50% of the cases, two hours faster.
Identifying spam sources using geographic information
Network information allows for much faster identification of the original source number that sent out the reported spam. But it also provides the possibility to track the other numbers in use by a spammer, even if these numbers have not yet been reported by a customer.
Included in network information is the cell tower that originally transmitted the reported spam, an important piece of information since spammers often send all their spam texts from the same location. Provided the cell tower is an AT&T tower, researchers may be able to geographically correlate customer-reported spam with other large volumes of texts transmitted in the same time frame, from the same location, and at roughly the same volume, and thus identify the other numbers in use by a spammer, not just the reported number.
Researchers are now validating this method.
Future work: identifying spam without a customer report
Spam detection is much faster using network information, but the entire process of tracking spam still hinges on receiving the initial customer report. The obvious next step is to see whether spam can be identified without the ground truth that 7726 reports provide.
Researching new anti-spam efforts will be, by necessity, an ongoing effort.
Identifying spam solely by volume, time, and location is not reliable. Many legitimate customers also send out high volumes of texts all at once from the same cell tower—a pattern that by itself looks a lot like spamming.
To differentiate spam from legitimate texts, researchers are looking hard at other and more subtle spamming patterns. Are spammers more likely to send texts at specific hours or from specific locations? What devices do they use? Can their phone numbers be distinguished from phone numbers of legitimate users?
Another promising avenue might be to look at customers who receive spam. If some are more likely to receive spam than others—and there are initial indications some are—researchers may be able to create honey-pot accounts with the characteristics of spam-attracting accounts, luring spam text out into the open and creating a sort of gray zone that legitimate users don’t usually enter.
By passively monitoring such a space and seeing who travels it, researchers may be able to detect spam as it occurs. When combined with other validation checks, honey-pot accounts may help shut down spammers within minutes of their first spam.
By luring spam texts to fake accounts, researchers may be able to detect spam as it occurs.
Researching new anti-spam efforts will be, by necessity, an ongoing effort. Spammers themselves are evolving their methods to react to improved defenses, finding ways to get around filters and per-day message limits, and learning also to conceal their phone numbers. Spammers in countries with tight daily traffic limits now use malware to turn phones into spam-sending bots, a pattern that is expected to migrate to North America as North America tightens its defenses against prepaid message spam.
But as spammers change their patterns, they are creating new ones. Discovering these new patterns will required researchers to re-jigger their algorithms, utilize new pieces of information, and learn how to correlate information in new ways. The depth and breadth of information available in the network provides researchers with the best chance to do so.
Facts about spam-texting
Estimated spam texts in 2011: 2.5 billion
Spam traffic can contribute more than 20 times the volume of normal texting traffic at some cell towers.
Text spam is illegal (Telephone Consumer Protection Act, 1991, amended 2005).
How to report spam texts
If your carrier supports the free 7726 service (AT&T and many major carriers do):
1. Note the spammer’s short code or 10-digit phone.
2. Forward the spam text to short code 7726 (spells “SPAM”).
3. Upon receiving a 7726 confirmation text, reply by inserting the spammer’s number or short code. This returns another confirmation.
Your privacy is maintained; all complaints are forwarded to a messaging security company under contract with the carriers, where the spam is verified and the respective telecoms notified.
What NOT to do
Do not click on embedded links, even those labeled “stop receiving spam” or something similar. Responding in any way verifies that you have a working (and re-sellable) phone number.
About the researchers
Yu Jin is a Senior Member of Technical Staff in the Analysis & Optimization department at AT&T Labs. Yu Jin received his PhD degree from the Department of Computer Science & Engineering at the University of Minnesota, Twin Cities in 2010. Yu Jin’s research work is data-oriented, involving analysis of massive network data for a wide range of applications, such as network profiling, measurement and anomaly detection. Yu Jin’s work has been published in many journals and conferences, including MobiSys, SIGMETRICS, CoNEXT, TKDD, ICNP and INFOCOM, etc.
Ann Skudlark is a Director in the Analysis & Optimization department at AT&T Labs. During her tenure at AT&T Ann held positions in strategy and operations before joining AT&T Labs in the Consumer Lab in 1992. Ann’s interest is on systems and algorithms for risk management with a key focus on fraud detection – finding malicious needles in calling behavior haystacks.
Ann has published in the Journal of Information Technology, and in numerous conference proceedings including Mobisys, International Telecommunications Society, National Conference of Decision Science Institute, and INFORMS.
Ann is on the STEM Advisory Board for the NJ School Age Coalition.
Colin Goodall is Lead Member of Technical Staff and 13-year veteran in the Analysis & Optimization organization at AT&T Labs. Dr Goodall came to AT&T following faculty appointments at Princeton with John Tukey, at Columbia, and at Penn State Universities, where he worked in statistical computing and visualization, spatial and shape statistics. During this period, Colin helped to launch a technology company in the area of healthcare-outcomes that went to IPO.
Colin Goodall's areas of focus at AT&T include extensive and long-time use of call detail records for fraud and spam detection and usage analysis, and building re-usable toolkits for data analysis. He has been awarded several patents.