Identifying out-of-service phone numbers

by: , Thu Jul 28 12:51:00 EDT 2011

How do you know if a given phone number actually works? You could call it, but what if you have 36 million numbers to check each month, as YELLOWPAGES.COM does?
It’s a technical challenge, requiring the ability to query massive amounts of data while also understanding how systems and networks interact deep within the network. AT&T Research is well positioned to do both.

YELLOWPAGES.COM provides listing information for close to 36 million businesses. But how is it possible to validate the information is correct as supplied, particularly whether the 36 million listed phone numbers are actually working? YELLOWPAGES has turned over the job to AT&T Research. [Is there YPC part of AT&T now?  [did YPC do it before?]
Why AT&T Research? Because validating 36 million phone numbers every month is an immense, technically difficult task. It requires both the ability to handle massive amounts of data while also understanding at a deep level how networks work, because sometimes the only way of knowing for sure to make observe what goes on inside the network.
The job of identifying out-of-service numbers starts each month with AT&T Research retrieving the 36 million or so phone numbers listed by YELLOWPAGES.COM. Since the process of individually checking numbers is time-consuming, the first steps focus on removing numbers from the pool of number to be checked.
1. Remove non-telephone numbers.
The first step is to filter out numbers that are obviously not phone numbers. AT&T runs a script with several checks. [is there a technical difficulty here? How long is the process? Is there actually a script? Do we talk about models?].
This step, which is run on  all 36 million numbers takes ?? hours/days?

2. Remove numbers with recent activity.
If a number shows activity within the past month, it’s also removed from the pool of numbers to be checked.
To know which YPC’s numbers are active, researchers search for them in a database of 290 million active numbers, which includes both AT&T and non-AT&T numbers.
Maintaining the database is an enormous undertaking. It requires not only the ability to store close to 300 million records (the current total of number) but the ability to write queries that ?? [need help here. What makes querying difficult?]. For this, AT&T Research uses its Daytona database.
The universe of active phone numbers is continuously being updated. New phone numbers get added, and out-of-service numbers deleted. AT&T Research estimates that every day .2% of numbers become inactive, and .3% numbers become active for the first time. [verify this with Chris Volinsky] [who to talk to about updating this database to find new numbers?]
[What percentage of numbers are AT&T customers?]
AT&T knows of course its own numbers. For non-AT&T numbers, researchers infer their existence from their interactions with AT&T customers (capturing their numbers within detail records; see sidebar). [is it possible for some active AT&T numbers not to appear in the database?]
Not all active numbers will appear in the universe of active numbers (it’s possible that some people have no interaction with AT&T customers during the course of a month). [is this the only reason a non-AT&T number won’t show up in the universe?]
Anything to say about the technical difficulties?]


 As a result of this process, which takes about?? Hours/days, researchers remove numbers found to be active.
4. Looking to the network for evidence of activity
All numbers found in the active database are removed from the pool of numbers needing to be checked.
At this point, researchers have eliminated about 2/3 of the original 36 million, leaving 1/3 of the original numbers whose status is not yet determined. [Are these all non-AT&T numbers?]
The next step is to actively investigate each remaining number by looking in the network for evidence the number is out of service. Specifically AT&T Research looks at the interaction of the switch and the phone number, and to pick up the tone used by the phone and switch to communicate status.
Every time a call is made, it’s forwarded to a switch, and the switch responds with a tone/code. (For instance, the dial tone you hear when you pick up indicates the line is live and that a connection is open.)  By pinging the number, researchers look to detect a disconnect tone (or the code for one).
[Bill, how far can I go here? Can I say “researchers essentially call the number but disconnect before the call is connected”?]
(In actuality, some codes cannot be interpreted with certainty. In these ambiguous cases, researchers will not assume a number to be disconnected.)
 [This is made up. Need to know more about the tones, what they are used for, what issues the tone? The phone or switch? Where does the code come from?]
Before starting the time-consuming pinging step (which can take up to two weeks), researchers first check whether a number has been pinged within the past six months. If it has and a disconnect tone was returned, the number does not need to be re-pinged. [is this because out-of-service number cannot be re-circulated for six months?]

On the numbers that remain, the pining process is carried out in stages so as not to add undue traffic to the network when congestion is especially heavy.
4. Forwarding out-of-service numbers to YELLOWPAGES.COM.
At the end of the procedure, researchers send YPC a list of all out-of-service numbers along with the numbers that failed the validation in step 1.
Estimated accuracy rate: ?????