Introduction to dialr

dialr is an R interface to Google’s libphonenumber java library.

libphonenumber defines the PhoneNumberUtil class, a set of functions for extracting information from and performing processing on a parsed Phonenumber object. A phone number must be parsed before any other operations (e.g. checking phone number validity, formatting) can be performed.

dialr provides an interface to these functions to easily parse and process phone numbers in R.

Parsing phone numbers

A phone class vector stores a parsed java Phonenumber object for further processing alongside the original raw text phone number and default region. This “default region” is required to determine the processing context for non-international numbers.

To create a phone vector, use the phone() function. This takes a character vector of phone numbers to parse and a default region for phone numbers not stored in an international format (i.e. with a leading “+”).

library(dialr)

# Parse phone number vector
x <- c(0, 0123, "0404 753 123", "61410123817", "+12015550123")
x <- phone(x, "AU")

is.phone(x)
#> [1] TRUE
print(x)
#> # Parsed phone numbers: 5 total, 4 successfully parsed
#> [1] 0            123          0404 753 123 61410123817  +12015550123

Basic phone functions

is_parsed(x)    # Was the phone number successfully parsed?
#> [1] FALSE  TRUE  TRUE  TRUE  TRUE
is_valid(x)     # Is the phone number valid?
#> [1] FALSE FALSE  TRUE  TRUE  TRUE
is_possible(x)  # Is the phone number possible?
#> [1] FALSE FALSE  TRUE  TRUE  TRUE
get_region(x)   # What region (ISO country code) is the phone number from?
#> [1] NA   NA   "AU" "AU" "US"
get_type(x)     # Is the phone number a fixed line, mobile etc.
#> [1] NA                     "UNKNOWN"              "MOBILE"              
#> [4] "MOBILE"               "FIXED_LINE_OR_MOBILE"

Comparing phone numbers

Equality comparisons for phone numbers ignore formatting differences and compare the underlying phone number.

phone("0404 753 123", "AU") == phone("+61404753123", "US")
#> [1] TRUE
phone("0404 753 123", "AU") == phone("0404 753 123", "US")
#> [1] FALSE
phone("0404 753 123", "AU") != phone("0404 753 123", "US")
#> [1] TRUE

Parsed phone numbers can also be compared to character phone numbers stored in an international format.

phone("0404 753 123", "AU") == c("+61404753123", "0404 753 123")
#> [1]  TRUE FALSE

Use is_match() for more customisable comparisons.

is_match(phone("0404 753 123", "AU"), c("+61404753123", "0404753123", "1234"))
#> [1]  TRUE FALSE FALSE
is_match(phone("0404 753 123", "AU"), c("+61404753123", "0404753123", "1234"), detailed = TRUE)
#> [1] "EXACT_MATCH" "NSN_MATCH"   "NO_MATCH"
is_match(phone("0404 753 123", "AU"), c("+61404753123", "0404753123", "1234"), strict = FALSE)
#> [1]  TRUE  TRUE FALSE

Formatting phone numbers

The phone class has a format() method implementing libphonenumber’s core formatting functionality.

There are four phone number formats used by libphonenumber (see “Further reading” for details): "E164", "NATIONAL", "INTERNATIONAL" and"RFC3966". These can be specified by the format argument, or a default can be specifed in option dialr.format.

If clean = TRUE, all non-numeric characters are removed except for a leading +. clean = TRUE by default.

x <- phone(c(0, 0123, "0404 753 123", "61410123817", "+12015550123"), "AU")

format(x, format = "RFC3966")
#> [1] NA             "+61123"       "+61404753123" "+61410123817" "+12015550123"
format(x, format = "RFC3966", clean = FALSE)
#> [1] NA                    "tel:+61-123"         "tel:+61-404-753-123"
#> [4] "tel:+61-410-123-817" "tel:+1-201-555-0123"

format(x, format = "E164", clean = FALSE)
#> [1] NA             "+61123"       "+61404753123" "+61410123817" "+12015550123"
format(x, format = "NATIONAL", clean = FALSE)
#> [1] NA               "123"            "0404 753 123"   "0410 123 817"  
#> [5] "(201) 555-0123"
format(x, format = "INTERNATIONAL", clean = FALSE)
#> [1] NA                "+61 123"         "+61 404 753 123" "+61 410 123 817"
#> [5] "+1 201-555-0123"
format(x, format = "RFC3966", clean = FALSE)
#> [1] NA                    "tel:+61-123"         "tel:+61-404-753-123"
#> [4] "tel:+61-410-123-817" "tel:+1-201-555-0123"

# Change the default
getOption("dialr.format")
#> [1] "E164"
format(x)
#> [1] NA             "+61123"       "+61404753123" "+61410123817" "+12015550123"
options(dialr.format = "NATIONAL")
format(x)
#> [1] NA           "123"        "0404753123" "0410123817" "2015550123"
options(dialr.format = "E164")

If the home argument is supplied, the phone number is formatted for dialling from the specified country.

format(x, home = "AU")
#> [1] NA                "123"             "0404753123"      "0410123817"     
#> [5] "001112015550123"
format(x, home = "US")
#> [1] NA               "01161123"       "01161404753123" "01161410123817"
#> [5] "12015550123"
format(x, home = "JP")
#> [1] NA               "01061123"       "01061404753123" "01061410123817"
#> [5] "01012015550123"

If strict = TRUE, invalid phone numbers (determined using is_valid()) return NA.

format(x)
#> [1] NA             "+61123"       "+61404753123" "+61410123817" "+12015550123"
format(x, strict = TRUE)
#> [1] NA             NA             "+61404753123" "+61410123817" "+12015550123"

By default, as.character() returns the raw text phone number. Use raw = FALSE to use the format() method instead.

as.character(x)
#> [1] "0"            "123"          "0404 753 123" "61410123817"  "+12015550123"
as.character(x, raw = FALSE)
#> [1] NA             "+61123"       "+61404753123" "+61410123817" "+12015550123"

Use with dplyr

dialr functions are designed to work well in dplyr workflows.

# Use with dplyr
library(dplyr)

y <- tibble(id = 1:4,
            phone1 = c(0, 0123, "0404 753 123", "61410123817"),
            phone2 = c("03 9388 1234", 1234, "+12015550123", 0),
            country = c("AU", "AU", "AU", "AU"))

y %>%
  mutate_at(vars(matches("^phone")), ~phone(., country)) %>%
  mutate_at(vars(matches("^phone")),
            list(valid = is_valid,
                 region = get_region,
                 type = get_type,
                 clean = format))
#> # A tibble: 4 × 12
#>      id       phone1       phone2 country phone1_valid phone2_…¹ phone…² phone…³
#>   <int>      <phone>      <phone> <chr>   <lgl>        <lgl>     <chr>   <chr>  
#> 1     1           NA +61393881234 AU      FALSE        TRUE      <NA>    AU     
#> 2     2       +61123      +611234 AU      FALSE        FALSE     <NA>    <NA>   
#> 3     3 +61404753123 +12015550123 AU      TRUE         TRUE      AU      US     
#> 4     4 +61410123817           NA AU      TRUE         FALSE     AU      <NA>   
#> # … with 4 more variables: phone1_type <chr>, phone2_type <chr>,
#> #   phone1_clean <chr>, phone2_clean <chr>, and abbreviated variable names
#> #   ¹​phone2_valid, ²​phone1_region, ³​phone2_region

Further reading

libphonenumber

GitHub

Frequently Asked Questions

Falsehoods Programmers Believe About Phone Numbers

javadocs

Phone number format standards

"E164": general format for international telephone numbers from ITU-T Recommendation E.164

"NATIONAL": national notation from ITU-T Recommendation E.123

"INTERNATIONAL": international notation from ITU-T Recommendation E.123

"RFC3966": “tel” URI syntax from the IETF tel URI for Telephone Numbers

ISO country codes

ISO 3166

Wikipedia