June 6, 2018

Airportr: a lightweight package for airport data

As someone whose day job is in the travel and tourism industry, I have to work with airport codes, names, and locations all the time.

airportr is a lightweight package to help deal with a few common airport related tasks. This package bundles open license airport data from OurFlights with several utility functions and does not require any API calls or dependencies beyond dplyr.

airportr is easy to install from Github (or soon CRAN).

#install.packages("devtools")
devtools::install_github("dshkol/airportr")
library(airportr)

Simple lookup functions

There are four simple lookup functions that work by taking some kind of input such as an airport name, an airport IATA/IACO code, or city name and returns structured and consistent data. This can be as simple as finding out what airport YYJ is:

airport_lookup("YYJ")
## [1] "Victoria International Airport"

Or the geographic coordinates of Lester B. Pearson Airport in Toronto:

airport_location("Lester B. Pearson International Airport")
## # A tibble: 1 x 2
##   Latitude Longitude
##      <dbl>     <dbl>
## 1     43.7     -79.6

Or the full available detailed data for CYEG:

dplyr::glimpse(airport_detail("CYEG"))
## Observations: 1
## Variables: 14
## $ `OpenFlights ID` <int> 49
## $ Name             <chr> "Edmonton International Airport"
## $ City             <chr> "Edmonton"
## $ Country          <chr> "Canada"
## $ IATA             <chr> "YEG"
## $ ICAO             <chr> "CYEG"
## $ Latitude         <dbl> 53.3097
## $ Longitude        <dbl> -113.58
## $ Altitude         <int> 2373
## $ UTC              <dbl> -7
## $ DST              <chr> "A"
## $ Timezone         <chr> "America/Edmonton"
## $ Type             <chr> "airport"
## $ Source           <chr> "OurAirports"

The lookup functions are designed to be robust to any of the three standard inputs, whether it is an IATA code, an IACO code, or the full name of an airport, though specific input and output types can be added as function parameters. IATA and IACO codes are more robust and easier to use as names need to match exactly and there may be similar named airports in multiple countries. IACO codes in particular are more complete than IATA codes which do not include all smaller and domestic airports. Lookups by airport name are designed to return potential similarly named matches if there is no exact match, alongside a warning.

airport_lookup("Halifax", output_type = "IATA")
## Warning in airport_lookup("Halifax", output_type = "IATA"): No exact
## matches but some similar names in the database include:
## Halifax / CFB Shearwater Heliport
## Halifax / Stanfield International Airport
## Haifa International Airport
## Wadi Halfa Airport
airport_lookup("Halifax / Stanfield International Airport", output_type = "IATA")
## [1] "YHZ"

City lookups

Cities will often have multiple airports serving them. This is especially common for larger cities. Typically when working with airport origin/destination data, an analyst might need to identify what cities those airports actually serve. The city_airports() function helps with this.

city_airports("Chicago")
## # A tibble: 4 x 14
##   `OpenFlights ID` Name      City   Country IATA  ICAO  Latitude Longitude
##              <int> <chr>     <chr>  <chr>   <chr> <chr>    <dbl>     <dbl>
## 1             3747 Chicago … Chica… United… MDW   KMDW      41.8     -87.8
## 2             3818 Waukegan… Chica… United… UGN   KUGN      42.4     -87.9
## 3             3830 Chicago … Chica… United… ORD   KORD      42.0     -87.9
## 4             8593 Chicago … Chica… United… CGX   KCGX      41.9     -87.6
## # ... with 6 more variables: Altitude <int>, UTC <dbl>, DST <chr>,
## #   Timezone <chr>, Type <chr>, Source <chr>

Nearest airport lookups

Sometimes a city lookup is insufficient. Baltimore International Airport (BWI) serves Baltimore, but is typically grouped with other DC-area airports like DCA and IAD as a set of airports serving a particular metro area. We can lookup airports that fall within a specified distance of one another using the airports_near() function which takes an airport name or code as an argument alongside a specified distance radius in kilometres.

For example, to find all airports within 50KM of BWI:

airports_near("BWI", distance = 50)
## # A tibble: 8 x 14
##   `OpenFlights ID` Name       City  Country IATA  ICAO  Latitude Longitude
##              <int> <chr>      <chr> <chr>   <chr> <chr>    <dbl>     <dbl>
## 1             3489 Tipton Ai… Fort… United… FME   KFME      39.1     -76.8
## 2             3520 Ronald Re… Wash… United… DCA   KDCA      38.9     -77.0
## 3             3552 Andrews A… Camp… United… ADW   KADW      38.8     -76.9
## 4             3772 Phillips … Aber… United… APG   KAPG      39.5     -76.2
## 5             3849 Baltimore… Balt… United… BWI   KBWI      39.2     -76.7
## 6             8143 Montgomer… Gait… United… GAI   KGAI      39.2     -77.2
## 7             8935 Lee Airpo… Anna… United… ANP   KANP      38.9     -76.6
## 8             9183 Martin St… Balt… United… MTN   KMTN      39.3     -76.4
## # ... with 6 more variables: Altitude <int>, UTC <dbl>, DST <chr>,
## #   Timezone <chr>, Type <chr>, Source <chr>

And sometimes all you have is a pair of coordinates. The airports_around() function takes a pair of lat/lon coordinates in decimal degrees as arguments and returns all airports that fall within a given radius.

airports_around(49, -123, distance = 50)
## # A tibble: 12 x 14
##    `OpenFlights ID` Name     City   Country IATA  ICAO  Latitude Longitude
##               <int> <chr>    <chr>  <chr>   <chr> <chr>    <dbl>     <dbl>
##  1              104 Pitt Me… Pitt … Canada  "\\N" CYPK      49.2     -123.
##  2              156 Vancouv… Vanco… Canada  YVR   CYVR      49.2     -123.
##  3              175 Abbotsf… Abbot… Canada  YXX   CYXX      49.0     -122.
##  4              184 Victori… Victo… Canada  YYJ   CYYJ      48.6     -123.
##  5             3777 Belling… Belli… United… BLI   KBLI      48.8     -123.
##  6             5500 Vancouv… Vanco… Canada  CXH   CYHC      49.3     -123.
##  7             7083 Orcas I… Easts… United… ESD   KORS      48.7     -123.
##  8             7269 Ganges … Ganges Canada  YGG   CAX6      48.9     -123.
##  9             7273 Boundar… Bound… Canada  YDT   CZBB      49.1     -123.
## 10             7274 Langley… Langl… Canada  "\\N" CYNJ      49.1     -123.
## 11             8224 Victori… Patri… Canada  "\\N" CAP5      48.7     -123.
## 12             9749 Bedwell… Bedwe… Canada  YBW   CAB3      48.8     -123.
## # ... with 6 more variables: Altitude <int>, UTC <dbl>, DST <chr>,
## #   Timezone <chr>, Type <chr>, Source <chr>

Airport distances

When working with origin/destination data sometimes you need to calculate the distance between to airports. airport_distance() calculates the distance between any two pairs of three-letter IATA codes. Distances are calculated using the Haversine Formula:

\[ d = 2r\arcsin\Big(\sin^2\frac{(\varphi_2-\varphi_1)}{2}+\cos(\varphi_1)\cos(\varphi_2)\sin^2\frac{\lambda_2-\lambda_1}{2}\Big) \] Where \(r\) is the earth’s radius, \(\varphi_1\) and \(\varphi_2\) are the latitudes of the two airports in radians, \(\lambda_1\) and \(\lambda_2\) are the longitude in radians, and \(d\) is the great circle distance between the two points.

The Haversine method is relatively accurate over most distances but it does not take into account for the earth’s ellipsoidal nature and can result in errors of approximately 0.3% of distance. Other methods such as the Vincenty Ellipsoid method are more accurate and are implemented in the much more robust and comprehensive geosphere package.

Data

Airport data is from the OpenFlights Airport Database made available under the Open Database License.

Disclaimer on the data from OpenFlights:

This data is not suitable for navigation. OpenFlights does not assume any responsibility whatsoever for its accuracy, and consequently assumes no liability whatsoever for results obtained or loss or damage incurred as a result of application of the data. OpenFlights expressly disclaims all warranties, expressed or implied, including but not limited to implied warranties of merchantability and fitness for any particular purpose.

Wrapping up

This was a fun little project to take on to comprehensively address a few different common tasks I face at work. I hope that this lightweight package can be useful to others who work with similar data, and I encourage anyone with suggestions for how this can made to be more useful still opens up an issue or PR on Github or sends me an email.

© Dmitry Shkolnik 2020

Powered by Hugo & adapted from Kiss.