hockeyR

Getting started

library(hockeyR)

load_pbp()

As mentioned on the home page, the main function of the hockeyR package is to load raw NHL play-by-play data without having to scrape it and clean it yourself. The load_pbp() function will do that for you. The season argument in load_pbp() is very accepting. You may use any of the following syntax when loading play-by-play data for the 2020-21 NHL season:

To load more than one season, wrap your desired years in c(). That is, to get data for the last two years, one could enter load_pbp(c(2020,2021)).

get_game_ids()

If you want to load play-by-play data for a game that isn’t in the data repository, or perhaps you just want a single game and don’t need to load a full season, you’ll first need to find the numeric game ID. The get_game_ids() function can find it for you as long as you supply it with the date of the game in YYY-MM-DD format. The function defaults to the current date as defined by your operating system.

# get single day ids
get_game_ids(day = "2017-10-17")
#> # A tibble: 11 x 9
#>       game_id season_full date   game_~1 home_~2 away_~3 home_~4 away_~5 game_~6
#>         <int> <chr>       <chr>  <chr>   <chr>   <chr>     <int>   <int> <chr>  
#>  1 2017020082 20172018    2017-~ 07:00 ~ New Yo~ Pittsb~       4       5 REG    
#>  2 2017020083 20172018    2017-~ 07:00 ~ Philad~ Florid~       5       1 REG    
#>  3 2017020084 20172018    2017-~ 07:00 ~ Washin~ Toront~       0       2 REG    
#>  4 2017020081 20172018    2017-~ 07:30 ~ New Je~ Tampa ~       5       4 REG    
#>  5 2017020085 20172018    2017-~ 07:30 ~ Ottawa~ Vancou~       0       3 REG    
#>  6 2017020086 20172018    2017-~ 08:00 ~ Nashvi~ Colora~       4       1 REG    
#>  7 2017020087 20172018    2017-~ 08:00 ~ Winnip~ Columb~       2       5 REG    
#>  8 2017020088 20172018    2017-~ 08:30 ~ Dallas~ Arizon~       3       1 REG    
#>  9 2017020089 20172018    2017-~ 09:00 ~ Edmont~ Caroli~       3       5 REG    
#> 10 2017020090 20172018    2017-~ 10:00 ~ Vegas ~ Buffal~       5       4 REG    
#> 11 2017020091 20172018    2017-~ 10:30 ~ San Jo~ Montré~       5       2 REG    
#> # ... with abbreviated variable names 1: game_time, 2: home_name, 3: away_name,
#> #   4: home_final_score, 5: away_final_score, 6: game_type

You can instead supply a season to get_game_ids() to grab a full year’s worth of IDs as well as final scores, home and road teams, and game dates for each game in the given season.

scrape_game()

This function scrapes a single game with a supplied game ID, which can be retrieved with get_game_ids(). Live game scraping has yet to undergo testing.

scrape_game(game_id = 2020030175)
#> # A tibble: 718 x 108
#>         xg event_id event~1 event secon~2 event~3 event~4 descr~5 period perio~6
#>      <dbl>    <dbl> <chr>   <chr> <chr>   <chr>   <chr>   <chr>    <int>   <dbl>
#>  1 NA       2.02e13 GAME_S~ Game~ <NA>    <NA>    <NA>    Game S~      1       0
#>  2 NA       2.02e13 CHANGE  Chan~ <NA>    Montré~ away    ON: Sh~      1       0
#>  3 NA       2.02e13 CHANGE  Chan~ Line c~ Toront~ home    ON: Wa~      1       0
#>  4 NA       2.02e13 FACEOFF Face~ <NA>    Toront~ home    Auston~      1       0
#>  5 NA       2.02e13 HIT     Hit   <NA>    Toront~ home    Zach H~      1      13
#>  6 NA       2.02e13 CHANGE  Chan~ On the~ Montré~ away    ON: Je~      1      24
#>  7 NA       2.02e13 CHANGE  Chan~ On the~ Toront~ home    ON: Al~      1      27
#>  8 NA       2.02e13 CHANGE  Chan~ On the~ Montré~ away    ON: Co~      1      29
#>  9  0.0921  2.02e13 SHOT    Shot  Wrist ~ Toront~ home    Alex G~      1      32
#> 10 NA       2.02e13 CHANGE  Chan~ On the~ Toront~ home    ON: Ja~      1      32
#> # ... with 708 more rows, 98 more variables: period_seconds_remaining <dbl>,
#> #   game_seconds <dbl>, game_seconds_remaining <dbl>, home_score <dbl>,
#> #   away_score <dbl>, event_player_1_name <chr>, event_player_1_type <chr>,
#> #   event_player_2_name <chr>, event_player_2_type <chr>,
#> #   event_player_3_name <chr>, event_player_3_type <chr>,
#> #   event_goalie_name <chr>, strength_state <glue>, strength_code <chr>,
#> #   strength <chr>, game_winning_goal <lgl>, empty_net <lgl>, ...

scrape_day()

This is the backbone function that keeps the hockeyR-data repository up to date during the season. Supply a date (YYY-MM-DD) and it will scrape play-by-play data for all games on that day. Live game scraping is still awaiting testing.

scrape_day("2015-01-06")
#> # A tibble: 6,472 x 109
#>       xg event_id event_t~1 event secon~2 event~3 event~4 descr~5 period perio~6
#>    <dbl>    <dbl> <chr>     <chr> <chr>   <chr>   <chr>   <chr>    <int>   <dbl>
#>  1    NA  2.01e13 GAME_SCH~ Game~ <NA>    <NA>    <NA>    Game S~      1       0
#>  2    NA  2.01e13 CHANGE    Chan~ <NA>    Buffal~ away    ON: Jo~      1       0
#>  3    NA  2.01e13 CHANGE    Chan~ Line c~ New Je~ home    ON: Pa~      1       0
#>  4    NA  2.01e13 FACEOFF   Face~ <NA>    Buffal~ away    Zemgus~      1       0
#>  5    NA  2.01e13 BLOCKED_~ Bloc~ <NA>    Buffal~ away    Andy G~      1      10
#>  6    NA  2.01e13 CHANGE    Chan~ On the~ Buffal~ away    ON: Ch~      1      36
#>  7    NA  2.01e13 GIVEAWAY  Give~ <NA>    New Je~ home    Giveaw~      1      38
#>  8    NA  2.01e13 TAKEAWAY  Take~ <NA>    New Je~ home    Takeaw~      1      41
#>  9    NA  2.01e13 CHANGE    Chan~ On the~ New Je~ home    ON: Ma~      1      41
#> 10    NA  2.01e13 CHANGE    Chan~ On the~ New Je~ home    ON: Ja~      1      48
#> # ... with 6,462 more rows, 99 more variables: period_seconds_remaining <dbl>,
#> #   game_seconds <dbl>, game_seconds_remaining <dbl>, home_score <dbl>,
#> #   away_score <dbl>, event_player_1_name <chr>, event_player_1_type <chr>,
#> #   event_player_2_name <chr>, event_player_2_type <chr>,
#> #   event_player_3_name <chr>, event_player_3_type <chr>,
#> #   event_goalie_name <chr>, strength_state <glue>, strength_code <chr>,
#> #   strength <chr>, game_winning_goal <lgl>, empty_net <lgl>, ...

If you can wait until the day after a game, the load_pbp() function is the only one you’ll need. If you’d like to scrape the data yourself immediately following a game, the other functions discussed here will do the job for you.