This is my first blog on data science topics. In this blog post, I will show you how to import the data from fantasy premier league into R and perform exploratory data analysis. ————-

The questions that came up in my mind before doing this analysis are:

  1. How has Wayne Rooney been performing for the last several years ?
  2. How do Romelu Lukaku, Sergio Aguero, Harry Kane line up side by side ?

The data sources that we will be using on this tutorial can be found on the following links:

  1. https://fantasy.premierleague.com/drf/elements/
  2. https://fantasy.premierleague.com/drf/element-summary/{player_id}

The second link above requires the user to know the player_id in order to access the details about the particular player.

The packages we will need for this analysis are

library(ggplot2)
library(dplyr)
library(tidyr)
library(rjson)
library(knitr)
library(plotly)

The data available for the link mentioned above is in json format. So, we will use rjson package in order to read and summarize in R.

Reading data using rjson package

json_file <- "https://fantasy.premierleague.com/drf/elements/"
json_data <- fromJSON(paste(readLines(json_file), collapse = ""))
# View length of json_data
length(json_data)
## [1] 580

Now, we have read the data into R and json_data variable contains list of 566 players.

Change the data to data frame

players <- data.frame(do.call(rbind, lapply(json_data, rbind)))
# View head
head(players[,c(1:10)])
##   id      photo    web_name team_code status   code      first_name
## 1  1  48844.jpg      Ospina         3      a  48844           David
## 2  2  11334.jpg        Cech         3      a  11334            Petr
## 3  3  98980.jpg    Martinez         3      u  98980 Damian Emiliano
## 4  4  51507.jpg   Koscielny         3      a  51507         Laurent
## 5  5  17127.jpg Mertesacker         3      a  17127             Per
## 6  6 158074.jpg     Gabriel         3      u 158074 Gabriel Armando
##   second_name squad_number                       news
## 1      Ospina           13                           
## 2        Cech           33                           
## 3    Martinez           26 Season-long loan to Getafe
## 4   Koscielny            6                           
## 5 Mertesacker            4                           
## 6    de Abreu            5    Joined Valencia on 18/8
# Sapply class
sapply(players[,c(1:10)], class)
##           id        photo     web_name    team_code       status 
##       "list"       "list"       "list"       "list"       "list" 
##         code   first_name  second_name squad_number         news 
##       "list"       "list"       "list"       "list"       "list"
# Reshape the data frame with only selected columns
players <- players %>% dplyr::select(id, first_name, second_name, total_points, yellow_cards, 
                                     red_cards, goals_scored, assists)
head(players) %>% knitr::kable()
id first_name second_name total_points yellow_cards red_cards goals_scored assists
1 David Ospina 0 0 0 0 0
2 Petr Cech 75 0 0 0 0
3 Damian Emiliano Martinez 0 0 0 0 0
4 Laurent Koscielny 65 3 0 0 0
5 Per Mertesacker 14 0 0 1 0
6 Gabriel Armando de Abreu 0 0 0 0 0

Finding the players whose stats we are interested in

We need to be very careful with the letter type when searching the players. For example, if you search Alexis Sanchez, then the program won’t find any player with that name.

interested_players <- c("Alexis Sánchez", "Romelu Lukaku", "Harry Kane", "Wayne Rooney")

players %>% 
  dplyr::mutate(full_name = paste(first_name, second_name)) %>%
  dplyr::filter(full_name %in% interested_players) %>% knitr::kable()
id first_name second_name total_points yellow_cards red_cards goals_scored assists full_name
14 Alexis Sánchez 68 4 0 4 4 Alexis Sánchez
161 Wayne Rooney 91 3 0 10 3 Wayne Rooney
285 Romelu Lukaku 94 2 0 10 4 Romelu Lukaku
394 Harry Kane 97 4 0 12 1 Harry Kane

Accessing Stats of individual players

Now that we have found the id of interested players, let’s go ahead and pull data for these players.

I have created a function in order to pull information for the players

# A function to calculate the statistics of each player 
player_stats <- function(player_id, player_name){
  #browser()
  json_data <- fromJSON(paste(readLines(paste0("https://fantasy.premierleague.com/drf/element-summary/", player_id)), collapse = ""))
  # If we look at the length of json_data, it will be 6 
  # We will be looking at only the first list : past_history
  # Convert json_data to data.frame
  player <- data.frame(do.call(rbind, lapply(json_data[1]$history_past, rbind)))
  # Change each column from list to vector
  player <- tidyr::unnest(player, )
  player$name <- player_name
  return(player)
}

In the above function, we are just looking at the first list that contains the past history of each player. The following information is available to us if we are interested:

  • The information on lists are summarized below:
  • List 1 : history_past
  • List 2 : fixtures_summary
  • List 3 : explain
  • List 4 : history_summary
  • List 5 : fixtures
  • List 6 : history

Viewing Wayne Rooney’s Stats

rooney <- player_stats(player_id = 161, player_name = "Wayne Rooney")
# Look at dimension of rooney
dim(rooney)
## [1] 11 26
# View 6 columns of dataframe
rooney[,1:6]
##      id season_name element_code start_cost end_cost total_points
## 1   262     2006/07        13017        115      122          184
## 2  1031     2007/08        13017        120      117          148
## 3  1660     2008/09        13017        110      107          135
## 4  2392     2009/10        13017        110      120          224
## 5  3029     2010/11        13017        120      118          142
## 6  3893     2011/12        13017        120      129          230
## 7  4602     2012/13        13017        120      116          143
## 8  5297     2013/14        13017        105      113          190
## 9  6016     2014/15        13017        105      106          132
## 10 6729     2015/16        13017        105       99          118
## 11 7434     2016/17        13017         90       86           76
# Names of columns
names(rooney)
##  [1] "id"               "season_name"      "element_code"    
##  [4] "start_cost"       "end_cost"         "total_points"    
##  [7] "minutes"          "goals_scored"     "assists"         
## [10] "clean_sheets"     "goals_conceded"   "own_goals"       
## [13] "penalties_saved"  "penalties_missed" "yellow_cards"    
## [16] "red_cards"        "saves"            "bonus"           
## [19] "bps"              "influence"        "creativity"      
## [22] "threat"           "ict_index"        "ea_index"        
## [25] "season"           "name"

Reshape function

For this tutorial, we will look at only goals_scored, total_points and assists. So, I am going to write another function that selects only these three variables and change to tidy data.

player_reshape <- function(playerdf){
  playerdf %>% tidyr::gather(key = "variable", value = "value", -id, -season_name, -name) %>% 
    dplyr::filter(variable %in% c("goals_scored", "total_points", "assists"))  
}

Creating data frames for players

rooney_reshape <- player_stats(player_id = 161, player_name = "Wayne Rooney") %>% player_reshape()
sanchez_reshape <- player_stats(player_id = 14, player_name = "Alexis Sanchez") %>% player_reshape()
kane_reshape <- player_stats(player_id = 394, player_name = "Harry Kane") %>% player_reshape()
lukaku_reshape <- player_stats(player_id = 285, player_name = "Romelu Lukaku") %>% player_reshape()

Combining all the dataframes into one and visualizing

all_players <- rbind(rooney_reshape, sanchez_reshape, kane_reshape, lukaku_reshape)
head(all_players)
##     id season_name         name     variable value
## 1  262     2006/07 Wayne Rooney total_points   184
## 2 1031     2007/08 Wayne Rooney total_points   148
## 3 1660     2008/09 Wayne Rooney total_points   135
## 4 2392     2009/10 Wayne Rooney total_points   224
## 5 3029     2010/11 Wayne Rooney total_points   142
## 6 3893     2011/12 Wayne Rooney total_points   230

The data is tidy because each observation is shown in each row in the table.

Now, let us use ggplot to visualize the plots.

ggplot(all_players) + geom_bar(aes(season_name, as.numeric(value), fill = name), 
                               position = "dodge", width = 0.5, stat = "identity") + 
  facet_wrap(~ variable, ncol = 1, scales = "free_y") + 
  xlab("Premier League Season") + 
  ylab("Total Fantasy Points / Goals Scored / Assists") + 
  labs(fill = "Players")  + 
  theme(legend.position = "top")

If you want to view the interactive version of the above graph use the plotly package and use ggplotly function.

library(plotly)
ggplotly(ggplot(all_players) + geom_bar(aes(season_name, as.numeric(value), fill = name), 
                               position = "dodge", width = 0.5, stat = "identity") + 
  facet_wrap(~ variable, ncol = 1, scales = "free_y") + 
  xlab("Premier League Season") + 
  ylab("Total Fantasy Points / Goals Scored / Assists") + 
  labs(fill = "Players")  + 
  theme(legend.position = "top"))

I hope you guys learned how easy it is to bring the fantasy premier league data into R and then analyse using the awesome packages such as dplyr, tidyr, ggplot2 etc.

Please leave your comments below.