IMGD 2905 Project 1

FIFA 18 Player Analytics

Due date: Monday, March 19th, 11:59pm

Cover

Cover


The goal of this project is to get used to part of the game analytics pipeline, applying the pipeline to EA's FIFA 18. You will import data into a spreadsheet program (e.g., Excel), then work to querying, extraction and analysis the data. Analysis includes creating a report with charts and tables, and possible dissemination by presentation. The tool pipeline will be used for subsequent projects, including a more advanced analysis of game data.


Top | Part 0 | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Writeup | Hints | Links | Submit | Grade

Part 0 - Setup

Setup an environment for FIFA 18 game analytics.

Spreadsheet

At this point, the primary installation is of a spreadsheet. Choose the installation method appropriate for your Operating System and install a spreadsheet:

https://it.wpi.edu/Article/Install-Microsoft-Office-2016-on-Windows

sudo apt-get install libreoffice

Dataset

We will be using:

Aman Shrivastava. "FIFA 18 Complete Player Dataset", Kaggle Dataset, Last updated: December 2017.

The dataset is from the latest edition (2018) of EA FIFA, with 17k+ players having more than 70+ attributes. The data itself is scraped from SoFIFA, which itself grabs data from the PC version of FIFA 18 as it is updated, and uploads it to the Web.

I have downloaded the relevant Kaggle dataset here:

fifa-18.csv

If you prefer, you can download the Shrivastava data from the source:

https://www.kaggle.com/thec03u5/fifa-18-demo-player-dataset/data

If so, you will need a Kaggle account/login and will want to make sure to only grab the file:

CompleteDataset.csv

Once the data is downloaded, check that you can successfully open it with your spreadsheet. You should see something like:

with attributes in the columns and one player per row.

When done, proceed to part 1.


Part 1 - Rating versus Age Analysis

Analyze Overall Rating versus Age. This analysis must be in the form of a scatter plot chart, something like:

Tips:

To draw the chart, use a spreadsheet (e.g., Microsoft Excel). The scatter plot chart (and many charts), needs to have the data in two columns. Something like:

Rating,   Age,
90,        22,
89,        25,
88,        26,
....

Note, the raw data has each column separated by a comma. This file format is known as "csv" for "comma separated values" and can be read into most spreadsheets.

Fortunately, the original dataset is already in the right format! You just need to select the two relevant columns, and then create/insert a chart from the data.

For help with selecting columns:

https://tinyurl.com/yczpub96

There are lots of online resources for making spreadsheet charts. One such is:

http://www.excel-easy.com/examples/scatter-chart.html

For the above link and all below, if your version of Excel or spreadsheet does not match the tutorial, hit F1 for "Help" and then type your query (e.g., "create a scatter plot").


Part 2 - Rating versus Wage (per week) Analysis

Analyze Overall Rating versus Wage (per week) for two club teams, Real Madrid and Paris Saint-Germain. This analysis must be in the form of a scatter plot chart, but with the two teams clearly differentiated. Something like:

Once done compute and report (in table form):

Tips:

Note, the wage values in the dataset are already per week - no need to adjust further.

In order to do the intended analysis, it is helpful to cluster players from the same teams together. One way to do this is to sort by a particular column (e.g., Club):

https://tinyurl.com/y8sdkv5e

Then, you can select just some of the rows (e.g., only the rows with Real Madrid players). Steps on selecting a set rows can be found here:

https://exceljet.net/formula/highlight-rows-that-contain

Computing averages in spreadsheets is typically done with a built-in function. In Excel:

https://tinyurl.com/y6uwhu5v


Part 3 - Age Analysis

Analyze the age of all players. This analysis must be in the form of a histogram (bucket size is optional), something like:

Tips

Tips for drawing a histogram with Microsoft Excel can be found at:

http://www.excel-easy.com/examples/histogram.html

As for other tips, if your version of Excel does not match the tutorial, hit F1 for "Help" and then search for "Create a histogram".


Part 4 - Speed by Position Analysis

Analyze average Speed according to Preferred Position for all players. This analysis must be in the form of a bar chart. Something like:

Only the positions of striker (ST), center midfield (CM), center back (CB), goal keeper (GK) and overall (averaged over all players in the dataset) need to be analyzed.

Tips

Note, the speed attribute is labeled sprint_speed in the dataset.

Also note, some speed values have a "+" associated with them. e.g., "95+2". You can either ignore those players, add the values (e.g., treat that as a "97") or use only the base (treat that as a "95"). Making these types of decisions is often part of data cleaning! Whatever you choose, you should note it clearly in your report.

As for other charts, the trick is to get only the rows with the position desired selected. One way to do this is with filters, filtering in data that matches a pattern (e.g., GK) and filtering out data that does not:

https://tinyurl.com/y8fh9dru

Note, you can take filtered data and copy it, say, to another spreadsheet for analysis.


Part 5 - Your Choice Analysis

Pick another, as of yet analyzed aspect of the FIFA 18 dataset (not a new dataset) of interest to you and analyze it. This could include new, un-analyzed Player attributes (e.g., skill) or un-analyzed team/club comparisons (e.g., MLS to La Liga). Feel free to be creative!

This must include a chart and may include a table or other statistics.


Writeup

Writeup a short report on the above analysis. Include a brief description of the methodology, particularly as it may relate to the results obtained. Have clearly labeled sections for each Part (e.g., Section 1). You do not need to include Part 0 - Setup in your report.

All results in the form of charts and tables should:

For reference, consider a good example of descriptive text to accompany a chart made through data analysis.

Remember, the independent variable on the x-axis (horizontal axis) and the dependent variable on the y-axis (vertical axis). The independent variable is the one that you manipulate, and the dependent variable is the one that you observe. Note that sometimes you do not really manipulate either variable, you observe them both. In that case, if you are testing the hypothesis that changes in one variable cause (or at least correlate with) changes in the other. Put the variable that you think causes the changes on the x-axis.


Hints

The comments below are common mistakes from previous years. They may pertain to your project. They are not provided in any particular order.


The FIFA 18 home page.

The kaggle dataset for this project.

The original source for the kaggle dataset.

A beginners guide to soccer.

A FIFA 18 Data Visualizer you might try to examine various relationships.


Submission

The assignment is to be submitted electronically via the Instruct Assist Website by 11:59pm on the day due.

The submission is a report in PDF, named:

    proj1-lastname.pdf

with your name in place of "lastname" above, of course.

To submit your assignment, log into the Instruct Assist website:

https://ia.wpi.edu/imgd2905/

Use your WPI username and password for access. Visit:

Tools → File Submission

Select "Project 1" from the dropdown and then "Browse" and select the assignment file (i.e., proj1-lastname.pdf).

Make sure to hit "Upload File" after selecting it!

If successful, there should be a line similar to:

Creator    Upload Time             File Name        Size    Status   Removal
Claypool 2018-03-12 21:40:07  proj1-claypool.pdf   3208 KB  On Time  Delete

Grading

All accomplishments are shown through the report. The point break down does not necessarily reflect effort or time on task. Rather, the scale is graduated to provide for increasingly more effort required for the same reward (points).

Breakdown

Part 1 - 30% : The analysis of rating versus age represents a large chunk of the grade. Completing this part means a basic tool pipeline is setup and can be used, with a basic demonstration of one full-set of analysis.

Part 2 - 25% : The analysis of rating versus wage is worth an additional chunk of the grade. Completing this demonstrates more sophisticated queries and analysis.

Part 3 - 20% : The analysis of age is worth an additional chunk of data. Completing this demonstrates an additional chart and analysis skill.

Part 4 - 15% : Comparing speed by position is worth an additional letter grade worth of points. Doing so reinforces the skills already demonstrated one time.

Part 5 - 10% : Analyzing data of choice is worth a small fraction of the grade as it represents the "icing on the cake". It shows an additional set of queries as well as new insights that are self-driven.

Rubric

100-90. The submission clearly exceeds requirements. All parts of the project have been completed or nearly completed. The report is clearly organized and well-written, charts and tables are clearly labeled and described and messages provided about each part of the analysis.

89-80. The submission meets requirements. The first 3 parts of the project have been completed well, but not parts 4 or 5. The report is organized and well-written, charts and tables are labeled and described and messages provided about most of the analysis.

79-70. The submission barely meets requirements. The first 2 parts of the project have been completed or nearly completed, but not parts 3 or 4. The report is semi-organized and semi-well-written, charts and tables are somewhat labeled and described, but parts may be missing. Messages are not always clearly provided for the analysis.

69-60. The project fails to meet requirements in some places. The first part of the project has been completed or nearly completed, and maybe some of part 2, but not parts 3, 4 or 5. The report is not well-organized nor well-written, charts and tables are not labeled or may be missing. Messages are not always provided for the analysis.

59-0. The project does not meet requirements. No part of the project has been completed. The report is not well-organized nor well-written, charts and tables are not labeled and/or are missing. Messages are not consistently provided for the analysis.

Postmortem Feedback on Graded Projects

The comments below are in response to graded projects. They are not provided in any particular order.


Top | Part 0 | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Writeup | Hints | Links | Submit | Grade

Return to the IMGD 2905 home page

Questions: imgd2905 question-answer forum