|
IMGD 2905 Project 1League of Legends Player AnalyticsDue date: Tuesday, March 21st, 11:59pm |
The goal of this project is to set up tools for a game analytics pipeline and apply the pipeline to Riot's League of Legends (League). You will work through steps to setup tools that allow for querying, extraction and analysis of data. The pipeline will be exercised from basic queries to Riot's game data set, analysis and presentation through charts and tables, and into a report for dissemination. The tool pipeline will be used for subsequent projects, including a more advanced analysis of League data.
Setup an environment for doing the pipeline for game analytics.
To do this, work completely through guide to Setup Game Analytics Pipeline Tools.
Once all tools are properly installed and tested, proceed to Part 1.
For the League player (summoner) named "Faker" (Wiki), analyze the cumulative number of (ranked) LoL matches he has played since he started (using the start day as day 0).
This analysis must be in the form of a chart, something like:
To gather the needed data, use Python and the Riot API. Below is a
Python script showing some basic queries. The lines with a
#
are comments, written to help
understand what the code is doing. You are encouraged to: 1) run the
below script to make sure it works, 2) study it carefully and modify
it and re-run it as needed to gain a deep understanding of how it
works, 3) copy, extend and modify it to provide the data you need for
this project.
#!/usr/bin/python3 # # basic.py - Do some basic queries using the Riot API. # # version 1.2 # # Bring in Python imports needed for data processing. # RiotWatcher from: https://github.com/pseudonym117/Riot-Watcher from riotwatcher import RiotWatcher import json import time # Replace below with my Riot developer key. developer_key = 'my-developer-key-here' # Get master RiotWatcher object that queries Riot API using my key. r_w = RiotWatcher(developer_key) # Get player with summoner name 'faker'. # Wiki: https://en.wikipedia.org/wiki/Faker_(video_gamer) player = r_w.get_summoner(name='faker') # Print out player info. print("Player") # Get player's match list from match data. match_data = r_w.get_match_list(player['id']) match_list = match_data['matches'] # Loop through all matches, printing out champion id. # Note: the champion id corresponds to the static_get_champion_list(). print("\nChampions played") # "\n" puts out a newline (blank line). count = 0 for match in match_list: count = count + 1 # tally the number matches champion = match["champion"] print(champion, end=", ") # end=", " puts a comma and space after print("\nTotal matches: %d" % count) # %d is for integer # Print out time of oldest (and last) match in list. # Note: match times are in milliseconds since 1970. i = len(match_list)-1 # in Python, the last item in a list is length-1. match = match_list[i] match_time_old = match['timestamp'] print("\n\nOldest match time: ", end="") print(json.dumps(match_time_old, indent=3)) print(time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(match_time_old/1000))) # Print out time of newest (and first) match in list. i = 0 # in Python, the first item in a list is '0'. match = match_list[i] match_time_new = match['timestamp'] print("Newest match time: ", end="") # end="" means don't add a newline print(json.dumps(match_time_new, indent=3)) print(time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(match_time_new/1000))) # Compute time elapsed between newest and oldest. milliseconds = match_time_new - match_time_old minutes = milliseconds / (1000 * 60) hours = minutes / 60 days = hours / 24 years = days / 365 print("\nTime difference between newest and oldest") print("hours: %d" % hours) # %d is for integer print("days: %d" % days) # %d is for integer print("years: %.2f" % years) # %.2f is for real, 2 digits after decimal # Write some data to file with commas (i.e., a csv) with open('basic.csv', 'w') as csvfile: print ("Hours, Days, Years", file=csvfile) print ("%d, %d, %d" % (hours, days, years), file=csvfile) # Note, file closes automatically here.
To draw the chart of "Games Played" versus "Time", use a spreadsheet (e.g., Microsoft Excel). Data needs to be in a format similar to:
Days, Match, 0.00, 0, 0.03, 1, 0.06, 2, 0.39, 3, 0.42, 4, 1.01, 5, 1.05, 6, ...
Note! The values extracted (and that you print out) may be in reverse chronological order (i.e., newest to oldest). This can be fine for generating a chart - you do not need to reverse them.
The values are aligned in vertical columns, with each column separated by a comma (,). This file format is known as "csv" for "comma separated values" and can be read into most spreadsheets, neatly placing the columns and rows into spreadsheet cells.
Tips for drawing a line chart with Microsoft Excel can be found at:
http://www.wikihow.com/Make-a-Line-Graph-in-Microsoft-Excel
Determine the Champion that Faker has played the most over all his matches.
Note, there are many ways to compute this! However, a recommended
way is to list all the champion id's and compute
the mode()
in Excel. Tips for how to compute the mode
are at:
https://tinyurl.com/jslhn2y
As in part 1, data needs to be in a csv format, such as:
67, 236, 119, 67, 64, 67, 25, 202, ...
In the case of a tie, report any one of the tied Champions.
For the final answer, the Champion must be reported by name (e.g.,
"Leona") and not by id (e.g., 89). To find the champion name
associated with a champion id, refer to the "hello.py" script from
the setup. The JSON object returned
by static_get_champion_list()
has an id field
corresponding to the ids found from match['champion']
.
Think about how to print out the JSON returned
by static_get_champion_list()
.
Repeat the above analysis, but do so for two separate groups of matches - the oldest half of Faker's matches and the newest half of the Faker's matches. Determine if the mode changed (i.e., whether Faker changed his main champion half way through is career).
Pick another League player of choice (note, s/he must have played in competitive/ranked LoL matches in order to gather data) and do the same analysis of matches and champions played as that done for Faker.
There are many ways to find competitive/professional League players, but a pretty easy way is through a Google search. Or, any friends that have played competitive/ranked League matches can be used.
Draw a chart comparing your selected player to Faker (note, this means a chart that shows both data sets on one chart).
Provide a combined table with your selected player's champion played and with Faker's champions played.
Analyze the lengths (durations) of all of Faker's matches.
For the first analysis, produce a histogram of the number of matches versus the match length, broken into 5 minute intervals (i.e., the "bucket size" is 5 minutes), something like:
Tips for drawing a histogram with Microsoft Excel can be found at:
http://www.excel-easy.com/examples/histogram.htmlBut if your version of Excel does not match, your best bet is to hit F1 for "Help" and then search for "Create a histogram".
The match data can be obtained from the RiotWatcher call
to get_match()
, called with a match id (a number). In
the case of this project, the matchId is one of the matches for Faker.
For example, a code snippet may look like:
for match in match_list: fullMatch = r_w.get_match(match['matchId'])
Duration can be obtained from the field matchDuration
in the full match data.
For the second analysis, compute summary statistics on the match duration - the minimum, maximum, average and standard deviation of the match length. Present this result in a table.
Tips for drawing a histogram with Microsoft Excel can be found at:
http://www.excel-easy.com/functions/statistical-functions.html
For the writeup, compare analysis of Faker's data to overall trends from the League of Graphs - Game Durations.
Writeup a short report on the above analysis. Include a brief description of the methodology, particularly as it may relate to the results obtained. Have clearly labeled sections for each Part (e.g., Section 1). You do not need to include Part 0 - Setup in your report.
All results in the form of charts and tables should:
For reference, consider a good example of descriptive text to accompany a chart made through data analysis.
Remember, the independent variable on the x-axis (horizontal axis) and the dependent variable on the y-axis (vertical axis). The independent variable is the one that you manipulate, and the dependent variable is the one that you observe. Note that sometimes you do not really manipulate either variable, you observe them both. In that case, if you are testing the hypothesis that changes in one variable cause (or at least correlate with) changes in the other. Put the variable that you think causes the changes on the x-axis.
For all Riot API queries, there is a limit to how fast requests can be made. Doing so too rapidly will have the query rejected and an error returned. In such a case there should be a pause between requests. In Python, this can be done by:
import time ... time.sleep(1) # sleep for 1 second
For project questions, please post your question on the imgd2905 question-answer forum. Both the professor and staff will look to answer all questions there, but students can also answer each other's questions. You may even find your question has been answered already!
The assignment is to be submitted electronically via the Instruct Assist Website by 11:59pm on the day due.
The submission is a report in PDF, named proj1-lastname.pdf
To submit your assignment, log into the Instruct Assist website:
https://ia.wpi.edu/imgd2905/
Use your WPI username and password for access. Visit:
Tools → File Submission
Select "Project 1" from the dropdown and then "Browse" and select
the assignment file (i.e., proj1-lastname.zip
).
Make sure to hit "Upload File" after selecting it!
If successful, there should be a line similar to:
Creator Upload Time File Name Size Status Removal Claypool 2017-03-12 21:40:07 proj1-claypool.zip 3208 KB On Time Delete
All accomplishments are shown through the report. The point break down does not necessarily reflect effort or time on task. Rather, the scale is graduated to provide for increasingly more effort required for the same reward (points).
Part 1 | 60% | The analysis of matches played represents more than half of the grade. Completing this part means your tool pipeline is setup and can be used, with a basic demonstration of one full-set of analysis. |
---|---|---|
Part 2 | 20% |
The analysis of the champions is worth an additional 20%. Completing
this demonstrates associating data from one script/table with another
(that of hello.py ), a common skill needed for data
analytics.
|
Part 3 | 13% | Comparing players is worth an additional letter grade worth of points. Doing so reinforces the skills already demonstrated one time. |
Part 4 | 7% | Analytics match data is worth a small fraction of the grade as it represents the "icing on the cake". Analyzing the match data shows an additional set of queries as well as analysis of a new data structure. |
100-90. The submission clearly exceeds requirements. All parts of the project have been completed or nearly completed. The report is clearly organized and well-written, charts and tables are clearly labeled and described and messages provided about each part of the analysis.
89-80. The submission meets requirements. The first 3 parts of the project have been completed or nearly completed, but not part 4. The report is organized and well-written, charts and tables are labeled and described and messages provided about most of the analysis.
79-70. The submission barely meets requirements. The first 2 parts of the project have been completed or nearly completed, but not parts 3 or 4. The report is semi-organized and semi-well-written, charts and tables are somewhat labeled and described, but parts may be missing. Messages are not always clearly provided for the analysis.
69-60. The project fails to meet requirements in some places. The first part of the project has been completed or nearly completed, but not parts 2, 3 or 4. The report is not well-organized nor well-written, charts and tables are not labeled or may be missing. Messages are not always provided for the analysis.
59-0. The project does not meet requirements. No part of the project has been completed. The report is not well-organized nor well-written, charts and tables are not labeled and/or are missing. Messages are not consistently provided for the analysis.
The comments below are in response to graded projects. They are not provided in any particular order.
As several of you pointed out, the player Faker on the North American (NA) servers is not the "real" Faker. The real Faker plays mostly on the Korean servers. But the NA Faker is a good player and worked out fine for our analysis.
Number all pages.
Sections should be numbered and with a name/label (e.g., "Section 1. Matches" instead of "Section 1.").
Charts/figures don not need a title and a caption. In general, use a caption only. Chart titles may be fine for a presentation.
Using color to differentiate lines is ok (even good), but they should also be distinguishable in black and white. Black and white is still the lowest common denominator - some people are color blind, many printers don't do color and even some devices do not (e.g., my PDF annotator does not display in color!). Differentiate with thickness and/or hashes and/or point types.
Excel provides horizontal lines and grids on charts by default. These are almost always not needed and provide unnecessary "ink", clouding the message.
Excel may choose axis marks that make little sense (e.g., 1, 7, 13 ...). Control the axis marks to provide something that is meaningful given the data.
If a chart has a single data series, a key/legend is not needed.
Figures are numbered differently than tables (e.g., Figure 1, Figure 2 and then Table 1 and Table 2). Also, use capitalization for each.
Histograms should not have spaces between the bars (or only small spaces). Also, a bar with one number below it (e.g., 25) is not clear - does it represent, e.g., 20-25 or 22.5 - 27.5?
Think about font size when placing a chart in a paper. In general, the font size of the embedded figure should be comparable to the paper itself.
Make sure to submit PDFs only. Word docs or other formats are not acceptable.
Write formally. Avoid conversational prose (e.g., "it was insanely large") and hyperbole. Similarly, do not use contractions (e.g., "don't" instead use "do not") and do not use second person (e.g., "you", "your", and especially "y'all").
Check the number of significant digits on a computation. Just because Excel can provide a standard deviation of 7.8215132511 does not mean all digits should be reported. First, are they really significant or are they within the error of measurement (e.g., for time which is measured in seconds)? Second, does it really help the reader? e.g., better would be 7.82 or even 7.8.
If embedding a picture taken from a screen shot, be sure to choose a high resolution monitor, blow the picture up as large as possible and then take the screen capture. This will minimize the chance of it looking blurry in the document.
Return to the IMGD 2905 home page
Questions: imgd2905 question-answer forum