Due date: Monday, April 1st, 11:59pm
The goal of this project is to use a complete game analytics pipeline on a simple game with a level of your design. This will illustrate a common process in game development - analyzing player behavior/performance in game at various depths. You will design a level (a maze), get players to play your maze (as well as play others' mazes), analyze player data for your maze and compare player performance for your maze with other mazes. Results will be presented in a report.
For this part, you will design a level (a maze) and play your classmates' levels (mazes).
Read the Mazetool documentation (docx, pdf).
Using Chrome or Firefox (results with other Web browsers can vary) visit:
and type in your WPI username (without the @wpi.edu
extension). Press enter.
Play some (3 to 5) randomly generated mazes to get the feel of the game. Design a few (2 to 3) custom mazes and play them to get a better feel for how maze design can impact the experience. For several mazes you complete, take a look at the data that is generated and sent by email to have a better understanding of the information that is collected each game.
When ready, design one maze that provides for a player experience that you think will be interesting. You should have a level-design experience in mind - e.g., a long maze, a tricky maze, both easy and tricky parts in the maze, a maze with backtracking, a direct-line maze, .... Playtest your maze yourself several times to help ensure you get the experience intended. Be sure to have your email (WPI username) entered and click "save". This will allow your maze to be loaded by others.
In preparation for your report:
A. take a screenshot of your maze, showing the player's starting location, all the gold pieces and the exit (cropping out the Web browser so you have just the maze). You will include this picture in your final report.
In preparation for your analysis:
Determine the shortest path to complete your maze in spaces.
Determine the fewest number of clicks needed to complete your maze.
Determine the shortest time needed to complete your maze by either: a) measuring by running the maze yourself as fast as possible, or b) determining the fewest number of spaces and estimating time based on speed.
Based on #3, decide on how short a time (e.g., 1 minute) is needed to complete your maze to count as a "win".
When and where indicated by the professor, play the maze of every other student in the class.
Open your Chrome/Firefox Web browser to:
From the list of usernames provided by the professor during the class session, enter the username of the first person on the list. (Tip! If an error is reported, you likely entered the wrong name. In that case, retry.) Touch the path to start to play, completing the maze as quickly as possible. Do this just once.
When you complete the maze (you reach the exit), refresh your Web browser (hit the refresh button or hit F5
). Then, move on to the next username.
Repeat the above for everyone on the list (except yourself, of course).
To analyze the mazetool output data, you will use Python and a spreadsheet (e.g., Microsoft Excel).
Follow the instructions in the setup Python document to get Python installed and ready.
When done, examine the below parse.py
Python script which parses the Mazetool comma separated value (csv
) output. You are encouraged to login to your Jupyter account and:
Create a new folder called mazetool-data
. Upload the example file sample.csv to this new folder.
Create a new Python Notebook (select "New" --> "Python 3").
Into your Notebook, paste in the below script. Modify the variable DIR to be the location of your sample.csv
file (e.g., mazetool-data
).
#
# parse.py - parse Mazetool output file(s).
#
# version 2.0
#
# Needed imports.
import csv
import os
DIR="change-to-your-dir-name" # e.g., mazetool-data
# Repeat (loop) for every file in directory
for f in os.listdir(DIR):
print("---------------------------------")
# Only handle .csv files.
if not f.endswith(".csv"):
print("Ignoring:", f)
continue
# Print file information.
filename = DIR + "/" + f
print("File:", f)
print("Full path:", filename)
# Print out gold events.
print("Gold (time number):")
with open(filename, 'r') as csvfile: # read from file
reader = csv.DictReader(csvfile) # treat as csv file
for row in reader:
if (row['gold'] is not ''):
print(row['time'], row['gold'])
print("-----------");
# Print out click events.
print("Clicks (time spaces):")
with open(filename, 'r') as csvfile: # read from file
reader = csv.DictReader(csvfile) # treat as csv file
for row in reader:
if (row['click'] is not ''):
print(row['time'], row['click'])
print("-----------");
# Print out exit event (it is always the last line in file).
print("Exit (time spaces):")
print(row['time'], row['exit'])
Gold (time number): 01.586 1 01.786 2 16.404 3 18.806 4 23.911 5 32.215 6 35.217 7 37.719 8 38.119 9 38.518 10 ----------- Clicks (time spaces): 00.800 0 12.653 10 17.255 48 20.787 64 25.783 96 28.885 127 31.150 131 33.036 142 36.191 164 1:10.071 188 ----------- Exit (time spaces): 1:15.022 238
This same script can work on any Mazetool output.
Study the script carefully and modify it and re-run it as needed to gain a deep understanding of how it works.
For your analysis, copy, extend and modify it to provide the data you need.
One extension to the script you will likely use is to write data you want to analyze (e.g., by Excel in a chart) to a file. This can be done in Python fairly easily. Below is some sample code showing one way to do this. There are others that you can find by searching the Web.
#
# write.py - Show basic csv file writing.
#
# version 1.1
#
import math # needed for sqrt()
# Output directory and file name.
DIR="." # '.' means current directory. Or try, e.g., mazetool-data
FILE="basic.csv"
# Write some numbers to file with commas (i.e., a csv).
filename = DIR + "/" + FILE
with open(filename, 'w') as csvfile:
# Print header.
print ("Loop, Square, Square Root,", file=csvfile)
# Repeat (loop) for numbers 1 to 10.
for i in range(1, 10):
# Print numbers: integer, integer, float.
print ("%d, %d, %f," % (i, i*i, math.sqrt(i)), file=csvfile)
# Note, file closes automatically here.
Note, if you cut and paste the above script to your Python Notebook (either a new one or the one you have created) it should work, but you won't see anything printed on the screen for output. Instead, the script will have created a file called "basic.py" you can open in your Jupyter account by double-clicking on it or selecting, downloading and opening with Excel.
Note that the Mazetool output is in seconds only (e.g., 36.191) when under 1 minute, but when over one minute contains the minutes followed by a colon and the seconds (e.g., 1:10.071). The times can be converted to just seconds in all cases with code similar to:
# Convert Mazetool time format to seconds.
t = row['time']
if ":" in t: # time went over one minute.
(m, s) = t.split(':') # split into minutes and seconds
seconds = int(m) * 60 + float(s) # convert to total seconds
else: # time is less than one minute.
seconds = t
print(seconds)
Note that the above script is not a complete program - you can't just run it by itself (in fact, you will get an error "row not defined" if you try). Instead, you can use that script as part of another script. For example, try using the script at the end of parse.py
to print the exit time in seconds.
From the data files for people that played your maze, select two example runs that illustrate progression through your maze. One should be a "short" run where the player completed the maze quickly, likely with fewer mouse clicks, and one should be a "long" run where the player took longer to complete the maze, perhaps with more mouse clicks.
Analyze the data, providing a time-series chart of distance traveled (in spaces) versus time (in seconds). The distance should be obtained from the mouse click events. Chart trendlines (with lines and points) should be clearly indicated.
While two separate charts could be shown, better would be to have one chart with two trendlines (as done in the sample chart). In Excel, this may be done by first creating a scatter plot for the first data set (e.g., the "long" run):
"Insert" --> "Scatter Plot" --> "Scatter Plot with Lines and Markers"
Then, adding the second data set (e.g., the "short" run) to the first graph by: 1) selecting the data and copying (ctrl-c
), 2) selecting the chart, and 3) choosing:
"Home" --> "Paste" --> "Paste special"
making sure "Categories (X Values) in First Column" is selected.
From all the data files for people that played your maze, tally up the number of people that "won" (completed your maze below the time you had specified as a winning time).
Make a pie chart of the number of wins and the number of losses.
To do this analysis, you will need data in the form:
Time,
115.396,
45.022,
46.924,
where each row is the exit time (the time the maze was completed), in seconds.
Once the time csv data is imported into Excel, the wins and losses can be tallied. In a separate column, provide an IF
formula that checks if the time is less than the winning time. For example, if the winning time is 30 seconds or under, the formula to check the winning time in cell A1 would be:
=IF(A1 < 30, "yes", "no")
Copy and paste this formula along the column for each row.
Then, use a COUNTIF
formula to count up the "yes" and the "no" values. Something like:
=COUNTIF(B1:B3, "yes")
For example, a spreadsheet with the analysis of 3 maze times (15.396, 45.022, and 76.924 seconds), might look like:
A pie chart can be made by selecting the "yes" and "no" rows and the tally column (e.g., C1:C2 to D1:D2 in the example) and then:
"Insert" --> "Insert Pie" --> "2-D Pie"
Analyze the completion times for all the people that played your maze via a cumulative distribution chart. In addition, analyze and compare the distribution of completion times of your maze to the distribution of completion times of all other mazes (i.e., do not include your maze data in the "all" maze data).
The time data is the same as in part 2.
Time,
115.396,
45.022,
46.924,
...
But for this part, draw a cumulative distribution chart of the times.
To create a cumulative distribution chart:
=row()/count * 100
where count
is the number of data rows.A trend line for the second distribution (e.g., "all") can be added by the same method used in part 1 to add a trend line to a time series chart.
Compare observed times for your maze with the "best" times you determined in Part 0.
Do some comparative analytics on individual mazes, including your own, compared to all mazes.
First, for all mazes and all runs (including your own), find the minimum, maximum, mean, median and mode for:
Report the results in a table.
For the variables of clicks, time and spaces, draw radar charts comparing your maze average clicks, average time and average spaces to the average clicks, average time and average spaces for all mazes.
In order to draw a radar chart in Excel, data needs to be in the form:
Clicks, Spaces, Time,
7, 128, 72,
Once in Excel, select the both rows and all three columns and:
"Insert" --> "Insert Waterfall, Funnel, Stock, Surface, or Radar Chart" --> "Radar"
(Note, the above command may need to be adjusted depending upon your Excel version.)
In doing so, however, the comparative charts will not be very effective! This is because the number of clicks is typically far fewer than either spaces or time. To remedy this, normalize each dimension each average by dividing by the maximum (previously computed for the table), i.e., clicks / max_clicks
. This will produce a number from 0 to 1 that is comparable across variables. Data will be similar to:
Clicks, Spaces, Time,
0.428, 0.714, 0.98,
A radar plot drawn with the normalized data will be comparable.
Select three other mazes, randomly or based on your own interest (e.g., your friends). Draw similar radar charts for each, being sure to clearly identify which maze each chart came from.
Compare observed average clicks and average spaces for your maze with the "best" times you determined in Part 0.
The entire Mazetool data set for all players on all mazes is available at:
https://web.cs.wpi.edu/~imgd2905/d19/projects/proj2/mazetool-data.zip
For your analysis, you should not aggregate characteristics about the data set (e.g., number of people, number of mazes played, etc).
Uploading multiple files (e.g., 300+ or more Mazetool csv files) can be tedious. See the Python setup tips for how to unzip (and zip) multiple files in a Notebook.
For many kinds of analytics, including game analytics, organization is key. Paying special attention to filenames - raw data, scripts and csv data - will pay dividends as the project progresses and gets more complicated. This is especially important if you ever have to re-visit your analysis, something that is quite common in practice.
With this in mind, some suggestions on keeping organized:
Make small, individual scripts that provide data for one part of the needed analysis. For example, a script may just pull out all the completion (exit) times from a series of files. Nothing more. This could be used for Part 1.
Have comments for each script that has a name and says something meaningful about what it does . For example, "exit-times - records exit-times".
Have any output csv files produced by the script use the same name as the script. For example, exit-times.csv
.
For spreadsheet analysis, have a separate file for each part in the analysis or, alternatively, a separate sheet for each part of the analysis. Name the file (or sheet) with the same name used for the script and the data. For example, exit-times.xlsx
.
Have a brief README.txt
file for your own notes that provides a one-line description of what each script does.
When embedding charts in a report, fonts may often shrink to the point they are not readable! To avoid this, as a guideline, compare the size of text inside a chart to the size of the text in the paper. They should be similar in size. If the chart text size is too small or way too big), go back to the original chart and choose a font size that results in a final font size that more proportional to the paper font. Note, this may require adjusting other aspects of the chart, such as axis tick marks and spacing.
The aspiring Python programmer might want to have an easier way to use the code to compute time in seconds. In general, small pieces of code like this can be separated into a "block" of code, called a function. With a function, you could write something like, for example, seconds = getSeconds(row['time'])
to get the number of seconds, regardless of whether the format is pre-pended by minutes or not. Tutorials to make functions can be found online; one such document is:
Many of the grading comments applied to Project 1 are general and pertain to Project 2 as well. You should review the comments made to your Project 1 report and make sure not to incorporate needed changes into your Project 2 report.
You should also check out the Postmortem Feedback on Graded Project 1s for general guidelines that also pertain to this Project 2.
Writeup a short report.
For Part 0 (Level Design, Play and Analysis Tools), include details on your maze, describing the high level experience, showing a screen shot of your maze, and providing data on the "win" condition, shortest path, fewest clicks and fastest time estimates.
For each other part of the project, provide a brief section on the analysis in clearly labeled sections (e.g., Part 1 - Maze Running). Include a brief description of the methodology, particularly as it may relate to the results obtained.
All guidelines for presenting and describing charts should be adhered to.
The assignment is to be submitted electronically via Canvas by 11:59pm on the day due.
The submission is a report in PDF, named:
proj2-lastname.pdf
with your name in place of "lastname" above, of course.
To submit your assignment (proj2-lastname.pdf
):
Open: IMGD2905-D19-D01
Navigate to:Assignments
->Project 2
Click:Submit Assignment
Click:Choose File
Select the pdf file:proj2-lastname.pdf
Click:Submit Assignment
Important - you must click the Submit Assignment
button at the end or your file will not be submitted!
When successfully submitted, you should see a message similar to:
Submission
- Submitted!
Apr 1 at 11:52pm
All accomplishments are shown through the report. The point break down does not necessarily reflect effort or time on task. Rather, the scale is graduated to provide for increasingly more effort required for the same reward (points).
Part 0 - 10% : Building a maze and playing everyone else's maze.
Part 1 - 35% : Time series chart showing short and long maze runs.
Part 2 - 25% : Pie chart showing "win" fraction.
Part 3 - 20% : Cumulative distribution charts of maze times.
Part 4 - 10% : Table of maximums and radar charts comparing mazes.
100-90. The submission clearly exceeds requirements. All Parts of the project have been completed or nearly completed. The report is clearly organized and well-written, charts and tables are clearly labeled and described and messages provided about each Part of the analysis.
89-80. The submission meets requirements. Parts 0-3 of the project have been completed or nearly completed, but perhaps not Part 4. The report is organized and well-written, charts and tables are labeled and described and messages provided about most of the analysis.
79-70. The submission barely meets requirements. Parts 0-2 of the project have been completed or nearly completed, and some of Part 3, but not Part 4. The report is semi-organized and semi-well-written, charts and tables are somewhat labeled and described, but parts may be missing. Messages are not always clearly provided for the analysis.
69-60. The project fails to meet requirements in some places. Parts 0-1 of the project has been completed or nearly completed, and some of Part 2, but not Parts 3 or 4. The report is not well-organized nor well-written, charts and tables are not labeled or may be missing. Messages are not always provided for the analysis.
59-0. The project does not meet requirements. Besides Part 0, and maybe Part 1, no other part of the project has been completed. The report is not well-organized nor well-written, charts and tables are not labeled and/or are missing. Messages are not consistently provided for the analysis.
The comments below are in response to graded projects. They are not provided in any particular order.
For some, the portmortem feedback on project 1 comments still hold (e.g., mistakes on that project are still being made on this one). Those should be reviewed.
The power for radar charts comes from comparison with other radar charts. So, when using them in a report, try to include all the radar charts on one page. This means resizing, as needed, so they can fit and changing fonts/lines so they are readable.
Make sure to indicate how radar charts are normalized (ditto for other charts/data analysis where normalization is done). In other words, if a value is divided by a maximum, say so and say where the maximum came from (e.g., maximum over all mazes? my maze?).
Generally, cumulative distribution functions should have the y-scale to be 100. Larger numbers on that axes are not possible.
When including methodology (e.g., describing how data was gathered to make a chart), put this before the chart itself.
Return to the IMGD 2905 home page
Questions: imgd2905 question-answer forum