IMGD 2905 Project 3

The goal of this project is to use the game analytics pipeline on raw game data to reinforce a common process in game development. You will obtain game session data, write scripts to parse and re-format the data (“wrangle”), and analyze the formatted data across many games to better understand key game attributes. Unlike in previous projects, rather than have forms of analysis (e.g., charts) prescribed to you specifically, you are to use analytics knowledge you have acquired in doing targeted analysis. Results are presented in a report.

Part 0 - Learn Hearthstone, Prepare Pipeline

Hearthstone is a free-to-play, online card game developed and published by Blizzard. The game is two-player and turn-based. Players select a Hero and a customizable deck of 30 cards with the goal of reducing the opponent’s health to zero. Winning matches and completing quests earns gold, allowing players to buy new cards for the next game.

Learn Hearthstone

For this part, you will gain a basic understanding of Hearthstone, at least enough to undertake the analysis required in this project.

Don’t worry about mastering the game (although you can if you wish and time allows!) - just get a general familiarity with the game rules, cards and playstyle.

Prepare Pipeline

Player data for Hearthstone used to be collected using the 3rd party Track-o-Bot, a small, easy-to-use app which automatically tracks a player’s Hearthstones matches. For this project, we will use the Collect-o-Bot previously gathered public Track-o-Bot data:

Up until the end of 2019, data was published daily and compiled into monthly data sets at the end of each month. Unfortunately, as of January 2022, the Collect-o-Bot site was no longer available, as is much of the data they collected. Fortunately, there is some - we will analyze what is there.

You will analyze one month of Collect-o-Bot 2019 data, from any month that is available.

The data is in JavaScript Object Notation (JSON) format. JSON is commonly-used, open-standard format that encapsulates object data in human-readable text. Data is arranged into attribute-value pairs and arrays (lists) of data.

An example of the JSON format for part of a Collect-o-Bot game file is in the Hints section.

#!/usr/bin/python3

#
# Parse Hearthstone output file(s).
# version 1.3
#
# Print some stuff.
#

# Needed imports.
import csv
import json
import sys

FILE="change-to-your-file-name"  # e.g., 2017-07-05.json or single.json

# Load file as json object.
filename = FILE
data = json.load(open(filename))

# Print all game matchups.
i = 0
for game in data['games']:
  i += 1
  print("Game", i, end=": ")
  print(game['hero'], end=" vs ")
  print(game['opponent'])

# Print count of number of cards played each game.
i = 0
print("Number of cards played:")
for game in data['games']:
  i += 1
  cards_played = len(game['card_history'])
  print("  game ", i, ":", cards_played);

# Print out total game count.
total_games = data['total_games']
print("Total games: ", total_games)

# Count how many times main player won.
wins = 0
for game in data['games']:
  if (game['result'] == 'win'):
    wins += 1

# Print out number of wins.
print ("Main player wins:", wins)

# Count how many times main player had coin.
coin_count = 0
for game in data['games']:
  if game['coin'] == True:
    coin_count += 1

# Print out total number of coins main player had.
print ("Main player had coin:", coin_count)

# Tally number of times main player had a certain Hero.
tally = {}
for game in data['games']:
  if game['hero'] in tally:
    tally[game['hero']] += 1
  else:
    tally[game['hero']] = 1

# Print tally out.
print ("Main player heroes played:");
for hero in tally:
  print (hero, tally[hero])

Copy and paste the script into a new Python notebook. Then, change the name of FILE to the name of the Collect-o-Bot data file (unzipped). You should see output similar to:

Study the script carefully. You will use, copy and modify it (in conjunction with other Python skills used in Project 2) for the analysis required in this project.

Part 1 - The Coin

Exploration: Which player wins most often - the player that starts or the player that goes second?

In turn-based games, going first can often be an advantage. For example, in chess the white player always starts and has a 5% higher chance of winning than the black player.

In Hearthstone, also a turn-based game, the player that goes first could have an advantage by being able to get a Minion out and do damage first. In an attempt to counteract this potential advantage, Hearthstone provides the player that goes second with The Coin - a unique spell card that costs 0 Mana to play and gives the player 1 additional Mana for that turn only.

For a month’s data, analyze the win rate for the player that starts compared to the win rate for the player that goes second (has the coin).

Be sure to report the number of games in your analysis (in addition to the month and year selected, of course).

Part 2 - Heroes

Exploration: How many different Heroes are used? What is popularity of each Hero? How often does each Hero win?

For a month’s data, analyze the distribution of Heroes used in all games. Also analyze the win/loss rate for each Hero (i.e., how likely a Hero is to win a game they are in).

Hint: A code sample that prints out Hero wins for the main player: hero-wins.py

Hint: Remember, in doing Hero analysis, each game has an opponent, too (e.g., game['opponent']).

Part 3 - Play Rates

Exploration: How long are Hearthstone games? How many cards are played? What is the play rate?

For a month’s data, analyze the duration of games. Analyze the number of cards played. Analyze how fast (the play rate) cards are played.

Hint: The length of a game can be obtained with code similar to: game['duration']

You may find some duration values to be null. These games should be ignored for this section. You should report how many such games are removed and/or the total games used in the duration computations. This kind of culling (i.e., removing some of the data) is often called “data cleaning” and is common in data analytics of all kinds.

Part 4 - Choice

Exploration: Think of some Hearthstone exploration you would like to do. Consider the gameplay itself, the data available and possible use of the outcome.

Hint: While you are not to make your choice analysis trivial, you should also consider your capabilities (e.g., with Python) in doing your analysis - some options require more data wrangling than others.

Hint: Fields that might be of interest are: hero_deck, opponent_deck, mode, rank and user_hash. There may be others. To print, for example, the user_hash for the 3rd game:

  print(data['games'][3]['user_hash'])

Hint: If doing an in-depth analysis of the cards, as an example, the command:

  print(data['games'][5]['card_history'][2]['card']['name'])

Hints

For each part, you might start with the parse.py script provided. Make a copy of the script (e.g., to a new Jupyter notebook). Then, remove or comment out the lines that it prints out that you do not need.

If it is helpful for learning and debugging code, a json-formatted file with just one game is here:

This file is formatted with indentation to make it easier to read in an editor. You can use it to better understand the structure of the data files.

After debugging your code with one game, you might then progress to one day of data. Once that works, proceed with your month analysis.

To “pretty-print” a JSON file to the screen, you might try the following script:

#
# pp.py - pretty-print json file.
#

# Needed imports.
import json
import sys

FILE="change-to-your-file-name"  # e.g., single.json

# Open file, parse and print!
filename = FILE
parsed = json.load(open(filename))
print (json.dumps(parsed, indent=2, sort_keys=True))

Writeup

For details on the data set, make sure you indicate the month and year, as well as high-level information on the “size” of the data set, such as number of games. You can do this once, say, in a “Methodology” section if all your analysis uses the same month or for each section if not.

For each other part of the project, provide a brief section on the analysis in clearly labeled sections.

Be sure to consider measures of central tendency and measures of spread, as appropriate.

Submission

The assignment is to be submitted electronically via Canvas by 11:59pm on the day due.

Important - you must click the Submit Assignment button at the end or your file will not be submitted!

Grading

All accomplishments are shown through the report. The point break down does not necessarily reflect effort or time on task.

Breakdown

Part 1 (The Coin) - 25% : Analysis of the impact of going first on win rate.

Rubric

100-90. The submission clearly exceeds requirements. All Parts of the project have been completed or nearly completed. The report is clearly organized and well-written, charts and tables are clearly labeled and described, measures of central tendency and spread properly computed and explained, and messages provided about each Part of the analysis.

89-80. The submission meets requirements. Parts 1-3 of the project have been completed or nearly completed, but perhaps not Part 4. The report is organized and well-written, charts and tables are labeled and described, measures of central tendency and spread computed and explained, and messages provided about most of the analysis.

79-70. The submission barely meets requirements. Parts 1-2 of the project have been completed or nearly completed, and some of Part 3, but not Part 4. The report is semi-organized and semi-well-written, charts and tables are somewhat labeled and described, but parts may be missing. Measures of central tendency and spread may not be always computed or explained. Messages are not always clearly provided for the analysis.

69-60. The project fails to meet requirements in some places. Part 1 of the project has been completed or nearly completed, and some of Part 2, but not Parts 3 or 4. The report is not well-organized nor well-written, charts and tables are not labeled or may be missing. Measures of central tendency and spread may not be always computed or explained or may even be misused. Messages are not always provided for the analysis.

59-0. The project does not meet requirements. Besides Part 0, and maybe Part 1, no other part of the project has been completed. The report is not well-organized nor well-written, charts and tables are not labeled and/or are missing. Measures of central tendency and spread are missing of, if in place, are misused. Messages are not consistently provided for the analysis.