IMGD 2905 Project 3

Hearthstone Analytics

Due date: Tuesday, April 9th, 11:59pm

[Hearthstone]


The goal of this project is to use the game analytics pipeline on raw game data to reinforce a common process in game development. You will obtain game session data, write scripts to parse and re-format the data ("wrangle"), and analyze the formatted data across many games to better understand key game attributes. Unlike in previous projects, rather than have forms of analysis (e.g., charts) prescribed to you specifically, you are to use analytics knowledge you have acquired in doing targeted analysis. Results are presented in a report.


Top | Part 0 | Part 1 | Part 2 | Part 3 | Part 4 | Writeup | Hints | Submit | Grade

Part 0 - Learn Hearthstone, Prepare Pipeline

Hearthstone is a free-to-play, online card game developed and published by Blizzard. The game is two-players and turn-based. Players select a Hero and a customizable deck of 30 card with the goal of reducing the opponent's health to zero. Winning matches and completing quests earns gold, allowing players to buy new cards.

Learn Hearthstone

For this part, you will gain a basic understanding of Hearthstone, at least enough to undertake the analysis required in this project.

  1. Download and install Hearthstone

  2. Jump right in an play! And/or, check out Blizzard's guides (in order you should read them):

    1. How to Play
    2. Heroes
    3. Inkeeper's Lessons

    Other 3rd party guides, videos and more can be found online. For a start:

    1. New and Beginner Players

Don't worry about mastering the game (although you can if you wish and time allows!) - just get a general familiarity with the game rules, cards and playstyle.

Prepare Pipeline

Player data for Hearthstone can be collected using the 3rd party Track-o-Bot, a small, easy-to-use app which automatically tracks a player's Hearthstones matches. While you can certainly collect Track-o-Bot data for your own games, that is not necessary for this project. Instead, we will use Collect-o-Bot data, a public repository of Track-o-Bot data. You will analyze one month of Collect-o-Bot 2019 data. Data is published daily and compiled into monthly data sets at the end of each month.

  1. Download the latest, full-month of data.

    Note: For testing purposes, you will want to test on a data set that is smaller than one month. For example, you may want to look at data on just one day, say March 15, 2019

  2. Download the latest, one-day of data.

    The data is in JavaScript Object Notation (JSON) format. JSON is an open-standard format that encapsulates object data in human-readable text. Data is arranged into attribute-value pairs and arrays (lists) of data.

    An example of the JSON format for part of a Collect-o-Bot game file is below:

#!/usr/bin/python3

#
# Parse Hearthstone output file(s).
# version 1.2
#
# Print some stuff.
#

# Needed imports.
import csv
import json
import sys

FILE="change-to-your-file-name"  # e.g., 2019-03-15.json

# Load file as json object.
filename = FILE
data = json.load(open(filename))

# Print out total games.
total_games = data['total_games']
print("Total games: ", total_games)

# Print all Heroes.
i = 0
for game in data['games']:
  print("Game", i, end=": ")
  print(game['hero'], end=" vs ")
  print(game['opponent'])
  i = i + 1

## Figure out how often a 'Paladin' won.
wins = 0
count = 0
for game in data['games']:

  # Main player was a Paladin?
  if (game['hero'] == 'Paladin'):
    count += 1 
    if (game['result'] == 'win'):
      wins += 1

  # Opponent was a Paladin?
  if (game['opponent'] == 'Paladin'):
    count += 1
    if (game['result'] == 'loss'):
      wins += 1

print("Paladin games: ", count)
print("Paladin wins: ", wins)

Copy and paste the script. Then, change the name of FILE to the name of the collect-o-bot data file (unzipped). You should see output similar to:

 
  Total games:  320
  Game 0: Priest vs Warlock
  Game 1: Priest vs Paladin
  ...
  Game 318: Paladin vs Rogue
  Game 319: Paladin vs Mage
  Paladin games:  104
  Paladin wins:  59

Study the script carefully. You will use, copy and modify it (in conjunction with other Python skills used in Project 2) for the analysis required in this project.

Note, see the Hints Section for a script to pretty-print a JSON file.


Part 1 - Heroes

Exploration: How many different Heroes are used? What is popularity of each Hero? How often does each Hero win?

For a month's data, analyze the distribution of Heroes used in all games. Also analyze the Win/Loss rate for each Hero.


Part 2 - The Coin

Exploration: Which player wins most often - the player that starts or the player that goes second? Do some Heroes do better going second than others?

In turn-based games, going first can often be an advantage. For example, in chess the white player always starts and has a 5% higher chance of winning than the black player.

In Hearthstone, also a turn-based game, the player that goes first could have an advantage by being able to get a Minion out and do damage first. In an attempt to counteract this potential advantage, Hearthstone provides the player that goes second gets The Coin - a unique spell card that costs 0 Mana to play and gives the player 1 additional Mana for that turn only.

For a month's data, analyze the win rate for the player that starts compared to the win rate for the player that goes second (has the coin). Analyze the relative win rate compared to the average for each Hero - in other words, how much does a particular Hero gain from the coin.

Hint: In Python, if you are checking if a JSON value is true or false after loading it, you need use True or False without quotes (i.e., not "True" or "False"). For example:

  if ( data['games'][5]['coin'] == True ):
    print("In game 5, the main player had The Coin")

Part 3 - Duration+

Exploration: How long are Hearthstone games? How many cards are played? How does the number of cards vary by Hero?

For a month's data, analyze the duration of games. Analyze the number of cards played and the number of cards played for each Hero.

Note, ascertaining the number of cards played by each Hero takes a bit of work. For this part, it is sufficient to you analyze the cards played by either player in a game with each type of Hero.

You may find some duration values to be null. These games should be ignored for this section. You should report how many such games are removed and/or the total games used in the duration computations. This kind of culling (i.e., removing some of the data) is often called "data cleaning" and is common in data analytics of all kinds.

Hint: The function len() can be used to print the length (i.e., number of items) in a Python dictionary.

  cards_played = print( len(data['games'][5]['card_history']) )

Hint: The below code prints out the total cards each Hero played across all games. This might be a useful start for some analysis.

#!/usr/bin/python3

# Needed imports.
import json
import sys

FILE="change-to-your-file-name"  # e.g., 2019-03-15.json

## Count total cards for all games by each hero

# Open file.
filename = FILE
data = json.load(open(filename))

# Setup dictionary.
played = {} 

# Loop through all games.
total_games = data['total_games']
for game in data['games']:

  # Get hero. If new hero (not yet seen), initialize count.
  hero = game['hero']
  if hero not in played:
    played[hero] = 0

  # Loop through all cards in this game.
  for card in game['card_history']:
    if (card['player'] == 'me'):
      played[hero] = played[hero] + 1

  # Note, you can do same as above for opponent!

# Print total cards for each hero.
for hero in played:
  print("hero: ", hero)
  print("played cards: ", played[hero])

Part 4 - Choice

Exploration: Think of some Hearthstone exploration you would like to do. Consider the gameplay itself, the data available and possible use of the outcome.

Options include, but are not limited to:

Hint: Fields that might be of interest are: hero_deck, opponent_deck, mode, rank and user_hash. There may be others. To print, for example, the user_hash for the 3rd game:

  print(data['games'][3]['user_hash'])

Hint: If doing an in-depth analysis of the cards, the command:

  print(data['games'][5]['card_history'][2]['card']['name'])

will print the name of the 2nd card in the 5th game.


Hints

If it is helpful for learning and debugging code, a json-formatted file with just one game is here:

This file is formatted with indentation to make it easier to read in an editor. You can use it to better understand the structure of the data files.

To "pretty-print" a JSON file to the screen, you might try the following script:

#
# pp.py - pretty-print json file.
#

# Needed imports.
import json
import sys

FILE="change-to-your-file-name"  # e.g., 2019-03-15.json

# Open file, parse and print!
filename = FILE
parsed = json.load(open(filename))
print (json.dumps(parsed, indent=2, sort_keys=True))

Many of the grading comments applied to previous projects (1 and 2) are general and pertain to Project 3 as well. You should review the comments made to your Project 1 and Project 2 reports and make sure not to incorporate needed changes into your Project 3 report.

For general guidelines, you should also check out:


Writeup

Writeup a short report.

Include appropriate descriptions for your data set, pipeline and methodology.

For each other part of the project, provide a brief section on the analysis in clearly labeled sections.

Be sure to consider measures of central tendency and measures of spread, as appropriate.

All guidelines for presenting and describing charts should be adhered to.


Submission

The assignment is to be submitted electronically via Canvas by 11:59pm on the day due.

The submission is a report in PDF, named:

    proj3-lastname.pdf

with your name in place of "lastname" above, of course.

To submit your assignment (proj3-lastname.pdf):

Open: IMGD2905-D19-D01
Navigate to: Assignments -> Project 3
Click: Submit Assignment
Click: Choose File
Select the pdf file: proj3-lastname.pdf
Click: Submit Assignment

Important - you must click the Submit Assignment button at the end or your file will not be submitted!

When successfully submitted, you should see a message similar to:

Submission
- Submitted!
Apr 9 at 11:50pm


Grading

All accomplishments are shown through the report. The point break down does not necessarily reflect effort or time on task. Rather, the scale is graduated to provide for increasingly more effort required for the same reward (points).

Breakdown

Part 1 (Heroes) - 35% : Analysis of Hero selection and win rate.

Part 2 (The Coin) - 30% : Analysis of the impact of going first on win rate.

Part 3 (Duration+) - 20% : Analysis of game duration and cards played.

Part 4 (Choice) - 15% : Analysis of choice.

Rubric

100-90. The submission clearly exceeds requirements. All Parts of the project have been completed or nearly completed. The report is clearly organized and well-written, charts and tables are clearly labeled and described, measures of central tendency and spread properly computed and explained, and messages provided about each Part of the analysis.

89-80. The submission meets requirements. Parts 1-3 of the project have been completed or nearly completed, but perhaps not Part 4. The report is organized and well-written, charts and tables are labeled and described, measures of central tendency and spread computed and explained, and messages provided about most of the analysis.

79-70. The submission barely meets requirements. Parts 1-2 of the project have been completed or nearly completed, and some of Part 3, but not Part 4. The report is semi-organized and semi-well-written, charts and tables are somewhat labeled and described, but parts may be missing. Measures of central tendency and spread may not be always computed or explained. Messages are not always clearly provided for the analysis.

69-60. The project fails to meet requirements in some places. Part 1 of the project has been completed or nearly completed, and some of Part 2, but not Parts 3 or 4. The report is not well-organized nor well-written, charts and tables are not labeled or may be missing. Measures of central tendency and spread may not be always computed or explained or may even be misused. Messages are not always provided for the analysis.

59-0. The project does not meet requirements. Besides Part 0, and maybe Part 1, no other part of the project has been completed. The report is not well-organized nor well-written, charts and tables are not labeled and/or are missing. Measures of central tendency and spread are missing of, if in place, are misused. Messages are not consistently provided for the analysis.

Postmortem Feedback on Graded Projects

The comments below are in response to graded projects. They are not provided in any particular order.

Chart 1

Chart 2

Chart 3

Chart 4

Chart 5

proj3-part3.pptx


Top | Part 0 | Part 1 | Part 2 | Part 3 | Part 4 | Writeup | Hints | Submit | Grade

Return to the IMGD 2905 home page