Game Studies Corpus (GSC)





Video games are one of the most influential entertainment mediums in our world and influence large parts of the society directly as a means of entertainment, education and recreation or indirectly as it is seen in social gamification processes. A multi billion dollar market has developed around this medium that is in some cases even under legal observation because of harmful effects on parts of the population.

Yet the video game world is not yet a prominent part of the scientific research process. One reason is that it is hard to answer the question how an analysis should be handled. Video games consist of bytecode that can not really be interpreted constructively and are mainly consumed via audiovisual means, which are hard enough to analyse for themselves. Another reason is also one of video games' defining properties: Interactivity. How would we be able to analyse something when its actual content is in big parts created and controlled by the person that consumes it?

The purpose of this project is to provide a solution for both of these problems and hopefully a robust starting point for empirical work in the field of Games Studies. Video game walkthroughs provide a textual representation of the video game in question and contain exactly the information that is needed to complete the game. These descriptions ignore the (theoretically infinite) variance of outcomes that are the result of the interaction element. Additionally they convert the content of a video game into text, an information medium that is routinely analysed in many ways in various research environments.




Project Overview

Goal: The goal of this project is to publish a text corpus that compiles video game walkthroughs from various sources for textual analysis.
Project Coordinator: Dr. Jochen Tiepmar, Natural Language Processing Group, Leipzig University
Project Start: 12.02.2020
Project End: "When it's done"
Contact: jtiepmar(at)informatik.uni-leipzig.de




Downloadable Content

Please be aware that data is available based on the current progress and may change (hopefully only expand) during the project.
The metadata will center around the walkthrough text corpus. That's why it does not simply contain everything that is available but only what could be mapped to a game walkthrough. For example it will not contain all available Steam tags but only tags from games that could be mapped to a collected walkthrough document.
CTS URNs are used as unique identifiers for the games to make the data interoperable with the planned Canonical Text Service. More info on that is available here. If you do not care about CTS, you can just see them as weird magical identifiers that connect the data points across the project.

Description Data Format Visualisation
Short Descriptions (RAWG) Plain Text
HTML Table
Tab separated key-value pairs
Gameplay Tags Plain Text
HTML Table
Tab separated key-value pairs with comma separated values Histogram
Combination Histogram
Genres Plain Text
HTML Table
Tab separated key-value pairs with comma separated values Histogram
Combination Histogram
Publisher Names Plain Text
HTML Table
Tab separated key-value pairs with comma separated values Histogram
Combination Histogram
Developer Names Plain Text
HTML Table
Tab separated key-value pairs with comma separated values Histogram
Combination Histogram
Supported Game Languages Plain Text
HTML Table
Tab separated key-value pairs with comma separated values Histogram
Combination Histogram
Supported Platforms (PC, Gameboy, iOS,...) Plain Text
HTML Table
Tab separated key-value pairs with comma separated values Histogram
Combination Histogram

The walkthroughs themselves will be published later (It's complicated).




Project Statistics

Number of Documents: 12904
Number of Games: 6231
Walkthrough Languages: deu, eng
Game Language Associations: 4634
Genre Associations: 1925
Gameplay Tags: 3619
Release Dates:
Developers: 3204
Publishers: 2836
Steam IDs: 1086
Platform Associations: 5239 (PC, Gameboy, iOS, Linux,...)




(Optional) Roadmap

Suggestions and Hints are welcome