Building a Playstation Network Recommendation System from Scratch
Can we build a system better than PS starting from nothing?
It is quite an understatement to say that I have been a massive fan of Playstation. I have owned each console, and while I have parted ways with my PS1 and PS2, I still own every other system, including the PSVita. According to Playstation’s “gaming stats,” I log around 2,000 hours each year. Why am I confessing this undying adoration for Playstation? To some extent, I hope it will help emphasize the following statement. The digital storefront for purchasing Playstation content is lackluster.
As my number of digital purchases skyrocketed over the last five years, I have had more chances to experience the PS store’s shortcomings firsthand. While the PS5 admittedly made some improvements over the PS3 and PS4, I rarely find myself actively shopping on the console; several third-party websites exist that present the information better. While I could criticize many different features (or the lack thereof), I want to focus on one area where I have at least some knowledge; player-level recommendations systems. With the variety of content available on the Playstation Network (PSN), an advanced implementation of such a system would benefit both players and company revenue.
Current State of Playstation’s Recommendations
With the PS5 launch, there were signs that the storefront could be more innovative— hidden deep within a specific game’s product page, you can find a “Discover More Games Like This” feature (DMG), but the accuracy is hit or miss. The “Expand Your Games” section is also present, a feature carried over from the PS4 store. It filters on recently played games and suggests purchasing their DLC. Regardless if you already own the DLC, they will still recommend it! For example, I already own the “Lara Croft Season Pass”:
On one of the main store pages, they replicate the DMG feature filtered on a single game, the one you most recently played. While these are all steps in the right direction, it feels like more significant opportunities are available for recommendations. Of course, everyone can be a critic. Perhaps someone would be more receptive to our feedback on how the storefront could better leverage recommendations if we first constructed a PSN recommender system. Let’s gain some credibility by building that system from scratch!
Building Our Own PSN Recommendation System
To construct a game recommendation system, we are going to need data. A reasonable first step would be to create a list of features that we feel are relevant — we can further break those down into three broad categories; player behaviors, player information, and game information.
We could generate features for this list for quite some time, but we’ve yet to address the elephant in the room; we are not Playstation, and we do not have access to most of the core features above. Technically, we could scrape most game information from online resources, but user information would be more challenging to track down. However, even if we could somehow scavenge all of this metadata on games and players, we would still be missing a critical factor in building our system; a target.
For example, some recommendation systems might attempt to rank items in an order roughly aligned with the purchase probability. Of course, there are plenty of nuances here, but the issue still stands. If we do not have a way to indicate a player’s preferences, we will have no way to learn the relevance of any given features, which is a prominent issue. Behavioral attributes are often the most important in determining if someone is interested in a piece of content. Looking through our shortlist on what we have access to; purchases — nope; wishlist, clicking or searching actions — no, no, no; playing — no.., well, wait a second — We might be able to access some information that proxies play!
The Trophy System
On some platforms, developers have implemented a system that rewards players for their progression in a game. On the Playstation Network, we can see that the trophy system represents this concept. A player can earn a trophy for a variety of reasons. Here are some of the main ones:
- General Game Progression — main questline, side quests, and completion of the game
- Depth of Game Skill — difficulty related, speed run related
- Collection Oriented — collectible related, grind related
- Multiplayer Oriented — co-op objectives, online ranks
For most games, general progression represents a considerable portion of the trophies. Therefore, we can assume that the percentage of trophies a player has received in a given game roughly correlates with their overall depth of play.
Issues with using Trophy Data for Recommendations
Of course, there are three significant issues that we need to address first to ensure this is feasible. The first issue is the collection of the trophy data itself. We aren’t getting very far if we don’t have access to any trophy information. The second issue is trophy difficulty. For some games, trophies are painless to achieve, while others can be highly challenging. Sometimes an achievement is met automatically with story progression, but occasionally a player needs to carry out specific actions to receive it. We do not want to confuse games with more accessible trophies as being more deeply played than games with complex ones. The final issue is the player’s overall engagement with trophies. Some players will go out of their way to collect every achievement, while others may not seek out any, receiving only trophies rewarded through the main questline. Most players likely exist somewhere in between. We need a way to deal with these differing depths of engagement across players.
The Data Issue
For now, I will not dive heavily into the details of this process, but I will mention that there are somewhat hidden API endpoints on the PSN to gather this information. Of course, whether the data is available for a given player is a function of their privacy settings. The ability for a player to remove access to their trophy data will slightly bias our recommendation system, but there is currently no way to circumvent this issue. Fortunately, the trophy information of many players is accessible. Another thing to note for those interested in working with this API is that Playstation has implemented heavy rate limiting, so collecting large amounts of player data is somewhat painful. Before writing this article, I had accumulated the data on approximately 110 thousand players and 70 million trophies. Data collection was by far the most intensive task for this project. Don’t underestimate the data itself!
The Game Difficulty Issue
For any game, we can derive the distribution of trophy progression across all players. We have access to various statistics of this distribution; the mean, the median, or a more specific percentile and the trophy progression it represents. Instead of using trophy percentage, which is heavily dependent on overall game difficulty, we can use progression percentiles across players, which acts as a normalizing agent across all games. Here is an example:
If games A and B have median trophy progressions of 25% and 75%, then we can conclude that a player currently sitting at 50% in both games is more invested in A than B than the average player. If we only referenced the overall trophy progression, we would have falsely assumed that the player has roughly the same interest in both games. While there may be more sophisticated ways of addressing this, percentiles are a simple way to quickly approximate player preferences against the population.
The Player Engagement Issue
For most players, the percentiles will revolve around 50%. However, some players will not pay attention to trophies, and their percentiles will be much lower. One way to solve this is to carry out the same normalization process used for games except within each player: a percentile of percentiles. While this worked well in the previous step, it isn’t as robust when applied to a single player. Once again, an example will help:
If we had two players who had only played three games, their percentiles across those games might be .05, .10, and .20 for player A and .80, .90, and 1.00 for player B. Even though their depth of engagement is on opposite ends of the spectrum, we should infer that both players have similar percentiles for each game when normalized against their own engagement. In short, Game C is each player’s top choice.
Final dataset
After collecting the data, restructuring it, and applying the transformations for game percentiles and player engagement, we end up with a long list of user-game pairs with a “rating”:
I have provided a Github gist here if you’re interested in the general spark code that generates this from the initial data frame.
Implementing Recommendations in Spark
Now that we have our data, it is time to implement the rating algorithm. There are two main paradigms in most recommendation situations: content-based and collaborative filtering (CF). Due to our data, a content-based approach is not feasible, so we’re limited to a CF approach. CF has a few different structures, but this information is readily available elsewhere (with a much better explanation) if you’re interested in the details!
We’ll start with an out-of-the-box solution already available in Spark to determine if our trophy data is relevant for creating recommendations. Further exploration into a custom-based solution is warranted if we find these recommendations are decent. We’ll use Alternating Least Squares (ALS), which can perform a low-rank matrix completion while also leveraging parallelization in a row-by-row manner. Here is the initial setup:
from pyspark.ml.recommendation import ALSals = ALS(
userCol="user_index",
itemCol="game_index",
ratingCol="rating",
nonnegative=True,
maxIter=20,
rank=200,
regParam=0.1
)
Overall, there is little trickery here. The first three parameters of ALS refer to the users, items, and ratings. “Nonnegative” indicates that all ratings are positive, which helps with the optimization path. We have some freedom in selecting the final three parameters: iterations, rank, and regularization. Iterations refer to the number of times the algorithm will iterate, although it can potentially converge sooner. Rank is the size of the latent vectors we will produce for each item and user. Finally, the regularization parameter can be set, which works in a similar fashion as when applied to ordinary least squares. Usually, we will do some grid search across these parameters and use an evaluation metric such as RMSE or MAE to determine the “quality” of the system. However, time is of the essence, so we’ll throw spaghetti at the wall, and if it sticks, we’ll dig deeper later!
Once the ALS model has been fit, we can look at the item factors or user factors to get a high-level understanding of what is meant by a “low-rank matrix completion.” Each player and item has a vector of length 200 representing them. These vectors point to the entity’s location in n-dimensional space and allow us to locate nearby entities. Here is a small slice of one vector:
Let’s finally find out if we’re getting any traction with our hacky trophy data and recommendation system combination!
Results
While user-level recommendations are more relevant, they are more complicated examples to parse. We’ll look at item similarities first to see how they compare with the “Discover More Games Like This” (DMG) feature on the PS Store. The easiest way to see if our recommender is working is to look up a sports or racing title, such as FIFA, NBA2K, or Gran Turismo. For Gran Turismo Sport, we should expect to see all racing games:
In the PS Store, most of the recommendations are racing games, except for “Puzzle Bobble 3D,” which feels a little strange. For our Spark ALS rankings, the top 40 games all incorporate driving as the primary focus. Also, note, I haven’t filtered out the reference game itself, so it will always be #1 in these lists.
Let’s take a look at Dark Souls III:
The suggestions from the Playstation Store for Dark Souls III don’t seem as strong as Gran Turismo Sport. However, our rankings show some of the most common souls-like games. Also, Salt and Sanctuary is listed twice because it has a PS4 and Vita version!
What about Knack 2?
I’m not familiar with most of the suggestions from Playstation for Knack 2, but I do see a trend in the Spark ALS rankings — LEGO and other games designed for a younger audience!
More niche, the fighting game — Guilty Gear XRD revelator
I’m not an expert on fighting games, but it looks like both the PS Store and the Spark ALS rankings are recommending a bunch of fighting games.
How about Uncharted?
The results for our Spark ALS rankings aren’t too surprising — other Uncharted games, Tomb Raider, and other story-based action/adventure games. The PS Store recommendations feel a little off. They are showing more recent releases instead of similar games, and maybe there is a valid reason for this!
Deeper Issues
The results look promising, but we did cherry-pick to some extent. While most games have relevant recommendations, I observed a few issues when delving more deeply. Here are the main ones:
Time sensitivity — sometimes, two games that did not seem very similar would appear on each other’s lists. The general trend seemed to be a tighter coupling for games released in the same month. When looking at the trophy data, many users were active for only a year or two, so it makes sense that we would observe some time dependence properties. To fix this, we could filter to a subset of users, the ones currently active, or figure out a savvy way to learn the time component of the latent vector and normalize against it.
Games with Limited Users — when you only have a small percentage of users from the PSN Network, it is hard to get a signal on some more obscure or indie titles. For example, if you only have two instances of users who have played a particular game, their information dominates the recommendations for that title. To temporarily fix these issues, I hard filtered out games with less than ten players, which removed about 50% of the games on the PSN. Fortunately, this is primarily a data collection issue, and as we add data on more users, this problem will not be as prevalent.
Lack of Novelty — if you look over all of the recommendations, at times, you would be like, “well, duh.” For example, entering FIFA 20 and getting every other soccer game over the last ten years is not surprising or helpful. Most out-of-the-box frameworks for building recommender systems don’t have a natural way to surface more novel or less expected items. Even if we had perfect purchase and play history for all users on the PSN, we would no doubt need to modify our system to create more meaningful recommendations that push players to try new but related things.
With all that said, we built this recommendation system on some pretty shallow assumptions! I do not doubt that by collecting more data, implementing validation tests, and assessing a broader set of parameters and algorithms, we would have a robust system built on trophies alone!
Final Notes — In Defense of Playstation
The question now becomes, why is Playstation struggling to reinvent their storefront around recommendations? We are likely being naive about all the processes surrounding a digital storefront. For example, there are no doubt “curators” who set up the design of many storefronts. If they have a defined cadence for changes, how do they successfully interact with a recommendation system to achieve their goals?
Also, additional constraints become apparent once recognizing other business objectives; certain games have different margins, and contracts with publishers could mean that specific games need to hit a certain number of views for an advertising campaign to be deemed a success. The monetary benefits for this business style may outweigh the importance of optimizing recommendations at the player level. If a recommendation system is not believed by leadership to create as much monetary lift, it might get scrapped altogether in favor of business deals and contracts.
In the end, I think those two systems, with a lot of communication, can work much better than one without the other. We’ve shown a quick way to construct a decent system without much data or effort. Hopefully, we can tackle some of the more profound recommendations and business issues from the last few sections in a future article!