Recommendation Systems for Rotating Stores in Video Games (Part Three)

Assessing the value that personalization brings to your players.

Riley Howsden
10 min readFeb 16, 2023
Photo by Aaron Burden on Unsplash

If you haven’t had the chance to look over the first two parts of this blog, you can do so at the following links:

Thanks to those who pointed out the Fortnite Reddit thread “the Item Shop needs an overhaul” as the motivation for further pursuing this series of articles.

In our last rotation, we left off with the core idea that differences between player preferences and predicting those differences are crucial in understanding a rotating store’s potential performance. Moving forward, we want to determine if players have a healthy spread of interest across the entire inventory, which is likely to be the case.

This post focuses on the high-level details in predicting patterns between individual users and items, although our goal isn’t to build a recommendation system outright. We will start with a conversation on impactful features that drive player decisions in the context of a rotating store. From there, we will discuss the difficulties of purchase “probabilities” and how they may not capture what one would expect. Finally, we will touch on how we can leverage our framework from the previous articles alongside our model outputs to assess the value of personalization more formally.

A Simple Start

Photo by Rodion Kutsaiev on Unsplash

Let’s start with an elementary assumption about the space in which we’re attempting to make these predictions. Our rotating store consists of a single item, and we randomly choose that item at some cadence, either globally or for each player. We observe whether each player purchases or passes on the content. To determine what factors most influence the player’s decision, we first need to speculate what some primary features could be in this scenario.

Features

Of course, many core features will be standard for purchase predictions under any setting. Is the item tethered to specific content in the game? If so, has the player interacted with that content? For example, how many hours have they played with a character or used a weapon? What is the overall popularity of the item? Do players with similar play patterns have the item? Do they use it? Have they recently seen this specific content in the game? We could extend this list of questions and features for quite some time. While we should include those in any model in this space, this article will only elaborate on the unique features of a rotating storefront.

Primarily, we want to focus on features available at the offer’s inception; collecting information after this, such as the number of times they have viewed the content, is valuable but will likely cause leakage in our predictions (a player has to click on something to purchase it). Let’s see what we can come up with!

Last Seen

Screenshot of Fortnite items and time last seen in store from fnbr.co

To extend our earlier conversation, clearly, the last time a user has seen an item holds weight; the longer an item has gone unseen, the higher the player’s demand. This effect likely scales with their baseline interest; while holding back an item with low demand will create a tiny lift for the next rotation, the impact of doing so on a high-interest item will be much more significant.

Number of Rejections

Digging deeper into the last-seen feature, we find a log of interactions with the item itself. Each time we know that the user has actively rejected a purchase, the probability of a future purchase should decrease. However, we need to understand other reasons why the transaction may have yet to occur; perhaps it was the perfect recommendation, but they could not afford the item at the time. Therefore, we must be cautious in setting firm rules that filter previously “rejected” content.

Location

Since the rotating store interface is often frozen at the length of one page and there is no scrolling, location will likely play a different role than it would in a direct purchase storefront. Still, we should understand the impact that placement decisions have. After all, rearranging an item should be trivial if we set up our system correctly. The best way to envision this for our simple scenario of one object is that the offer exists on the landing page alongside publishing beats. We wish to understand which location on that page is most beneficial for sales.

Player’s Currency Balance

League of Legend’s Your Shop and Player Balance Screenshot

Most service games have a secondary or tertiary currency the player has previously purchased. While I have never seen any indication of this being meaningful for predictions, it is still heavily speculated that having enough of this currency to cover the cost of an item impacts the probability of purchase. The idea is that there is a sharp cliff in interest once the item is outside the player’s current balance since the player must make a new charge. We would likely encode this feature as a simple dummy variable of “over” (0) and “under” (1).

Player Activity

Do we even expect the player to be active during the window where we make the rotation? Player activity is a massive indicator; if the player never logs on, they can not purchase the item. We could calculate a handful of features representing the player’s history that will proxy their chances of seeing the store.

Cadence

Cadence interacts with player activity. However, this only becomes a meaningful feature if we expect different time windows for players to purchase things. If there are things such as flash sales that are only on for a fraction of the typical cadence, this will impact the purchase probabilities.

Time Remaining

Screenshot of Store Countdown Timers from VALORANT

Depending on how we set up the structure of our predictions, we might consider the time remaining as a feature. In most systems, we predict the purchase probability over the entire window, but for others, we might want to indicate that probability over a single interaction with the store. In the latter case, information on the remaining time is likely crucial.

Other Features

The list above is partial, and it should be apparent that relaxing our assumption of only showing one item will induce a greater complexity in the feature space. For right now, we’ll ignore those details so we can more deeply explore some issues with purchase “probabilities.”

Purchase “Probabilities”

From the features above, we could choose a classification model from a barrage of options, with our goal being to output a prediction for each item & player combination. It is reasonable to assume that those predictions, which range between 0 and 1, represent actual probabilities. However, let’s not fool ourselves, as this is often not the case. Even for a basic model, such as logistic regression, it is unlikely that its outputs will represent actual purchase probabilities, especially when there is a high imbalance between outcomes. The best way to test this is to plot predicted values against actual observations. One way to do this is by creating a bucket for each range of predictions. (0%->1%, 1%->2%, 2%->3%, etc.) and calculating the average of each bucket’s actual outcomes. Below is an example of what this looks like for a variety of models:

Calibration Curves Image from Scikit-Learn

Notice that these all have a sort of “S” shape. Depending on the model, predicted probabilities are optimistic in some ranges and pessimistic in others. The model outputs are not probabilities, at least not accurate ones. How can we fix that?

Calibration

One solution is to use an additional transformation known as a sigmoid regressor to map these values to a better representative state.

This logistic model works best if the calibration error is symmetrical; the magnitude of a mismatch for the un-calibrated model is similar for both high and low outputs. Technically, the binary classes should show normally distributed characteristics with the same variance; this is often a reasonable assumption if the data is well-balanced across the two categories. However, outcomes are often heavily imbalanced. In our example, the number of purchase decisions is minimal compared to non-purchase ones. In this scenario, the calibration plot will look far from symmetric, something closer to what is seen below:

In this situation, it is better to use an isotonic regression to fix the probabilities. The general form may seem familiar:

Even though this looks like linear regression on the surface, the constraints are looser, allowing isotonic to be more free-form, capturing more non-linear behaviors. The only condition is that the function it outputs must be non-decreasing. If the data set is large enough, isotonic regression will perform as well as or better than the sigmoid approach simply because it assumes less about the distribution and has more power to correct an un-calibrated model’s distortions. If the regressor performs well, the output will look like this:

Note that there are two lines here, just that they overlap. By carrying our original model predictions through this additional calibration step, we better map our output “probabilities” to realized probabilities.

Valuing Personalization with Model Predictions

Now that we have something more akin to probabilities, we can shift our focus back towards the value of personalization, which we will see is tightly connected with the accuracy of our model. Recall from the last article that our ability to assess the value of personalization in a rotating storefront is based mainly on identifying a player spread in preferences. What does this mean in terms of the model? If given all of the features and a model appropriately trained, we cannot predict which users are more or less likely to purchase an item, we shouldn’t expect any lift from personalization.

Fortunately, this will rarely be the case. We’ll have at least some ability to predict affinities. Once we can do this, we can substitute them for the hand-wavey simulations of purchase probabilities we made in scenario four in part one of this series. For each player, we can now stack rank content. Each player will have a likelihood of purchasing their most preferred item and a likelihood they would have bought the population’s most preferred item. We can calculate the lift as follows:

By aggregating this across all players, we get an indicator of how much personalization would benefit our rotating storefront. We didn’t even have to launch a recommendation service to gauge that! Note, that we would want to simulate this process over a more extended period as the lift will become less over time as the player starts purchasing their favorite content.

Price

While comparing player vs. population probabilities is essential for our valuation goals, we have left out an additional layer until now; price. The price will be an important, if not the most important, feature of an item. Not only is it a critical factor in a user’s purchase decision, but it also impacts the revenue generated. Instead of purchase probabilities, we should extend our focus to expected revenue:

Note that not only is the purchase probability dependent on price, but that price heavily influences the expected value. Therefore, an item with a lower purchase probability might show up over another if the price differential between the two is high enough. Our formulation around lift would instead need to consider the ratio of expected values to determine the full potential of personalization.

Final Thoughts

After three full articles, we have finally solidified a framework for evaluating the value of personalization in rotating stores. That said, there are many other compelling things to uncover about rotating stores that are worth exploring in future articles. A few that are top of mind:

  • Content Dependence — notice how, in this article, we assumed a straightforward setup with only one item. In the real world, we want to consider showing multiple items to a player at once, and the mere existence of another option alone is bound to change the player’s decision process. How do we adapt our models to account for this?
  • A Rotating Store Recommendation System — now that we have identified the value, how should we build out that value? Recommender systems are commonplace nowadays, but additional details beyond the norm exist in building one for a rotating shop.
  • Discount Optimization — at some point, we may want to offer discounts on older content; how do we determine which deal is the best to ensure we make a profit and don’t cannibalize sales instead?
  • Store Optimization — if we combine everything into a holistic picture of the store, how do we optimize all the pieces at once? What additional information must our recommender know? About discounts? About dependence? How do we optimize the entire basket of goods we show to a player?

If any of those topics sound interesting, I’ll see you in the next rotation.

--

--

Riley Howsden

“Half-Stack” machine learning propagandist in the gaming industry — a critic of all data presented deceitfully, unless it contains a meme, then it must be true.