Recommendation Systems for Rotating Stores in Video Games (Part Two)

Assessing the value that personalization brings to your players.

Riley Howsden
11 min readFeb 18, 2021
Photo by Oneisha Lee on Unsplash

I recommend readers look over the first part of this blog; that post is found here. Part three of this series is HERE.

In the last blog post, the goal was to explain a rotating store’s core concepts while imposing some minor personalization details. We generated a quick estimate to understand personalization’s monetary impact on a rotating store. Unfortunately, this painted a naive optimism towards personalization; this post aims to dig into the areas where the previous fell short. Let’s get started!

A quick note — since this post is focused on rotating stores in video games, I will interchange the term users with players often. That said, the rotating store use cases are broader than video games alone.

Throughput

Recall from the previous post that we assumed that a player would receive a single item in their personalized rotating store, and that item would change once a week. While it depends on the catalog’s depth, in reality, the number of items shown to a user is likely greater than one. On top of that, the cadence could vary in length. While both of these modifications work hand-in-hand, it might be useful to formalize the concept of throughput as:

Throughput = # of Items * # of Updates

We could define throughput over any timeframe, but for now, we’ll focus on yearly throughput. In our toy example from before, the annual throughput would have been 1*52; 52 items. This throughput can have a wide range; for example, the rotating store could show three things every day, over a thousand every year. Different systems could have the same throughput; if we offered a player 13 items once a quarter, they would also see 52 items. Note that these items do not have to be and often will not be unique throughout this timeframe. Although it is natural to impose restrictions that prevent the same product from being offered twice in the same rotation, this will likely not carry over to an extended timeframe. In both of our examples above, 52 items will probably not be 52 unique items unless we impose additional constraints.

Store Structure

In all the cases I have seen within gaming, the store’s underlying structure is static; a designer decides the total throughput and sets the layout. In theory, the store could be modular, supporting any number of tiles in any orientation; examples below:

Of course, we could also personalize this structure; perhaps one player is more likely to buy content when laid out vertically rather than horizontally, but another may not. We could also modify the consistency across rotations — is it better for the store to always have the same structure, so players are familiar with it, or would a random design, one that changes with each rotation, be more appealing? While this may seem like the prime scenario for A/B testing, let’s not get ahead of ourselves; the gains from optimizing the content itself far outweigh the store’s structure. While placement and presentation are meaningful, the primary motivator in a purchase is item affinity. We should avoid optimizing the store structure until we are serving content adequately.

Catalog Scale

It may come as no surprise that the value of recommendations, in general, is proportionally related to the catalog’s size. If we were to randomly select items each week for a player from a small inventory, then the chance their best content will show up within a reasonable time is relatively high. For example, if only four items were available and we were to select two of them, the lift that personalization will bring is negligible. We should not extrapolate the results in these types of scenarios onto personalization efforts as a whole, as they are not equivalent.

However, in the presence of a rather extensive catalog or a moderately sized catalog with low throughput, a very long time could pass before the player sees any relevant items if we choose at random. While not technically rotating stores, it is clear to see this issue on large e-commerce websites, such as Amazon. With millions of products, personalization becomes a difficult task and an important one; it is far less likely that you will stumble upon products at random. Combining an accurate search engine alongside personalization is a must to combat the massive size of these catalogs.

Catalog Growth

If the number of available items within the store is not static, another difficulty emerges, especially if new item growth is higher than the throughput itself. One strategy that the gaming industry uses to ensure that players do not miss fresh, new content is by guaranteeing availability for a set time upon release. Usually, stores achieve this by adding a pre-scheduled “featured” area that is not unique from one player to the next. However, suppose content drops are too high. In that case, a sacrifice needs to be made, either by lowering the amount of time an item is featured at release or personalizing these featured sections themselves. Under the former scenario, the player may not have had the chance to log-in before other featured products take over. In the latter, the strategy guarantees that the player will not see a portion of the items, but the trade-off may be worth it to show more relevant products for a longer time. In most cases, the catalog’s growth will not outpace the rotation throughput, but it is still an important aspect to capture.

Long-Term Scenarios

Before we start building out the long-term scenarios, here is a recap of the main levers that should affect the outcomes of a rotating store:

  • Rotation Size — how many items will be available in a rotation?
  • Rotation Cadence — how often will the store rotate?
  • Catalog Size — how big is the catalog initially?
  • Catalog Growth — how does the catalog size increase with time?
  • Distribution — what is the underlying purchase distribution?

In the previous post, we went over some scenarios for the purchase distribution. At the population level, items were either wanted equally, represented by a uniform distribution or wanted unequally, represented by a power distribution. In this post, we will leverage the latter for all of the upcoming simulations. I have created the script below to approximate how each of the above levers affects personalization.

To explain some of the core pieces:

  • The preset values for our scenario attempt to mimic a reasonable real-world system; rotation cadence once a week (52), six items in each rotation, catalog size of a thousand, catalog growth at two items per week, and a rough list of purchase probabilities that follow a power distribution — averaging at 3% purchase rate. The only questionable preset is the spread, which simulates how varied users are in their preferences.
  • The “add_items” function does as one would expect; adds items to the catalog. We use this to initialize the catalog and also grow it later on. The initialization is slightly naive, duplicating the given distribution until it reaches the correct size. If the catalog size is not a direct multiple of the distribution size, then a few items are randomly sampled from the distribution to meet that criterion.
  • In “add_items,” there is a list comprehension that creates tuples. The structure of these tuples is (item_number, popular_pref, personal_pref). To simulate personal preference, we multiply the popular_pref number by the spread to get a range of numbers around the default purchase rate. This operation is somewhat arbitrary but easy to ensure that the average purchase rate across items over users remains consistent; we’ll expand on it later.
  • The “rank_items” function sorts the items in the catalog; the following methods are supported:

Random — completely disregard any underlying preferences and shuffle the catalog.

Popular — sort the items based on their overall purchase probabilities.

Personal — sort the items based on individual purchase probabilities.

  • The “rotate_simulation” function runs a year-long simulation of the storefront for an individual. Each rotation determines if the person purchased the item, removes it from the catalog if needed, and adds new items. The critical piece here is simulating the purchase decision, and we reference the individual’s preference to determine this.
  • Rewriting this as a RotatingStore class would be a good idea. :)

In running the defaults for 10,000 individuals, the results are as follows:

The number of items sold under the random strategy ~9, popular ~37, and personal ~67. The biggest thing to point out here is how poorly the random strategy fairs — given this underlying distribution, a generic popularity heuristic is likely to increase the number of sold items 4x. Due to this lack of performance, we’ll ignore the random method from here on out; a popularity rating should be a relatively straightforward modification to add on anyways. We now shift our focus onto the popular vs. personal, which is a lift of 82% in the default case. Let’s look at how changes to the default scenario affect the ratio between these two numbers.

Catalog Variables

For more extensive catalogs, personalization is more valuable. In the same light, if the catalog grows quickly, it better accommodates the personalization scenario.

Rotation Variables

As the rotation cadence increases, the value of personalization decreases.

As the rotation size increases, the value of personalization decreases.

The general idea here is that the more often the store is rotated, the more it starts to mimic a direct purchase store’s dynamics. While the user can not purchase each item every day, it won’t be long until an item shows up again, and therefore, high utility items are less likely to get buried. If a store team cannot deliver a personalized experience, raising the rotations may be a secondary solution.

Distributions

While the power distribution is a decent representation of item popularity within any catalog, we could simulate other distributions. Here are a few examples:

For the distributions above, the lift under the default scenario is 140% for uniform, 130% for linear, and 72% for plateau. The power distribution (82%) now feels underwhelming, and we can conclude that this choice in our simulations was not boosting the numbers for personalization.

To extend this to more general technical terms, the higher the distribution’s entropy, the bigger the opportunity for personalization to shine. For example, the uniform distribution, which maximizes entropy, is most susceptible to personalization. On the other hand, the purchase spread is limited for the plateau scenario, and the overall lift is much smaller.

Purchase Rate

In our default scenario, we set all distributions to have an average purchase rate of 3% across the catalog. Of course, lowering this number would mean an absolute drop in sales regardless of the strategy, but does it also degrade the importance of personalization?

Somewhat surprisingly, the trend indicates that the lower the average purchase rate from a distribution, the higher the lift is for personalization. Unfortunately, we can’t just switch on a better purchase rate in our rotating store design. Still, it should be reassuring that differences in purchase rates from our chosen 3% are negligible in the larger scheme.

Spread

All of the selections for our simulation seem relatively straightforward up until we reach the suspicious spread variable. As mentioned earlier, spread here works to estimate how different preferences are between one user to the next. If the spread is zero, then there is nothing special about any individual; their preferences mimic those of the crowd. If the spread is 1, a user has a preference between 0x and 2x that of the population. If the spread is 0.5, a user has a preference between 0.5x and 1.5x. Here is how this spread variable affects personalization:

This variable has a massive impact on personalization; it essentially is the value of personalization. From catalog size to rotation cadence, all of the other variables, while essential in assessing a rotating store’s overall design, are dwarfed by the underlying preference spread across users.

The question now becomes, is the spread we’re selecting even realistic for simulations? It seems quite optimistic, but it could also be pessimistic. For example, imagine there were 32 items in our store, all with a uniform purchase probability at the population level, except each user wants precisely one of these items and none of the others. The population distribution is that of maximum entropy, while each individual’s distribution has minimal entropy. In this scenario, the spread is far greater than anything we have simulated, and implementing some form of personalization is crucial. But is this even a real scenario? You may have noticed that this case extends from a sports league, such as the NFL, where content is attached to separate teams (one of 32) or players; in these scenarios, polarized preferences across fanbases are commonplace. Content in video games is often tethered to specific characters, classes, or weapons, and the same effect occurs.

As for being overly optimistic, the real spread is likely to be normally distributed around the population preference, whereas we lazily distribute the spread uniformly. The devil is in the details when it comes to selecting this variable. However, this issue is broader than a single spread variable, as explained in the next section.

Prediction Accuracy

While we could continue to focus on the single spread variable, in reality, we need a more general approach towards understanding player variance. That simple modification is likely not correct, and instead, we need to predict any given player’s purchase patterns accurately; this is paramount for real-world systems. Again, if the actual preference distribution from one player to the next is large, personalization is powerful. Still, it won’t be significant if we can not make accurate predictions. If we have millions of users, getting a popularity metric correct is not hard; just average purchase rates across all items and sort them universally, removing specific items already owned by individual users. Yet, getting probabilities for a particular player is more challenging, especially if we have limited data on them. We were once again naive in these simulations, assuming that we predicted each individual’s preferences flawlessly. In reality, even with the best models, we are only going to capture 50% to 80% of that additional value.

Summary

After all of that, it still feels unclear the exact value personalization will bring to a rotating store. The truth is, it depends. On the one hand, there are signs that it can be quite impactful in most rotating store scenarios. However, on the other, we still need to come to terms with our ability, or lack thereof, to understand our users deeply and predict patterns amongst them. On the bright side, we now have a better framework for thinking about rotating stores, but our journey isn’t over yet. In part three, we’ll brainstorm on building accurate predictions and continue down the path of addressing small nuances that we have ignored.

Here is a taste of a few of those:

  • Store “Freshness” — if personalization continues to show the same things for each user over multiple rotations, is that “staleness” a bad thing? If so, how might we address that?
  • Item Pricing — if our catalog consists of products with varying costs, how does that impact our selections?
  • Model Scope — if we can create accurate models from historical data and apply those models for future recommendations, will we yield the same results? Why or why not?

See you in the next rotation.

--

--

Riley Howsden

“Half-Stack” machine learning propagandist in the gaming industry — a critic of all data presented deceitfully, unless it contains a meme, then it must be true.