Recommendation Systems for Meal-Kit Services

A proposal to limit “Food Decision Fatigue” along with other applications for personalized understanding in the meal-kit industry

Riley Howsden
11 min readJul 1, 2021
Photo by Icons8 Team on Unsplash

Where should we eat? Regardless if it is a small group of friends, or you and your spouse, everyone has been there, trying to decide the best venue for food to devour for your upcoming meal. If making a swift selection as seemingly simple as this is a monumental task, choosing multiple recipes from an extensive list is far more daunting. Yet, that is the expectation for most meal services. If instead, we designed a system that could reduce this friction by recommending and assigning default meals, we could create a more fluid user experience and prevent unneeded attrition.

I’ve become a huge fan of meal delivery services over the past year. I’ve pondered the unique problems these services face and how machine learning or optimization may render a solution. This post will detail recommendation systems in this space and how one might rank recipes for users. In a previous article, I’ve written about using these rankings to supplement the optimization process for meal allocation across users.

What is meal delivery?

The meal delivery service is a relatively new business model, originating in Sweden in 2007. In 2012, three meal kit companies entered the United States: Blue Apron, HelloFresh, and Plated. The market has since grown to support around two dozen providers, with each falling under one of two main umbrellas:

  • Meal kits — pre-portioned food ingredients and recipes to prepare homecooked meals; sometimes partially-prepared
  • Meal delivery — pre-cooked meals

Some differences across companies appeal to various consumers, whether it be a specific diet such as keto or paleo or another restriction, such as vegan or vegetarian. Regardless of the amount of pre-preparation included in a meal delivery service, or the underlying meal contents, one thing shared is that they often run on a subscription plan. Thus, a customer can choose their meals at some cadence, typically a week, and can skip weeks or altogether cancel if desired.

Meal Recommendations

A large portion of the appeal for meal-delivery services comes from convenience; the consumer’s required work is limited to a smaller subset of what they originally had to perform to bring a meal to the table. That said, each week, a minor inconvenience occurs when the service sends a reminder to choose upcoming meals. For some, this may be enjoyable, but it could be frowned upon by others. Of course, choices should remain for those who want to make them, but a system should intelligently choose the default selections and communicate the high-level reasons for doing so. For example, how would the value of a consumer receiving an email stating, “We’re excited to unveil your top meals for the upcoming week! Get ready for your next box!” compare to a generic reminder to select meals?

Admittedly, the suggestion of a recommendation system should come as no surprise for almost any company, regardless of industry. However, there is a component of the meal-delivery service that makes recommendations more compelling. While customers rate meals, one might make the argument that they are implicitly scoring ingredients instead.

Photo by Cristiano Pinto on Unsplash

Therefore, a recipe’s contents and the user’s score will become a strong indicator of future recipe preferences, even if that recipe is brand new for all users. As long as some of the ingredients were present in previous recipes, meal kit recommendations can somewhat bypass the familiar cold start problem that plagues recommender systems in other settings.

Recipe Representation

The first step is to represent these recipes in a tractable way that a machine learning system can leverage. A fundamental way to numerically describe ingredients is through a one-hot encoding, a vector with as many dimensions as the number of unique ingredients across all recipes. Each dimension is represented by a 1 if included in the recipe and with 0s everywhere else, such as below:

There are a few problems with this; first, these vectors are sparse and often immense, increasing computational complexity and becoming less feasible at scale. Secondly, all vectors are orthogonal and equidistant from every other vector; we will capture no relational information between any two ingredients. Effectively, garlic is just as similar to onions as it is saltine crackers; we need a better representation. Fortunately, the concept of embeddings can be applied here; in the same way that words have a context within sentences, ingredients can also have a context within recipes. The goal is to map each of our ingredients to a smaller n-dimensional space, which allows for better comparisons.

For word embeddings, the primary learning approach is to leverage a word’s surroundings using a defined window of neighboring words. In our scenario, adding ingredients to a recipe could be encoded in a sentence-like fashion. It may also prove beneficial to capture other critical information from the recipe beyond its components, such as cooking actions: “chop, drain, marinate, mix.” Putting it all together in an example: “Season chicken breasts with salt and pepper. Rub with olive oil. Place on a baking sheet” would transform into [“season,” “chicken breasts,” “salt,” “pepper,” “rub,” “olive oil,” “bake”]. However, this is probably a tad optimistic as parsing a recipe’s steps accurately would require additional pre-processing steps. A simpler version could take all of the ingredients in a recipe and randomly permute them, outputting multiple versions of the same recipe that we can use to understand the overall context. In the end, our output will be a compressed ingredient2vec embedding, such as below:

With this, we can directly compare how similar one ingredient is to another using the distance between these vectors. From here, we could represent a given recipe by naively averaging the embeddings of the recipe’s components to form a final embedding. Unfortunately, this inherently ignores some of the interaction between ingredients, and it may be preferable to create a recipe2vec embedding to capture this information.

User Preferences

Additionally, we need to extend this to the customer; required is a method to determine their preferences. If the process includes a way for users to select their meals, we could leverage this as implicit feedback of their score on an item over unselected items. However, it would be better to have direct feedback and a more accurate assessment from the user after consuming it. Ideally, we would find a way to combine both methods into an overall rating system that accommodates users who do not use an explicit scoring system and those who do. Within the embedded space, all we need to do is look for nearby recipes already scored highly by the user! As mentioned earlier, additional features exist outside of ingredients, such as preparation tasks, the overall time to complete, and tools needed. These could map to the satisfaction of a given dish in these services; for example, while taste could be optimal, preparation time might deter a high rating.

Photo by Jimmy Dean on Unsplash

To complicate things further, a single meal kit is consumed by multiple people; we should expect that each person will have a different rating for the recipe. If scores for multiple users were available, we should modify our targets to account for this. Ideally, we want to ensure that we’re providing high-ranking meals across all the consumers. A simple solution would be to average the scores amongst the users, but taking the minimum ranking from all users would ensure that we don’t suggest a recipe that a particular user detests. That said, I have yet to see a system that allows for multiple users to rank recipes, so these improvements are wishful thinking.


A looming issue that remains is a lack of serendipity. This concept is closely related to that of novelty, but there is a slight difference. In recommender systems, novelty often refers to something new but expected, while serendipity refers to something surprising but not necessarily new. For example, suppose we added a new Indian recipe as an option. In that case, the recommendation of that dish to a user who typically eats Indian food is novel but not necessarily serendipitous. However, if we suggested an Ethiopian dish to that same user, it may have been previously unknown that the recipe would appeal to them at all; this is serendipity.

Photo by Lai YuChing on Unsplash

All of this is important because diversity likely plays a crucial role in the overall satisfaction of many customers. For example, if one were to receive a box each week with five Italian meals, each meal could score well independently, but the limited variety may disappoint. In essence, the entire box’s score is not necessarily the sum of its parts; there may be a degradation in adjacent recipes’ ratings if they are too similar. For now, we will focus solely on encouraging serendipity in our rankings and deal with the entire box optimization elsewhere.

How much serendipity should we allow? This decision lies at the crossroads of exploration and exploitation that is common in many machine learning problems. The safe route for the short-term is to exploit a user’s direct preferences, continually looping through the same meals they have already scored highly. Over time, there might be a concern that the user starts to tire of the same menu and is hoping for something new, which is where the value of exploration comes in if we can successfully push users towards new and relevant dishes that they love, then we will have broadened their interests. The additional diversity across the menu will help with retention; serendipity, therefore, can create more significant long-term gains.

Leveraging Others

In a previous section, we relied on a single user’s preferences in determining whether a meal would be desirable for them. While this is valuable information, depending on it alone ensures that our recommendations will lack serendipity. For example, if a user scores all Italian meals highly, then the system will continuously recommend Italian meals; in technical terms, it is reasonable to say that they can’t escape the local minima they have created for themselves. Conceptually, the solution to this is relatively simple; instead of focusing on a single user, focus on multiple users who are similar to the current user. In that setting, if other users have had the chance to try a dish outside of Italian food and scored it highly, then we can leverage that knowledge to help another user explore effectively. This concept is known as collaborative filtering and, when in sync with a user’s score on recipe contents, provides a hybrid solution that is more well-rounded in its rankings.

Other Applications

While the primary use case of generating recommendations is to serve users directly, there may be a lack of internal resources to build a capable machine learning product. Also, logistics across the entire company and constraints on the delivery service itself might make it infeasible altogether. However, even from an analytics perspective, there are reasons for building out ad-hoc recommendations for users based on their underlying preferences. Here is a couple:

General Inventory Management

Even if the system cannot leverage personalization, the aggregation of recommendations across all users is valuable in understanding the overall demand across meals. If coupled with current inventory knowledge, either the amount of each ingredient held or the number of prepped meals, we can generate a general demand function to make fundamental decisions on prioritizing the production of future meals.

At first glance, it might seem adequate to just aggregate star ratings and make decisions from there. While this would provide a reasonable approximation of demand, there is still value in generating user-level rankings before aggregation as different users will have varying scales. Not only that but in cases where the meal is brand new, there will be no historical ratings, voiding the general aggregation tactic entirely. Even if ratings are available, the ranking exercise will provide more granular details for matching demand appropriately. After all, this better matches the decision process a user will perform in a given week; effectively, they are ranking their top 3–5 meals for delivery. With those individual rankings, we can more effectively simulate the demand and plan accordingly. It should also help in determining the best default meal selection for the entire population.

Generating New Recipe Suggestions

If the business is trying to decide which new meals to make, an understanding of user preferences can help. By running potential recipes and their ingredients through an ad-hoc recommender, we can create a distribution of expected rankings for our user base. Therefore, we can identify recipes that are more likely to be successful. Another layer might be to look for recipes that appeal specifically to users who have a larger chance of attrition. It may be that there are not enough meals to entice them to stick around, and adding meals that specifically cater to them and not necessarily the entire population might be a worthwhile tradeoff.

Potential Drawbacks

An unintended trade-off when creating recommender systems, and specifically, in this case, is if our rankings are too good, will our website see less traffic from subscribers? After all, we’ve adequately chosen a set of recipes and emailed the user what they should expect for the next week; why would they need to make changes if everything looks fantastic already? So while our recommendation system has solved one problem, any ability to grow the business through upsells or additional products could potentially suffer.

In this case, we need to think about the downstream effects of a recommendation system and other business areas that could also leverage a machine learning approach. For example, if we can adequately predict a user’s preferences, the chances are high that we can also learn indicators to identify individuals who are more likely to purchase these upsells. If this is possible, we can hone those offers towards consumers who would be fond of them while limiting annoying and invasive interaction with our more general audience.

Final Thoughts

A recipe recommender system can leverage additional context through an understanding of its ingredients. Encoding those features in a useful way adds a layer of complexity to the problem, and in this article, we briefly explored one possibility for representation. While the embedded approach outlined above is one way to solve the problem, others exist, such as constructing a bipartite graph of ingredients/recipes and building a flavor network. The Nature article here and this Uber eats blog take a look at the graph approach. Omnomnom!



Riley Howsden

“Half-Stack” machine learning propagandist in the gaming industry — a critic of all data presented deceitfully, unless it contains a meme, then it must be true.