How to Conquer Daily Fantasy Football Competitions using Optimization — Part Two

A method for determining an optimal lineup for Fanduel, DraftKings, or Yahoo DFS

Riley Howsden
12 min readNov 9, 2021
Photo by Adrian Curiel on Unsplash

In the last post, we focused on an optimization that was a watered-down version of our overall goal. However, in seeking a solution for a subset, that of only three wide receivers, we could more deeply explore the caveats of what is meant by "optimal." We looked at a few problematic approaches, from a brute force selection that inefficiently finds an optimal solution to a random sampling selection that, while less computationally taxing, doesn't guarantee an optimal solution. In the end, we arrived at an approach using integer programming that effectively prunes the space and determines the best lineup. I recommend glancing over that post before diving into this one, as it sets a foundation for the concepts we will discuss in detail.

In this post, we will propose using the previously detailed framework to find the optimal solution for an entire lineup. From there, we will discuss some other standard lineup configurations in fantasy sports and how this approach can fit those too. Finally, we'll finish the post off with a few words around additional issues that can complicate setting a successful daily fantasy lineup.

Brute Force Reminder

In our last scenario, the number of wide-receivers we had to choose from was approximately 160, meaning that there were around 700k combinations for choosing three players. Of course, pruning allowed us to decrease the number of "reasonable" candidates to 15, making the total combinations of three players around 500, which is still tractable. However, once we turn to a full lineup, the number of combinations explodes as we need to carry out this selection process for each position. Even with substantial pruning, the brute force search space can become huge when considering the product of possibilities for each of these decisions. In mathematical terms, with "i" representing the position:

Also, while constructing a brute force solution was relatively trivial for a single position, it will require custom reworks as nuances in lineup rules appear. One of these nuances is when players can take on multiple positions; this is more common in NBA lineups and presents itself in the "FLEX" construct in fantasy sports, where various positions are eligible for the spot. Fortunately, as we will see below, the integer programming framework can handle these variations with relative ease.

Setting up the Data

Before we jump into the formulation for our final lineup, it is essential to construct our data in a manageable way that the integer program can leverage. Most of the difficulty in solving mathematical programming problems is first creating a robust representation of the underlying data. For now, we will hone in on one implementation but take note that others likely exist. The specific dataset we will be using is the results from the first week of the 2021 NFL season, found HERE.

For each player, we have three critical pieces of information about them:

  • Position — these create hard constraints on if a player can be slotted into a given place or not.
  • Salary — as a whole, our lineup is constrained to some salary cap.
  • Points — the number of points the player scored or a prediction of an upcoming score; for our scenario, we will be using player points from a week that has already concluded

Overall, our player data frame looks something like this:

Now, let's combine the position and points columns to construct a "points matrix." If you've heard of the concept of one-hot encoding, this matrix will look familiar; each position column will be populated by a value equal to the player points if the player is from that position and 0 otherwise:

On the surface, the value of this representation may seem unclear, but it should become somewhat more straightforward as we add in a column for the "FLEX" position. For the NFL, specifically FanDuel, the FLEX position can be filled by an RB, WR, or TE. Therefore, when adding this column, we set the player points equal to 0 unless they are from one of these positions.

To jump a little bit ahead of ourselves, this suggests that if we were to choose a QB or DEF for our FLEX position, that we would receive 0 points. Technically, this is not allowed, but since we are optimizing on total points, cases where the algorithm will choose this, are, if not nonexistent, of extremely low probability. The code to generate this points matrix is below:

pos_map = {
'QB': ['QB']
'RB': ['RB']
'WR': ['WR']
'TE': ['TE']
'FLEX': ['RB', 'WR', 'TE']
}
for p in pos_map.keys():
df[p] = np.where(df['Position'].isin(pos_map[p]), df['Pts'], 0)

This code effectively creates a column for each position and populates each player's points if they meet the requirement. Note that we made a position map to more effectively match a player to valid assignments. Now that we have modified our points representation to simplify optimization let's turn to the optimization itself.

Optimization Constraints and Objective

While we created a selection vector in the previous post, we now need a selection matrix to determine which players we have selected for each specific position. This representation is a sparse matrix of 0's and 1's, stating whether we have chosen a given player. We have aligned a position constraint vector underneath the selection matrix, indicating how many players the algorithm must select for each position. Attaching the salary onto this, we end up with something that effectively represents all of our constraints. Note that we have only shown a handful of players in the image, so the conditions themselves will not make much sense within this subset.

From this, we can see the constraints (three red boxes) that must be satisfied:

  1. The sum of each column in the selection matrix must equal the number of players in that position; for a standard lineup, we need precisely one quarterback, two running backs, three wide receivers, one tight end, a defense, and a single flex player. Note we have omitted the defense position for a more straightforward visual representation.
  2. The sum of each row in the selection matrix must be equal to or less than one; we may only select each player to take on one position.
  3. The sum of the product of each selection vector and the salary vector must be less than the salary cap; we can not select players in a way that would break the salary cap.

We also need to remind ourselves of the objective function we intend to maximize, the overall number of points scored. We need to calculate the element-wise product between the points and selection matrices, more formally known as the Hadamard product.

Hadamard product: element-wise product — image from Wikipedia

From here, a summation across all of the elements determines the total points our entire lineup was able to generate.

Setting Up the IP Solver

Similar to the last post, we need to initialize the solver as follows:

from ortools.linear_solver import pywraplpsolver = pywraplp.Solver.CreateSolver('SCIP')points = df[config['POSITIONS'].keys()].values
salaries = df['Salary'].tolist()
players = df['Player'].tolist()
salary_cap = config['SALARY_CAP']
num_players = len(players)
num_pos = len(config['POSITIONS'])

Again, we'll be using the SCIP solver for this example, which you can learn more about here. It is important to note that additional proprietary solvers exist, such as Gurobi or CPLEX, so feel free to look into those as well if you're interested.

One other thing that you may have noticed the previous code referencing is a configuration dictionary. Instead of hard-coding requirements into our constraints, we can take a more modular approach that we can revise quickly in the future. For our current lineup, the config is:

{
'SALARY_CAP': 60000,
'POSITIONS': {
'QB': 1,
'RB': 2,
'WR': 3,
'TE': 1,
'FLEX': 1,
'DEF': 1
}
}

Adding Constraints

Our first task is to express the constraints mentioned in a previous step with a structure that the solver can ingest. To set up the binary problem, we can start with:

x = {}for i in range(num_players):
for j, pos in enumerate(config['POSITIONS'].keys()):
x[i, j] = solver.IntVar(0, 1, players[i] + " " + pos)

Remember that our integer variable is bounded by 0 and 1; each player can only take on a binary value of being selected or not. Of course, we loop through all player and position combinations, effectively representing the selection matrix. Now we need to constrain the number of players selected for each position.

for j, pos in enumerate(config['POSITIONS'].keys()):
solver.Add(solver.Sum([x[i,j] for i in range(num_players)]) \
== config['POSITIONS'][pos])

An additional constraint that we did not have in the previous blog post is that we need to ensure that the solver can only select each player once:

for i in range(num_players):
solver.Add(solver.Sum([x[i, j] for j in range(num_pos)]) <= 1)

Finally, we also want to ensure that we can pay the players we've chosen, so we set the salary constraint:

total_salary = 0for i in range(num_players):
for j, pos in enumerate(config['POSITIONS'].keys()):
total_salary += x[i, j] * salaries[i]
solver.Add(total_salary <= salary_cap)

Adding Objective

The last thing to add to our solver is the objective. In our case, we want the solver to maximize the overall points for the selected players:

objective_terms = []for i in range(num_players):
for j in range(num_pos):
objective_terms.append(points[i][j] * x[i, j])
solver.Maximize(solver.Sum(objective_terms))

Solve

Now that we have laid out all the instructions, all there is left to do is let the solver run its course. This code effectively determines if the solver was able to converge to a suitable solution and prints the players that it has chosen:

status = solver.Solve()if status in (pywraplp.Solver.OPTIMAL, pywraplp.Solver.FEASIBLE):
for j, pos in enumerate(config['POSITIONS'].keys()):
for i in range(num_players):
if x[i, j].solution_value() > 0.5:
print(f'Choose {players[i]} for {pos}:',
f'{points[i][j]} points')
print('Total Points = ', solver.Objective().Value())

The solver returns the optimal lineup for week one of the 2021 NFL season as shown below:

Note that all players match their given position, and Amari Cooper, a WR, has filled the FLEX position.

Contest Modifications

The approach outlined above handles one of the more common contests in daily fantasy sports, setting a "full roster." However, there are many variations on contests, and we hope that the framework above can accommodate those as well. Let's see how we can quickly solve similar problems in the daily fantasy space with just a few changes to our code and serving a suitable configuration.

Superflex

A Superflex lineup is very similar to the scenario laid out above. There are two main differences. The first is that a Superflex lineup does not field a DEF and has one less WR. A Superflex position compensates for this by adding a Superflex position similar to a FLEX position. In addition to RBs, WRs, and TEs, we can also consider QBs. To solve this problem, all we need to do is add a position in our points matrix, "SFLEX," and update the configuration for our solver. Instead of the format above, we now use:

{
'SALARY_CAP': 60000,
'POSITIONS': {
'QB': 1,
'RB': 2,
'WR': 2,
'TE': 1,
'FLEX': 1,
'SFLEX': 1
}
}

Single Game

While the extension to Superflex was relatively trivial, the single-game variation requires a small addition to our code. Single-game lineups consist of five positions that anyone can fill; effectively, their label is Anyflex, but we will use "AFLEX." Like our Superflex iteration, we need to add the "AFLEX" position to our points matrix.

However, it is essential to note that one of these "AFLEX" positions is different from the rest — an MVP position that anyone can fill. The critical difference is that we multiply this player's points by 150%. Therefore, the goal is to choose a player who will do exceptionally well and slot them into this position. How do we represent this in our points matrix? We need to add a statement to our point creation that accounts for this. A fundamental way to do this is to add an if-else statement for each multiplier:

if p == 'MVP':
df[p] = np.where(df['Pos'].isin(pos_map[p]), 1.5 * df['Pts'], 0)
else:
df[p] = np.where(df['Pos'].isin(pos_map[p]), df['Pts'], 0)

In other game modes, especially for other sports, it is common to have multipliers of 200%, 150%, and 120%, and this general solution holds for those situations. Now that we've accounted for the multipliers, all we need is a new configuration:

{
'SALARY_CAP': 60000,
'POSITIONS': {
'MVP': 1,
'AFLEX': 4
}
}

Three Man

A 3-Man challenge is very similar to a single-game lineup in that any player on the list can fill all positions. The main caveat is that one of those players is selected as the MVP, netting 150% of their actual point value. In comparison to the events we've seen so far, salaries are on a much different scale, but all we need to do is capture that in our configuration, as demonstrated below:

{
'SALARY_CAP': 7,
'POSITIONS': {
'MVP': 1,
'AFLEX': 2
}
}

Summary

We have illustrated that if we have the complete information on all player's points, how to optimize a full lineup for FanDuel. We have also shown how to carry out minor modifications to adapt to a variety of other contests. While other daily fantasy sites might have different contests, we can easily extend these modifications to accommodate any scenario.

Now you, too, can go out and win every single daily fantasy competition and become massively wealthy; well, not so fast. In all of the examples from the previous blog post to this one, we have assumed that we could predict with 100% accuracy the number of points each player would score. Using the knowledge of a previous week's statistics, we could better delve into setting up our optimization framework, but that only makes us champions of the past and not the future.

While knowing how to take previous results and optimize a lineup is crucial in our journey to conquering the daily fantasy world, the landscape is far more complex than expected. The next step is to make decent predictions of a player's future points and feed them into the optimization we have created here. Unfortunately, even once we've made reasonable predictions, the problem doesn't stop there. Here are some other considerations:

  • We must consider that player scores are not independent of each other; for example, a WR can not score points without a QB scoring points — a DEF will often score points when an offensive player loses points. Our optimization does not currently deal with these interactions.
  • We must consider that not only are we interested in the expected points of a player, but the variance of that point distribution. If our goal is to win large competitions, we must seek high variance players for our lineup — the reward can outpace the risk as many contests have a rapidly decaying prize structure. Therefore, we should not predict the mean for a player but the overall distribution of what they might score.
  • Finally, and probably the most difficult to lock down, we must consider what other users choose for their lineup; if we end up winning a competition but have to split the winnings with thousands of other players, that is problematic. There is a game theory component that needs exploration; a way to over-index on players that we have higher confidence in than the overall population.

Perhaps I will throw together a few blogs in the future to address some of these concerns. While the optimization piece is still an exciting endeavor and can be used to solve more practical applications in the real world, recognize that there is still a lot of work that we need to carry out to make daily fantasy a worthwhile effort!

All images and code by the author unless noted otherwise.

--

--

Riley Howsden

“Half-Stack” machine learning propagandist in the gaming industry — a critic of all data presented deceitfully, unless it contains a meme, then it must be true.