Quicksum not automatically summing coefficients

回答済み

2024年05月02日 11:22

I am solving a quadratic model to calculate footballers player ratings, and am having trouble with memory as the objective gets very large. A big reason for this is that whenever i use quicksum or any other sum over a list of LinExpr, gurobi does not add the coefficients of decisicon variables that are the same. For example, 0.5*beta_AGE[25] + 0.3*beta_AGE[25] does not end up as 0.8*beta_AGE[25] but is maintained as having both present. This increases the amount of decision variables in various expressions and the objective exponentially. How do I make gurobi aggregate the coefficients when summing a list of either linexpressions or other expressions?

Code where quicksum is used:
and f_player:

関連

16件のコメント

Ronald van der Velden

Gurobi Staff

2024年05月02日 11:36
Indeed aggregation of coefficients per variable only happens when these expressions are added to the model as constraints. So when you say "trouble with memory", does that happen while constructing the model? If that's the case and you encounter the described situation (variables occurring more than once in an expression) very frequently, I would consider writing a tiny helper function to merge two linear expressions. You could also initially represent them as a dict (variables being key, coefficients being values), which could make it easier to efficiently add two expressions.
0
Elias Joa

2024年05月02日 11:44
It does indeed happen during construction of these expressions. To give some perspective there are 62000 games, and around 24 players per game, and all of these have a specific base rating decision variable, and 27 age variables (these are not specific for a player and are the ones i want coefficients to be aggregated for) and a competition effect variable for each competition. There are no constraints, simply a minimization of an objective function, and therefore the difference of adding 62000*24*27 (in the case of decision variables for age not aggregating their coefficients) and 62000*24+27 is large. I will look at creating a helper function. I guess then that we want to replace the quicksum with a loop that adds every linear expression resulting from f_player for all_player_ids?
0
Ronald van der Velden

Gurobi Staff

2024年05月02日 13:33
Exactly :-)
0

Riley Clement

Gurobi Staff

2024年05月06日 10:42 編集済み

As an example for anyone following, this could be such a helper function:

import itertools
from operator import itemgetter

vars = model.getVars()

def quicksum_acc(iter_pairs):
    # iter_pairs should be iterable of (coefficient, variable) pairs
    return gp.quicksum(
        vars[index]*sum(amt for _,amt in v)
        for index,v in itertools.groupby(((v.index,c) for c,v in iter_pairs), key=itemgetter(0))
    )

Then instead of

gp.quicksum(c*v for ....)

use

quicksum_acc((c,v) for ...)

Elias Joa

2024年05月06日 09:32
Thanks! Any idea on the time complexity of this function? Experiencing a massive increase in runtime, but indeed no memory problems anymore.
0
Riley Clement

Gurobi Staff

2024年05月06日 09:49
Not sure. Revisiting though, I think I've dropped some code between my original prototype and what I've pasted above... I'll see if I can fix it.
0
Elias Joa

2024年05月06日 10:19 編集済み
Great, thanks a lot! Furthermore, maybe there is a way to make the function actually able to input a list of LinExpressions, returning one LinExpression with coefficients aggregated. Like quicksum, only with aggregation of coefficients and variables.
0
Riley Clement

Gurobi Staff

2024年05月06日 10:39
Not sure if it's what you want but if you have a linear expression linexpr then linexpr._normalize() aggregates the coefficients.
0

Elias Joa

2024年05月06日 10:43

Oh wow i was not aware, so I can then use quicksum(List of linexressions)._normalize(). I made this function instead, which aggregates coefficients:

def quicksum_acc(lin_exprs):

    var_coeff_map = {}

    for expr in lin_exprs:

        if isinstance(expr, LinExpr):

            for i in range(expr.size()):

                var = expr.getVar(i)

                coeff = expr.getCoeff(i)

                if var not in var_coeff_map:

                    var_coeff_map[var] = coeff

                else:

                    var_coeff_map[var] += coeff

        elif isinstance(expr, (int, float)):  

            if expr != 0:  

                if None in var_coeff_map:  

                    var_coeff_map[None] += expr

                else:

                    var_coeff_map[None] = expr

   

    return quicksum(var_coeff_map[var] * var if var is not None else var_coeff_map[var] for var in var_coeff_map)

Riley Clement

Gurobi Staff

2024年05月06日 10:45
I've updated the above function to try a different approach which may be faster and more memory efficient. See how it goes?
0
Elias Joa

2024年05月06日 11:15
I ended up using ._normalize(). Will hopefully solve the issues. If not, the solution will be to keep track of all coefficients and creating the LinExpression only right before solving the model.
0
Elias Joa

2024年05月06日 14:53
Is there any documentation on ._normalize()? Cannot find it.
0
Riley Clement

Gurobi Staff

2024年05月06日 21:43
No there isn't. The leading underscore in the name is a convention to indicate that it is not part of the public API. Technically using these "private" methods is risky in any package because it they are subject to change without notification. I think it is relatively safe in this instance though. Good practice would involve automated testing of your code if this is something that is deployed into production, and I'd include tests for ._normalize to catch and unexpected changes.
0
Elias Joa

2024年05月07日 07:25
I see. Well thanks for tipsing me about it! Fixed all my memory problems, even allowing me to parallelize it and finish the rating calculations twenty times faster.
0
Elias Joa

2024年05月08日 07:59
An issue now is that we get large coefficients for the variables that often are added that are similar, while all the unique variables (for example per player) always has a coefficient of 1. This leads to the warning about large range in quadratic coefficient objective range. I tried setting numericfocus to 3 but it is still present. Is this something I should worry about? Any workarounds?

Furthermore, I am unsure whether to initialize objective as QuadExpr or LinExpr. LinExpr seems to not give any errors, but the numbers I get on the other side seems off.

Optimize a model with 0 rows, 1000 columns and 0 nonzeros
Model fingerprint: 0x87121378
Model has 30948 quadratic objective terms
Coefficient statistics:
Matrix range [0e+00, 0e+00]
Objective range [7e-02, 1e+01]
QObjective range [2e-06, 3e+04]
Bounds range [0e+00, 0e+00]
RHS range [0e+00, 0e+00]
Warning: Model contains large quadratic objective coefficient range
Presolve removed 0 rows and 7 columns
Presolve time: 0.01s
Presolved: 0 rows, 993 columns, 0 nonzeros
Presolved model has 30948 quadratic objective terms
Ordering time: 0.03s

Barrier statistics:
Free vars : 1966
AA' NZ : 4.599e+05
Factor NZ : 4.616e+05 (roughly 5 MB of memory)
Factor Ops : 2.954e+08 (less than 1 second per iteration)
Threads : 4

Objective Residual
Iter Primal Dual Primal Dual Compl Time
0 1.52929258e+02 1.52929258e+02 0.00e+00 1.43e+01 0.00e+00 0s
1 1.13016646e+02 1.20217080e+02 6.46e-10 1.42e+00 0.00e+00 0s
2 1.12980752e+02 1.19886569e+02 9.34e-08 1.35e+00 0.00e+00 0s
3 1.12947182e+02 1.19560159e+02 1.15e-07 1.29e+00 0.00e+00 0s
4 1.12918373e+02 1.19265495e+02 1.41e-07 1.23e+00 0.00e+00 0s
5 1.12779810e+02 1.17570389e+02 1.81e-07 9.07e-01 0.00e+00 0s
6 1.12761005e+02 1.17287230e+02 4.85e-07 8.53e-01 0.00e+00 0s
7 1.12731301e+02 1.16797017e+02 4.03e-07 7.61e-01 0.00e+00 0s
8 1.12714344e+02 1.16486488e+02 4.18e-07 7.03e-01 0.00e+00 0s
9 1.12698329e+02 1.16166539e+02 8.34e-07 6.44e-01 0.00e+00 0s
10 1.12676556e+02 1.15674203e+02 1.44e-06 5.53e-01 0.00e+00 0s
11 1.12658541e+02 1.15191211e+02 2.18e-06 4.64e-01 0.00e+00 0s
12 1.12623497e+02 1.13730180e+02 3.47e-06 1.99e-01 0.00e+00 0s
13 1.12615759e+02 1.12811990e+02 2.30e-06 3.48e-02 0.00e+00 0s
14 1.12615502e+02 1.12634995e+02 4.54e-07 3.45e-03 0.00e+00 0s
15 1.12615500e+02 1.12617433e+02 7.34e-08 3.42e-04 0.00e+00 0s
16 1.12615501e+02 1.12615693e+02 1.59e-08 3.39e-05 0.00e+00 0s
17 1.12615501e+02 1.12615521e+02 2.17e-09 3.36e-06 0.00e+00 0s
18 1.12615501e+02 1.12615504e+02 1.81e-10 3.33e-07 0.00e+00 0s
19 1.12615501e+02 1.12615502e+02 2.63e-11 3.45e-08 0.00e+00 0s

Barrier solved model in 19 iterations and 0.30 seconds (0.29 work units)
Optimal objective 1.12615501e+02

Optimized for date: 2015-07-02
Processed date 2015-07-02 (2/2)
0
Riley Clement

Gurobi Staff

2024年05月13日 04:54
Hi Elias,

Setting NumericFocus to higher values can be a good idea, but it won't remove the warning. NumericFocus does not reduce large coefficient ranges, it only increases the effort made to try and prevent numerical issues from occurring (usually at the expense of slowing the solver).

We have a section in the manual called Does my model have numerical issues? which may help. It is part of the Guidelines for Numerical Issues chapter. It's a little dry, the following video from our YouTube is a bit more palatable in my opinion and is certainly worth a watch: How to avoid numerical issues in optimization models.

I am unsure whether to initialize objective as QuadExpr or LinExpr.

If you initialize a gurobi.LinExpr but then add quadratic terms it will be promoted to a QuadExpr, so it shouldn't make a difference. Since it is ultimately a QuadExpr though I would initialize it as this, for clarity of code.

- Riley
0

サインインしてコメントを残してください。

Quicksum not automatically summing coefficients

コメント

お探しのものを見つけられませんでしたか？