Quicksum not automatically summing coefficients
回答済みI am solving a quadratic model to calculate footballers player ratings, and am having trouble with memory as the objective gets very large. A big reason for this is that whenever i use quicksum or any other sum over a list of LinExpr, gurobi does not add the coefficients of decisicon variables that are the same. For example, 0.5*beta_AGE[25] + 0.3*beta_AGE[25] does not end up as 0.8*beta_AGE[25] but is maintained as having both present. This increases the amount of decision variables in various expressions and the objective exponentially. How do I make gurobi aggregate the coefficients when summing a list of either linexpressions or other expressions?
Code where quicksum is used:
and f_player:
-
Indeed aggregation of coefficients per variable only happens when these expressions are added to the model as constraints. So when you say "trouble with memory", does that happen while constructing the model? If that's the case and you encounter the described situation (variables occurring more than once in an expression) very frequently, I would consider writing a tiny helper function to merge two linear expressions. You could also initially represent them as a dict (variables being key, coefficients being values), which could make it easier to efficiently add two expressions.
0 -
It does indeed happen during construction of these expressions. To give some perspective there are 62000 games, and around 24 players per game, and all of these have a specific base rating decision variable, and 27 age variables (these are not specific for a player and are the ones i want coefficients to be aggregated for) and a competition effect variable for each competition. There are no constraints, simply a minimization of an objective function, and therefore the difference of adding 62000*24*27 (in the case of decision variables for age not aggregating their coefficients) and 62000*24+27 is large. I will look at creating a helper function. I guess then that we want to replace the quicksum with a loop that adds every linear expression resulting from f_player for all_player_ids?
0 -
Exactly :-)
0 -
As an example for anyone following, this could be such a helper function:
import itertools
from operator import itemgetter
vars = model.getVars()
def quicksum_acc(iter_pairs):
# iter_pairs should be iterable of (coefficient, variable) pairs
return gp.quicksum(
vars[index]*sum(amt for _,amt in v)
for index,v in itertools.groupby(((v.index,c) for c,v in iter_pairs), key=itemgetter(0))
)Then instead of
gp.quicksum(c*v for ....)
use
quicksum_acc((c,v) for ...)
0 -
Thanks! Any idea on the time complexity of this function? Experiencing a massive increase in runtime, but indeed no memory problems anymore.
0 -
Not sure. Revisiting though, I think I've dropped some code between my original prototype and what I've pasted above... I'll see if I can fix it.
0 -
Great, thanks a lot! Furthermore, maybe there is a way to make the function actually able to input a list of LinExpressions, returning one LinExpression with coefficients aggregated. Like quicksum, only with aggregation of coefficients and variables.
0 -
Not sure if it's what you want but if you have a linear expression linexpr then linexpr._normalize() aggregates the coefficients.
0 -
Oh wow i was not aware, so I can then use quicksum(List of linexressions)._normalize(). I made this function instead, which aggregates coefficients:
def quicksum_acc(lin_exprs):
var_coeff_map = {}
for expr in lin_exprs:
if isinstance(expr, LinExpr):
for i in range(expr.size()):
var = expr.getVar(i)
coeff = expr.getCoeff(i)
if var not in var_coeff_map:
var_coeff_map[var] = coeff
else:
var_coeff_map[var] += coeff
elif isinstance(expr, (int, float)):
if expr != 0:
if None in var_coeff_map:
var_coeff_map[None] += expr
else:
var_coeff_map[None] = expr
return quicksum(var_coeff_map[var] * var if var is not None else var_coeff_map[var] for var in var_coeff_map)0 -
I've updated the above function to try a different approach which may be faster and more memory efficient. See how it goes?
0 -
I ended up using ._normalize(). Will hopefully solve the issues. If not, the solution will be to keep track of all coefficients and creating the LinExpression only right before solving the model.
0 -
Is there any documentation on ._normalize()? Cannot find it.
0 -
No there isn't. The leading underscore in the name is a convention to indicate that it is not part of the public API. Technically using these "private" methods is risky in any package because it they are subject to change without notification. I think it is relatively safe in this instance though. Good practice would involve automated testing of your code if this is something that is deployed into production, and I'd include tests for ._normalize to catch and unexpected changes.
0 -
I see. Well thanks for tipsing me about it! Fixed all my memory problems, even allowing me to parallelize it and finish the rating calculations twenty times faster.
0 -
An issue now is that we get large coefficients for the variables that often are added that are similar, while all the unique variables (for example per player) always has a coefficient of 1. This leads to the warning about large range in quadratic coefficient objective range. I tried setting numericfocus to 3 but it is still present. Is this something I should worry about? Any workarounds?
Furthermore, I am unsure whether to initialize objective as QuadExpr or LinExpr. LinExpr seems to not give any errors, but the numbers I get on the other side seems off.Optimize a model with 0 rows, 1000 columns and 0 nonzeros
Model fingerprint: 0x87121378
Model has 30948 quadratic objective terms
Coefficient statistics:
Matrix range [0e+00, 0e+00]
Objective range [7e-02, 1e+01]
QObjective range [2e-06, 3e+04]
Bounds range [0e+00, 0e+00]
RHS range [0e+00, 0e+00]
Warning: Model contains large quadratic objective coefficient range
Presolve removed 0 rows and 7 columns
Presolve time: 0.01s
Presolved: 0 rows, 993 columns, 0 nonzeros
Presolved model has 30948 quadratic objective terms
Ordering time: 0.03sBarrier statistics:
Free vars : 1966
AA' NZ : 4.599e+05
Factor NZ : 4.616e+05 (roughly 5 MB of memory)
Factor Ops : 2.954e+08 (less than 1 second per iteration)
Threads : 4Objective Residual
Iter Primal Dual Primal Dual Compl Time
0 1.52929258e+02 1.52929258e+02 0.00e+00 1.43e+01 0.00e+00 0s
1 1.13016646e+02 1.20217080e+02 6.46e-10 1.42e+00 0.00e+00 0s
2 1.12980752e+02 1.19886569e+02 9.34e-08 1.35e+00 0.00e+00 0s
3 1.12947182e+02 1.19560159e+02 1.15e-07 1.29e+00 0.00e+00 0s
4 1.12918373e+02 1.19265495e+02 1.41e-07 1.23e+00 0.00e+00 0s
5 1.12779810e+02 1.17570389e+02 1.81e-07 9.07e-01 0.00e+00 0s
6 1.12761005e+02 1.17287230e+02 4.85e-07 8.53e-01 0.00e+00 0s
7 1.12731301e+02 1.16797017e+02 4.03e-07 7.61e-01 0.00e+00 0s
8 1.12714344e+02 1.16486488e+02 4.18e-07 7.03e-01 0.00e+00 0s
9 1.12698329e+02 1.16166539e+02 8.34e-07 6.44e-01 0.00e+00 0s
10 1.12676556e+02 1.15674203e+02 1.44e-06 5.53e-01 0.00e+00 0s
11 1.12658541e+02 1.15191211e+02 2.18e-06 4.64e-01 0.00e+00 0s
12 1.12623497e+02 1.13730180e+02 3.47e-06 1.99e-01 0.00e+00 0s
13 1.12615759e+02 1.12811990e+02 2.30e-06 3.48e-02 0.00e+00 0s
14 1.12615502e+02 1.12634995e+02 4.54e-07 3.45e-03 0.00e+00 0s
15 1.12615500e+02 1.12617433e+02 7.34e-08 3.42e-04 0.00e+00 0s
16 1.12615501e+02 1.12615693e+02 1.59e-08 3.39e-05 0.00e+00 0s
17 1.12615501e+02 1.12615521e+02 2.17e-09 3.36e-06 0.00e+00 0s
18 1.12615501e+02 1.12615504e+02 1.81e-10 3.33e-07 0.00e+00 0s
19 1.12615501e+02 1.12615502e+02 2.63e-11 3.45e-08 0.00e+00 0sBarrier solved model in 19 iterations and 0.30 seconds (0.29 work units)
Optimal objective 1.12615501e+02Optimized for date: 2015-07-02
Processed date 2015-07-02 (2/2)0 -
Hi Elias,
Setting NumericFocus to higher values can be a good idea, but it won't remove the warning. NumericFocus does not reduce large coefficient ranges, it only increases the effort made to try and prevent numerical issues from occurring (usually at the expense of slowing the solver).
We have a section in the manual called Does my model have numerical issues? which may help. It is part of the Guidelines for Numerical Issues chapter. It's a little dry, the following video from our YouTube is a bit more palatable in my opinion and is certainly worth a watch: How to avoid numerical issues in optimization models.
I am unsure whether to initialize objective as QuadExpr or LinExpr.
If you initialize a gurobi.LinExpr but then add quadratic terms it will be promoted to a QuadExpr, so it shouldn't make a difference. Since it is ultimately a QuadExpr though I would initialize it as this, for clarity of code.
- Riley
0
サインインしてコメントを残してください。
コメント
16件のコメント