メインコンテンツへスキップ

Quicksum not automatically summing coefficients

回答済み

コメント

16件のコメント

  • Ronald van der Velden
    Gurobi Staff Gurobi Staff

    Indeed aggregation of coefficients per variable only happens when these expressions are added to the model as constraints. So when you say "trouble with memory", does that happen while constructing the model? If that's the case and you encounter the described situation (variables occurring more than once in an expression) very frequently, I would consider writing a tiny helper function to merge two linear expressions. You could also initially represent them as a dict (variables being key, coefficients being values), which could make it easier to efficiently add two expressions.

    0
  • Elias Joa
    Conversationalist
    First Question

    It does indeed happen during construction of these expressions. To give some perspective there are 62000 games, and around 24 players per game, and all of these have a specific base rating decision variable, and 27 age variables (these are not specific for a player and are the ones i want coefficients to be aggregated for) and a competition effect variable for each competition. There are no constraints, simply a minimization of an objective function, and therefore the difference of adding 62000*24*27 (in the case of decision variables for age not aggregating their coefficients) and 62000*24+27 is large. I will look at creating a helper function. I guess then that we want to replace the quicksum with a loop that adds every linear expression resulting from f_player for all_player_ids? 

    0
  • Ronald van der Velden
    Gurobi Staff Gurobi Staff

    Exactly :-)

    0
  • Riley Clement
    Gurobi Staff Gurobi Staff

    As an example for anyone following, this could be such a helper function:

    import itertools
    from operator import itemgetter

    vars = model.getVars()

    def quicksum_acc(iter_pairs):
        # iter_pairs should be iterable of (coefficient, variable) pairs
        return gp.quicksum(
          vars[index]*sum(amt for _,amt in v)
          for index,v in itertools.groupby(((v.index,c) for c,v in iter_pairs), key=itemgetter(0))
        )

    Then instead of

    gp.quicksum(c*v for ....)

    use

    quicksum_acc((c,v) for ...)

     

    0
  • Elias Joa
    Conversationalist
    First Question

    Thanks! Any idea on the time complexity of this function? Experiencing a massive increase in runtime, but indeed no memory problems anymore. 

    0
  • Riley Clement
    Gurobi Staff Gurobi Staff

    Not sure.  Revisiting though, I think I've dropped some code between my original prototype and what I've pasted above... I'll see if I can fix it.

    0
  • Elias Joa
    Conversationalist
    First Question

    Great, thanks a lot! Furthermore, maybe there is a way to make the function actually able to input a list of LinExpressions, returning one LinExpression with coefficients aggregated. Like quicksum, only with aggregation of coefficients and variables.

    0
  • Riley Clement
    Gurobi Staff Gurobi Staff

    Not sure if it's what you want but if you have a linear expression linexpr then linexpr._normalize() aggregates the coefficients.

    0
  • Elias Joa
    Conversationalist
    First Question

    Oh wow i was not aware, so I can then use quicksum(List of linexressions)._normalize(). I made this function instead, which aggregates coefficients:

    def quicksum_acc(lin_exprs):

        var_coeff_map = {}

        for expr in lin_exprs:

            if isinstance(expr, LinExpr):

                for i in range(expr.size()):

                    var = expr.getVar(i)

                    coeff = expr.getCoeff(i)

                    if var not in var_coeff_map:

                        var_coeff_map[var] = coeff

                    else:

                        var_coeff_map[var] += coeff

            elif isinstance(expr, (int, float)):  

                if expr != 0:  

                    if None in var_coeff_map:  

                        var_coeff_map[None] += expr

                    else:

                        var_coeff_map[None] = expr

       

      return quicksum(var_coeff_map[var] * var if var is not None else var_coeff_map[var] for var in var_coeff_map)
    0
  • Riley Clement
    Gurobi Staff Gurobi Staff

    I've updated the above function to try a different approach which may be faster and more memory efficient.  See how it goes?

    0
  • Elias Joa
    Conversationalist
    First Question

    I ended up using ._normalize(). Will hopefully solve the issues. If not, the solution will be to keep track of all coefficients and creating the LinExpression only right before solving the model. 

    0
  • Elias Joa
    Conversationalist
    First Question

    Is there any documentation on ._normalize()? Cannot find it. 

    0
  • Riley Clement
    Gurobi Staff Gurobi Staff

    No there isn't. The leading underscore in the name is a convention to indicate that it is not part of the public API. Technically using these "private" methods is risky in any package because it they are subject to change without notification. I think it is relatively safe in this instance though. Good practice would involve automated testing of your code if this is something that is deployed into production, and I'd include tests for ._normalize to catch and unexpected changes.

    0
  • Elias Joa
    Conversationalist
    First Question

    I see. Well thanks for tipsing me about it! Fixed all my memory problems, even allowing me to parallelize it and finish the rating calculations twenty times faster.

    0
  • Elias Joa
    Conversationalist
    First Question

    An issue now is that we get large coefficients for the variables that often are added that are similar, while all the unique variables (for example per player) always has a coefficient of 1. This leads to the warning about large range in quadratic coefficient objective range. I tried setting numericfocus to 3 but it is still present. Is this something I should worry about? Any workarounds? 

    Furthermore, I am unsure whether to initialize objective as QuadExpr or LinExpr. LinExpr seems to not give any errors, but the numbers I get on the other side seems off.  

     

    Optimize a model with 0 rows, 1000 columns and 0 nonzeros
    Model fingerprint: 0x87121378
    Model has 30948 quadratic objective terms
    Coefficient statistics:
      Matrix range     [0e+00, 0e+00]
      Objective range  [7e-02, 1e+01]
      QObjective range [2e-06, 3e+04]
      Bounds range     [0e+00, 0e+00]
      RHS range        [0e+00, 0e+00]
    Warning: Model contains large quadratic objective coefficient range
    Presolve removed 0 rows and 7 columns
    Presolve time: 0.01s
    Presolved: 0 rows, 993 columns, 0 nonzeros
    Presolved model has 30948 quadratic objective terms
    Ordering time: 0.03s

    Barrier statistics:
     Free vars  : 1966
     AA' NZ     : 4.599e+05
     Factor NZ  : 4.616e+05 (roughly 5 MB of memory)
     Factor Ops : 2.954e+08 (less than 1 second per iteration)
     Threads    : 4

                      Objective                Residual
    Iter       Primal          Dual         Primal    Dual     Compl     Time
       0   1.52929258e+02  1.52929258e+02  0.00e+00 1.43e+01  0.00e+00     0s
       1   1.13016646e+02  1.20217080e+02  6.46e-10 1.42e+00  0.00e+00     0s
       2   1.12980752e+02  1.19886569e+02  9.34e-08 1.35e+00  0.00e+00     0s
       3   1.12947182e+02  1.19560159e+02  1.15e-07 1.29e+00  0.00e+00     0s
       4   1.12918373e+02  1.19265495e+02  1.41e-07 1.23e+00  0.00e+00     0s
       5   1.12779810e+02  1.17570389e+02  1.81e-07 9.07e-01  0.00e+00     0s
       6   1.12761005e+02  1.17287230e+02  4.85e-07 8.53e-01  0.00e+00     0s
       7   1.12731301e+02  1.16797017e+02  4.03e-07 7.61e-01  0.00e+00     0s
       8   1.12714344e+02  1.16486488e+02  4.18e-07 7.03e-01  0.00e+00     0s
       9   1.12698329e+02  1.16166539e+02  8.34e-07 6.44e-01  0.00e+00     0s
      10   1.12676556e+02  1.15674203e+02  1.44e-06 5.53e-01  0.00e+00     0s
      11   1.12658541e+02  1.15191211e+02  2.18e-06 4.64e-01  0.00e+00     0s
      12   1.12623497e+02  1.13730180e+02  3.47e-06 1.99e-01  0.00e+00     0s
      13   1.12615759e+02  1.12811990e+02  2.30e-06 3.48e-02  0.00e+00     0s
      14   1.12615502e+02  1.12634995e+02  4.54e-07 3.45e-03  0.00e+00     0s
      15   1.12615500e+02  1.12617433e+02  7.34e-08 3.42e-04  0.00e+00     0s
      16   1.12615501e+02  1.12615693e+02  1.59e-08 3.39e-05  0.00e+00     0s
      17   1.12615501e+02  1.12615521e+02  2.17e-09 3.36e-06  0.00e+00     0s
      18   1.12615501e+02  1.12615504e+02  1.81e-10 3.33e-07  0.00e+00     0s
      19   1.12615501e+02  1.12615502e+02  2.63e-11 3.45e-08  0.00e+00     0s

    Barrier solved model in 19 iterations and 0.30 seconds (0.29 work units)
    Optimal objective 1.12615501e+02

    Optimized for date: 2015-07-02
    Processed date 2015-07-02 (2/2)

    0
  • Riley Clement
    Gurobi Staff Gurobi Staff

    Hi Elias,

    Setting NumericFocus to higher values can be a good idea, but it won't remove the warning.  NumericFocus does not reduce large coefficient ranges, it only increases the effort made to try and prevent numerical issues from occurring (usually at the expense of slowing the solver).

    We have a section in the manual called Does my model have numerical issues? which may help.  It is part of the Guidelines for Numerical Issues chapter.  It's a little dry, the following video from our YouTube is a bit more palatable in my opinion and is certainly worth a watch: How to avoid numerical issues in optimization models.

    I am unsure whether to initialize objective as QuadExpr or LinExpr.

    If you initialize a gurobi.LinExpr but then add quadratic terms it will be promoted to a QuadExpr, so it shouldn't make a difference.  Since it is ultimately a QuadExpr though I would initialize it as this, for clarity of code.

    - Riley

    0

サインインしてコメントを残してください。