figuring out resource requirements

Comments

5 comments

  • Silke Horn

    There are a few things I notice from your log:

    First of all, the error message from the scheduler says TIMEOUT. So I guess more time would help. :-)

    Why does it time out? Is there a hard limit on the duration of such a run on your cluster?

     

    Moreover, it looks as though your model is very difficult.

    According to your log, you have 2016314 integer and 0 binary variables, but your matrix range is [1e+00, 1e+00]. Is it possible that your (integer) variables are unbounded? Unbounded integer variables make a model extremely hard to solve (in particular, if there is a huge number of them). If this is the case, you should either add bounds (as tight as possible) or relax on the integrality.

    In addition to that, it could be possible to speed up the solver by setting the right parameters. To start, I think you could save some time by choosing Method=2 instead of 4.

    Silke

    0
    Comment actions Permalink
  • Jennifer Gossels

    Hi Silke,

     

    Thanks very much for the response.  Yes, the university imposes a time limit of 168 hours.  I accidentally set this experiment to 150 hours, so there's a chance those extra 18 hours will make a big difference (I'm running it again with the longer time limit), but I'm not optimistic.

     

    By setting bounds on the variables, you mean that I should tell the solver if I know the final values should be within some range?

     

    You suggest method=2 instead of method=3?  As I was writing the message above I noticed that the log said the concurrent spin time can be avoided by choosing method=3...

     

    Thanks again,

    Jennifer

    0
    Comment actions Permalink
  • Silke Horn

    Hi Jennifer,

    Yes, with the bounds I meant exactly what you say. Find an upper and lower (if that's not already 0) bound on the integer variables. Otherwise, the bounds will be assumed to be +/- 2 billion (2.000.000.000) and I think in almost all practical applications, this is not reasonable.

    On the other hand, if an integer variable does need to have a very big range, you should reconsider whether it really needs to be integer or whether you can remove the integrality condition and round the result in the end. E.g. if such a variable models an amount of money (say in cents) and needs to have a very huge range (because your problem involves millions of bucks), then you can make it continuous and round to the nearest cent (or dollar, or multiple of 100 dollars) without affecting the real-world solution quality. Does that make sense?

    Setting good bounds or relaxing on the integrality should make a much bigger difference for the running time than any parameter settings. (As an experiment, you could try to just make all your variables continuous and see whether this helps.)

    As for the method, 3 will choose non-deterministic concurrent (i.e., multiple algorithms in parallel), 2 will choose the barrier. Setting it to 3 should provide a speedup, but since we already know from your first run that barrier wins, you could as well set it to use barrier only. (This could provide another albeit probably small speedup since the barrier then does not have to share resources with the other algorithms.)

    Silke

    0
    Comment actions Permalink
  • Jennifer Gossels

    Thank you so much for the very helpful explanation!  Am I correct that requesting more nodes or tasks per node will not help?  Do you think more memory will help?  The log says Factor NZ takes roughly 5.0 GB of memory, but the error message from the cluster says I used 84.66 GB.  I'm not sure if there is an easy way to convert the Factor NZ memory amounts to total amount of memory needed for the whole computation?

    0
    Comment actions Permalink
  • Silke Horn

    I think that more nodes or more tasks might make sense if you wanted to do concurrent optimization, but since the solver gets stuck in the root, I don't expect this to help.

    As for the memory, Gurobi does not log (or monitor) the total memory usage. So it is hard to say when these 84.66 GB were used, but I would guess this happened during the root relaxation (because of the barrier). Afterward, memory usage typically goes down significantly and then starts growing again. Can you get onto the machine and see (e.g. using top) how much memory Gurobi is using? If so, you should take note of these numbers and compare them for different times (e.g. towards the end of the root relaxation and afterward).

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk