Model Creation Hangs
回答済みHI All,
Following on from my previous post (Finding Eulerian Path in a Directed Graph With Minimal Edge Addition) my code has been working ok on my laptop and via SLURM on a cluster, when run with samples.
However, when I load the full data, and run it in batch mode (i.e. with sbatch) it seems to hang on the creation of the model. If I run it in interactive mode, it works. I've traced it with print statements and the code gets no further than:
m = gp.Model("EPath")
In my batch job log file I get the print statement that is before this line, then the license file and TokenServer messages but not the print statement output that follows this line.
What is curious is, this is just the creation of an empty model right? The next line of code to print out that I've set the model name is never executed (I left it for 24hours once). So it appears to have nothing to do with the data or size of the model since the code never reaches the first variable declaration.
I've got bags of memory which very slowly creeps up. Has anyone encountered this or a similar issue?
My code is below.
Many thanks
Stuart
import gurobipy as gp
from gurobipy import GRB
def has_gurobi_sol(gr_csr):
#
print("In gurobi function")
var_edges=gr_csr.shape[1]
var_nodes=gr_csr.shape[0]
print("Calling model creation")
m = gp.Model("EPath")
print("Set model name")
# Define the variables required
x = m.addMVar(shape=var_edges, lb=1, vtype=GRB.INTEGER, name="x")
y = m.addMVar(shape=var_nodes, lb= -GRB.INFINITY, vtype=GRB.INTEGER, name="y")
print("Defined Gurobi Vars")
print("Elapsed time {:.2f}".format((time.perf_counter()-start)/60))
# Define the contrainsts
m.addConstr(y == gr_csr @ x, name="MatrixMul")
m.addConstr(y.sum() == 0, name="SumEqZero")
m.addConstr(y @ y <= 2, name="TwoOnesOnly")
print("Starting optimisation")
print("Elapsed time {:.2f}".format((time.perf_counter()-start)/60))
m.setObjective(sum(x), GRB.MINIMIZE)
m.optimize()
#
dupes="n/a"
sol=[]
if m.status == GRB.OPTIMAL:
dupes=(m.PoolObjVal - var_edges)
sol=m.getAttr('x')
return (m.status, dupes, sol)
-
正式なコメント
This post is more than three years old. Some information may not be up to date. For current information, please check the Gurobi Documentation or Knowledge Base. If you need more help, please create a new post in the community forum. Or why not try our AI Gurobot?. -
Hi Stuart,
So you are saying that the code works well on your local machine but runs into some issues when run in sbatch mode on the cluster? Are you able to get more output from the batch script to see whether some error occurs?
Could you try creating an environment and attaching your model to it
myEnv = gp.Env()
m = gp.Model("EPath", env=myEnv)Then, after the optimization is complete and you retrieved all information of interest, you have to dispose the model and the environment
m.dispose()
myEnv.dispose()Does this help?
Could you try executing one of the Gurobi examples?
Best regards,
Jaromił0 -
Hi Jaromił,
Thanks very much for the tips. No joy. I have got further, however, and I am now stuck at a different place.
It is now producing a solution, however, when I try to extract the value of my solution vector, and return it from the function back to the main program, it hangs. I have tried...
solution=m.getAttr("x")
solution=x.getAttr("x")
solution=[v.x for v in m.getVars()]
solution=x.xHowever, the function return never executes. I've tried to use gdb but it does not have the python add in so is only showing me the c frames and there is almost nothing to see. If, instead of trying to pass anything back, I print the solution e.g. simply print(x.x), it shows me the variables and the values they have and the function does return. So I know it's in there:)
It feels like I am missing something that maybe obvious, like there is a particular method to use when you want to pass the solution out of a function call?
Any further help greatly appreciated.
Stuart
0 -
Hi Stuart,
Good to hear that you are making progress.
What exactly do you mean by "it hangs"?
All 4 ways of getting the solution vector you described should work and return a list of double values. You are saying that you are able to print it but it cannot be returned to another function correct? Could you provide a minimal working example? I tested a very basic function call and it works
def test(model):
solution = model.getAttr("x")
return solution
# construct and optimize model
# [...]
# get solution
s = test(model)
print(s)Best regards,
Jaromił0 -
Thanks so much for the replies and help, however, It seems it was all a red herring...
I had tried to re-write things, for example, by saving the state to disc at a certain point and reloading and resuming in another script. When this also "hung", I walked away in frustration.... The next day, it had completed and generated the sequence I was hoping for.
For some reason, I can't explain (and I am continuing to see this and other curiosities running code on SLURM), when a certain set of operations on large data are following, the logging stops at some previous point in the code, even in a previous function. I think the improvements I had made did allow it to complete more quickly which is why it was done the next day.
Also just FYI - I also see huge pauses in the execution. Yesterday a "running" program seemed to wait for 8 hours. The optimisation had completed in seconds, but the message to report this was not logged for a further 8 hours. All the timestamps generated by the following code execution (including the resulting files written) suggest it just did nothing for that time.
Anyway - thanks again for helping.
0 -
Hi Stuart,
Good to hear that to were able to make some progress.
Could you elaborate more on when exactly the logging stops? Do you have a log output you could share? Is it possible that the cluster you are using via slurm is full and the queueing kicks in resulting in your jobs having to wait for quite some time? After you have solved a model, do you dispose of the model and the environment?
Best regards,
Jaromił0 -
I had the same issue with Gurobi hanging on model creation when running on a SLURM cluster. The recommendation to do
gp_env = gp.Env()
model = gp.Model(env=gp_env)(while also doing dispose()) fixed my problem.
0
投稿コメントは受け付けていません。
コメント
7件のコメント