Large MILP model on HPC
Awaiting user inputHi,
I have a MILP model for the unit commitment and dispatch of a district heating plant. The model is built in Matlab and later sent to Gurobi. The mat file and mps file of the model can be found here Gurobi in HPC.
I managed to solve simpler versions of this model in my computer. However, it looks like the presence of thermal storage, which allows more possible production patterns, and the existence of a production unit with economies of scale (linearized with piecewise linear function) makes the space of solutions explode. Since I could not solve it with my personal laptop, I have tried to use the HPC cluster available at my University https://www.lunarc.lu.se/systems/cosmos/.
The cluster uses slurm to schedule jobs and I have written the following instructions:
#SBATCH N 1 # Asking for 1 node
#SBATCH ntasks=1
#SBATCH cpuspertask=48 #A good practice is to request as many threads as available cores on the node
Furthermore, in parameters for Gurobi, I have also modified the number of threads property to match the number of cpus per task.
The previous instructions are based on the little I could find on the internet. However, I have no idea whether they are reasonable or not.
Once the job starts executing Gurobi starts running. However, after two days runs out of memory as shown below (The slurm output file with all the information can be found in the same folder as the models):
/var/spool/slurm/slurmd/job478223/slurm_script: line 25: 333245 Killed matlab batch "District_Heating_and_Cooling_Opimization_Bilbao_WH_cap_op_v1"
slurmstepd: error: Detected 1 oomkill event(s) in StepId=478223.batch. Some of your processes may have been killed by the cgroup outofmemory handler.
In addition, I am not sure if it is relevant but there seems to be a problem between Matlab 2023 and Gurobi 10.0.1 although I previously tried to solve simpler problems and this did not seem to be an issue. More recently, the responsible for the cluster has installed Gurobi 9.5, which he says should be compatible with Matlab 2023.
Inactive Modules:
1) Gurobi/10.0.1 3) SQLite/3.39.4 5) XZ/5.2.7 7) libreadline/8.2
2) Python/3.10.8 4) Tcl/8.6.12 6) libffi/3.4.4
Due to MODULEPATH changes, the following have been reloaded:
1) GMP/6.2.1 2) bzip2/1.0.8 3) ncurses/6.3
The following have been reloaded with a version change:
1) GCCcore/12.2.0 => GCCcore/10.3.0 3) zlib/1.2.12 => zlib/1.2.11
2) binutils/2.39 => binutils/2.36.1
With this information, I would really appreciate if anyone could help me out with the following questions.
1. Are there other "settings" for the cluster, I could modify in order to reach a solution (and ideally faster than a week)? In general, it would be nice to have some straightforward guidelines of what to choose.
2. Is there any modification of the problem that could ease its resolution? I could shorten the length of the dispatch period reducing the number of variables but I would rather avoid that if possible.
Thank you very much for the help!!!

Hi Luis,
Thank you for sharing the model.
Are there other "settings" for the cluster, I could modify in order to reach a solution (and ideally faster than a week)? In general, it would be nice to have some straightforward guidelines of what to choose.
For this it is best to contact the HPC IT responsible and discuss with them. After how much time did the machine run out of memory? You could try the SoftMemLimit parameter to still be able to retrieve the feasible solution when hitting a memory limit. You could also try reducing the number of Threads to reduce memory consumption.
Is there any modification of the problem that could ease its resolution? I could shorten the length of the dispatch period reducing the number of variables but I would rather avoid that if possible.
I think there may be a few things to try.
I see that in your model you have constraints of the form
R736219: C315527  0.1666666666666667 C788787 <= 0
You can rewrite those as
R736219: 6 C315527  C788787 <= 0
It is best to use integer coefficient as much as possible to avoid numerical inaccuracies. Maybe you can apply this transformation to other constraints as well.
I see a long list of constraint of the form
R52584: C52584  3.120769070299958 C788785 <= 0
R52585: C52585  3.120633981121157 C788785 <= 0
R52586: C52586  3.120499003754496 C788785 <= 0
R52587: C52587  3.12036413824225 C788785 <= 0
R52588: C52588  3.12022938462588 C788785 <= 0
R52589: C52589  3.120094742947666 C788785 <= 0
R52590: C52590  3.119960213248872 C788785 <= 0Are these approximating a nonlinear (possibly quadratic) convex function? The issue with such constraints usually is that they differ only very little in the only coefficient. The coefficient difference in ~1e4 which is very close to the default FeasibilityTol value. This may cause numerical trouble during the solution process.
Then there are constraints
R1367884:  C316177 + C368761  0.9999853078790991 C526512
+ 1.000014692120901 C526513 = 0.0104913986666371The two coefficients on the LHS are both almost one. The difference between 1 and the coefficients is ~1e5 which again is very close to the default FeasibilityTol value. How are these coefficients computed? Are these coefficients actually reciprocals? Then you could do the same transformation as above for R736219. In general, I recommend having a look at our Guidelines for Numerical Issues.
There is also another approach you might want to try. You can first run the model with SolutionLimit=1. This will tell Gurobi to stop as soon as a feasible solution has been found. You then save this solution point either via API calls or via the ResultFile parameter. Then you start the optimization process again but you provide the previously found solution as a MIP start, see How do I use MIP start? In addition, you set the NoRelHeurTime parameter to run the socalled No Relaxation Heuristic. This heuristic focuses on finding and improving feasible points without the need of a B&B tree. Since, you provide a feasible point, this heuristic's main goal would be to improve the given solution point. You could set the time used for the heuristic to something like NoRelHeurTime=7200, which would make the heuristic run for 2 hours. Maybe running for even longer makes sense. Setting Method=2 makes sense for your model as well, because the Barrier algorithm performs best on the root relaxation.
Best regards,
Jaromił0 
Hi Jaromił,
Thank you for your answer!
I try to answer your questions below.
After how much time did the machine run out of memory? It ran out of memory after roughly two days and a half (2 days and 14:37:14 for the entire job; the preparation of the problem takes just a few minutes).
Regarding the HPC, I will try to ask about it to the manager. However, there is something that you might be able to help me out with. I restricted the number of nodes to one; would I gain anything by increasing the number of nodes? What I could see so far in another model, is that when I moved from
#SBATCH N 1 # Asking for 1 node
#SBATCH ntasks=1to
#SBATCH N 1 # Asking for 1 node
#SBATCH ntasks=1
#SBATCH cpuspertask=48I experienced a drastic reduction of time. From a couple of hours to reach a MIP gap of 1% to just a couple of minutes. Furthermore, in the first option, it looked like only the memory of one core was used (5300 MB), rather than the entire node.
Regarding your suggestions on model improvements, I have a few questions.
1. The constraints of this type:
R52584: C52584  3.120769070299958 C788785 <= 0
They represent the following constraints:
\(Q_{{heat \ pump}_{heat_{river_i}}}  COP_i \cdot P_{hp} \leq 0 \)
Q_hp_heat_river_i represents the heat production of a heat pump in time step i. The COP_river_heat_i is the Coefficient of Performance of a heat pump (the efficiency), which is precalculated depending on several characteristics and varies every hour, and P_hp is the electric capacity of the heat pump.
I took all the decimals from the COP calculation, but it would be perfectly fine to round it to two decimals if that improves performance.
There is nonetheless a convex function, which is represented by the following equations, whose basis I took from slides 2126 of https://moodle2.units.it/pluginfile.php/383444/course/section/90792/MathOpt21_Lecture21_210415.pdf:
1. In rows 525842:1:525851, equations of the type:
\(\begin{align} y_1 \leq b_1 \\ y_n \leq b_{n1} + b_n \\ y_p \leq b_{p1} \end{align} \)
, where \(y_{n}\) are the weights \([0,1]\) that are used to approximate a power function with an exponent close to 0,5. \(b_{n}\) are binary variables.
2. Additionally, there are the following equality constraints, which correspond to rows 1472382, 1472383 and 1472384.
\( \sum_{n=1}^{p} \left( HT_n \cdot y_n \right )  Q_{WH_{max}} = 0 \)
\( \sum_{n=1}^{p} y_n = 1 \)
\( \sum_{n=1}^{p1} b_n = 1 \)
I have solved the model without the previous convex function in a matter of minutes. However, the introduction of that convex function plus another set of equations to represent a thermal energy storage for cold water, seems to be difficult.
Regarding your second question, the equation that you point out represents the loading, unloading and loss of a thermal energy storage as shown in the equation below.
\( Q_{TES_{heat_{in_{i}}}} + Q_{TES_{heat_{out_{i}}}} + \left( \frac{1}{2} \cdot \eta_{2_{heat}} + 1\right) \cdot TES_{heat_{i}} + \left( \frac{1}{2} \cdot \eta_{2_{heat}}  1\right) TES_{heat_{i1}} = \eta_{1_{heat}} \)
The equation could be simplified to the equation below. In that case, the example that you present would become:
\( Q_{TES_{heat_{in_{i}}}} + Q_{TES_{heat_{out_{i}}}} + TES_{heat_{i}}  \frac{1}{2} \cdot \eta_{2_{heat}} \cdot TES_{heat_{i1}} = \eta_{1_{heat}} \)
R1367884:  C316177 + C368761  0.000014692120901 C526512 + C526513 = 0.0104913986666371
Thank you too for the other suggestions. I will try to rewrite the equations using coefficients closer to the unity and will also try the other suggestions.
Best regards,
Luis
0 
Hi Luis,
After how much time did the machine run out of memory? It ran out of memory after roughly two days and a half (2 days and 14:37:14 for the entire job; the preparation of the problem takes just a few minutes).
This sounds acceptable. The number of nodes in the B&B tree probably grew to millions (or beyond) so it is expected that at some point the machine runs out of memory. If you have write permission, you could use the NodefileStart and NodefileDir parameters to avoid running out of memory. Gurobi will start to write nodes to a file. This will slow down the optimization process but it will avoid an outofmemory error.
The difference in performance is because
#SBATCH cpuspertask=48
allows more Threads to be used by Gurobi. I would for now stick to one node and use all CPUs a node has before using multiple nodes for one job. The reason for this is that often communication and synchronization between nodes is slow compared to communication and synchronization within a given node. I think 48 CPUs in a given node should be enough.
I experienced a drastic reduction of time. From a couple of hours to reach a MIP gap of 1% to just a couple of minutes
Given the complexity of the model, what MIPGap are you aiming for? Is a MIPGap of let's say 0.5% already enough for your application or do you really need a very tight MIPGap, i.e., a very tight proof of global optimality? It is always important to think about the tradeoff between quality of the solution, quality of the global optimality proof, and optimization performance. It is very often the case the proving the last fractions of % of MIPGap is the hardest part, so it might take hours or days to squeeze out the last 0.01% of MIPGap which would allow for convergence.
I took all the decimals from the COP calculation, but it would be perfectly fine to round it to two decimals if that improves performance.
In that case, it is better to keep all decimals. I just thought that the values come from a fraction like \(\frac{1}{6}\) so you could have multiplied the whole row by the denominator to get integer coefficients.
There is nonetheless a convex function, which is represented by the following equations, whose basis I took from slides 2126 of https://moodle2.units.it/pluginfile.php/383444/course/section/90792/MathOpt21_Lecture21_210415.pdf:
Unfortunately, I cannot access the slides.
that are used to approximate a power function with an exponent close to 0,5.
You can use the addGenConstrPow method together with the FuncNonlinear=1 parameter to directly handle the nonlinear term. This would avoid the introduction of binaries. If you have a nonlinear constraint of the form \(y^{0.5} \geq x\), then you could approximate \(y^{0.5}\) be it's tangents at \(n\) points. You would not need binary variables in this case because \(y^{0.5}\) is concave and thus the whole constraint \(y^{0.5} \geq x\) is convex. The approximation would look like
\[\begin{align*}
\frac{1}{2\sqrt(y'_i)}*y + (\sqrt{y'_i}\frac{1}{2\sqrt(y'_i)}*y'_i) \geq x \quad \forall y'_i \in Y_i
\end{align*}\]where \(Y_i\) is a set of approximation points in the domain of \(y\), the LHS of the constraints describe \(m\cdot y + b\) with slope \(m=\frac{1}{2\sqrt(y'_i)} = \frac{d}{dy}\sqrt(y) _{y'_i}\) and \(b = \sqrt{y'_i}\frac{1}{2\sqrt(y'_i)}\cdot y'_i\). Note that the number of approximation points \(y'_i\) decides about the accuracy of the approximation but also about the final runtime.
The equation could be simplified to the equation below. In that case, the example that you present would become:
R1367884:  C316177 + C368761  0.000014692120901 C526512 + C526513 = 0.0104913986666371
This looks better.
Best regards,
Jaromił0
Please sign in to leave a comment.
Comments
3 comments