TipsnTricks

Creating tarballs the parallel way

Dear Users,

Creating tarballs may sometimes have taken some of your patience, occasionally. Compressing directories is especially useful when prior to transfer or archiving. Well, this too can be done in parallel. We just created a wiki entry on the topic.

Your HPC-Team

Archiving from Mogon II

Dear Users,

we have set up our archiving scripts on Mogon II, such that the Mogon I and II setup is identical.

You will find further information in our wiki.

Your HPC Team

Troubleshooting 101

Dear Users,

In order to shoot your troubles (with job scripts) you ought to know first. Right? Alas, SLURM is a little bit obfuscating sometimes when job scripts do not use proper job steps. Some hints can be found in our wiki.

Also, be sure to check for available modules rather than compiling standard software yourself. In case some software is missing, you may use our little form to ask for a software or library to be installed. When you report issues with a software we installed - at least we know what you are talking about.

Best,

Your HPC Team

Using mogonfs

Dear Users,

We occasionally mentioned that file transfers can be a lot faster, if no encryption (like with scp ) is applied. And for quite a while the wiki stated example will follow asap in the section on ftp. Well, a comprehensive introduction to (l)ftp is not our goal, but at least know we have the promised example.

Your HPC-Team

"Going Productive"

Dear Users,

 

Noticed a low turnover of your jobs? One potential - and alas not so infrequent - cause is taking too much resources.

Ok, what is "too much" with regard to resources (CPU time, RAM)? Of course, you do not want to see your jobs crashing because they hit the run time or memory limit. Therefore you rather ask for a little overhead. And this is what we recommend as well. After all: What is the point if you loose time and have to re-submit?

Yet, asking for 3 GB, when the first 1000 jobs took all below 0.5 GB will cause you to occupy slots where other users and also your jobs could be running. This assumes jobs, of course, with only one or a few slots. If taking multiple nodes, you will be unnecessarily waiting for nodes with more memory. (We sometimes observe memory requirement ratios which are worse.)

Likewise for run time limits: Always asking for the maximum run time of a queue, will impair backfilling, the mechanism which attempts to use all potential CPU time. As a consequence your jobs will be pending longer than needed.

So, instead of stuffing queues with untested jobs, we ask you to test a few jobs (which might require some great overhead in terms of run time and memory). Look for the actual run time in the LSF report and also the actual memory used (its maximum value). It is then still alright to "round up" those values, just to be safe. However, try not to be too cautious as a "too much" will result in a slower work flow for you.

We reserve the right to point you to problematic usage. But remember: We offer courses, individual counseling, etc.. Just ask for our help, if in doubt.

Your HPC team

Shellcheck

Dear Users,

We frequently see jobs dying because of faulty scripts. This is part of a development cycle. After all: Who is perfect?

There are other ways than trying to correct the script post mortem - and saving time. One is to just check the syntax without executing a script:

$ bash -n scriptname

Another, very powerful tool isĀ this on-line shell checker tool.

Your HPC-Team