QLite seemed to perform reliably on the system described above using RedHat 6.2. However on a larger system (70 nodes) with RedHat 7.3, it appears to be unreliable with the queue locking daemon freezing for no apparent reason. Attempts to fix the problem under 7.3 suggest that there are some changes in the OS which are responsible rather than this being a problem of having more machines in the cluster. Note also that under RedHat 7.3, you must compile with -static. In the Makefile, change the line:
LINK = cc
LINK = cc -static
The Sun Grid Engine is recommended as an alternative before investing too much time in fixing problems in QLite.
QLite is a very simple batch queueing system for distributing jobs across a farm of machines. It does not have the same advanced features as a full batch queueing system such as NQS, DQS, or PBS, but is very simple to use and install and has very low latency between jobs making it ideal for queueing many thousands of short jobs. It has been run very happily with some 60,000 jobs distributed across 8 processors on 6 machines.
It consists of the following programs:
- Submits a script file to the QLite queue(s)
- List jobs in the QLite queue(s)
- The daemon run on each node which performs processing of jobs
- Shutdown the daemon on a node after waiting for the current job to finish.
- Suspend the daemons on all nodes after waiting for the current job to finish.
- A daemon to manage locking between the nodes.
QLite is freely available for use by not-for-profit organisations and for commercial organisations (providing they inform the author that they are using it). It may not be distributed without the author's permission, but must be obtained from this site.
QLite is supplied as a gzipped tar file of source code.