Running Jobs on Clusters - Job Policy
Send comments to firstname.lastname@example.org
1. Job cleaning policy
To ensure the efficient use of cluster computing resources and maintain the health of the system, jobs submitted without the Condor job scheduler and jobs that have been running over 7 days, except those on open nodes and those protected by the sysadmin, will be removed by system cleaning tool.
You are encouraged to break a big calculation into small pieces that can finish within 7 days, or write programs that can resume from a saved state.
2. Types of jobs
While most of the computing nodes are reserved for computation-intensive jobs, some nodes are open to interactive jobs and testing jobs.
Computing-intensive jobs are programs that can be left running on computing nodes with minimal supervision. These jobs usually take hours to days to complete.
Interactive jobs are most of UNIX shell commands and programs with graphic user interfaces. These jobs interact with users during run time.
Testing jobs are programs underdevelopment. Developer needs to launch the test/debug codes immediately to reduce cycle time of development.
3. Open and reserved nodes
The following nodes are open to all users.
* Master node is open for user login, job submission, compiling serial jobs, and running interactive programs such as MatLab and Mathematica.
* Benny master node is also open for compiling and testing MPI programs, and running Fluent and Evolver.
* Node41 is open for testing MPI programs and running Fluent and Evolver.
Other nodes do not provide full access to regular users. Condor jobs are given highest priority on the reserved computing nodes. There is no need to login to these nodes.
* Master2 is file server, license server and condor server of benny. No login is granted to regular users.
* Node1 to node 40 are reserved for Condor serial jobs.
* Node 42 to node 93 are available for both serial jobs and MPI jobs.
4. Submitting Comuting-intensive with Job scheduler
Computing-intensive jobs must be submitted with job scheduler Condor. The documentation of Condor is here.
If your interactive job requires more than 2 full CPUs and lasts longer than 4 hours, you need to contact system administrator. System administrator will look for alternative solutions that balance your need and system performance.