INTRODUCTION

The LINUX cluster gibbs has 1 master node and 9 slave nodes, total of 20 XEON 2.4GHz Intel processors, which provides parallel processing capabilities with a peak performance of 52 billions floating point operations per second. The cluster has 12 GB of distributed memory and 300 GB of storage space. The cluster also provides centralized file server capabilities as well as a tape library with 400 GB of storage space to archive and secure data. Gibbs is connected to UTMB network, and thus other UNIX workstations or PCs within the campus and off campus can easily access the cluster.

Gibbs file system
All 9 slave nodes named node1, node2 ... and node9 are locally mounted on to the master node via NFS with a 100/1000 GB switch  box within the cluster enclosure.
The /usr/local, /home and /data are mounted on to the master node and /data is used as all users' login home directory. User can run df command to view mounted file systems.

An example is shown here at node1:
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda1              2063504    171176   1787508   9% /
/dev/hda7             18421552  12344632   5141136  71% /scratch
/dev/hda2             10317860     32848   9760892   1% /tmp
/dev/hda3              4127108   1697852   2219608  44% /usr
/dev/hda6               505605     53369    426132  12% /var
master:/usr/local      8064304   4798052   2856596  63% /usr/local
master:/home           8198780   1197648   6584652  16% /home
master:/data         356193944  85939788 252160552  26% /data
 
All user  application  programs are  installed on /home/local or /usr/local for global access while  working at a slave node.


GIBBS LOCAL NODES ACCESS HOW TO

(bold faced red italic are inputs from users)

Access to a slave node:
Once you have logged on to gibbs you can access (login) to any one of gibb’s slave node via:

To logon to a local node
ssh node# (# goes from 1 to 9). You can work on that node as if you were  working on a workstation and you can have as many Xterms as you like on that node.

To quit from a node:
 exit
and it will bring you back to where you were.


 

Running commands remotely
You can run commands at any one of the node (including Master node) to a remote node.

Examples:

List files and directories of /scratch/people on node6:
ssh node6 ls /scratch/people

To check node9 uptime:
ssh node9 uptime

To check node9 process status
ssh node9 ps -efl | grep myprocess

To delete node9 /tmp/myfile
ssh node9 rm -rf /tmp/myfile

To use scp to copy a file from current directory a node7 directory: /scratch/people/myhome
scp myfile node7:/scratch/people/myhome/

To transfer files via sftp between your working node and a remote node
sftp node# 
(# goes from 1 to 9 and you can use ftp's commands to put or get files between the nodes)


DSH HOW TO

dsh is a program which runs a single command on multiple cluster nodes or computers at the same time. Dsh was written in perl and C.
For information on how to run dsh please logon to gibbs and run man dsh for details.



Examples:
(bold faced red italic are inputs from users)

A) Viewing directories or file names at different slave nodes

    List all *.exe files at /scratch/people/my_home_dir on all gibbs nodes:
    dsh -a 'ls /scratch/people/my_home_dir/*.exe'

    List /scratch/people directory on nodes 6,7 and 9:
    dsh -w node6,node7,node9 'ls -l /scratch/people' (note -l is a ls command option)


B) Copying, moving and deleting directory or files to or from slave nodes

    Copying files with -ra option to keep the file attributes to all slave nodes
    dsh –a 'cp -ra ~/myhome/myfile /scratch/people/myhome'
    dsh –a 'cp -ra /home/people/myhome/myfile.* /scratch/people/myhome'

  

Or you can use scp if the master and slave's file systems can not be seen each other
    dsh –a 'scp -ra master:/etc/myfile /etc'

    Copying files with updated file attributes to slave nodes 1 and 3
    dsh –w node1, node3 'cp /home/people/myhome/myfile.* /scratch/people/myhome'

    Copying a file from slave node9 to master node at your current directory
    dsh -w node9 'cp /scratch/people/myhome/myfile .'  (you might not need it)

    Deleting files or directories from all slave nodes with -rf options
    dsh –a 'rm -rf  /scratch/people/myhome/my_junk.* (you may have problem without using -rf options)

    Deleting files from some nodes with -rf options
    dsh –w node1,node2,node3 'rm -rf  /scratch/people/myhome/my_junk.*'


C) Viewing, appending file contents to or from salve nodes

    Viewing a file contents using cat, head and tail commands at some nodes or all nodes
    but more can not be used

    dsh -w node1 'cat /scratch/people/myhome/myfile '
    dsh  -a 'head
/scratch/people/myhome/myfile '
    dsh  -w node5,node1 'tail /scratch/people/myhome/myfile '

    Pending a file's contents on a slave node using cat command to your current directory
    dsh -w node1 'cat /scratch/people/myhome/myfile >> myfile1'
    dsh -w node1 'cat /scratch/people/myhome/myfile' >> myfile1 (it works but wrong)


D) File Pattern searching and comparing

Using grep to search the file patterns on slave nodes

    dsh -w node1 'cat /scratch/people/myhome/myfile | grep "my pattern"'
    dsh -a 'cat /scratch/people/myhome/myfile.* | grep "my pattern"'

    Using diff, or cmp to compare the file patterns on slave nodes
    dsh -w node1 'diff ~/myfile /scratch/people/myhome/myprogram/myfile'
    dsh -w node1, node2 'diff /scratch/people/myhome/myfile /scratch/people/myhome/myprogram/myfile'
     Note: the second diff will compare the two files at the same nodes not the two files between the node.


E) Other miscellaneous usage of dsh on salve nodes

    Checking process status of slave nodes
    dsh -a 'ps -efl | grep myprogram'
    dsh -w node1,node2,node3  'ps -efl | grep myprogram'

    Checking  nodes uptimes
    dsh -a 'uptime'

    Checking  total number of slave nodes the gibbs has
    dsh -a -q

F) Interactive distributed shell
The dsh can also be used interactively by issuing dsh command with an option of number of nodes you         want to have dsh to operate on. After enter the dsh shell you will see a dsh prompt and you can do unix    commands like you normally do under sh and all the unix commands you entered will be run on the             number of nodes you assigned. Note: (some commands will not run properly, such as more, vi, ftp etc.         that requires user's input to continue)

    Running dsh on all nodes
   
dsh -a

    Running dsh on nodes 3 and 5
   
dsh -w node3, node5 (You can used pwd command to confirm the number of nodes you have selected.)

DSH and SCP HOW TO


You can combine dsh and scp to copy or move files between file systems that are not nfs mounted.
To copy files from master's /etc directory to /etc directory of all nodes:

dsh -a "scp master:/etc/passwd /etc/passwd"

For root user if you see Permission denied, please try again Then you can copy it to /home/mydir which can be
seen by all nodes and then do
dsh -a "scp /home/mydir/passwd /etc"  to update you passwd file for all nodes.



UNIX COMMANDS FOR SYSTEM RESOURCES

To check all system resources such as CPU, memory, disk, uptime and who are using CPU times etc.
cpu (follow the instructions. A locally written utility shell script. It's buggy please report to yuxu@utmb.edu if you see it)

To kill a process on all nodes
gkill processnam (it kills a process matched processname on all nodes)