Profile Frontal Top

Today is Sunday. Vancouver is sunny. It’s been quite a while that I haven’t written anything. It took me a couple of weeks to have my tax reported finally. Hmmm… Anyway, finally, I’ve got some time to talk about Supercomputer:

This is going to be a series of 3 blogs.

1. Compare All Raspberry Pi Variants

Refer to: Comparison of All Raspberry Pi Variants.

2. Four Raspberry Pis

Here, please guarantee the 4 Raspberry Pis are respectively designated with the following hostnames:

• pi01
• pi02
• pi03
• pi04

2.1 pi01: Raspberry Pi 4 Model B Rev 1.4 8GB with Raspberry Pi OS (32-bit) with desktop 2020-05-27

In fact, at the very beginning, I used to try Pi4 64-bit raspbian kernel, as following:

However, there are still quite a lot of issues about Pi4 64-bit raspbian kernel, I have to downgrade the system from Pi4 64-bit raspbian kernel to Raspberry Pi OS (32-bit) with desktop 2020-05-27 in the end.

3. Raspberry Pi Cluster Configuration

This section heavily refers to the following blogs:

Actually, the cluster can certainly be arbitrarily configured as you wish. A typical configuration is 1-master-3-workers, but which one should be the master? Is it really a good idea to ALWAYS designate the MOST powerful one as the master? Particularly, in my case, 4 Raspberry Pis are of different versions, so that they are of different computing capability.

3.1 Configure Hostfile

It’s always a good idea to create a hostfile on the master node. However, as reasons mentioned above, there is NO priority among ALL nodes in my case, I configured the hostfile for ALL 4 Raspberry Pis.

node hostfile
pi01 192.168.1.253 slots=4
192.168.1.251 slots=4
192.168.1.249 slots=4
192.168.1.247 slots=4
pi02 192.168.1.251 slots=4
192.168.1.253 slots=4
192.168.1.249 slots=4
192.168.1.247 slots=4
pi03 192.168.1.249 slots=4
192.168.1.253 slots=4
192.168.1.251 slots=4
192.168.1.247 slots=4
pi04 192.168.1.247 slots=4
192.168.1.253 slots=4
192.168.1.251 slots=4
192.168.1.249 slots=4

3.2 SSH-KEYGEN

In order to test multiple nodes across the cluster, we need to generate SSH keys to avoid inputting password for logging into the other nodes all the time. In such, for each Raspberry Pi, you’ll have to generate a SSH key by ssh-keygen -t rsa, and push this generated key using command ssh-copy-id onto the other 3 Raspberry Pis. Finally, for a cluster of 4 Raspberry Pis, there are 3 authorized keys (for these other 3 Raspberry Pis) stored in file /home/pi/.ssh/authorized_keys on each of the 4 Raspberry Pis.

4. Cluster Test

4.1 Command mpiexec

4.1.1 Argument: -hostfile and -n

For a cluster of 4 Raspberry Pis, there will be 4*4=16 CPUs in total. Therefore, the maximum number to specify for argument -n will be 16. Otherwise, you’ll meet the following ERROR message:

4.2 mpi4py-examples

Run all examples with argument --hostfile ~/hostfile, namely, 16 cores in a row.

4.2.3 mpi4py-examples 03-scatter-gather

Sometimes, without specifying the parameter btl_tcp_if_include, the running program will hang:

Please refer to the explanation TCP: unexpected process identifier in connect_ack. Now, let’s specify the parameter as --mca btl_tcp_if_include "192.168.1.251/24,192.168.1.249/24,192.168.1.247/24".

4.3 Example mpi4py prime.py

4.3.1 Computing Capability For Each CPU

Here, we’re taking mpi4py prime.py as our example.

Hostname Computing Time
pi01
pi02
pi03
pi04

Clearly, the computing capability of each CPU on pi01/pi02 is roughly 3 times faster than the CPU on pi03/pi04, which can be easily estimated from the parameter BogoMIPS:

$108.00 (pi01/pi02) / 38.40 (pi03/pi04) \approx 3$

4.3.2 Computing Capability For Each Raspberry Pi

Clearly, on each of my Raspberry Pi, including

there are 4 CPUs. So, let’s take a look at the result when specify argument -n 4.

Master Worker Computing Time
pi01 pi02
pi03
pi04
pi02 pi01
pi03
pi04
pi03 pi01
pi02
pi04
pi04 pi01
pi02
pi03

Clearly, to make full use of 4 CPUs -n 4 is roughly 4 times faster than just to use 1 CPU -n 1.

4.3.3 Computing Capability For The cluster

I carried out 2 experiments:

• Experiment 1 is done on 4 nodes:
• Experiment 2 is done on the FASTEST 2 nodes:
hostfile on master Computing Time
192.168.1.253 slots=4
192.168.1.251 slots=4
192.168.1.249 slots=4
192.168.1.247 slots=4
192.168.1.253 slots=4
192.168.1.251 slots=4

The results are obviously telling:

• to calculate using a cluster of 4 Raspberry Pis with 16 CPUs is ALWAYS faster than running on a single node with 4 CPUs.

$42.22 \le 50$

• to calculate using 2 fastest nodes is even faster than running on a cluster of 4 nodes. This clearly hints the importance of Load Balancing.

$29.56 \le 42.22$

• the speed in Experiment 2 is roughly doubled as that using a single node of pi03 or pi04.

$52 (pi01/pi02) / 29.56 (Experiment 2) \approx 2$

In the end of this blog, as for Load Balancing, I may talk about it some time in the future.