A Cluster of Raspberry Pis (1) - Configuration

Profile Frontal Top
Profile View Frontal View Top View

Today is Sunday. Vancouver is sunny. It's been quite a while that I haven't written anything. It took me a couple of weeks to have my tax reported finally. Hmmm... Anyway, finally, I've got some time to talk about Supercomputer:

This is going to be a series of 3 blogs.

1. Compare All Raspberry Pi Variants

Refer to: Comparison of All Raspberry Pi Variants.

2. Four Raspberry Pis

Here, please guarantee the 4 Raspberry Pis are respectively designated with the following hostnames:

  • pi01
  • pi02
  • pi03
  • pi04

2.1 pi01: Raspberry Pi 4 Model B Rev 1.4 8GB with Raspberry Pi OS (32-bit) with desktop 2020-05-27

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
pi@pi01:~ $ hostname
pi01
pi@pi01:~ $ uname -a
Linux pi01 4.19.118-v7l+ #1311 SMP Mon Apr 27 14:26:42 BST 2020 armv7l GNU/Linux
pi@pi01:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster
pi@pi01:~ $ cat /proc/meminfo
MemTotal: 8104404 kB
MemFree: 7275484 kB
MemAvailable: 7712212 kB
Buffers: 41552 kB
Cached: 592328 kB
SwapCached: 0 kB
Active: 326536 kB
Inactive: 373352 kB
Active(anon): 65820 kB
Inactive(anon): 9980 kB
Active(file): 260716 kB
Inactive(file): 363372 kB
Unevictable: 16 kB
Mlocked: 16 kB
HighTotal: 7405568 kB
HighFree: 6735192 kB
LowTotal: 698836 kB
LowFree: 540292 kB
SwapTotal: 102396 kB
SwapFree: 102396 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 66020 kB
Mapped: 70740 kB
Shmem: 11760 kB
Slab: 68352 kB
SReclaimable: 35736 kB
SUnreclaim: 32616 kB
KernelStack: 1320 kB
PageTables: 1964 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 4154596 kB
Committed_AS: 498204 kB
VmallocTotal: 245760 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 656 kB
CmaTotal: 262144 kB
CmaFree: 223228 kB
pi@pi01:~ $ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 108.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 1
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 108.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 2
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 108.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 3
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 108.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

Hardware : BCM2835
Revision : d03114
Serial : 10000000bc6e6e05
Model : Raspberry Pi 4 Model B Rev 1.4

In fact, at the very beginning, I used to try Pi4 64-bit raspbian kernel, as following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
pi@pi01:~ $ hostname
pi01
pi@pi01:~ $ uname -a
Linux pi01 5.4.44-v8+ #1320 SMP PREEMPT Wed Jun 3 16:20:05 BST 2020 aarch64 GNU/Linux
pi@pi01:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster
pi@pi01:~ $ cat /proc/meminfo
MemTotal: 7950652 kB
MemFree: 7749820 kB
MemAvailable: 7770884 kB
Buffers: 16092 kB
Cached: 105460 kB
SwapCached: 0 kB
Active: 91600 kB
Inactive: 47532 kB
Active(anon): 17832 kB
Inactive(anon): 8404 kB
Active(file): 73768 kB
Inactive(file): 39128 kB
Unevictable: 16 kB
Mlocked: 16 kB
SwapTotal: 102396 kB
SwapFree: 102396 kB
Dirty: 44 kB
Writeback: 0 kB
AnonPages: 17628 kB
Mapped: 26968 kB
Shmem: 8652 kB
KReclaimable: 16632 kB
Slab: 35720 kB
SReclaimable: 16632 kB
SUnreclaim: 19088 kB
KernelStack: 2304 kB
PageTables: 1284 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 4077720 kB
Committed_AS: 181984 kB
VmallocTotal: 262930368 kB
VmallocUsed: 7680 kB
VmallocChunk: 0 kB
Percpu: 688 kB
CmaTotal: 262144 kB
CmaFree: 256244 kB
pi@pi01:~ $ cat /proc/cpuinfo
processor : 0
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 1
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 2
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 3
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

Hardware : BCM2835
Revision : d03114
Serial : 10000000bc6e6e05
Model : Raspberry Pi 4 Model B Rev 1.4

However, there are still quite a lot of issues about Pi4 64-bit raspbian kernel, I have to downgrade the system from Pi4 64-bit raspbian kernel to Raspberry Pi OS (32-bit) with desktop 2020-05-27 in the end.

2.2 pi02: Raspberry Pi 4 Model B Rev 1.1 4GB with Raspberry Pi OS (32-bit) with desktop 2020-05-27

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
pi@pi02:~ $ hostname
pi02
pi@pi02:~ $ uname -a
Linux pi02 4.19.118-v7l+ #1311 SMP Mon Apr 27 14:26:42 BST 2020 armv7l GNU/Linux
pi@pi02:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster
pi@pi02:~ $ cat /proc/meminfo
MemTotal: 3999744 kB
MemFree: 3556984 kB
MemAvailable: 3761604 kB
Buffers: 42296 kB
Cached: 270688 kB
SwapCached: 0 kB
Active: 196952 kB
Inactive: 133824 kB
Active(anon): 18036 kB
Inactive(anon): 8376 kB
Active(file): 178916 kB
Inactive(file): 125448 kB
Unevictable: 16 kB
Mlocked: 16 kB
HighTotal: 3264512 kB
HighFree: 2965564 kB
LowTotal: 735232 kB
LowFree: 591420 kB
SwapTotal: 102396 kB
SwapFree: 102396 kB
Dirty: 40 kB
Writeback: 0 kB
AnonPages: 17848 kB
Mapped: 26708 kB
Shmem: 8624 kB
Slab: 54128 kB
SReclaimable: 27336 kB
SUnreclaim: 26792 kB
KernelStack: 1024 kB
PageTables: 1172 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 2102268 kB
Committed_AS: 184320 kB
VmallocTotal: 245760 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 656 kB
CmaTotal: 262144 kB
CmaFree: 223228 kB
pi@pi02:~ $ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 108.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 1
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 108.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 2
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 108.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 3
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 108.00
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

Hardware : BCM2835
Revision : c03111
Serial : 100000006c0c9b01
Model : Raspberry Pi 4 Model B Rev 1.1

2.3 pi03: Raspberry Pi 3 Model B Rev 1.2 1GB with Raspberry Pi OS (32-bit) with desktop 2020-05-27

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
pi@pi03:~ $ hostname
pi03
pi@pi03:~ $ uname -a
Linux pi03 4.19.118-v7+ #1311 SMP Mon Apr 27 14:21:24 BST 2020 armv7l GNU/Linux
pi@pi03:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster
pi@pi03:~ $ cat /proc/meminfo
MemTotal: 895500 kB
MemFree: 375644 kB
MemAvailable: 709712 kB
Buffers: 54464 kB
Cached: 317028 kB
SwapCached: 0 kB
Active: 244084 kB
Inactive: 197180 kB
Active(anon): 70032 kB
Inactive(anon): 6488 kB
Active(file): 174052 kB
Inactive(file): 190692 kB
Unevictable: 16 kB
Mlocked: 16 kB
SwapTotal: 102396 kB
SwapFree: 102396 kB
Dirty: 20 kB
Writeback: 0 kB
AnonPages: 69816 kB
Mapped: 80468 kB
Shmem: 6752 kB
Slab: 59000 kB
SReclaimable: 28552 kB
SUnreclaim: 30448 kB
KernelStack: 1600 kB
PageTables: 3252 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 550144 kB
Committed_AS: 837224 kB
VmallocTotal: 1163264 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 640 kB
CmaTotal: 8192 kB
CmaFree: 6280 kB
pi@pi03:~ $ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 1
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 2
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 3
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

Hardware : BCM2835
Revision : a02082
Serial : 000000009fcc6a22
Model : Raspberry Pi 3 Model B Rev 1.2

2.4 pi04: Raspberry Pi 3 Model B Rev 1.2 1GB with Raspberry Pi OS (32-bit) with desktop 2020-05-27

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
pi@pi04:~ $ hostname
pi04
pi@pi04:~ $ uname -a
Linux pi04 4.19.118-v7+ #1311 SMP Mon Apr 27 14:21:24 BST 2020 armv7l GNU/Linux
pi@pi04:~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 10 (buster)
Release: 10
Codename: buster
pi@pi04:~ $ cat /proc/meminfo
MemTotal: 895500 kB
MemFree: 417612 kB
MemAvailable: 727524 kB
Buffers: 50624 kB
Cached: 296456 kB
SwapCached: 0 kB
Active: 225904 kB
Inactive: 174124 kB
Active(anon): 53244 kB
Inactive(anon): 5968 kB
Active(file): 172660 kB
Inactive(file): 168156 kB
Unevictable: 16 kB
Mlocked: 16 kB
SwapTotal: 102396 kB
SwapFree: 102396 kB
Dirty: 16 kB
Writeback: 0 kB
AnonPages: 52960 kB
Mapped: 65096 kB
Shmem: 6240 kB
Slab: 57688 kB
SReclaimable: 28096 kB
SUnreclaim: 29592 kB
KernelStack: 1464 kB
PageTables: 2724 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 550144 kB
Committed_AS: 715652 kB
VmallocTotal: 1163264 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 640 kB
CmaTotal: 8192 kB
CmaFree: 6152 kB
pi@pi04:~ $ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 1
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 2
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

processor : 3
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4

Hardware : BCM2835
Revision : a22082
Serial : 000000003fc1b876
Model : Raspberry Pi 3 Model B Rev 1.2

3. Raspberry Pi Cluster Configuration

This section heavily refers to the following blogs: - Build a Raspberry Pi cluster computer - Build your own bare-metal ARM cluster - Installing MPI for Python on a Raspberry Pi Cluster - Instructables: How to Make a Raspberry Pi SuperComputer!

Actually, the cluster can certainly be arbitrarily configured as you wish. A typical configuration is 1-master-3-workers, but which one should be the master? Is it really a good idea to ALWAYS designate the MOST powerful one as the master? Particularly, in my case, 4 Raspberry Pis are of different versions, so that they are of different computing capability.

3.1 Configure Hostfile

It's always a good idea to create a hostfile on the master node. However, as reasons mentioned above, there is NO priority among ALL nodes in my case, I configured the hostfile for ALL 4 Raspberry Pis.

node hostfile
pi01 192.168.1.253 slots=4
192.168.1.251 slots=4
192.168.1.249 slots=4
192.168.1.247 slots=4
pi02 192.168.1.251 slots=4
192.168.1.253 slots=4
192.168.1.249 slots=4
192.168.1.247 slots=4
pi03 192.168.1.249 slots=4
192.168.1.253 slots=4
192.168.1.251 slots=4
192.168.1.247 slots=4
pi04 192.168.1.247 slots=4
192.168.1.253 slots=4
192.168.1.251 slots=4
192.168.1.249 slots=4

3.2 SSH-KEYGEN

In order to test multiple nodes across the cluster, we need to generate SSH keys to avoid inputting password for logging into the other nodes all the time. In such, for each Raspberry Pi, you'll have to generate a SSH key by ssh-keygen -t rsa, and push this generated key using command ssh-copy-id onto the other 3 Raspberry Pis. Finally, for a cluster of 4 Raspberry Pis, there are 3 authorized keys (for these other 3 Raspberry Pis) stored in file /home/pi/.ssh/authorized_keys on each of the 4 Raspberry Pis.

4. Cluster Test

4.1 Command mpiexec

4.1.1 Argument: -hostfile and -n

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
pi@pi01:~ $ mpiexec -hostfile hostfile -n 16 hostname 
pi01
pi01
pi01
pi01
pi02
pi02
pi02
pi03
pi03
pi03
pi02
pi03
pi04
pi04
pi04
pi04

For a cluster of 4 Raspberry Pis, there will be 4*4=16 CPUs in total. Therefore, the maximum number to specify for argument -n will be 16. Otherwise, you'll meet the following ERROR message:

1
2
3
4
5
6
7
8
9
pi@pi01:~ $ mpiexec -hostfile hostfile -n 20 hostname 
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 20 slots
that were requested by the application:
hostname

Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------

4.1.2 Execute Python Example mpi4py helloworld.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
pi@pi01:~ $ mpiexec -hostfile hostfile -n 16 python Downloads/helloworld.py 
Hello, World! I am process 1 of 16 on pi01.
Hello, World! I am process 5 of 16 on pi02.
Hello, World! I am process 6 of 16 on pi02.
Hello, World! I am process 7 of 16 on pi02.
Hello, World! I am process 4 of 16 on pi02.
Hello, World! I am process 15 of 16 on pi04.
Hello, World! I am process 12 of 16 on pi04.
Hello, World! I am process 13 of 16 on pi04.
Hello, World! I am process 14 of 16 on pi04.
Hello, World! I am process 2 of 16 on pi01.
Hello, World! I am process 0 of 16 on pi01.
Hello, World! I am process 3 of 16 on pi01.
Hello, World! I am process 9 of 16 on pi03.
Hello, World! I am process 10 of 16 on pi03.
Hello, World! I am process 11 of 16 on pi03.
Hello, World! I am process 8 of 16 on pi03.

4.2 mpi4py-examples

Run all examples with argument --hostfile ~/hostfile, namely, 16 cores in a row.

4.2.1 mpi4py-examples 01-hello-world

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
pi@pi01:~/Downloads/mpi4py-examples $ mpirun --hostfile ~/hostfile ./01-hello-world 
Hello! I'm rank 1 from 16 running in total...
Hello! I'm rank 2 from 16 running in total...
Hello! I'm rank 3 from 16 running in total...
Hello! I'm rank 0 from 16 running in total...
Hello! I'm rank 6 from 16 running in total...
Hello! I'm rank 7 from 16 running in total...
Hello! I'm rank 4 from 16 running in total...
Hello! I'm rank 5 from 16 running in total...
Hello! I'm rank 12 from 16 running in total...
Hello! I'm rank 10 from 16 running in total...
Hello! I'm rank 11 from 16 running in total...
Hello! I'm rank 13 from 16 running in total...
Hello! I'm rank 9 from 16 running in total...
Hello! I'm rank 14 from 16 running in total...
Hello! I'm rank 8 from 16 running in total...
Hello! I'm rank 15 from 16 running in total...

4.2.2 mpi4py-examples 02-broadcast

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
pi@pi01:~/Downloads/mpi4py-examples $ mpirun --hostfile ~/hostfile ./02-broadcast 
------------------------------------------------------------------------------
Running on 16 cores
------------------------------------------------------------------------------
[00] [0. 1. 2. 3. 4.]
[04] [0. 1. 2. 3. 4.]
[03] [0. 1. 2. 3. 4.]
[05] [0. 1. 2. 3. 4.]
[07] [0. 1. 2. 3. 4.]
[01] [0. 1. 2. 3. 4.]
[15] [0. 1. 2. 3. 4.]
[13] [0. 1. 2. 3. 4.]
[12] [0. 1. 2. 3. 4.]
[11] [0. 1. 2. 3. 4.]
[08] [0. 1. 2. 3. 4.]
[09] [0. 1. 2. 3. 4.]
[02] [0. 1. 2. 3. 4.]
[10] [0. 1. 2. 3. 4.]
[06] [0. 1. 2. 3. 4.]
[14] [0. 1. 2. 3. 4.]

4.2.3 mpi4py-examples 03-scatter-gather

Sometimes, without specifying the parameter btl_tcp_if_include, the running program will hang:

1
2
3
4
5
6
7
8
9
10
11
12
pi@pi01:~/Downloads/mpi4py-examples $ mpirun --np 16 --hostfile ~/hostfile  03-scatter-gather
------------------------------------------------------------------------------
Running on 16 cores
------------------------------------------------------------------------------
After Scatter:
[0] [0. 1. 2. 3.]
[1] [4. 5. 6. 7.]
[pi03][[1597,1],8][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[1597,1],10]
[pi01][[1597,1],0][btl_tcp_endpoint.c:626:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[1597,1],3]
[2] [ 8. 9. 10. 11.]
^C^Z
[1]+ Stopped mpirun --np 16 --hostfile ~/hostfile 03-scatter-gather

Please refer to the explanation TCP: unexpected process identifier in connect_ack. Now, let's specify the parameter as --mca btl_tcp_if_include "192.168.1.251/24,192.168.1.249/24,192.168.1.247/24".

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
pi@pi01:~/Downloads/mpi4py-examples $ mpirun --np 16 --hostfile ~/hostfile --mca btl_tcp_if_include "192.168.1.251/24,192.168.1.249/24,192.168.1.247/24"  03-scatter-gather
------------------------------------------------------------------------------
Running on 16 cores
------------------------------------------------------------------------------
After Scatter:
[0] [0. 1. 2. 3.]
[1] [4. 5. 6. 7.]
[2] [ 8. 9. 10. 11.]
[3] [12. 13. 14. 15.]
[4] [16. 17. 18. 19.]
[5] [20. 21. 22. 23.]
[6] [24. 25. 26. 27.]
[7] [28. 29. 30. 31.]
[8] [32. 33. 34. 35.]
[9] [36. 37. 38. 39.]
[10] [40. 41. 42. 43.]
[11] [44. 45. 46. 47.]
[12] [48. 49. 50. 51.]
[13] [52. 53. 54. 55.]
[14] [56. 57. 58. 59.]
[15] [60. 61. 62. 63.]
After Allgather:
[0] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[1] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[2] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[3] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[4] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[5] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[6] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[7] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[8] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[9] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[10] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[11] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[12] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[13] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[14] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]
[15] [ 0. 2. 4. 6. 8. 10. 12. 14. 16. 18. 20. 22. 24. 26.
28. 30. 32. 34. 36. 38. 40. 42. 44. 46. 48. 50. 52. 54.
56. 58. 60. 62. 64. 66. 68. 70. 72. 74. 76. 78. 80. 82.
84. 86. 88. 90. 92. 94. 96. 98. 100. 102. 104. 106. 108. 110.
112. 114. 116. 118. 120. 122. 124. 126.]

4.2.4 mpi4py-examples 04-image-spectrogram

4.2.5 mpi4py-examples 05-pseudo-whitening

4.2.6 NULL

4.2.7 mpi4py-examples 07-matrix-vector-product

1
2
3
4
5
pi@pi01:~/Downloads/mpi4py-example $ mpirun --np 16 --hostfile ~/hostfile --mca btl_tcp_if_include "192.168.1.251/24,192.168.1.249/24,192.168.1.247/24"  07-matrix-vector-product
============================================================================
Running 16 parallel MPI processes
20 iterations of size 10000 in 1.14s: 17.50 iterations per second
============================================================================

4.2.8 mpi4py-examples 08-matrix-matrix-product.py

4.2.9 mpi4py-examples 09-task-pull.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
pi@pi01:~/Downloads/mpi4py-examples $ mpirun --hostfile ~/hostfile python ./09-task-pull.py 
Master starting with 15 workers
I am a worker with rank 1 on pi01.
I am a worker with rank 2 on pi01.
I am a worker with rank 3 on pi01.
I am a worker with rank 4 on pi02.
I am a worker with rank 5 on pi02.
I am a worker with rank 6 on pi02.
I am a worker with rank 7 on pi02.
Sending task 0 to worker 2
Sending task 1 to worker 1
Sending task 2 to worker 3
Got data from worker 2
Sending task 3 to worker 2
Got data from worker 3
Sending task 4 to worker 3
Got data from worker 1
Got data from worker 2
Sending task 5 to worker 1
Sending task 6 to worker 2
Got data from worker 3
Sending task 7 to worker 3
Got data from worker 1
Got data from worker 2
Sending task 8 to worker 1
Sending task 9 to worker 2
Got data from worker 3
Sending task 10 to worker 3
Got data from worker 1
Got data from worker 2
Sending task 11 to worker 1
Sending task 12 to worker 2
Got data from worker 3
Sending task 13 to worker 3
Got data from worker 1
Got data from worker 2
Sending task 14 to worker 1
Sending task 15 to worker 2
Got data from worker 3
Sending task 16 to worker 3
Got data from worker 1
Got data from worker 2
Sending task 17 to worker 1
Sending task 18 to worker 2
Got data from worker 3
Sending task 19 to worker 3
Got data from worker 1
Sending task 20 to worker 1
Got data from worker 2
Sending task 21 to worker 2
Got data from worker 3
Sending task 22 to worker 3
Got data from worker 1
Sending task 23 to worker 1
Got data from worker 2
Got data from worker 3
Sending task 24 to worker 2
Sending task 25 to worker 3
Got data from worker 2
Got data from worker 1
Sending task 26 to worker 2
Got data from worker 3
Sending task 27 to worker 3
Got data from worker 2
Sending task 28 to worker 1
Sending task 29 to worker 2
Got data from worker 3
Sending task 30 to worker 3
Got data from worker 2
Got data from worker 1
Sending task 31 to worker 2
Got data from worker 2
Got data from worker 3
Worker 2 exited.
Worker 1 exited.
Worker 3 exited.
I am a worker with rank 15 on pi04.
I am a worker with rank 12 on pi04.
I am a worker with rank 8 on pi03.
I am a worker with rank 13 on pi04.
I am a worker with rank 9 on pi03.
I am a worker with rank 14 on pi04.
I am a worker with rank 10 on pi03.
I am a worker with rank 11 on pi03.
Worker 5 exited.
Worker 4 exited.
Worker 6 exited.
Worker 7 exited.
Worker 15 exited.
Worker 8 exited.
Worker 9 exited.
Worker 10 exited.
Worker 11 exited.
Worker 12 exited.
Worker 13 exited.
Worker 14 exited.
Master finishing

4.2.10 mpi4py-examples 10-task-pull-spawn.py

4.3 Example mpi4py prime.py

4.3.1 Computing Capability For Each CPU

Here, we're taking mpi4py prime.py as our example.

Hostname Computing Time
pi01
1
2
3
4
5
pi@pi01:~ $ mpiexec -n 1 python prime.py 100000
Find all primes up to: 100000
Nodes: 1
Time elasped: 214.86 seconds
Primes discovered: 9592
pi02
1
2
3
4
5
pi@pi02:~ $ mpiexec -n 1 python prime.py 100000
Find all primes up to: 100000
Nodes: 1
Time elasped: 212.2 seconds
Primes discovered: 9592
pi03
1
2
3
4
5
pi@pi03:~ $ mpiexec -n 1 python prime.py 100000
Find all primes up to: 100000
Nodes: 1
Time elasped: 665.24 seconds
Primes discovered: 9592
pi04
1
2
3
4
5
pi@pi04:~ $ mpiexec -n 1 python prime.py 100000
Find all primes up to: 100000
Nodes: 1
Time elasped: 684.64 seconds
Primes discovered: 9592

Clearly, the computing capability of each CPU on pi01/pi02 is roughly 3 times faster than the CPU on pi03/pi04, which can be easily estimated from the parameter BogoMIPS: \[ 108.00 (pi01/pi02) / 38.40 (pi03/pi04) \approx 3 \]

4.3.2 Computing Capability For Each Raspberry Pi

Clearly, on each of my Raspberry Pi, including - pi01: Raspberry Pi 4 Model B Rev 1.4 8GB - pi02: Raspberry Pi 4 Model B Rev 1.1 4GB - pi03 & pi04: Raspberry Pi 3 Model B Rev 1.2 1GB

there are 4 CPUs. So, let's take a look at the result when specify argument -n 4.

Master Worker Computing Time
pi01 pi02
pi03
pi04
1
2
3
4
5
pi@pi01:~ $ mpiexec -n 4 python prime.py 100000
Find all primes up to: 100000
Nodes: 4
Time elasped: 50.92 seconds
Primes discovered: 9592
pi02 pi01
pi03
pi04
1
2
3
4
5
pi@pi02:~ $ mpiexec -n 4 python prime.py 100000
Find all primes up to: 100000
Nodes: 4
Time elasped: 52.83 seconds
Primes discovered: 9592
pi03 pi01
pi02
pi04
1
2
3
4
5
pi@pi03:~ $ mpiexec -n 4 python prime.py 100000
Find all primes up to: 100000
Nodes: 4
Time elasped: 171.81 seconds
Primes discovered: 9592
pi04 pi01
pi02
pi03
1
2
3
4
5
pi@pi04:~ $ mpiexec -n 4 python prime.py 100000
Find all primes up to: 100000
Nodes: 4
Time elasped: 171.7 seconds
Primes discovered: 9592

Clearly, to make full use of 4 CPUs -n 4 is roughly 4 times faster than just to use 1 CPU -n 1.

4.3.3 Computing Capability For The cluster

I carried out 2 experiments: - Experiment 1 is done on 4 nodes: * 1 Raspberry Pi 4 Model B Rev 1.4 8GB * 1 Raspberry Pi 4 Model B Rev 1.1 4GB * 2 Raspberry Pi 3 Model B Rev 1.2 1GB - Experiment 2 is done on the FASTEST 2 nodes: * 1 Raspberry Pi 4 Model B Rev 1.4 8GB * 1 Raspberry Pi 4 Model B Rev 1.1 4GB

hostfile on master Computing Time
192.168.1.253 slots=4
192.168.1.251 slots=4
192.168.1.249 slots=4
192.168.1.247 slots=4
1
2
3
4
5
pi@pi01:~ $ mpiexec -np 16 --hostfile hostfile --mca btl_tcp_if_include "192.168.1.251/24,192.168.1.249/24,192.168.1.247/24" python prime.py 100000
Find all primes up to: 100000
Nodes: 16
Time elasped: 42.22 seconds
Primes discovered: 9592
192.168.1.253 slots=4
192.168.1.251 slots=4
1
2
3
4
5
pi@pi01:~ $ mpiexec -np 8 --hostfile hostfile --mca btl_tcp_if_include "192.168.1.251/24" python prime.py 100000
Find all primes up to: 100000
Nodes: 8
Time elasped: 29.56 seconds
Primes discovered: 9592

The results are obviously telling: - to calculate using a cluster of 4 Raspberry Pis with 16 CPUs is ALWAYS faster than running on a single node with 4 CPUs. \[ 42.22 \le 50 \] - to calculate using 2 fastest nodes is even faster than running on a cluster of 4 nodes. This clearly hints the importance of Load Balancing. \[ 29.56 \le 42.22 \] - the speed in Experiment 2 is roughly doubled as that using a single node of pi03 or pi04. \[ 52 (pi01/pi02) / 29.56 (Experiment 2) \approx 2 \]

In the end of this blog, as for Load Balancing, I may talk about it some time in the future.