Japanese Page

Performance of SSS-CORE

[Home | Features | Papers | Performance | Demos | Staff | Glossary]
[For CSS2-conforming browser | For non-CSS2-conforming browser | For tabular-challenged browser]

The performance of SSS-CORE has been evaluated from various angles. Here you can see summarized results of the evaluation such as:

The details of the experiments and the discussions are given in our papers.

In the following, the word `SPARCstation 20' stands for Sun Microsystems SPARCstation 20 and its compatible machines. We have mainly used Axil 320 model 8.1.1, which is compatible with Sun Microsystems SPARCstation 20.


Performance of Fundamental System Calls

Conditions
workstation SPARCstation 20 (85 MHz SuperSPARC × 1)
OS SSS-CORE Ver. 1.1
SunOS 4.1.4
Cost of getting a task ID
SSS-CORE get_taskid() 1.12 µsec
SunOS getpid() 4.39 µsec
Costs of allocating/freeing memory (in µsec)
size (byte) 4 K 16 K 64 K 256 K 1 M
SSS-CORE allocate 23.91 28.91 48.77 123.2 431.2
SSS-CORE free 19.49 20.36 23.91 36.23 99.06
SunOS sbrk() 133.2 375.8 894.3 1828 2020

Fundamental Communication Performance of MBCF

On Gigabit Ethernet

Conditions
workstation Sun Microsystems Ultra 60 (450 MHz UltraSPARC-II × 1)
NIC Sun Microsystems GigabitEthernet/P 2.0 Adapter
network (directly connected)
OS & Communication Protocol SSS-CORE Ver. 2.3 MBCF
Solaris 2.6 TCP/IP
One-way latencies of MBCF/1000BASE-SX (in µsec)
data size (byte) 4 16 64 256 1024
MBCF 9.6 11.0 11.5 16.2 35.9
TCP/IP 95.08 95.22 95.39 99.45 114.15
Peak bandwidths of MBCF/1000BASE-SX (in Mbyte/sec)
data size (byte) 4 16 64 256 1024 1408
MBCF 2.29 5.67 22.30 55.41 78.22 80.92
TCP/IP 0.09 0.43 1.67 5.56 12.79 20.21

Although the software overhead of MBCF is small enough, the peak bandwidth does not come up to the hardware limit of 125 Mbyte/sec. There should be some bottleneck around the Ultra 60's hardware.

On Fast Ethernet

Conditions
workstation SPARCstation 20 (85 MHz SuperSPARC × 1)
NIC Sun Microsystems Fast Ethernet SBus Adapter 2.0
network SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB)
Bay Networks BayStack 350T (switching 100BASE-TX HUB)
OS SSS-CORE Ver. 1.1
One-way latencies of MBCF/100BASE-TX (in µsec)
data size (byte) 4 16 64 256 1024
MBCF_WRITE 24.5 27.5 34 60.5 172
MBCF_FIFO 32 32 40.5 73 210.5
MBCF_SIGNAL 49 52.5 60.5 93 227.5
Peak bandwidths of MBCF/100BASE-TX (in Mbyte/sec)
data size (byte) 4 16 64 256 1024 1408
MBCF_WRITE, half duplex 0.31 1.15 4.31 8.56 11.13 11.48
MBCF_WRITE, full duplex 0.34 1.27 4.82 9.63 11.64 11.93

Communication Performance of MPI/MBCF

Conditions
workstation SPARCstation 20 (85 MHz SuperSPARC × 1)
NIC Sun Microsystems Fast Ethernet SBus Adapter 2.0
network SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB)
Bay Networks BayStack 350T (switching 100BASE-TX HUB)
OS & MPI implementation SSS-CORE Ver. 1.1 MPI/MBCF
SunOS 4.1.4 MPICH Ver. 1.1 (using TCP)
Round-trip times of MPI with 100BASE-TX (in µsec)
message size (byte) 0 4 16 64 256 1024 4096
MPI/MBCF on SSS-CORE 71 85 85 106 168 438 1026
MPICH/TCP on SunOS 968 962 980 1020 1080 1255 2195
Peak bandwidths of MPI with 100BASE-TX (in Mbyte/sec)
message size (byte) 4 16 64 256 1024 4096 16384 65536
MPI/MBCF on SSS-CORE, half duplex 0.14 0.53 1.82 4.72 8.08 9.72 10.15 9.78
MPI/MBCF on SSS-CORE, full duplex 0.14 0.57 1.90 5.33 10.22 11.68 11.77 11.85
MPICH/TCP on SunOS, half duplex 0.02 0.09 0.35 1.27 3.54 6.04 5.59 7.00

Efficiency of MPI/MBCF for the NAS Parallel Benchmarks

Conditions
workstation SPARCstation 20 (85 MHz SuperSPARC × 1)
NIC Sun Microsystems Fast Ethernet SBus Adapter 2.0
network SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB)
OS & MPI implementation SSS-CORE Ver. 1.1 MPI/MBCF
SunOS 4.1.4 MPICH Ver. 1.1 (using TCP)
Execution results of the NAS Parallel Benchmarks
program [# of nodes] EP [8] MG [8] CG [8] IS [8] LU [8] SP [9] BT [9]
MPI/MBCF on SSS-CORE
execution time (sec) 15.14 7.48 11.02 3.02 160.36 154.91 67.30
speedup ratio to 1 node 7.99 5.24 6.27 3.33 6.26 8.11 9.16
communication frequency (Mbyte/sec) 0.00 9.68 12.69 13.58 1.89 7.83 5.32
communication frequency (# of messages/sec) 4 4670 2138 466 1199 421 488
average message size (Kbyte) 0.00 2.07 5.94 29.14 1.58 18.60 10.90
MBCF_WRITE availability rate (%) 51.10 0.01 53.33 99.22 13.37 49.01 47.24
use of collective communication yes no no yes no no no
MPICH/TCP on SunOS
execution time (sec) 16.25 13.72 14.59 4.81 185.04 231.66 96.02
speedup ratio to 1 node 7.73 2.83 4.71 2.13 5.84 6.01 6.53
MPI/MBCF on SSS-CORE versus MPICH/TCP on SunOS
performance improvement ratio 1.07 1.83 1.32 1.59 1.15 1.50 1.43

Performance of the RPC with MBCF

Conditions
workstation SPARCstation 20 (85 MHz SuperSPARC × 1)
NIC Sun Microsystems Fast Ethernet SBus Adapter 2.0
network SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB)
OS & RPC implementation SSS-CORE Ver. 1.1 modified SUNRPC 4.0
SunOS 4.1.4 SUNRPC 4.0
Round-trip latencies of RPC with 100BASE-TX (in µsec)
data size (byte) 4 256 512 1024
SSS-CORE, MBCF_SIGNAL 127 173 221 315
SSS-CORE, MBCF_FIFO 148 194 251 372
SunOS TCP 863 903 918 1033

Efficiency of RCOP for the SPLASH-2 suite

ADSM

Conditions
workstation SPARCstation 20 (85 MHz SuperSPARC × 1)
NIC Sun Microsystems Fast Ethernet SBus Adapter 2.0
network Bay Networks BayStack 350T (switching 100BASE-TX HUB)
OS SSS-CORE Ver. 1.1
runtime system ADSM
Effects of optimization methods on LU-Contig (n = 512, b = 16)
optimization methods execution time (sec) # of consistency management codes # of packets amount of communication (Mbyte)
None 28.20 5592 K 5207 K 47.73
runtime packet combining 14.35 5592 K 83.5 K 113.00
static interprocedural redundancy elimination 2.17 1.43 K 7.73 K 9.42
runtime packet combining & static interprocedural redundancy elimination 2.16 1.43 K 7.60 K 9.27
Effects of optimization methods on Radix (#key = 1 M)
optimization methods execution time (sec) # of consistency management codes # of packets amount of communication (Mbyte)
None 21.90 793 K 3220 K 76.72
runtime packet combining 12.13 793 K 75.8 K 101.08
static interprocedural redundancy elimination 1.57 2.08 K 19.5 K 13.47
runtime packet combining & static interprocedural redundancy elimination 1.24 2.08 K 10.1 K 13.63
[graph (17KB)]
Figure: Speedups on ADSM

UDSM

Conditions
SSS-CORE system workstation SPARCstation 20 (85 MHz SuperSPARC × 1)
NIC Sun Microsystems Fast Ethernet SBus Adapter 2.0
network Bay Networks BayStack 350T (switching 100BASE-TX HUB)
OS SSS-CORE Ver. 1.1
runtime system UDSM
AP1000+ system MPP Fujitsu AP1000+ (50 MHz SuperSPARC × 256)
OS Cell-OS
runtime system UDSM
Breakdown of execution time
Sync synchronization
WC write commitment
PF page fault handler
Msg remote message handlers
Task execution of original application codes
[graph (6KB)]
Figure: Execution time of LU-Contig on 1 to 8 nodes
[graph (6KB)]
Figure: Execution time of Radix on 1 to 8 nodes
[graph (17KB)]
Figure: Speedups on UDSM (on SSS-CORE)

To SSS-CORE Home Page.
Mail to <info@ssscore.org>.
© 1998-2000 SSS-CORE Project Team.