To execute shared-memory-based parallel programs efficiently,
we introduce two compiler-assisted software cache schemes
which are well-suited to automatic optimizations 
of remote communications.  
One scheme is a full user-level software cache (User-level
Distributed Shared Memory: UDSM)
and another is 
a page-based
cache (Asymmetric Distributed Shared Memory: ADSM) 
which exploits TLB/MMU only in the cases of read-access-misses. 
Under these schemes we can apply several optimizing
techniques, which exploit capabilities of the middle-grained or
coarse-grained remote-memory-accesses, to reduce
the number and the amount of communications.
We also introduce a high-speed user-level communication and 
synchronization scheme ``Memory-Based
Communication Facilities (MBCF)'' for providing the capabilities
in a general-purpose system 
with off-the-shelf communication-hardware. 
In this paper, we explain outline of our approach, the UDSM and the ADSM,
the MBCF, and optimizing techniques for remote communications.
Finally we show experimental results on effects of our proposed approach 
using our prototype optimizing compiler 
``Remote Communication Optimizer (RCOP)'' and the MBCF on Fast Ethernet.