To reduce the main memory requirements of mirroring, and the long latency of parity, we developed the method of parity caching described in section 3. Summarizing, each client reserves a small number of local pages to hold parity frames. When a page is swapped in or out, its parity frame is swapped in as well (if not already at the client's parity frames), and the new parity is computed. This method increases the number of page ins and page outs, because besides prorram pages, parity frames are swapped in and out as well. To measure the additional overhead of parity caching, and compare it to mirroring, we use execution driven simulation on top of the same DEC Alpha 3000 model 300 workstation. We use ATOM [16], an object file rewriting tool, that executes each application, while at the same time simulates the reliability policy we want to evaluate. The policies we evaluate are:
The applications we simulate are:
If all workstations are connected via a broadcast interconnection network, no extra page transfers are needed to implement a reliable policy. For example, in MIRRORING, each swapped-out page needs to be broadcasted only once over the interconnection network to reach all servers. Similarly, PARITY_CACHING does nor need extra parity frame transfers. If the workstations that keep the parity frames snoop in the interconnection network, they can intercept all swapped-in and swapped-out pages, and update their parity records. If, however, the interconnection network is not broadcast-based, then extra page transfers are needed for the reliable policies. For example, MIRRORING doubles the number of page transfers for all swapped-out pages, while PARITY_CACHING increases the number of page transfers by a factor that depends on the effectiveness of caching. The exact magnitude of this factor is studied in our simulations, where we measure the number of pages swapped-in (including parity pages), and swapped-out (including parity and mirror pages) by each policy. The results are plotted in graphs 5 and 6. We see that the number of pages swapped in for MIRRORING and NO_RELIABILITY are the same, but the number of pages swapped out for MIRRORING are twice that of the NO_RELIABILITY. For PARITY_CACHING, both the number of pages swapped in and swapped out, are within a 5% of those for NO_RELIABILITY. The reason is that all applications have some locality of reference. Thus, pages swapped-out within a short time interval using some LRU policy, will probably be swapped-in also within a short time interval. Pages who were initially swapped out close to each other, belong to the same parity frame. Thus, as long as these pages are swapped close in time, their parity frame will reside in the client's cache, and no extra page transfers to move the parity will be needed.
We see that reliability comes at little extra cost, actually, from 0% to 5%, depending on the nature of the interconnection network, the application, and the policy used. We believe that the little extra overhead is a small cost to pay for the benefit provided.
Figure 5: Number of Pages swapped in.
Figure 6: Number of Pages swapped out.