How can microprocessor maintain cache coherency




















Compiler-based cache coherence mechanism perform an analysis on the code to determine which data items may become unsafe for caching, and they mark those items accordingly. So, there are some more cacheable items, and the operating system or hardware does not cache those items. The simplest approach is to prevent any shared data variables from being cached.

This is too conservative, because a shared data structure may be exclusively used during some periods and may be effectively read-only during other periods. It is only during periods when at least one process may update the variable and at least one other process may access the variable then cache coherence is an issue More efficient approaches analyze the code to determine safe periods for shared variables. The compiler then inserts instructions into the generated code to enforce cache coherence during the critical periods.

Hardware solution provide dynamic recognition at run time of potential inconsistency conditions. Because the problem is only dealt with when it actually arises, there is more effective use of caches, leading to improved performances over a software approach.

Directory protocols collect and maintain information about where copies of lines reside. Typically, there is centralized controller that is part of the main memory controller, and a directory that is stored in main memory.

When an individual cache controller makes a request, the centralized controller checks and issues necessary commands for data transfer between memory and caches or between caches themselves.

It is also responsible for keeping the state information up to date, therefore, every local action that can effect the global state of a line must be reported to the central controller.

Write-invalidate and write-update policies are used for maintaining cache consistency. Processor P1 writes X1 in its cache memory using write-invalidate protocol. So, all other copies are invalidated via the bus. Invalidated blocks are also known as dirty , i. The write-update protocol updates all the cache copies via the bus. By using write back cache , the memory copy is also updated Figure-c. This initiates a bus-read operation. If no dirty copy exists, then the main memory that has a consistent copy, supplies a copy to the requesting cache memory.

If a dirty copy exists in a remote cache memory, that cache will restrain the main memory and send a copy to the requesting cache memory.

In both the cases, the cache copy will enter the valid state after a read miss. If the new state is valid, write-invalidate command is broadcasted to all the caches, invalidating their copies. When the shared memory is written through, the resulting state is reserved after this first write. This is done by sending a read-invalidate command, which will invalidate all cache copies. Then the local copy is updated with dirty state.

However, when the copy is either in valid or reserved or invalid state, no replacement will take place. By using a multistage network for building a large multiprocessor with hundreds of processors, the snoopy cache protocols need to be modified to suit the network capabilities. Broadcasting being very expensive to perform in a multistage network, the consistency commands is sent only to those caches that keep a copy of the block. The bus is normally used to perform invalidates. To perform an invalidate, the processor simply acquires bus access and broadcasts the address to be invalidated on the bus.

All processors continuously snoop on the bus, watching the addresses. The processors check whether the address on the bus is in their cache. If so, the corresponding data in the cache are invalidated. When a write to a block that is shared occurs, the writing processor must acquire bus access to broadcast its invalidation. If two processors attempt to write shared blocks at the same time, their attempts to broadcast an invalidate operation will be serialized when they arbitrate for the bus.

The first processor to obtain bus access will cause any other copies of the block it is writing to be invalidated. If the processors were attempting to write the same block, the serialization enforced by the bus also serializes their writes. Also, we need to locate a data item when a cache miss occurs. In a write-through cache, it is easy to find the recent value of a data item, since all written data are always sent to the memory, from which the most recent value of a data item can always be fetched.

For a write-back cache, the most recent value of a data item can be in a cache rather than in memory. The snooping process is used here also. All processors snoop on the address placed on the bus.

If a processor finds that it has a dirty copy of the requested cache block, it provides that cache block in response to the read request and causes the memory access to be aborted. In this module, we will examine the implementation of coherence with write-back caches. The tag bits, the dirty bit and the valid bit that we discussed with respect to caches are used here also.

The normal cache tags can be used to implement the process of snooping, the dirty bit to indicate whether the cache block was modified and the valid bit to indicate the validity of the cache block. The only other additional bit that is needed is to indicate whether or not a cache block is shared. For this, we can add an extra state bit associated with each cache block, indicating whether the block is shared. When a write to a block in the shared state occurs, the cache generates an invalidation on the bus and marks the block as exclusive.

No further invalidations will be sent by that processor for that block. The processor with the sole copy of a cache block is normally called the owner of the cache block.

If another processor later requests this cache block, the state must be made shared again. Since our snooping cache also sees any misses, it knows when the exclusive cache block has been requested by another processor and the state should be made shared.

Every bus transaction must check the cache -address tags, which could potentially interfere with processor cache accesses. One way to reduce this interference is to duplicate the tags. The interference can also be reduced in a multilevel cache by directing the snoop requests to the L2 cache, which the processor uses only when it has a miss in the L1 cache. For this scheme to work, every entry in the L1 cache must be present in the L2 cache, a property called the inclusion property.

If the snoop gets a hit in the L2 cache, then it must arbitrate for the L1 cache to update the state and possibly retrieve the data, which usually requires a stall of the processor. Each block of memory is in one state:. A snooping coherence protocol is usually implemented by incorporating a finite state controller in each node. This controller responds to requests both from the processor and from the bus, changing the state of the selected cache block, as well as using the bus to access data or to invalidate it.

The various activities are elaborated below:. Read request by the processor which is a hit — the cache block can be in the shared state or modified state — Normal hit operation where the data is read from the local cache. Read request by the processor, which is a miss.

This indicates that the cache block can be in any of the following three states:. Invalid — It is a normal miss and the read request is placed on the bus. The requested block will be brought from memory and the status will become shared. Shared — It is a replacement miss, probably because of an address conflict. However, according to the invention, the request also contains Cache B's master address 44 and the index 46 which Cache B needs in order to index into its cache, as represented by a data block 43 as might be found in a temporary buffer associated with data received off the bus 24 FIG.

This two-element block of information is stored in the corresponding slot. The links merely express graphically the connection defined by the contents of the respective slots. The memory 28 will recognize that its own list was previously empty. Therefore, memory 28 places Cache B's link address and index in the nextmaster slot 34 and the previous master slot However, the pointers 38 and 40 now point to nextmaster slot 50 and previousmaster slot 52 of Cache B 20, and the memory link address and index are loaded into those slots FIG.

Cache B 20 recognizes that the nextmaster slot 50 and the previousmaster slot 52 now contain links which point to memory 28, as indicated by pointers 54 and Cache B 20 then recognizes that its request has been completed. At the conclusion of the transactions in each instance, according to the invention, the link indexes of the previousmaster and the nextmaster as well as the link addresses are stored in the respective nextmaster slot and the previousmaster slot of the requesting master.

Cache A issues a request 58 to memory 28 for a sharable copy of the same coherency block previously requested by Cache B 20 FIG.

The link value in slot 34 is thus replaced by the value Cache A. Cache A 18 also sends a transaction 66 to Cache B 20 to advise it that there is a new previousmaster. Cache A 18 has loaded its previousmaster slot 68 with the link index and address of the memory 28, as indicated by link Cache A 20 knows that the previousmaster link value is memory because the transaction 60 was issued by the memory In the next sequence, shown in FIG.

Cache B 20 knows the previousmaster slot value is Cache A because Cache A 18 initiated the transaction. All links are now completed, and the value of the tag 76 of Cache A 18 has been set to valid. Cache A 18 and its associated processor can thereupon employ all the content of the coherence block without risk of use of invalid data. Cache A institutes a transaction 80 to its previous master, namely, memory 28, as indicated by its previousmaster value in slot Cache A thereby passes the index and address of a new nextmaster link, which is the value Cache B stored in its nextmaster slot Cache A's previousmaster slot 68 is used to get its previous master's index and address, which value is also passed in the transaction 80 to memory.

In this case, the previousmaster slot value is memory. In other words, the transaction 82 is to fix Cache A's nextmaster's previousmaster link. The previousmaster slot 52 has been set to the value memory as indicated by pointer Bad pointers 38, 49 and 70, 86 are corrected by removal of Cache A 18, indicated by setting the value of its tag 76 to invalid.

A frequent type of transaction is a flush or purge of a coherence block initiated by a cache which is not in the list. Memory 28 receives the transaction 88 and notes from its link list that other copies of the coherence block exists. The transaction 90 is directed along its nextmaster link indicated by pointer 84 FIG. Cache B 20 sets its tag to the value invalid.

The above examples are merely a sampling of the type of transactions which can be carried out with reliability in a multiprocessor environment employing doubly-linked list cache coherency protocols modified in accordance with the invention. Another example is a request for a private coherence block, namely, a cache coherence block which is not to be shared with other caches. The request is similar to that for a shared coherence block with the following exception: Memory 28 must issue and complete a flush operation prior to returning data to the requesting master.

This invention, particularly the indexing and the passing of indexes, allows general mixing of processors, each employing its own indexing scheme, on a complex network-like bus while maintaining cache coherence across all processors.

This invention becomes acutely important in large multiprocessor bus networks. It is unnecessary to issue broadcast transactions, which are difficult to recover from if cache coherency is a concern. The size of the bus network is limited only by the number of index bits. The nature of the use of the index is not affected by the cache in which it is stored.

The invention has now been explained with reference to specific embodiments. Other embodiments will be apparent to those of ordinary skill in the art. For example, while the method disclosed is described in connection with only a single central memory, a system could comprise multiple memories having a valid copy of elements of the subject data without departing from the spirit and scope of this invention. It is therefore not intended that the invention be limited, except as indicated by the appended claims.

I claim: 1. A method for maintaining cache coherence in a multiprocessor computer system having a potential for duplication of data in a plurality of storage locations identifiable as coherence blocks, said multiprocessor computer system employing double-linked lists for identifying a previous master of said data and a next master of said data, said method comprising the steps of: passing a current address of a current master from the current master to the previous master, wherein the current address is the location to which the current master will respond;.

The method according to claim 1 wherein the current address and the current index are passed simultaneously as part of a transaction. The method according to claim 1 wherein said plurality of storage locations include one central memory system for containing a valid copy of all data and at least two cache memories, each one of said cache memories being coupled to its own associated processor.

The method according to claim 3 wherein each cache memory stores at least a portion of said coherence blocks, each of the coherence blocks having associated therewith a first tag indicative of the presence of valid data or invalid data, and wherein each cache memory includes a slot for storing said current address and the current index of the previous master, and a slot for storing said current address and the current index of the next master. A method for maintaining cache coherence in a multiprocessor computer system having a potential for duplication of data in a plurality of storage locations identifiable as coherence blocks, said multiprocessor computer system employing doubly-linked lists for identifying a previous master of said data and a next master of said data, said method comprising the steps of: a issuing from a first master to a second master a first transaction requesting sharable data in a coherence block of said second master, said second master being a repository of known valid data, said first transaction including at least an address of a requested coherence block, a first master address for said first master and a first index which identifies a storage location within the first master;.

The method according to claim 5 further inducing the step of tagging each master as containing valid data upon completion of said updating of addresses and indexes. The method of claim 5 wherein said updating step h includes acknowledging from said third master to said first master completion of the request of said first master for sharable data.



0コメント

  • 1000 / 1000