CS 240 2018 Lecture 6: Chandy-Lamport Problem: Record a global snapshot - state for each process, and state for each channel System Model: * N processes in the system * There are two uni-directional communication channels between each ordered process pair Pi, Pj: Cij and Cji - Communication channels are FIFO-ordered (First in-First out) * No failures - All messages arrive intact, and are not duplicated Requirements: * Snapshot should not interfere with normal application actions, and it should not require application to stop sending messages * Each process is able to record its own state - Process state: Application-defined state or, in the worst case: its heap, registers, program counter, code, etc. (essentially the coredump) * Global state is collected in a distributed manner * Any process may initiate the snapshot * We'll assume just one snapshot run Chandy-Lamport Algorithm: * First: Initiator Pi records its own state Initiator Pi creates special messages called “Marker” messages - Not an application message, does not interfere with application messages for j=1 to N except i - Pi sends out a Marker message on outgoing channel Cij //(N-1) channels Pi starts recording the incoming messages on each of the incoming channels at Pi: Cji (for j=1 to N except i) * Then: Whenever a process Pi receives a Marker message on an incoming channel Cki if (this is the first Marker Pi is seeing) Pi records its own state first Marks the state of channel Cki as "empty" for j=1 to N except i - Pi sends out a Marker message on outgoing channel Cij //(N-1) channels - Pi starts recording the incoming messages on each of the incoming channels at Pi: Cji (for j=1 to N except i and k) else // already seen a Marker message Mark the state of channel Cki as all the messages that have arrived on it since recording was turned on for Cki * The algorithm terminates when: - All processes have received a Marker (To record their own state) - All processes have received a Marker on all the (N-1) incoming channels (To record the state of all channels) * After termination, if needed, a central server collects all these partial state pieces to obtain the full global snapshot