On Time Complexity of Distributed Algorithms for Generalized Deadlock Detection

Deadlock detection in distributed asynchronous systems - such as distributed database systems, computer networks, massively parallel systems etc. - is peculiarly subtle and complex. This is because asynchronous systems are characterized by the lack of global memory and a common physical clock, as well as by the absence of known bounds on relative processor speeds and transmission delays. These difficulties imply also problems with performance analysis of distributed algorithms for deadlock detection. 
 
This paper deals with worst-case one-time complexity analysis of two well known distributed algorithms for generalized deadlock detection. The time complexity is expressed as a function of the diameter d and the longest path l of the wait-for-graph (WFG) characterizing a state of distributed system. First, the algorithm proposed by Bracha and Toueg is considered. It is shown that its time complexity is of 2d+2l. Then, we prove that the time complexity of Kshemkalyani and Singhal algorithm is of (d+ 1)+l.


Introduction
Deadlock handling is a very important problem in various applications, including information and database systems, computer networks, massively parallel systems etc.The problem is well defined and understood in centralized environments.However, in asynchronous distributed systems the problem remains peculiarly subtle and complex.This is because asynchronous system are characterized by the lack of global memory and a common physical clock, as well as by the absence of know bounds on relative processor speeds and message delays.As a result no node has accurate knowledge of the whole system state.These complexities brought about significant errors in a number of published distributed deadlock detection algorithms.
Informally, deadlock refers to situation in which some transactions (processes) request resources (services) from each other and then wait indefinitely for these requests to be satisfied, thus the progress of their execution is definetely halted.In this case, the execution can turn out to be completely useless unless proper and careful control is exercised.
In deadlock detection approach, messages are sent (i.e., resources are granted) without any constraints.However, the state of the system is checked periodically, or when a deadlock is suspected, to determine if a set of processes is deadlocked.This checking is performed by a deadlock detection algorithm.If a deadlock is discovered, recovery from it is done by aborting one or more deadlocked processes.
The suitability of a deadlock handling approach greatly depends on the environment characteristics and the assumed model of transaction (process) requests (see e.g., [16], [11]).The simplest possible request model is one in which a process can require at most one message (resource, lock) at a time.In the AND model (also known as the resource model), processes are permitted to request simultaneously a set of messages.A process cannot execute until it acquires all messages (resources) for which it is waiting ( [5], [6]).This model represents, for instance, possible requests of transactions to lock several data items.Another model of requests is the OR On Time Complexity of Distributed Algorithms for Generalized Deadlock Detection active process can spontaneously become passive requiring some messages (resources).Associated with a passive process P is its dependent set DS, the set of processes from which P is expecting to receive messages.By definition, a process is not a member of its dependent set.For example, in the OR-AND model, the dependent set of a passive process is defined as DS 1 DS 2 ::: DS q , where DS i P for all i.The process becomes active only after a message from every process in DS 1 , or a message from every process in DS 2 , or..., or a message from every process in DS q has arrived.
In order to abstract the activation condition of a passive process P with dependent set DS, a predicate called fulfilled(A) is applied, where A is a subset of P. Predicate fulfilled(A) is true if messages arrived from all processes belonging to set A are sufficient to activate process P (since P is passive, arrived messages are not consumed).Moreover, fulfilled(;) is false, and for DS 6 = ;, fulfilled(DS) is true.Of course, the following monotonicity property holds: if X Y and fulfilled(X) is true, then fulfilled(Y) is also true.
Definitions of fulfilled for other request models can be obtained as special cases of the above in the following manner: basic k out of n: q=1 and thus fulfilledAj DS A j k OR-AND: 8j :: k j =j DS j j and thus fulfilledA9j :: 1 j q :: DS j A OR: 8j ::j DS j j= 1 and thus fulfilledADS A 6 = ; where DS = DS 1 DS 2 ::: DS q .AND: q = 1 and k =j DS j and thus fulfilledADS A Now we give precise definitions of a deadlock for the general request model.For sake of convenience, the quantification 8P i :: P i 2 B :: ::: means exactly 8i :: 1 i n ^P i 2 B : : :::.Thus, let deadlock (B) be a predicate signifying that the nonempty set B of processes is deadlocked.If deadlock (B) is true at some instant during the execution of an application program, it remains true as deadlock occurrence is persistent: deadlock(B) is a stable property.Generally, depending on the request model, the formal definition of this predicate can take several forms.We formulate an abstract definition of deadlock, which holds irrespectively of the underlying request model.To do this, we introduce the following predicates [2]: passive i : true iff P i is passive.arr i j: true iff a message from P j has arrived and has not yet been consumed by P i .empty i j: true iff all messages sent by P j to P i have arrived.
Moreover, let ARR i denote the set of all processes P j such that arr i j= true, and let NE i denote the set of all processes P j such that empty i j = false.By fulfilled i we denote the predicate fulfilledassociated with P i .An abstract definition for a set B of processes to be deadlocked is given below: deadlockB B P ^B 6 = ; 8P i :: P i 2 B :: passive i : fulfilled i ARR i NE i P n B It means that any P i 2 B cannot be activated even after all messages from all processes in ARR i NE i and from all processes not deadlocked have arrived because these messages are not sufficient to satisfy the activation condition of any process in B.
With the previous precise definition of generalized deadlocks, various formulations of the deadlock detection problem can be stated.
The problem of detection of a deadlock occurrence is to determine if there exist a set B, such that deadlock(B) is true.The result of a solution to this problem is a boolean dd, satisfying the following post-condition: The problem of detection of a deadlock process is to determine if a given process P is deadlocked, i.e., does there exist a set B such that deadlock (B) = true and P 2 B. The result of a solution to this problem is a boolean dd satisfying the following post-condition: The problem of detection of a deadlocked set is to find a set of deadlocked processes.The result of a solution to this problem is a set of processes PD satisfying the following post-condition: deadlockPD _ PD = ; : 9 B :: deadlockB The problem of detection of the maximum deadlocked set is to find a deadlocked set which contains all deadlocked sets.This set is unique since the property of being deadlocked is closed under the set union operation.The result of a solution to this problem is a set of processes PD satisfying the following postcondition: deadlockPD _ PD = ; ^maxdeadPD where maxdeadPD 8B :: deadlockB B PD Note that deadlock detection problems have been stated in order of increasing complexity.In addition, this order is consistent with an increasing amount of information which is available at the time the detection terminates.This additional information is important as it is useful for efficient recovery from the deadlock.
Solution of the deadlock detection problems is difficult in distributed systems because no process has accurate knowledge of the global system state, i.e., the global state is not visible to any process instantaneously.Thus, in practice, only states related to earlier observations can be obtained.However, during the collection of local states of processes, these states are changing.Therefore, any deadlock detection algorithm in a distributed system can only ensure that: 1.A deadlock which has occurred before the initiation of the algorithm will be detected.
2. The detected set of deadlocked processes was indeed deadlocked at the moment when the detection algorithm terminates.
We evaluate considered distributed deadlock detection algorithm according to their worst-case one-time complexity.The a worst-case one-time complexity (afterwards, time complexity for short) of distributed algorithm is the maximum time of a computation of the algorithm under the following assumptions: a process can execute any finite number of events in zero time, the time between the sending and receipt of a message is exactly one time unit.
Advances in Databases and Information Systems, 1997 Time complexity is expressed here as a function of WFG parameters, i.e. as a function of WFG diameter d and a length l of the longest path in WFG.
Distributed deadlock detection algorithm can be perceived as a set of control processes C i , called controllers.Each controller C i is associate with the application process P i .The role of controller C i is, on the one hand, to observe the behavior of P i and, on the other hand, to cooperate with other controllers to consistently detect, if any, deadlock occurrence.In general, controllers need not be separated as special processes since their task can be incorporated into application processes using the superposition rules.Thus, the separation of controllers is merely a matter of interpretation and therefore controllers will not be explicitly specified further on.

Bracha and Toueg Algorithm ([1])
First we analyze Bracha and Toueg algorithm (BT algorithm) for a colorless WFG representing a static system (consistent snapshot) with instantaneous communication.
By OUT v the dependent set of P v is denoted .In other words, OUT v is the set of nodes P u from which P v is expecting to receive messages.Let IN v be the set of nodes that are waiting for message from P v , It is assumed that the values of IN v ; OUT v are readily available at each node P v .
A W F G G=(V,E) models a static "snapshot" of the state of the system that contains no messages in the communication channels.Since message transmission is instantaneous, we have OUT v = fP u j P v ; P u 2 E g and IN v = fP w j P w ; P v 2 E g .In this case, we have P v 2 IN u if and only if P u 2 OUT v .
The algorithm BT consist of two phases: notify -in which processes are notified that a deadlock detection algorithm has started, and grant -in which active or potentially active processes simulate the message transmission.All the processes that are made "active" as result of this also simulate the message transmission.Deadlocked nodes are those never made "active" by grant.The grant phase is nested within the notify phase.This nesting ensures that the notify phase terminates only after the grant phase is over.
Formal description of Bracha and Toueg algorithm is presented in the following.

BT Algorithm
0. Initialization for every node P v : OUT v := fP u j P v ; P u 2 E g ; IN v := fP u j P u ; P v 2 E g ; notified, free: = false, false; A:= ;;

Examples of BT Algorithm Application
Let us analyze first an application of the BT algorithm for generalized deadlock detection in the system state represented by the WFG depicted in Fig. 1a.
All processes P a , P b , P c , and P d composing the set P are passive and they are waiting for some messages from other processes of the set according to WFG edges.For a sake of presentation simplicity, we associate with each node (process) of WFG two vectors representing state of the algorithm with respect to a node.The first vector, denoted by PS (processing state) consists of two elements P S 1 and P S 2 .Thus, PS = P S 1 ; P S 2 .
The number P S 1 , depends on a value of boolean variable notified, and on receipt of DONE message.Initially, P S 1 is equal to zero.After receipt of the first NOTIFY message by the analyzed detection process its P S 1 gets value one.Then, when the DONE message is received, P S 1 is changed to two.Similarly, P S 2 depends on a value of boolean variable f r e e , and on receipt ACK message.Initially, P S 2 is equal to zero.After receipt of the first GRANT message by the analyzed detection process, P S 2 gets value one.Then, when the of ACK message is received, P S 1 is changed to two.The second vector associated with each node is denoted by CS (communication state).It consists of three elements C SGRANT; CS ACK and C SDONE representing counters of GRANT, ACK and DONE messages received, respectively.Initially all these counters are equal to zero.They are accordingly incremented as a result of a receipt of GRANT, ACK or DONE message.In the Figure, the vector PS is placed inside a node, but the vector CS just by a node.
Let in our example P a be the unique initiator of the deadlock detection.It calls NOTIFY procedure and thus, sequentially it changes PS to [1,0], sends NOTIFY messages to all processes composing its dependent set OUT a , and waits for DONE messages confirming all NOTIFY messages, as is depicted in Fig. 1b.Then concurrently, processes P b ,P c and P d receive NOTIFY messages and as a consequence they execute NOTIFY procedure changing appropriately P S 1 and relaying NOTIFY messages to the processes of their dependent sets (see Fig. 1c).In the next step, NOTIFY messages arrive to processes whose notified flags are true.Therefore, the processes simply send in response DONE message (Fig. 1d).As a result, processing state vectors of P b , P c and P d are set to [2,0].It enables then to send DONE messages to the initiator P a .When the initiator receives all DONE messages, the algorithm terminates claiming in the considered case deadlock occurrence.
The second example illustrating behavior of BT algorithm is presented in Fig. to 1 and forward NOTIFY messages to all processes of their dependent sets (see Fig. 2c).Thus, P d is grant initiator for P a , P b and P c .It means also, that P d will send DONE message to its notify initiator only after receiving ACK responses for its all GRANT messages.In the next step (Fig. 3d) P c , among others, receives GRANT from P d and forwards it to all its ancestors.It is important to note here, that P b does not forward GRANT to P a as it can be done in AND request model only when GRANT messages have been arrived from all descendants.This condition is fulfilled in the next step (Fig. 2e) when P b receives GRANT message from P c , as P c is the grant terminator of P b .In this case P b sends GRANT to P a (Fig. 2e).It implies that in the next steps ACK message goes subsequently to nodes P b , P c and P d ( see Fig. 2f, Fig. 2g and Fig. 2h, respectively).

Complexity Analysis of BT Algorithm
It is argued in [1], and then cited in literature (e.g.[13], [16]) that time complexity of the BT algorithm is of 4d.
However, in the above analyzed example (Fig. 2) the given WFG is characterized by d = 1 but the number of steps required to terminate the algorithm is 8.To find a cause of this ambiguity let us note that grant phase is propagated (GRANT message is sent to ancestors) by passive processes only when arrived finally the GRANT message setting predicate fulfilledto true for the first time.In general it requires that GRANT messages from all descendants have to arrive (e.g. for AND request model).But the time required to meet this point depends on the longest path from grant initiator to a given process.Indeed, let us consider a graph in which an initiator P x of a grant phase is connected to its ancestor through two disjoint backward paths of length l 1 and l 2 , respectively, where l 1 l 2 .Thus, in the best case when arrived GRANT message is propagated at each node immediately, i.e. this message is enough for predicate fulfilledto be true, the first GRANT message arrives to P y in l 2 time units.Now we assume, that process P y needs two GRANT messages to evaluate predicate fulfilledto true.The second GRANT message requires l 1 time units and hence both messages are ready to be consumed in time equal to max(l 1 ; l 2 ).But in general, l 1 can be equal to l, where l is the longest path in the WFG.Thus, even if there is no delays in intermediate nodes, passing of GRANT message can take up to l time units.Then, ACK message is sent back to the grant initiator and its transmission takes also up to l time units.
On the other hand let us note that ACK message is sent immediately after receiving GRANT message when the node has been already granted or predicate fulfilledis not true.Hence, in general, time complexity of grant phase is of 2l.However, as the grant phase is nested in notify phase, to find time complexity of the whole BT algorithm we have to add to grant phase complexity the time required to notify the grant initiator as well as the time required to send the DONE message from the grant initiator to the deadlock detection initiator.But NOTIFY messages are propagated immediately to all processes composing dependent set.So they need at most d hops (time units) to reach any node from a deadlock detection initiator.The same, one can conclude as about the time required to propagate DONE message when grant phase is terminated.Thus, summing up the above time complexity analysis we can formulate the following theorem.

Theorem 1.
The time complexity of BT algorithm is of 2d + 2 l .The above theorem is confirmed by the example presented in Fig. 2. Indeed, as the considered WFG is characterized by d = 1 and l = 3, the number of hops required to terminate algorithm should be 8 in accordance with the theorem.The same number of hops has been indeed enough in our example.
In the discussed till now BT algorithm it has been assumed that each node knows both dependent set and its IN set.The latter assumption seems be not very realistic in all possible cases, but the knowledge of IN set is necessary to propagate GRANT messages and to correctly acknowledge incoming GRANT message.For instance, in our example in Fig. 2 process P a should not reply with ACK message until it is sure that its IN set is empty.It means that some extra processing is required to construct WFG and to define the IN sets of all nodes.It takes only d hops to notify all nodes and one extra hop to flush outgoing edges, but termination of this processing must be really perceived by the nodes before sending back some ACK.Despite this processing can be overlapped partially with deadlock detection process it may require extra time.For example in Fig. 2 processes really perceive incoming edges of WFG only after receiving NOTIFY messages.Thus, GRANT messages cannot be sent out until appropriate NOTIFY messages have been received.In the considered case it means that GRANT messages from P d to P b and P c will be postponed one time unit.
Advances in Databases and Information Systems, 1997

Kshemkalyani and Singhal Algorithm ([12])
Let us analyze now a consistent snapshot without messages in transit (a static system with empty channels).The algorithm proposed by Kshemkalyani and Singhal, called here KS algorithm, consists of two concurrent sweeps of messages.In the outward sweep, the algorithm records a consistent snapshot of a distributed wait-for-graph.In the inward sweep, it performs reduction of the recorded distributed WFG to check for a deadlock.These two sweeps can overlap in time at a process.
Distributed WFG is recorded using FLOOD messages during the outward sweep and is examined for deadlocks using ECHO messages during in the inward sweep.When blocked, the initiator P i records its local state, the time t block i at which P i has been last blocked, and sends FLOOD messages along its outgoing wait-for edges at the time it blocks.
At the time a node P i receives the first FLOOD message along an existing incoming wait-for edge, it records its local state (out i ; p i ; t block i , and this particular incoming wait-for edge).If the node happens to be blocked at this time, it sends FLOOD's along its outgoing wait-for edges to ensure that all nodes in the reachability set of the initiator participate in recording of the WFG in the outward sweep.If the node happens to be active at this time, (i.e., it does not have any outgoing wait-for edges), then it initiates reduction of the incoming wait-for edge by returning an ECHO message on it.
ECHO messages perform reduction of the nodes and edges in the WFG by simulating the message transmission in the inward sweep.Assuming k out of n request model, a node gets reduced at the time it has received k = p ECHO's.When a node is reduced, it sends ECHO's along all the incoming wait-for edges incident on it in the WFG snapshot to continue the progress of the inward sweep.These ECHO's in turn may reduce other nodes.The initiator node detects a deadlock if it is not reduced when the deadlock detection algorithm terminates.The nodes in the WFG snapshot that have not been reduced are deadlocked.
In general, WFG reduction can begin at a nonleaf node before recording of the WFG has been completed at that node.This happens when ECHO's arrive and begin reduction at a nonleaf node before FLOOD's have arrived along all incoming wait-for edges and recorded the complete local WFG at that node.When a FLOOD on an incoming wait-for edge arrives at a node that is already reduced, the node simply returns an ECHO along that wait-for edge.Thus, the two activities of recording the WFG snapshot and reducing the nodes and edges in the WFG snapshot are done currently in a single phase, and no serialization is imposed between the two activities.
A termination detection technique based on weight [10,17] detects the termination of the algorithm by using SHORT messages (in addition to FLOOD's and ECHO's).A weight of 1 at the initiator node, when the algorithm is initiated, is distributed among all FLOOD messages sent out by the initiator.When the first FLOOD is received at a nonleaf node along an existing WFG edge, the weight of the received FLOOD is distributed among the FLOOD's sent along outgoing wait-for edges at that node.Weights in all subsequent FLOOD's arriving along existing WFG edges at a nonleaf node that is not yet reduced are returned to the initiator through SHORT messages.
When a FLOOD is received at a leaf node, its weight is returned in the ECHO message sent by the leaf node to the sender of the FLOOD.Note that an ECHO is like a reply in simulated unblocking of processes.When an ECHO arriving at a node does not reduce the node, its weight is sent directly to the initiator through a SHORT message.When an ECHO that arrives at a node reduces that node, the weight of the ECHO is distributed among the ECHO's that are sent by that node along the incoming edges in its WFG snapshot.When an ECHO arrives at a reduced node, its weight is sent directly to the initiator through a SHORT message.The algorithm maintains an invariant that the sum of the weights in FLOOD, ECHO and SHORT messages plus the weight at the initiator (received in SHORT and ECHO messages) is always 1.The algorithm terminates when the weight at the initiator becomes 1, signifying that all WFG recording and reduction activity has completed.
A node P i stores the local snapshot to detect deadlocks in a data structure LS i , which is an array of records.Record

Example of KS Algorithm Application
Let us analyze an application of KS algorithm for generalized deadlock detection.The assumed WFG is the same as in the previous example presented in Fig. 2. and we assume again AND request model for all processes.
The considered now example is depicted in Fig. 3.
Let again P a be an initiator of deadlock detection, and P d the only active process.First, the initiator P a sends out to processes composing its dependent set FLOOD messages with weight 1/3 each (see Fig. 3.a).
As P d is active it can send ECHO message back immediately with weight equal to the weight received in the FLOOD (Fig. 3b).Process P c simply forwards FLOOD message with weight 1/3 to its descendent, but P b forwards FLOOD messages to P c and P d sharing incoming weight by two, i.e., associating with each message weight 1/6.In the next step (Fig. 3c), P a as active simply responses sending ECHO messages to P c and P b with weights 1/3 and 1/6, respectively.P c sends SHORT message to the initiator in response to the subsequent FLOOD message.Then, after receiving ECHO with weight 1/3, P c forwards this message to its ancestors P a and P b sharing equally the received weight.P b , however, in response to ECHO which is not enough for its activation, sends back to P a SHORT message with weight 1/6.In the last step, P b forwards the received ECHO message to P a .The algorithm terminates when the deadlock detection initiator P a receives back messages with total weight equal to 1.

Complexity Analysis of KS Algorithm
When comparing BT algorithm with KS one, it is easy to note some similarities.FLOOD message corresponds is some sense to NOTIFY, ECHO corresponds to GRANT and DONE, but SHORT substitutes partialy both DONE and ACK messages.The key difference result from application of weight based termination algorithm.As a consequence, KS algorithm is more time efficient and does not assume the a priori knowledge of IN sets.It has been argued that the KS algorithm has the time complexity of 2d.However the above analyzed example shows that we need 5 steps to terminate the algorithm instead of 2 1 = 2. Again to find a cause of the ambiguity let us note that to send ECHO messages through all incoming edges FLOOD messages must first flush all edges.It needs, as is easy to verify, d + 1 steps.Then, following the argumentation similar to that for GRANT messages in BT algorithm, let us note that to set predicate fulfilledto true at P b , both ECHO messages from P c and P d have to be received.In general the time required to reach this state depends on the longer path linking active process P d with P b .Thus, the longest path in WFG determines the time required to propagate ECHO messages from an active process to the deadlock detection initiator.The above observation leads directly to the following theorem.Theorem 2 The time complexity of KS algorithm is of d + 1 + l .
This result is consistent with the example presented in Fig. 3.

Conclusions
In this paper the problem of worst-case one-time complexity analysis of two well known distributed algorithms for generalized deadlock detection has been addressed .The time complexity has been expressed as a function of the diameter d and the longest path l of the wait-for-graph (WFG) characterizing a state of static distributed system with instantaneous communication.First, the algorithm proposed by Bracha and Toueg has been considered.It has been shown that its time complexity is of 2d + 2 l .we have proved that the time complexity of Kshemkalyani and Singhal algorithm is of d + 1 + l .These results improve time complexity estimates proposed till now.
In the above context it seems to be natural to state the problem of finding lower-bound for the worstcase one-time complexity of generalized deadlock detection in static as well as dynamic systems.¿From the discussion presented in this paper we can only conclude that the lower-bound of the time complexity for distributed reduction of a given WFG is Ol.Thus further investigations in this line are required.
Moreover, additional analysis is necessary to precisely estimate how far a phase of consistentWFG construction can be overlapped with WFG reduction, especially in dynamic systems.
Another noteworthy point is the difference between the system model considered in this paper and the one comprising REQUEST, REPLY and CANCEL messages for resource allocation.The latter model seems to be less general but it gives extra opportunity to exploit these application messages in some preliminary phases of deadlock detection as it has been done in [1] and [12].

Figure 1 :Figure 2 :
Figure 1: Example of BT Algorithm Application for Deadlock Detection

Figure 3 :
Figure 3: Example of KS Algorithm Application for Distributed Deadlock Detection

On Time Complexity of Distributed Algorithms for Generalized Deadlock Detection for all
P w 2 IN v await(P w , ACK); 4. Upon receipt by P v of GRANT from P u : A v :=A v f P u g ; 1. procedure Notifyv; notified: = true; for all P w 2 OUT v send (P w , NOTIFY); if :passive v then Grantv; for all P w 2 OUT v await (P w , DONE); 2. Upon receipt by P v of NOTIFY from P u : if:notif ied then Notifyv; send(P u ,DONE); 3. procedure Grantv; f r e e := true; for all P w 2 IN v send(P w , GRANT); Advances in Databases and Information Systems, 1997 if :f r e e and fulfilled v (A)then Grantv; send (P u , ACK); 2. The WFG is very similar to the previous one but the process P d has no outgoing edges.It means that P d is active as opposed to all other processes.We assume here AND request model for all processes.Let again P a be the initiator of deadlock detection.It initiates notify phase sending NOTIFY messages to all its descendants as shows Fig.2b.Thus, P a is notifyinitiator for all other processes When NOTIFY message is arrived to active process P d with empty dependent set, P d calls Notify procedure.This procedure changes P S 1 of P d to 2 (which means that P d need not wait for any DONE messages) and calls Grant procedure.As a result, P S 2 of P d is set to 1 and GRANT messages are sent to all processes composing IN d set of P d .In this step, processes P b and P c change their PS 1 LS i P init stores a snapshot at node P i corresponding to deadlock detection initiation by the initiator node P init .Advances in Databases and Information Systems, 1997 9. Upon receipt by P of SHORT P init ; t init; w : /* Executed by process P i (which is always init) on receiving a SHORT for which t init = t block i .SHORT for an outdated snapshot t init t block i if wt init = 1 then declare deadlock and abort.