Stepwise Development of Peer-to-Peer Systems

Peer-to-peer systems like Napster, Gnutella and Kazaa have recently become popular for sharing information. In this paper, we show how to design peer-to-peer systems within the action systems framework by combining UML diagrams. We present our approach via a case study of stepwise development of a Gnutella-like peer-to-peer system.


INTRODUCTION
Peer-to-peer systems like Napster, Gnutella and Kazaa have recently become popular for sharing information.People find in peer-to-peer applications a convenient solution to exchange resources via internet.Two factors have fostered the recent explosive growth of such systems: first, the low cost and high availablity of large numbers of computing and storage resource, and second, increased network connectivity.As these trends continue, the peer-to-peer paradigm is bound to be more popular [9].
Most current distributed systems like the Web follow the client-server paradigm in which a single server stores information and distributes it to clients upon their requests.Peer-to-peer systems, which consider that all nodes are equal for sharing information, on the other hand, follow a paradigm in which each node can store information of its own and establish direct connections with another node to download information.In this way, the peer-to-peer systems offer attractive advantages like enhanced load balancing, dynamic information repositories, redundancy and fault tolerance, content-based addressing and improved searching [11] over the traditional client-server systems.
Because of the surprisingly rapid deployment of some peer-to-peer applications and the great advantages of the peer-to-peer paradigm, we are motivated to conduct a study of peer-to-peer systems and achieve a way to develop such systems.After using and analyzing various peerto-peer clients on different platforms, we identified two common problems of clients: reliability and robustness (Most clients fail to provide satisfactory download service and some buggy ones even bring down the system during our test) and extendability (Most clients are implemented in a way that adding new services or functionalities results to lots of modifications to the original specifications).An attractive strategy to solve the first open problem is to use formal methods in designing peer-to-peer systems.Formal methods can help with reliability and robustness by minimizing errors in designing peer-to-peer systems.To improve extendability, we introduce a modular and object-oriented architecture for peer-to-peer systems.The benefit of objectorientation can be used to design and implement peer-to-peer systems in a reusable, composable and extendable way.
In this paper, we show how to design peer-to-peer systems within the action systems framework by combining UML diagrams.We present our approach via a case study of stepwise development of a Gnutella-like peer-to-peer system.We start by briefly describing the action systems framework 6th International Workshop in Formal Methods (IWFM'03) The British Computer Society to the required extent in Section 2. In Section 3 we give an initial specification of the Gnutella system.An abstract action system specification is derived in Section 4. In Section 5 we analyze and refine the system specification introduced in the previous section.We end in Section 6 with concluding remarks.

ACTION SYSTEMS
Action systems have proved to be very suitable for designing distributed systems [1,2,3,6,13].
The design and reasoning about action systems are carried out with refinement calculus [12].
In this section we will introduce OO-action systems [4], an object-oriented extension of action systems which we select as a foundation to work on.In this way, we can address our two open problems in a unified framework with benefits from both formal methods and object-orientation.
An OO-action system consists of a finite set of classes, each class specifying the behaviour of objects that are dynamically created and executed in parallel.

Actions
We will consider a fixed set Attr of attributes (variables) and assume that each attribute is associated with a nonempty set of values.Also, we consider a set Act of actions defined by the following grammar ÓÖØ × Ô Ü Ú Ü ¾ Î Here Ü is a list of attributes, Ú a list of values, Î a nonempty set of values, a predicate over attributes.Intuitively, ÓÖØ is the action which always deadlocks, × Ô is a stuttering action, Ü Ú is a multiple assignment, Ü ¾ Î is a random assignment, is a guard of an action, is an assertion, ½ ¾ is the sequential composition of the actions ½ and ¾ , and ½ ¾ is the nondeterministic choice between the action ½ and ¾ .
The guard of an action is defined in a standard way using weakest preconditions [14] ´ µ

ÛÔ´ falseµ
The action is said to be enabled when the guard is true.

Classes and objects
Let CName be a fixed set of class names and OName a set of valid names for objects.We will also consider a fixed set of object variables OVar assumed to be disjoint from Attr.The only valid values for object variables are the names for objects in OName.The set of object actions OAct is defined by extending the grammar of actions as follows Here ¾ Ø, Ò is an object variable, Ó is either an object name or the constants self or ×ÙÔ Ö (all three possibly resulting from the evaluation of an expression), is a class name, Ô a procedure name, and Ñ is a method name.Intuitively, Ò Ó stores the object name Ó into the object variable Ò, Ò Û´ µ creates a new object instance of the class , Ò Ò Û´ µ assigns the name of the newly created instance of the class to the object variable Ò, Ô is a procedure call, Ò Ñ is a call of the method Ñ of the object the name of which is stored in the object variable Ò, self.m is a call of the method Ñ declared in the same object, and ×ÙÔ Ö Ñ is a call of the method Ñ declared in the object that created the calling object.Note that method calls are always prefixed by an object variable or by the constant self or ×ÙÔ Ö.We define the guard ´Çµ of an object action Ç to be the guard of the action in Ø obtained by substituting every atomic object action of Ç with the action × Ô, where an atomic object action is The resulting statement is an action in Ø and hence its guard is well defined.A Ð ×× is a pair , where ¾ AE Ñ is the Ò Ñ of the class and is its Ó Ý, that is, a statement of the form A class body consists of an object action Ç and of four declaration sections.In the attribute declaration the shared attributes in the list Ý, marked with an asterisk £, describe the variables to be shared among all active objects.Therefore they can be used by instances of the class and also by object instances of other classes.Initially they get values Ý¼.The local attributes in the list Ü describe variables that are local to an object instance of the class, meaning that they can only be used by the instance itself.The variables are initialized to the value Ü¼.The list Ò of object variables describes a special kind of variables local to an object instance of the class.They contain names of objects and are used for calling methods of other objects.We assume that the lists Ü Ý and Ò are pairwise disjoint.A method Ñ Å describes a procedure of an object instance of the class.They can be called by actions of the objects themselves or by actions of another object instance of possibly another class.A method consists of a method name Ñ and an object action Å.A procedure Ô È describes a procedure that is local to the object instances of the class.It can be called only by actions of the object itself.Like a method, it consists of a procedure name Ô and an object action forming the body È.
The class body is a description of the actions to be executed repeatedly when the object instance of the class is activated.It can refer to attributes which are declared to be shared in another class, and to the object variables and the local attributes declared within the class itself.It can contain procedure calls only to procedures declared in the class and method calls of the form Ò Ñ or ×ÙÔ Ö Ñ to methods declared in other classes.Method calls self.m are allowed only if Ñ is a method declared in the same class.As for action systems, the execution of an object action is atomic.

OO-action system
An OO-action system ÇÇconsists of a finite set of classes such that the shared attributes declared in each are distinct and actions in each or bodies of methods and procedures declared in each do not contain Ò Û statements referring to class names not used by classes in ÇÇ.Local attributes, object variables, methods, and procedures are not required to be distinct.
There are some classes in ÇÇ, marked with an asterisk £.Execution starts by the creation of one object instance of each of these classes.Each object, when created, chooses enabled actions and executes them.Actions operating on disjoint sets of local and shared attributes, and object variables can be executed in parallel.They can also create other objects.Actions of different active objects can be executed in parallel if they are operating on disjoint sets of shared attributes.Objects interact by means of the shared attributes and by executing methods of other objects.

INITIAL SPECIFICATION OF THE GNUTELLA SYSTEM
Gnutella is a decentralized peer-to-peer file-sharing model developed in 2000 by Nullsoft, AOL subsidiary and the company that created WinAMP [10].The Gnutella model enables file sharing without using servers.
Unlike a centralized server network, the Gnutella network does not use a central server to keep track of all user files.To share files using the Gnutella model, a user starts with a networked computer A with a Gnutella servent, which works both as a server and a client.Computer A will connect to another Gnutella-networked computer B and then announce that it is alive to computer B. B will in turn announce to all its neighbours C, D, E, and F that A is alive.Those computers will recursively continue this pattern and announce to their neighbours that computer A is alive.Once computer A has announced that it is alive to the rest of the members of the peer-to-peer network, it can then search the contents of the shared directories of the peer-to-peer network.
Search requests are transmitted over the Gnutella network in a decentralized manner.One computer sends a search request to its neighbours, which in turn pass that request along to their neighbours, and so on.Figure 1 illustrates this model.The search request from computer A will be transmitted to all members of the peer-to-peer network, starting with computer B, then to C, D, E, F, which will in turn send the request to their neighbours, and so forth.If one of the computers in the peer-to-peer network, for example, computer F, has a match, it transmits the file information (name, location, etc.) back through all the computers in the pathway towards A (via computer B in this case).Computer A will then be able to open a direct connection with computer F and will be able to download that file directly from computer F.

ACTION SYSTEM SPECIFICATION OF THE GNUTELLA SYSTEM
When taking a step back, it is seen that the Gnutella system enables at least the following functionalities: 1. Servent can easily join and leave the peer-to-peer network.2. Servent can publish its content to the shared directories of the peer-to-peer network.3. Servent can search for and download files from the shared directories of the peer-to-peer network using keywords.
Based on the simple descriptions above, we can identify that servent should provide three basic services, connect service, lookup service and download service, as shown in Fig.The statechart diagram Fig. 4 shows the joint behaviour of servent.Each state is described by a set of attributes.We give unique preconditions for entering each state.Table 1 shows preconditions and invariants for every state of the servent.
An interesting issue to notice is that downloads can be initiated in two ways, i.e. either from a search result or by directly specifying target information.This design is reasonable because we do not always need to search the peer-to-peer network to get wanted files.In some cases, name and location information of files is already available.For example, file exchanges between two friends, who have already known each other's IP and shared contents.Take this into consideration, we provide two ways to initiate downloads.
The first version of action system specification of servent can be derived directly from Fig. 4 and Table 1.

ÒÙØ ÐÐ Ë ÖÚ ÒØ Ë ¡ ¡ ¡
where the class body in Table 2 consists of attribute declaration, initialisation and a loop of actions which are chosen for execution in a non-deterministic fashion when enabled.Each action is of the form Ë where is the guard and Ë is a statement to be executed when the guard evaluates to The British Computer Society Here connected is a boolean variable; keyword is the search criteria; target is the location information of shared resources in format filename@IP.
The next step is to apply the design in Fig. 3 to our initial specification, which results in three more classes ConnectService, LookupService and DownloadService.Now the system consists of a set of classes where is the name of the class and is its body.On the top level, we have components of servent The class ÒÙØ ÐÐ Ë ÖÚ ÒØ Ë , marked with an asterisk £, is the root class.At execution time one object of this class is initially created and this in turn creates other objects by executing Ò Û statements.
Let us look at the actions in Table 2.We can refine them according to service groups.For example, action where Offline and Online are defined in Table 1, can be refined into action

ÓÒÒ Ø ÓÒÒ Ø ÓÒÒ Øμ
In this paper, however, we skip refinement details here because we do not want to go into details of semantics of action systems [15] nor refinement rules of refinement calculus [12].
The body of the refined specification of servent is described in Table 3.It models a servent that provides three basic services (ConnectService, LookupService and DownloadService).When it connects itself to the peer-to-peer network, users can search the network via Ë ØÃ ÝÛÓÖ method and then download files from the search result.Or alternatively, users can directly give target information via Ë ØÌ Ö Ø method to download files.

REFINING GNUTELLA SERVENT
Ultimately, we need to derive an implementable specification for each service in the Guntella servent.Hence, every service component should be refined.We notice ConnectService and LookupService share a common functionality that enables appropriate message routing.It is reasonable to introduce a new component Router to the system as depicted in Fig. 5.This component will be in charge of routing all the incoming and outgoing messages of the servent.For DownloadService, we introduce a new component FileRepository in Fig. 5.It will act as a resource exchanger between servent and network.

Refining ConnectService
We start by considering ConnectService first.A Gnutella servent connects itself to the peer-topeer network by establishing a connection with another servent currently on the network, and this 6th International Workshop in Formal Methods (IWFM'03) The British Computer Society   kind of connection is passed around recursively.In order to model the communication between servents, we define a set of descriptors and inter-servent descriptor exchange rules as follows [7] Ping Used to actively discover hosts on the network.A servent receiving a Ping descriptor is expected to respond with one or more Pong descriptors.
Pong The response to a Ping.Includes the address of a connected Gnutella servent and information regarding the amount of data it is making available to the network.
Furthermore, we need to define the format of Ping descriptor and Pong descriptor.We use the message format in Table 4, where DescriptorID is a string uniquely identifying the descriptors on the network.TTL stands for Time To Live, which is the number of times the descriptor will be forwarded by servent before it is removed from the network.The TTL is the only mechanism for expiring descriptors on the network.Each servent will decrement the TTL before passing it on to another servent.When the TTL reaches 0, the descriptor will no longer be forwarded.The information carried by the message is encoded in info, whose format depends on the variable type.
The peer-to-peer nature of the Gnutella network requires servents to route network traffic appropriately.Intuitively, a servent will forward incoming Ping descriptors to all its directly connected servents.Pong descriptors may only be sent along the same path that carried the incoming Ping descriptor as shown in Fig. 6.This ensures that only servents that routed the Ping descriptors will see the Pong descriptor in response.A servent that receives a Pong descriptor with × Ö ÔØÓÖÁ Ò, but has not seen a Ping descriptor with × Ö ÔØÓÖÁ Ò should remove the Pong descriptor from the network.
The above routing rules can be illustrated in the statechart diagram Fig. 7. Using the same techniques as in the previous section, we can translate the diagram into action system specification and further refine it into Table 5.

The specification of
peers is a set storing its directly and indirectly connected servents information; and this IP is the IP address of the responding servent.The sequence of a connect session is summarized in Fig. 8.
In order to reuse the specification in Table 3, we will specify ConnectService without making any changes in its interface.The specification is shown in Table 6.When ConnectService is initiating, an instance of Router is created.Then it keeps checking state variable connected in the router and passing the status to servent.

Refining LookupService
When we think about LookupService, we follow almost the same paradigm as ConnectService to specify this component.A Gnutella servent starts a search request by broadcasting a Query message through the peer-to-peer network.Upon receiving a search request, the servent checks if any local stored files match the query and sends a QueryHit message back.We use following descriptors and routing rules to model the communication between servents [7] Query The primary mechanism for searching the distributed network.A servent receiving a Query descriptor will respond with a QueryHit if a match is found against its local data set.
QueryHit The response to a Query.This descriptor provides the recipient with enough information to acquire the data matching the corresponding Query.
The message format in Table 4 has to be revised to adopt the new descriptors.The message type now includes Ping, Pong, Query and QueryHit, so a minor change is made in Table 7.
The routing rules for Query -QueryHit traffic are similar to the rules for Ping -Pong traffic.A servent will forward incoming Query descriptors to all its directly connected servents.QueryHit descriptors may only be sent along the same path that carried the incoming Query descriptor as shown in Fig. 9.This ensures that only those servents that routed the query descriptors will see the QueryHit descriptor in response.A servent that receives a QueryHit descriptor with 6th International Workshop in Formal Methods (IWFM'03) The British Computer Society   Like the previous section, we first draw a statechart diagram for the Query -QueryHit routing rules.Then we translate it into action system specification and further refine it.
In Table 8 we have the body of Query -QueryHit router specification, which models a router that is in charge of routing Query -QueryHit traffic appropriately.Like a Ping -Pong router, it keeps receiving incoming message and replying with apporiate outgoing message .Here descriptorDB is a set storing descriptorID information; myKeyword is a string storing search criteria; cKeyword is a string storing comparison criteria; filename is a string storing destination filename; target is the shared resource location information in format and f is an object of class FileRepository which enables local file search service via Has and Find methods.Details of class FileRepository will be elaborated in the next section.
The Query -QueryHit router provides searching function via method SetKeyword.Once a keyword is set, a Query descriptor carrying search criteria is generated and broadcasted in the peerto-peer network via method SendQuery.In the mean time, the router keeps receiving Query and QueryHit descriptors.For an incoming Query descriptor, a query request is passed to FileRepository.According to the search result, a QueryHit descriptor is sent back in response via method SendQueryHit if a match is found, otherwise the Query descriptor is further forwarded via method ForwardMsg.Upon receiving a QueryHit descriptor, it checks its keyword field, and then sets target information to complete the search session.We summarize the sequence of a search session in Fig. 10.

Now we specify
LookupService with emphasis on specification reuse.The result is shown in Table 9.When LookupService is initiated, an instance of Router is created.It provides Search method via calling SetKeyword method in the router, and then returning the search result to servent.
Until now we have two action systems, Rc modeling Ping -Pong routing rules and Rl modeling Query -QueryHit routing rules.We notice the two action systems actually model different aspects of a full router.Furthermore, we can compose the two action systems together using prioritising composition [5] to derive the action system specification of a full router The British Computer Society   Ò Û´FileRepositoryµ

CONCLUDING REMARKS
We identified two open problems of existing peer-to-peer systems: reliability and robustness and extendability, and proposed strategies that can be used to solve them.The main contribution of this paper is an approach to stepwise development of peer-to-peer systems within the action systems framework by combining UML diagrams.We have presented our approach via a case study of stepwise development of a Gnutella-like peer-to-peer system.
Our experience shows that it is beneficial to combine informal methods like UML and formal methods like action systems together in the development of peer-to-peer systems.In the early stage, we try to catch the characteristic of the system using use case diagrams and statechart diagrams.Then formal specification in action systems framework is derived by further studying and elaborating details of these diagrams.In the later stage, sequence diagrams are used to graphically clarify the structure of the refined action system specification.
Moreover, we find OO-action systems very suitable for designing such kind of systems.The formal nature of OO-action systems makes it a good tool to built reliable and robust systems.Meanwhile, the object-oriented aspect of OO-action systems helps to built systems in an extendable way, which will generally ease and accelerate the design and implementation of new services or functionalities.Furthermore, the final set of classes in the OO-action system specification is easy to be implemented in popular OO-languages like Java, C++ or C#.
Peer-to-peer systems have been evolving very quickly.Besides Gnutella, another promising choice is JXTA [8] from Sun, which has been generating lots of attention.In the future work, we plan to further investigate this new standard.Moreover, we plan to stepwise implement our action system specification and develop it into a real peer-to-peer system.

TABLE 1 :
Preconditions and invariants for states

TABLE 2 :
Initial specification of servent

TABLE 3 :
Refined specification of servent

TABLE 4 :
Message format

TABLE 5 :
Specification of Ping -Pong router

TABLE 6 :
Specification of ConnectService Workshop in Formal Methods (IWFM'03)The British Computer Society

TABLE 8 :
Specification of Query -QueryHit router Given target information, download action is enabled and servent initiates a download.A download request is sent to the target servent, and then a file is downloaded via HTTP protocol.Afterwards, fileDB is refreshed in order to reflect the change of adding new files to the repository.The sequence of a download session is summarized in Fig.11.The last step is to specify DownloadService.From the result in Table11, we can see that when DownloadService is initiating, an instance of FileRepository is created.It enables Download by calling SetTarget method in FileRepository.
[16]Refining DownloadServiceDownloadService is relatively simple compared to ConnectService and LookupService.The primary function of this component is to enable a servent to download files from other servents.Once a servent receives a QueryHit descriptor, it may initiate the direct download of one of the files described by the descriptor's result set.Or alternatively, users can initiate the download directly by giving complete target information.Files are downloaded out-of-network, i.e. a direct connection between the source and target servent is established in order to perform the data transfer.File data is never transferred over the peer-to-peer network.Additionally, this component provides the local file query function for other servents.It should be in charge of a local file database which provides data services like add, delete, update, query and refresh etc.Moreover, it should take full control of local files.Hence we introduce a new component FileRepository in Table10, which will satisfy the above requirements for DownloadService.First of all, we provide SetTarget method to enable file downloads.To make things simple, we assume that fileDB is simply a set of relations Ý file .We use relation notations[16]dom and ran for domain and range operations, and ¡ as a domain restriction operator, defined by Ë ¡ Ö Ü Ý Ü Ý ¾ Ö Ü ¾ Ë .For incoming Query descriptors, Has and Find methods are provided to enable local file searches.FileRepositoryÅ ×× Å×6th International Workshop in Formal Methods (IWFM'03)The British Computer Society

TABLE 10 :
Specification of file repository 6th International Workshop in Formal Methods (IWFM'03)The British Computer Society