Response Time Evaluation in Ethernet-based Automation Architectures

In this paper, a new method to evaluate the response time in switched Ethernet automation architectures is developed. It is based on the modeling of the whole system in the form of timed event graphs and on the resulting state representation in Max-Plus algebra. After the resolution of the state equations and their fusion according to the working of the system, we obtained an algorithm giving the reaction delay of the architecture. With deep analysis of these equations we got to analytical formulas for direct calculus of the response time as a function of the features of the architecture. The minimal and maximal bounds of response time are also calculated. To check the validity of the results, simulations of the algorithm and experimental measurements on a laboratory platform are used. Finally, a comparison with results obtained using a classical method, shows the interest and the effectiveness of this new approach.


INTRODUCTION
Ethernet is more and more used as a fieldbus in automation architectures.Indeed, it offers many advantages either of interoperability or performance.To benefit from these advantages, many suppliers and researchers worked on this alternative.Unfortunately, they define specific protocols not always compatible with the standard networks.In this study on the contrary, we are interested in switched Ethernet networks that use open standard protocols and particularly the client server Modbus TCP/IP protocol [1].However, the introduction of new elements like switches in the network makes it difficult to evaluate the architecture performances.Because of resource sharing by many parallel flows, it is not obvious to predict the delay a message suffers in a node of the network.Different investigations are made to assess these delays by the use of network calculus [2] and simulation [3].However, like the majority of studies, these works focus on the end-to-end delay of the network and ignore both the PLCs (programmable logic controllers) and the RIOMs (remote input output modules).As a matter of fact, the modules of the PLC are not synchronized and the RIOMs are shared by many applications.This leads to considerable delays that have to be taken into account.To our knowledge, the studies that consider the whole architecture are so far, often based on simulation or experimental measurements [4].For instance, a method based on exhaustive state space exploration and model-checking [5], [6], [7], enables to know if a bound of time is reached or not.Its disadvantage is that it does not provide the distribution of response delays and also the explosion of the number of states in the model.Another method relies upon a model in form of hierarchical timed colored Petri nets and its simulation with CPNTools [8], [9].This last provides good evaluations of the delays but is onerous of time and the critical states are not surely scanned.In our study, we propose a formal method to assess the time delay by the use of Max-Plus algebra equations representing the dynamics of the whole system.Our work is organized as follows.First in section 2.1, the automation architecture is described.Then, a classical evaluation method based on extreme assumptions is presented.It is used in a comparison to show the improvements that our results bring.After in section 3, we remind some fundamentals about the Max-Plus algebra and the timed event graphs (TEGs).Then, we move on to the modeling of the system using these tools in section 4. Different configurations are considered.We begin by a first case in section 4.1, with one PLC and one RIOM before to consider a more complex architecture in section 4.2.After the resolution of the equations in section 4.1.2,we obtained a calculus algorithm that scans all the states of the system if the length of simulation is over a critical period précised in section 4.1.3.After in section 4.1.4,by following the same principle and a deep analysis of the solutions, we got to analytical formulas giving the response time of the architecture.Then in section 5, simulations of the algorithm and experimental measurements on a laboratory platform PRISME [10] are used to check the validity of the analytical results we developed before.By comparison with the classical method results, it is shown how our results are interesting.

Switched Ethernet-based automation architectures
In this study, the messages are exchanged in the automation architecture according to Modbus TCP/IP client server protocol.The communication module of the PLC is the client and the RIOMs are the servers.The considered PLC is made up of two modules, the CPU (central processing unit) that executes the user program and the Ethernet board that sends requests (combined requests: read and write data) to the RIOMs.They operate cyclically but are not synchronized, the fact that makes it difficult to set the values of their periods so as to reach the desired temporal performances.The CPU accomplishes periodically the tasks: reading inputs, execution of user program and up date of the outputs.The only constraint is the respect of the cycle period.Regardless of the CPU, the Ethernet board sends requests to RIOMs and waits until the cycle time elapses.If all answers arrived, it begins a new cycle, else it waits until all answers come back.The PLC and the RIOMs of this study are linked via store-and-forward Ethernet switches.They are without quality of service and check only the validity of a frame before to forward it.Finally, the frames loss is not considered and no time-out has to be taken into account.

Classical calculus of the response time bounds
The response time is the delay between the occurrence of an event on the controlled process (plant) and the occurrence of the reaction event issued from the controller, on the plant.This delay results from both the behaviour of the PLC and the communication protocol.The messages moving in the system suffer a delay at each visited device of the architecture (FIGURE 1).At the top of FIGURE 1, we find the plant which is the source of the event and the destination of the consequence.On the figure we assigned a number to each important transition of the information in the system.The arrow (1) represents the occurrence of an event from the plant.The delay f d is due to the filtering of the data coming from the sensor.This information is taken into account ones the next request arrives.It is used in processing during before to return the corresponding response (arrow 2).The response suffers a network delay before to arrive to the Ethernet Board.Once in the input buffer of the Ethernet board, it is copied to the shared memory during a time .At the next cycle of the CPU, it is read and used in calculus (during ) to update the output (arrow 5).It is taken into account at the next beginning of scanning cycle (arrow 6).Again, this event consequence suffers of a network delay before it gets to the destination (RIOM').After a processing time in the destination remote module RIOM', the consequence gets to the plant (arrow 8).During this trip, the delays between two transitions are not only the grey lines mentioned on the figure but also the gaps.They can be distinguished according to their origins [9] as follows: -Waiting for resource availability (load): a shared resource is free only after processing all waiting frames.-Waiting for synchronization.Thus, the delay that the data suffer from their generation from the plant to their forwarding to the network is: where is the delay due to concurrent requests that share the module RIOM and is the time to wait for the arrival of the corresponding request to the module.The delay between the arrival of the response to the Ethernet board (arrow 3) and its reading to be used in calculus in the CPU (arrow 4) is: where is the time to process waiting frames and the delay before the beginning of the next CPU cycle.Once in the CPU, the information is affected by a delay: where the time before the result is is sent during the next scanning cycle.When a request is sent, each visited switch of the network affects it with a delay.The total delay is written: Also when the response comes back, the network causes it a delay given: ) the time to wait for the switch availability and (resp. ) the intrinsic delay to process a frame in the switches.Finally, when the result arrives to the destination, it is affected with the delay: where is the delay due to concurrent requests that share the module RIOM'.
6 load d The recapitulation of the previous decomposition, leads to the response time as the sum of all delays: In the classical method, the dependences between the components of the architecture are ignored.The bounds of time are obtained with extreme considerations.So, the response time is maximal (resp.minimal) if the parallel processes are maximally non synchronized (resp.totally synchronized) and loads are maximal (resp.minimal) in all the previous calculated delays.
In the case of our work, we consider an architecture with one PLC and N remote modules RIOMs.All the modules are scanned in an invariant order by only this PLC.Using the previous method, the bounds of response time are: Where and (resp.and ) the extreme network induced delays.We will see later in this study that the results (2) are not realistic and are far from the real bounds of response time.Indeed, even the fact that the processes are not synchronized, there is always a dependence that constrains the system to operate accordingly.This is what is proven in our work by the use of a formal method based on Max-Plus algebra and timed event graphs.

Max-Plus algebra
A set D endowed with an internal law ⊕ is a monoid if ⊕ is associative and has a neutral element ε , A semiring is a commutative monoid with another internal law Ä that is distributive on ⊕ , has a neutral element e and admits ε as an absorbent element, In the Max-Plus formalism, the structure max ( ,max,+ ¡ ) where commutative dioid with usual maximum and addition operations for respectively ⊕ and Ä .The neutral element of the maximum is ε = −∞ and of the addition is This algebra is extended to vectors and matrix.Then, for n ∈ and max , n v w Î ¡ , the vector v w Å has the components for to .With max( , ) , the matrix multiplication in max is defined by A B Ä or .A B where: The Kleene star of a square matrix ¥ where 0 M is the unit matrix with only on the diagonal and e ε elsewhere.Then, for is the minimal solution of the inequality .

Timed event graphs
An event graph is an ordinary Petri net where all the places have at most one upstream and one downstream transition.An event graph is timed if the transitions or the places are affected with delays.We note the number of transitions with at least one place upstream and the number of source transitions t n m u .The only place relying the transitions and is noted and its delay In the modeling of our study (section 4 The timed graph of the example leads to the equation: , which is a linear equation in Max-Plus algebra: In general, the behaviour of a TEG can be expressed by the following Max-Plus linear equation: , it corresponds to the delays of the places downstream of the source transitions.0 A * is the Kleene star of 0 A .In an analogous manner as in usual linear systems, this form can be brought to state space representation by replacing all the places with markings by m other places with ( Hence, we obtain an extended system with a state vector ( ) x k that belongs to max N , where ' N n n = + and the number of transitions added.The new system is described by the equation: where . The last formulations permit to point out that the behaviour of a timed event graph is determinist, depending only on the source transitions and the initial conditions [12].This dependence can be clarified by the following equation: . .( )

AUTOMATION ARCHITECTURE MODELING AND RESPONSE TIME EVALUATION
In this paper, two main cases are considered.The first to determine the response time of architectures with one PLC and one RIOM as in FIGURE 3 and a more complex system is studied after in the other case.

Case 1: one PLC and one RIOM
Two cases are to be considered in control architectures.It may be, for instance, the delay between the detection of danger and the triggering of an alarm.In this case the evaluation of the maximal bound of reaction time is of top priority.However, if the control concerns for example a robot arm speed, it is the distribution rather than the bounds of the response time that is more important to assess.In our work, we consider the general case.The response delay is calculated using only the state of the system and the time of occurrence of an event, regardless to its consequences.

TEG model and Max-Plus equations
According to the previous description of the architecture in section 2.1, we got to the model of FIGURE 4. It comprises two independent TEGs: one at the left to model the CPU and another at the right for the rest of the system.As a matter of fact, the model is abstract and represents the state of the two parts of the architecture independently by making abstraction of the meaning of the tokens in circulation in the TEGs.Therefore, the link between the CPU and the Ethernet board is hidden.By the fusion of the equations of the two TEGs and introduction of secondary variables, we can follow the frames along their route in the whole architecture to get the time delay during each step.The places , and with delays The grey arrows represent the source (data coming from the sensor) and output (data toward the plant).They are not considered at this stage since the system is not constrained and data are available at the output of the sensor as long as it is functional.By applying the method of the section 3.2 to the model of the architecture with the initial conditions on FIGURE 4, we got to the Max-Plus equations: ( ) ( ( 1) ) ( ( 1) ) ( ) ( ( 1) ) ( ( 1) ) The systems ( 7) and ( 8) are linear in Max-Plus algebra and can be rewritten in the form (5). We assigned them different indexes (k and l) to mean that they are not synchronized, exactly like the CPU and the Ethernet board.It is the additional and the main difficulty of this study.

Equations resolution and simulation algorithm
The resolution of the equations in max leads to: In these solutions, only the equations representing the following events interest us: --Reading and beginning of processing in the CPU ( 1 θ ).
--End of processing in the CPU and output update ( 2 θ ).
--Beginning of scanning and sending a request ( 4θ ).
--Reception of an answer in the shared memory ( 10 θ ).
Indeed, they are the events that link the CPU and the Ethernet board.When an answer arrives Let us put the time to wait for the reception of an answer: At the scanning cycle, the answer is received at time The delay is minimal if the data coming from the sensor are used in processing in the RIOM immediately after they are generated.So the minimal delay (not global minimum) for the event is: where f d is the delay due to the data filtering in the sensor.
On the contrary, the delay is maximal if the data arrived immediately after the beginning of the RIOM processing relative to the previous scanning cycle.It is then given by: This delay is always valid and it is the case if the frequency of update of the sensor output is smaller than the frequency of scanning.Else, some events may be erased and not used in any processing.This case is not considered in study even the formula (13) is not limitative.Indeed, it is not interesting to be considered since no response time is assigned to the erased events and it is only loss of processing time.Thus, we have an algorithm for the evaluation of the response time of the architecture to any occurring event.It is fast and easily implemented.The delays and the features of the components of the system are introduced as parameters.So, it is flexible for use for different configurations of the architecture.Simulations of this algorithm are used to check the validity of the formulas, obtained later in this study.

Critical period of simulation
In the architecture, we have two cyclical applications but not synchronized.Despite of this constraint, the whole system remains cyclical with a period that verifies: where always exist.Since we can write , it is enough to take: ( ) This period is minimal if and are chosen prime numbers (of course always possible).
We can conclude that the algorithm is formal and all possible states of the system are scanned if the simulation length is over the previous critical period .So, the global maximum and minimum are the maximum and minimum of those calculated using the simulation of this algorithm during at least .

Analytical calculus of response time
We use the results (11), (12) and the principle of the algorithm.Let us put: T T , where 1 β < .α and r τ are respectively the ratio and the remainder of euclidean division of by .
r T CPU T For calculus complexity proven after, we begin with the case r ∈ and generalize later (recall: ).

International Workshop on Verification and Evaluation of Computer and Communication Systems
VECoS 2008 • Case: At the scanning cycle, we have: We have also: 4 ( ) ( 1) and for 1 n l = + then: Thus on the condition C 1 : (1 ) r α β > + + , (it is the optimal case), 2 Finally: In practice the condition C 1 is often respected and the results (23) are valid because the scanning period is by far longer than the period of the CPU.The results are very interesting since the calculated bounds in (23) are constant and therefore are global minimum and maximum.Indeed, the network and the RIOM processing delays are supposed constant.It is not the same in architectures where the resources are shared with acyclic traffic and the imposed delays variable.This is so far out of our study.The results (23) are generalized for a condition: ( 1) r q r q α β ⋅ > + + > ⋅ − and we found: • Case : At the scanning cycle, we have: Let us take i where: and since in (*) We have also: [ ] From (*), we deduce: MIN MAX l i r q r q r q r q r q r q 1 ) 1 ) 1 ) , the global bounds of time and local delay are given by: ( ) ( ) ( ) In this general case, we point out that the optimality condition , it is enough to take

Case 2: One PLC and N RIOMs
The modeling of the PLC does not change and only the number of RIOMs that is different.The requests are sent from the Ethernet board in an invariant order and the switch with FIFO policy is without quality of service.In this more general case, the TEG of the CPU remains the same but the other one becomes more complex.Indeed, we introduce other equations to model the FIFO policy and solve the problem of sharing resources: the switch and the Ethernet board.The scanned RIOMs are affected with indexes according to the order of sending the request relative to each one.We associate the index i to the RIOM receiving the i request from the Ethernet board.Particularly, and With similar analysis and same notations as in the first case with one RIOM, the following results are obtained: where:  The results are very interesting and to get small response time, we should assign great index to the source and small one to the destination: the order of RIOMs is important.However, we have to keep in mind that the condition of calculus of the delay depends on or r T α and we should decrease (see (31)).So, the optimal case is got by increasing and stop just before the condition changes (before changes).

VALIDATION OF THE METHOD
To check the validity of the model and the results developed previously, we consider two configurations of architectures (FIGURE 3 and FIGURE 5).We compare the results of the algorithm and the formulas with experimental measurements taken on the patented laboratory platform [10].
In the second configuration, we are interested in the causality delay between an event generated on the input of the RIOM R4 and its consequence on the output of the RIOM R5.The histograms of FIGURE 6 represent a series of 10,000 measurements and simulations of the algorithm for this configuration.This architecture is more general than those of the study (two switches).This is made to show the possibility to extend the results to more complex systems.The CPU period is set up to 5 ms and scanning to 10 ms with a jitter of 15%.The jitter is considered in the algorithm and the formulas.We used the maximal (resp.minimal) scanning period to calculate the maximal (resp.minimal) bound of response time.We obtained the results of TABLE 1.In both cases, we already can conclude about the validity of the formulas since the maximal delays are greater than all measured delays and the minimal delays are smaller than all.As expected, the results of the simulation and formulas are exactly the same in all cases.Indeed, they are based on the same principle.The gaps of delays, with respect to measurements, are in all cases smaller than 3.27% for analytical formulas or simulations.This gap is in both cases, smaller than 2.01% in the calculus of the mean of responses times.A random event generator is used in simulation to obtain realistic distribution of delays (to offset effects of the jitter).Indeed, the shapes of the measurements and simulations histograms are very similar (FIGURE 6).However, the results of calculus using the classical method are very far from the real bounds.We can note a gap of more than 50% with respect to the measured minimal or maximal bounds.These results are not relevant to any conclusion about the performance of the architecture.Considering such result in the synthesis of controller for instance, would lead to unsatisfactory performances.
If we compare the results of both configurations (case 1 and case 2), we note that there is about a difference of 0.25 ms between the maximal bounds (resp.minimal bounds).It is exactly the value of transmission time of a frame and since the switches are very fast, the considered configurations are very similar.The main difference is the use of a RIOM for event source and another for consequence destination with a difference of one in order (R4 and R5).This result consolidates the general formulas of the section 4.2.

CONCLUSION
In this work, we modeled different configurations of automation architectures by the means of TEGs and Max-Plus algebra.We obtained an algorithm and analytical formulas for formal calculus of the response time.The comparison of the results with experimental measurements, allowed us to check the validity either of the algorithm or the formulas.Their comparison with the classical method results, made clearer their effectiveness.Thus, by the use of these interesting formulas, it is easy to choose the adequate configuration of the components of an architecture to fulfil the desired temporal requirements.For further studies, it would be interesting to consider more general automation architectures with many PLCs and RIOMs.Finally, a study of an overall networked control system to evaluate the parts that our results can play in the synthesis of control strategy, is prospected.
firing times of the n and transitions of system for the time.The matrix

τ τ τ and 4 ττ 12 τ 7 t
of the CPU, model respectively the phases of waiting, user program execution during (reading and writing included), CPU busy and finally CPU idle.The place is not necessary.It is represented only to show more clearly the end of a CPU cycle by firing the transition .It is also the case of .So, we easily note the periodical operating of the CPU with cycle period models the scanning period of the Ethernet board and a token in the place means it is busy during at least this period.The sending of a request starts by firing the transition and finishes by firing , time to send a frame.A token in means the request is sent and the Ethernet board is waiting for the answer.The places and model the network (switch) delays imposed to the sent requests and the returned answers.The separation of these places is made possible since the lines are full duplex and no situation of conflicts or collision is possible.To avoid overcrowding the model, includes also the necessary time () to copy the response from the input buffer of the Ethernet board to the shared memory with the CPU (indeed is by far smaller than the other delays of the architecture).The places , and represent the RIOM.This module stays waiting in until a request arrives to its input buffer .By firing , the processing International Workshop on Verification and Evaluation of Computer and Communication Systems VECoS 2008 starts and goes on for a time 10 τ equal to .At the end, it puts the answer in its output buffer before forwarding it to the network.

( 10 θ
), it is taken into account in the next beginning of the CPU cycle ( 1 θ ).It is read and used in the calculus in the CPU.Once the processing finishes, the result is written in the memory of the Ethernet board ( 2 θ ).It is taken into account at the next beginning of the scanning cycle ( 4 θ ) and sent to the RIOM.

r
being the entire part of and r ε fractional part.
condition C 2 more restrictive than C 1 .It is an important result which suggests to set the period of scanning as a multiple of the period of the CPU (of course minimize T first), in order to reduce the maximal bound of response time of the architecture.

N 4 S N = and 5 D
are the indexes assigned to respectively the event source (S) and destination (D) of its consequence.On FIGURE5for example, is the necessary time to wait for the reception of the answer from the event source. of RIOMs (S) and (D).
To study the dynamic behaviour of a timed event graph, we associate to each transition the date of its firing for time.It is noted for a source transition and i p On the following global and local conditions: th p