Modeling complex systems with VeriJ

This paper presents VeriJ, a language designed for modeling complex supervisory control problems. VeriJ is based on a subset of the Java language with some supervisory control specific constructs added; this allows to use industrial strength integrated development environments such as Eclipse to build VeriJ models and to directly use a Java debugger to execute (simulate) these models. With the aim to perform controller synthesis in a further step, VeriJ models are translated into hierarchical finite state machines (HFSM) representing the control flow graph, using modern model transformation techniques and tools. The semantics of these HFSM is then given as a pushdown system, leading to a concise and expressive representation of the underlying discrete event system. We illustrate our modeling and transformation approach with a VeriJ model of the Nim game, for which finding a winning strategy for a player can be seen as a control problem.


INTRODUCTION
Context.Supervisory control1 of discrete event systems is a formal approach allowing to automatically compute a controller given some control objective.Given a discrete model M of a system and an objective expressed as a formula ϕ, the control problem asks if there exists a controller C such that M controlled by C satisfies ϕ.This problem can also be viewed as a game where the controller is looking for a winning stategy against all possible actions of the environment.
While algorithms solving the control problem are well known (Ramadge and Wonham 1987), two obstacles limit practical application of these techniques in industry: like most state-space exploration techniques the algorithms scale up with difficulty to large and complex systems, and from an engineer's point of view the investment needed to learn to manipulate formal models such as automata is often considered too costly.
Contribution.We propose to model complex discrete event systems using VeriJ (Zhang 2010), a language based on a subset of Java.Complex systems may involve a large number of components and handle for instance lists with dynamic size.They would include automated transport systems like the one partially studied in (B érard et al. 2008).Concurrency is an important feature of complex systems but it is not yet implemented and is left out in this paper.In addition to Java instructions, VeriJ includes a small set of control specific elements allowing to specify which actions are controllable, and the control objective.Since this language is based on Java, most engineers already feel very comfortable expressing their system's semantics as programs.This also allows to directly benefit from mature and powerful industrial strength development environments such as Eclipse (syntax checks, code completion, etc.), including the facilities to interactively run a VeriJ model thanks to the standard Java debugger.
Discussion and related work.To perform formal analysis of VeriJ specifications, we need to build a transition system from the java-based source code.Two main alternative solutions are possible for this step: direct translation from source code to a formal model, or using the Java compiler to obtain bytecode, then analyze at bytecode level.We now compare existing techniques used in related work and present our proposal.
Direct translation to some variant of Control Flow Graph (CFG) allows to preserve a high level of abstraction in the resulting system model.However, it may be difficult in general to capture all syntactic elements from the language, and care must be taken to avoid deviation from a compiler's interpretation of the source code.This approach was used for instance in the Bandera project (Corbett et al. 2000) where the translation target was a so called Bandera Intermediate Representation akin to a finite state machine.It was also the case in early versions of the Java Path Finder (JPF) (Havelund 1999), a software model-checker with Promela as target: Promela is the input language of Spin model-checker (Holzmann 1997), again based on communicating finite automata.
The other option consists in using a standard compiler to derive the semantics of the source code, and handling the verification by working at the bytecode (e.g. for Java) or assembly language (e.g. for C) level.This approach solves issues related to software artifacts, such as external libraries for which no source code is available, hence is the preferred option for full-fledged software modelcheckers.It also allows to consider less cases when implementing the verification tool as the variety of opcodes is rather limited.However, it forces to work at a very low level of abstraction, on much larger models in raw number of instructions, or even to resort to executing the code to derive its interpretation.This is the choice taken in recent versions of JPF (Brat et al. 2000;Gvero et al. 2008) which rely on a dedicated backtrackable Java Virtual Machine (JVM) that provides non-deterministic choices and control over thread scheduling.
In this work, we choose to directly translate the VeriJ input into a variant of control flow graphs called Hierarchical Finite State Machines (HFSM), that preserve the structure of the source code and a high level of abstraction.Parsing of the input is partially handled by existing Java analysis tools: MoDisco2 is able to raise the source code to a model instance of a standard Java metamodel.Subsequent transformation to HFSM relies on model to model (M2M) transformation techniques using the Atlas Transformation Language (ATL3 ), a state-of-the-art model transformation plugin within Eclipse.Since we are targeting supervisory control rather than full software model-checking, we need an efficient expression of the system's transition relation.For this reason, JVM-based solutions (like in JPF) are not appropriate in our case.
Once a control flow graph has been obtained, there remains the question of how to provide its semantics.A straightforward approach in simple cases consists in inlining all calls, resulting in a single finite state machine (FSM) for the whole input program.FSMs are the natural input language for many model-checkers, which makes this solution attractive.However, a plain FSM may become very large due to duplication of behaviors.Moreover, it is inapplicable when recursion is involved, since the inlining would produce infinite structures.Another popular approach is to use pushdown semantics to interpret the CFG: using PushDown Systems (PDS) produces a compact and accurate representation of procedure calls thanks to the use of a stack in the system states.This is the choice taken in jMoped (Suwimonteerabuth et al. 2007), Magic (Chaki et al. 2006) or MOPS (Chen and Wagner 2002) for instance, and in our own approach as well.
Outline.The overall scheme of our approach is depicted in Figure 1.The transformation is split into two steps: preprocessing and compilation, both using model transformation techniques involving metamodels.
Instead of a really complex system, we use in this paper as a running example a simple two-player game, Nim, where the control problem consists in finding a winning strategy for one of the players.
The paper is structured as follows.Section 2 presents VeriJ and its metamodel.Section 3 is devoted to VeriJ compilation into Hierarchical Finite State Machines then Section 4 shows the pushdown semantics.

VERIJ
VeriJ is meant to bridge the gap between a programming language and the input of a controller synthesis tool.

VeriJ definition
Designed as a Domain Specific Language (DSL) for verification and control synthesis, VeriJ consists of a subset of the Java language, including elements such as: basic data types, arithmetic operators, assignments, decision and control statements, construction of classes with instantiation, with the addition of specific classes described below.VeriJ does not support features such as cast, exceptions, visibility, inheritance, libraries or native code (see Figure 2(a)).
We now present VeriJ specific features, shown in Figure 2(b).
• Due to the complexity of dealing with lowlevel Java collections such as sets, lists and so on, we create the VeriJList type instead, to handle basic collections with a small set of operations.For example, in Java, even the basic collection ArrayList involves hundreds of lines of complex source code.The VeriJList provides a high level of abstraction for the collections.
• Additionally, VeriJ introduces concepts useful for the verification process, such as random and non-deterministic choice (NDChoice).In VeriJ, random is a random integer generator built on the standard Math.random(),which randomly produces an integer between a min and a max.The NDChoice method is a random boolean generator.Both random and NDChoice are used to specify free choice semantics of a system and will be used to build the transition relation of the target model.The playerID parameter given in these two methods is needed to identify by whom each choice is made.In other words, each move is labeled by its player, which will be an essential part in the procedures of verification and controller synthesis.
These concepts are also implemented as Java classes, hence allowing us to run and debug VeriJ models using standard Java tools.In our implementation of the random methods, user input, trace record and replay or standard simulation are possible.

Example of a VeriJ model: the Nim game
We now present the Nim game, used as a running example throughout the paper.In this case, we would like to solve a control problem: finding a winning strategy for one of the player.In the final transition system, each action must be assigned to a player and a set of failure states have to be defined.Then a standard algorithm involving a backward fixpoint will be applied as in (Zhang et al. 2010).
Given a set of matches arranged in several rows, with 2i − 1 matches in the i th row, the Nim game consists of two players alternately picking a random number of matches from a randomly selected row of matches.The player who takes the last match loses.This game was completely solved in 1901 by Charles Bouton (Bouton 1901).We use here the variant presented in (Ziller 2002).Such a game can be seen as an instance of a controller synthesis problem: the game is modeled as a transition system where each player takes a turn and the goal is to determine if there exists a winning strategy for one of the two players.In other words, one of the players, the controller, tries to find a winning strategy against all the possible moves of the other player, who represents the environment.
The Nim game source code being large, we only present a subset in Figure 3, to show how modeling and simulation are done in VeriJ.
• The class Board only uses a variable Matches, whose type is VeriJList.Its constructor (lines 36 to 42) constructs the set of matches in successive rows by calling operation add (from
• A call to random (line 56) is used in method chooseNBtake to randomly generate the number of matches to take in one move.Similarly, NDChoice (line 69) is used in method chooseRow to randomly choose the row in which matches will be taken in that move.Both random and NDChoice carry the label of each player by playerID.

VeriJ compilation
To apply formal analysis to VeriJ models, VeriJ source code is compiled into a Hierarchical Finite State machine (HFSM), i.e. a set of finite state machines.Recall that the transformation (Figure 1) includes preprocessing and compilation, using model transformation techniques involving metamodels.
Since VeriJ takes a subset of Java as described in Figure 2(a), its metamodel is primarily derived from the metamodel of Java, by removing 46 metaclasses (from the 126 metaclasses of Java) and adding the new elements.Despite the classes removed from Java metamodel, VeriJ metamodel is still too large to be displayed in this paper.To get an in-depth description and complete details of the Java metamodel, we refer the reader to http://wiki.eclipse.org/MoDisco/Components/Java/Documentation/0.9.
In the preprocessing step, we extract the Java model of the application, conforming to Java metamodel, from either the Java source code or the VeriJ source code, thanks to MoDisco.MoDisco is a model-driven framework providing tools to support software modernization.Among these tools, discoverers automatically create models of existing systems.The VeriJ model, conforming to VeriJ metamodel, built out of the extracted Java model, is obtained by pruning unnecessary information from the Java model and building the VeriJ specific elements.Given the Java metamodel and the VeriJ metamodel, this step is carried out by a set of rules Java2VeriJ.atl, using Atlas Transformation Language (ATL) framework.A part of the transformation code is shown in Figure 6.Rule VeriJRandom (line 9 to 18) shows how to select and transform a MethodInvocation element in Java to a random element in VeriJ.

HIERARCHICAL FINITE STATE MACHINE
The next step of our approach is to compile VeriJ models into discrete event systems (Figure 1).The control flow of a VeriJ model can be syntactically described by a Hierarchical Finite State Machine (HFSM), which is a finite set of finite automata, linked together according to program instructions.This model and its semantics in terms of a pushdown system is intended to be the input of the verification tool.We first describe HFSM with the corresponding model transformation and give the pushdown semantics in Section 4.

Definition
A finite state machine (FSM) over a finite alphabet Σ is a tuple F = (S, δ, s 0 , s f ) where S is a finite nonempty set of states, δ is a partial mapping from S×Σ to S, s 0 ∈ S is the initial state and s f is a unique final state.an initial FSM F 0 ∈ F and alphabet Σ ∪ {r F , F ∈ F}, where Σ is a finite alphabet and each r F is a symbol not in Σ.
The initial FSM F 0 , which represents the main method, is the entry of F. States s 0 and s f of F 0 are respectively the initial and final states for the HFSM F. Each finite state machine F ∈ F corresponds to a function called by the program executed from the main method.A transition within F corresponds to an instruction of the function.When the instruction is a method invocation, the label of the transition is a reference to another FSM, and so on.Thus letter r F denotes the reference to FSM F , while basic instructions are elements of Σ.
Figure 5 shows the core part of the HFSM metamodel, the actual complete metamodel being much larger.A model is composed of a set of FSMs.Each FSM is composed of states and transitions, together with its own local variables, and a class name (corresponding to the class to which it belongs).A Transition has a source state, a destination state and a TransitionExpression.The TransitionExpression is an expression denoting a simple statement (e.g.variable declaration, assignment, etc.) or an HfsmExpression (method invocation) which refers to an FSM, hence bringing in the hierarchy.
In HFSM metamodel, both the types of parameters of a method declaration and the fields of a class belong to the Variable metaclass.

From VeriJ to HFSM
As mentioned above, the compilation of VeriJ (Figure 1) consists in the model transformation from a system specified in VeriJ to its HFSM model.Using VeriJ metamodel and HFSM metamodel, we code a set of ATL rules VeriJ2Hfsm.atl to create the states and transitions for each FSM in this HFSM model.To visualize the obtained HFSM model described in the form of XML Metadata Interchange, we create the corresponding .dotfile through project FSM2Dot4 and then generate the hierarchical FSM diagrams as shown in Figure 8 using Graphviz5 .The transformation associates with each method of the program an FSM with the same name, with initial and final states, which are called respectively by the FSM's name followed by "S.ini" and "S.fin".
Each transition is obtained from a statement of the VeriJ model.It takes the name of the statement as the label "transExpr" of TransitionExpression type.In particular, since a block statement in VeriJ is a special subclass of Statement which contains statements, each block is transformed into a list of transitions in an FSM, for example, the body of a method declaration, the body of the control statements such as If Statement (without else statement from Transition S.ini.2 to Transition S.ini.3 and with else statement from Transition S.ini.3 to Transition S.fin), While Statement (from main S.ini.4 to main S.fin) and For Statement (from Board S.ini.1 to Board S.fin).
The naming of the states inside a FSM is defined in the following way: Given the source and destination states of the transition obtained from a Block statement, the states in the list of transitions are named by adding the ordered number or strings that indicate the structure of a control statement.For example, in Figure 8(a), main S.ini.1 denotes the source state of the first transition, main S.ini.4.whilebody.1 represents the source state of the Block statement, the while body.

Example of the Nim game
The result of the transformation is now illustrated on the Nim game example from Section 2.2.A part of the associated set of FSMs is depicted in Figure 8.We give details about the transformation on this example, refering to pieces of code from Figure 3.The two steps above thus set up a consistent chained model transformation framework which makes maintenance and refinement easy.ATL also provides traceability mechanisms of the transformations in the form of TraceAdder (Jouault 2005).Another Eclipse plug-in from Atlas group, Atlas Model Weaver (AMW6 ), also offers a means to generate an ATL execution trace in a weaving model.Once the HSFM model is generated from the VeriJ model, we can build the corresponding pushdown system.

FROM HFSM TO PUSHDOWN SYSTEMS
It is well known that FSMs are not expressive enough to describe program semantics.For example, method calls and returns need to be correctly matched, including recursion schemes.Local variables in different procedure calls need to be distinguished.Pushdown systems (PDS), introduced in (Oettinger 1961;Chomsky 1962), are a natural choice for modeling method calls and interprocedural program behaviours, by adding a (possibly unbounded) stack to a finite set of control states.As mentioned in the introduction, pushdown systems are used in several recent verification tools (e.g.jMoped for Java and MOPS or Magic for C) to describe programs when recursion is involved.In fact, pushdown systems give a formal semantics to programs.

Definitions
The basic definition of pushdown automata is the following.
Definition 2 A Pushdown Automaton (PDA) is a tuple P = (P, Γ, ∆, c 0 ), where P is the set of states or control locations, Γ is the alphabet of stack symbols, ∆ is the set of transition rules, a partial mapping from P × Γ to P × Γ * , and c 0 ∈ P × Γ is the initial configuration.A transition rule δ ∈ ∆ is written as (p, z) → (p , α) with p, p ∈ P, z ∈ Γ and α ∈ Γ * .
The semantics of P is given as a transition system with P ×Γ * the set of configurations.For a rule δ ∈ ∆ and a non empty γ = zβ ∈ Γ * with z ∈ Γ and β ∈ Γ * , there is a transition (p, γ) δ − → (p , αβ).In other words, z is the topmost stack symbol, each transition pops z and pushes the word α. Figure 9 shows an example of a pushdown automaton with its semantics.In Figure 9(a), P = {p, q}, Γ = {A, B}, ∆ contains the three rules (p, A) → (p, AB), (p, A) → (q, ε) and (q, B) → (q, ε), and the initial configuration is c 0 = (p, A).The PDA has an infinite set of configurations depicted in Figure 9(b).

Extended PDS
In order to represent object references, we slightly extend the definition of pushdown automata to include an explicit representation of the heap.
Let X be a set of variables, including variable this.
For each x ∈ X, let D x be the (finite) range of x.The set D x can either be the set of values of a primitive type (restricted to int and boolean here) or the set of references Ref = {$0, $1, $2, . ..} containing heap addresses.In particular, D this = Ref .Default values are ⊥ for all ranges, for instance 0 for int, f alse for boolean or $0 for Ref .
A valuation is a partial mapping v : X → D where is defined} and we denote by V the set of all valuations, with ∅ for the valuation such that Dom(v) = ∅.For y ∈ X and d ∈ D y , define v by: Given a HFSM F with initial FSM F 0 , we write Σ = Σ ∪ {r F , F ∈ F} and F = (S F , Σ, δ F , s 0,F , s f,F ) for each F ∈ F. We set S = F ∈F S F , the stack alphabet is Γ = S × V (recall that V is the set of all valuations) and we define the set of configurations by: Q = (C × V ) * × V × Γ * where C is the set of class names from the VeriJ model (with ∅ for the empty class name).A configuration q = (h, g, γ) ∈ Q, consists of: • h ∈ (C × V ) * the heap state.Hence, an empty heap is described by ε (also represented in the figure by $0 :⊥) and the letters are of the form (c, w) ∈ C × V , where c is a class name and w ∈ V is a valuation for the attributes of an object in this class.
A non empty heap is described either as h = (c 1 , w 1 ) . . .(c n , w n ) for some n ≥ 1 and adding an object to h can be written as h.(c, w) for some new (c, w) ∈ C × V .
• g ∈ V the global variable state is the valuation of static variables and the temporary variables for return statements; For a variable in g, the boolean tag global in Variable has value true.
• γ ∈ Γ * the stack state where each element (s, v) ∈ Γ is composed of an FSM location and a variable valuation.
We now explain how the HFSM is interpreted in terms of PDS actions.The initial configuration of the PDS is q 0 = (ε, ∅, (main 0 , ∅), where main 0 denotes the initial state of F 0 .A transition from configuration q = (h, g, γ) to configuration q = (h , g , γ ) is written q t − → q , for some transition t : s δ − → s from FSM F with δ ∈ Σ.The stack state evolves from γ = zβ to γ = αβ, where α ∈ Γ * is a word of length |α| ≤ 2. In this context, it is sufficient to consider rules with maximal length 2, which corresponds to a method invocation: stacking the initial state of the method called and the return address after popping the topmost stack symbol.
We finally give several examples of rules from configuration q = (h, g, γ), assuming the size of the heap is |h| = n and γ = zβ, with z = (s, v) the topmost stack symbol.(ii) Instantiate a class of the form δ : new Class; For a variable x declared as a reference variable, this operator allocates memory for the new object in the heap and returns a reference to that memory cell.The successor state is q = (h , g, γ ) with h = h.(c,w) where c is the class name Class and w assigns default values to all fields.Moreover, γ = αβ with α = (s , v ∪ [x → $(n + 1)]).Instantaneously, another rule (Method invocation) is applied for the constructor call.
(iii) Method invocation of the form δ : ob.m(arg 1 , arg 2 ...); Let M be the FSM associated with method m() with m 0 its initial state, m f its final state and x 1 , x 2 ... the parameters.The successor state is q = (h, g, γ ) with γ = αβ and α = ((m 0 , v 0 )(s , v)), where v 0 is the valuation defined by v 0 (x i ) = arg i for i = 1, 2, . . .and v 0 (this) = $j if the reference of object ob is $j.Note that reaching the final state m f of M (with a return statement) will pop the topmost stack symbol, hence returning the control to s , the successor state within the FSM that emitted the call.
(iv) Assignment of the form δ : x = expression; Let d be the value resulting from the expression evaluation.When x is a field stored in the heap cell (c, w), the successor is q = (h , g, γ ) where h changes (c, w) to (c, w[x → d]), γ = αβ and α = (s , v).When x is a static variable, the successor is q = (h, g , γ) where g Examples of these rules are given in the next section on the Nim game, together with the model transformation step.

PDS model of the Nim game
Figure 10 shows a part of the PDS of the Nim game.The main method of Nim game contains the object creation δ : N im nim = new N im();.In the PDS, it is decomposed into three steps: (i) reference variable declaration N im nim, (ii) instantiation nim = new N im and (iii) initialization by invoking the constructor nim.N im().This pushes the initial state of Nim onto the stack (N im S0, [this → $1] in 4 th configuration).
In the constructor declaration N im(), the class instance creation statement this.board= new Board(); is decomposed into two steps: this.board = new Board and board.Board().This pushes the initial state of Board onto the stack (Board S0, [this → $2] in 6 th configuration).
Due to the lack of space, we skipped configurations related to further statements up to the return of the call to the constructor of Board.In the 7 th configuration, the final state N im S1 is reached.Then, upon the return of the call to constructor of Nim, the topmost stack symbol is popped, which leads to the 8 th configuration.

CONCLUSION AND PERSPECTIVES
This paper presents VeriJ, a language for the modeling and controller synthesis dedicated to complex systems.Based on a limited subset of Java, VeriJ also contains specific elements for the purpose of solving controllability.This approach combines the advantages of an easy specification with Java, including the facilities of an integrated development environment, with the use of existing verification tools, acting on standard transition systems.
Compilation of VeriJ specifications into transition systems, as well as a preprocessing step, are performed with a complete chain of model to model transformations, using state-of-the-art model-driven engineering frameworks like Modisco and ATL.These operations (Java2VeriJ.atland VeriJ2Hfsm.atl)produce, from a program, a set of hierarchically structured finite automata, interpreted using pushdown system semantics, hence enabling the use of recursion.
Contrary to software model-checking, where a large scope of programs is targeted, we focus on model generation, with controller synthesis as primary goal.Our approach is well suited to software engineers or domain experts wishing to use existing tools for controller synthesis.
Future steps of this work include checking the correctness of the two sets of ATL rules as recently done in (Planas et al. 2011;Ehrig and Ermel 2008) and linking this model generation with the controller synthesis: showed promising results in terms of scalability (Zhang et al. 2010).
A further goal is to apply this technique on industrial size complex systems, thus completing the centralized control of an automated highway system initiated in (B érard et al. 2008).We expect this approach to be part of an industrial modeling, verification and control synthesis tool kit to handle complex systems specifications.

Figure 1 :
Figure 1: From source code to formal model.

Figure 4 Figure 4 :
Figure 4 shows the class diagram of the Nim game from VeriJ source code.It contains four classes: TestNim, Nim, Board and Constants.Classes Nim and Board constitute the core part of the model.This model was produced from the VeriJ code using a standard UML tool.nim

5 6 h
e l p e r c o n t e x t MMJava !M e t h o d I n v o c a t i o n d e f : 7 i s V e r i J M e t h o d ( ) : Boolean = s e l f .method .proxy ; 8 9 r u l e VeriJRandom{ 10 from s : MMJava !M e t h o d I n v o c a t i o n ( 11 s .i s V e r i J M e t h o d ( ) and s .method .name .

Figure 8 :
Figure 8: HFSMs of Nim game obtained by application of VeriJ2HFSM.atl.

Figure 8
Figure 8(a) present the main FSM in the class TestNim. Figure 8(c) and Figure 8(b) are from the class Nim presenting constructor and the function Transition respectively.Figure 8(d) gives the constructor of class Board.We now describe the hierarchical structure of these FSMs.In Figure 8(a), the invocations of constructor Nim() from S.ini.1 to S.ini.2, nim.gameover() from S.ini.3 to S.ini.4 and nim.Transition() from S.ini.4.whilbody to S.ini.4.whilbody.1 compose the first level of the hierarchy.For instance, the method invocation of Transition reference the FSM in Figure 8(b), while the constructor invocation transition expression references the FSM in Figure 8(c).In addition, the invocation of Board() referencing to the FSM in Figure 8(d) makes the second level of the hierarchy.
(i) Non-static variable declaration of the form δ : type x = initializer; This statement adds to v a valuation for x, assigning the result d of the evaluation of initializer, which changes the stack state.The successor state is q = (h, g, γ ) with γ = αβ and α = (s , v ∪ [x → d]).

1.Figure 10 :
Figure 10: A part of the PDS of Nim game.
NBRow andMaxNBtaken which are transformed into literal values during the procedure from VeriJ to HFSM.It does not have any other static variables, hence the global variable is reduced to ∅ in this case.Whenever an instantiation happens with the new operator, an object is added to the heap state.Its fields hold the default values of their respective types.