Migrating Data-Oriented Applications to a Relational Database Management System

The need to update and reengineer legacy information systems to improve business processes and upgrade technology is becoming increasingly desirable for many organizations in the current economic climate. This involves migrating systems from centralized to distributed client-server architectures. These information systems are often written in COBOL, their data is stored on files and they use text-based user interfaces. In this paper, a method to migrate such an information system is presented. The target environment in our case is a relational database management system. In our method, the migration is carried out in an incremental way and by small steps. The migration process is driven by the data files. These files are previously arranged and then migrated in a defined order. An algorithm that defines the order of migration of the data files is described and an example is presented to clearly illustrate our met hod.


Introduction
Reengineering is being a significant activity in many software organizations.It has come about mainly by the need to: Take maximum advantage of new technology Improve business processes and meet new requirements Improve maintenability and reduce maintenance cost That is, the second strategy is much safer and more feasible [5] than the first one especially when the existing system is too big and complex.That is why we adopt it in our method.Therefore, the revolutionary strategy may be of use in the case of small applications.Thus, we intended to build a database gateway between the legacy system and the target system so that they could communicate and exchange data (Figure 1), i.e., legacy modules accede to the target database and the target modules accede to the legacy COBOL data files too.But in this manner, the development of the gateway is labor-intensive and time consuming.So, we have chosen to reduce the gateway functionality by defining one-way database gateway (Figure 2) in the following direction: legacy system !target system.This means that only legacy modules can get data from the target database.The target modules are not authorized to accede to legacy COBOL data files.

The migration method
Based on the evolutionary strategy, our method achieves an incremental migration.Moreover, the migration process is driven by the data files of the subject system.This means that the migration of the modules depends on the migration of the data files.Figure 2: The one-way database gateway

Principle of the method
The migration is achieved according to this principle: when a given data file is migrated to the target database, it defines a set of modules that will be migrated (rewritten) before transferring the next candidate data file.These modules are typically those that only accede to the target database since all of the legacy data files they used to accede have already been migrated.
The example below illustrates this principle (Figure 3).It presents three intermediate states of a migration process where a data file (T1) and the module (M1) that uses it have already been migrated to the target system (Step (a)).Assume now that the candidate file to migrate is F2.It will be replaced by the table T2 as shown in Step (b).Let us see now which module satisfies the condition of migration.There is one module (M2) that accede only to the target tables, so it is the next component to migrate (Step (c)).Then we go on with the remaining data files as in this example.Step (a) Step (b) Step (c) N.B: The list of legacy data files is previously ordered and when a legacy data file or module is migrated it is removed respectively from the set of legacy data files or modules.Now, if we consider the set of the data files to migrate, a question must be asked: in which order should they migrate?But before this, let us see the importance of arranging these files.

Why arranging the data files
As we saw, there is a need to arrange the legacy data files and migrate them in a defined order.Indeed, we have noticed during our experiments that according to the order in which the data files are migrated, the migration is simplified or complicated.Hence, we have developped an algorithm that arranges the legacy data files before migrating them.In fact, the order of migration of the data files is defined so that: the legacy code is altered as little as possible to minimize risk the database gateway that ensures data transfer between the legacy system and the target system is simplified.Let us see an example to clearly illustrate this problem.Here are two legacy data files that will be migrated to a relational database: -Customer (Numcus, Namecus, Fstnamecus, Towncus) -Order (Numord, Numcus, Namecus, dateord) and a legacy module M that uses the file Order and especially its property Namecus.
The normalized target relational tables will be: -Customer (Numcus, Namecus, Fstnamecus, Towncus) -Order (Numord, Numcus, dateord) Notice that the property Namecus is removed from the target table Order.Now, assume we migrate the file Order before migrating the file Customer.Two situations can appear: the case where the legacy module M has not been migrated yet (Figure 4) the case where the legacy module M has been rewritten in the target system (Figure 5) So, in the first case, the legacy module M cannot get the information Namecus from the target database.Hence, we are compelled to alter the code of this module so that it can get this information from the legacy file Customer.But, as we have said above, we try to avoid altering legacy code as far as possible in order to minimize risk.Indeed, in the case of several files, the alteration of legacy code can lead to serious problems and make the migration difficult.In the second case, we have to build a database gateway in the direction: target IS !legacy IS to get the information Namecus from the legacy file Customer.This operation complicates the development of the database gateway and then the migration.
That is why the order of migration of the files is very important and directly affects the complexity of the migration process [16].To determine this order, we use an algorithm that has as inputs both the legacy data files and the target relational tables.

The migration steps
Step 1: Preliminary step.
In this step, we list all of the modules of the legacy system and the data files they use.Then, we perform a mapping of the legacy data files into relational tables.This task is important for the next steps.
Step 2: Arrange the data files.
The order of migration of the files is very important and affects directly the complexity of the migration process.To determine this order, we use an algorithm that has as inputs both the legacy data files and the target relational tables.It performs three operations: Construction of the matrices of the legacy files and the corresponding target tables.Detection of the interdependence of the legacy files.Construction and then interpretation of the matrix of precedence.Section 4 will present in detail this algorithm.
Step 3: Plan the migration.
Once the data files are arranged, we use the algorithm DDM (c.f.paragraph 3.1) to get a global view of the migration process as in Figure 3 and to estimate the complexity of the database gateway to develop.
Step 4: Prepare the migration.
Migrating legacy data files to the target database involves downloading, converting and uploading large amount of data.A program that can carry out this operation has to be be developped.Moreover, the database gateway should be designed to ensure the mediation between the legacy modules and the target database.Concerning the modules, before migrating a legacy module, we have to design the target module and its interface in the target graphical environment.
Step 5: Carry out the migration.
At this step, we implement what was specified in the previous step, i.e.: implement the target relational schema and migrate the data build the database gateway rewrite the target modules and interfaces Step 6: Run the changes.

Migrating Data-Oriented Applications to a Relational Database Management System
At this step, we run the changes i.e. the migrated components and we go back to the step 4 to prepare the migration of the next component.
Step 7: Switch from the legacy IS to the target IS.
When all of the legacy modules and files are migrated, we switch from the legacy IS to the target IS.

Algorithm of precedence
This algorithm arranges the legacy data files and defines an order of precedence for them.

Construction of the matrices F (for files) and T (for tables)
They represent the legacy files and the target tables with their corresponding properties.When a property belongs to a given file, we put 1 in the corresponding case, otherwise we put 0 (Table 1

Detection of the interdependence of the legacy files
Typically, this operation locates the files that lose some of their properties by normalization.In order to do this, we have built up the following formula: R= 2*F-T.The resulting matrix R can be interpreted according to the following table.It shows the possible cases that can appear in the target tables: Creation of a property 1 0 2 Suppression of a property 1 1 1 Unchanged property Table 2: Interpretation of the resulting matrix Given two legacy files F1, F2 with a common property P and their corresponding target tables T1,T2, we say that F1 is dependent on F2, i.e.F1 should migrate after F2, when P is kept in T2 and removed from T1.
Based on the interpretation given above, we can say that in Table 3, Fm is dependent on F2 because P1 will be kept in T2 and removed from Tm.A file Fi of a given line precedes a file Fj of a given column when the value of their corresponding case is 1.When a given line is null, then the corresponding file does not precede any files.When a given column is null, then the corresponding file is not preceded by any files.When the line and the column of a given file are null, then it does not precede any files and it is not preceded by any files either.

An example
The following example will illustrate the described algorithm.

Order of migration of the files
Once the resulting matrix is interpreted, we can represent the order of migration of the files by an oriented graph as in Figure 6.The files that are not linked in the graph are independent.So, they can be migrated at any time.

Conclusion
This paper dealt with a topical problem in industry.It involves migrating a legacy IS.Thus, we have proposed an incremental method of migration of a centralized IS.In our method, we tried to capitalize the legacy IS components (e.g.data and design) and reuse them in the target IS so that we save money and time.The process of migration is driven by data files that are arranged beforehand.The remaining components of the IS are migrated according to the defined ordrer of the files.Finally, it should be stressed that our method eases the migration by defining a one-way gateway reducing its size.

Figure 1 :
Figure 1: The database gateway

Table 3 :
An example of a resulting matrix

Construction and interpretation of the precedence matrix
The precedence matrix is presented in Table4.It is obtained from the resulting matrix and gives the interdependence of the legacy files in a more readable way.

Table 4 :
The precedence matrixIt can be interpreted as follows:

Table 9 :
Precedence matrixAdvances in Databases and Information Systems, 1996