Modernising an application involves an essential stage of analysing and understanding the existing code.
How can we extract the business logic, and how can we represent it so that it can be used by code generation tools?
We take a look at reverse engineering tools and approaches.
Translating a legacy code into a modern language in a modern architecture requires a business logic extraction stage. The aim of modernisation is to successfully and consistently re-implement business functions in a new technical environment. If, during this stage, the code can be optimised and simplified to make it less technical, so much the better.
From code to model
The first problem with logic extraction is how the logic is displayed. With the advent of object-oriented languages, the market has adopted the UML standard, a modelling language promoted by the OMG. An ecosystem has formed around UML with logic extracted as models, also known as reverse modelling, model discovery and even model-driven reverse engineering.
The modernisation process generally involves three stages:
- Reverse engineering: this stage looks to understand the legacy application, to discover its logic and its architecture and to represent the system at a higher abstraction layer, using a model-based system (class diagrams, finite-state machines, workflows…)
- Forward engineering or Refactoring: the models are analysed and transformed to become a modernised system specification.
- Generation: the transformed models are used to generate code for the target environment, based on architecture patterns defined for the target.
Various publishers offer reverse engineering tools in UML, such as Altova, Blu Age, dScribe, green, Imagix, Modelio, phpModeler…
MoDisco (Model Discovery) is an Eclipse sub-project which is part of the EMF (Eclipse Modeling Framework) project. It is a Model-Driven Reverse Engineering framework, controlled by Hugo Bruneliere at Inria. It is used to extract both models and business processes (BPMN) from legacy code. Various publishers and companies are working on and with this extraction engine. In particular, it is at the core of the ARTIST open source project.
The ARTIST project is an ambitious European collaboration involving both manufacturers (ATC, Atos, Engineering, SparxSystems, Spikes) and research centres (Fraunhofer, ICCS, Inria, Tecnalia, TU Wien). Its aim is to make it easier to modernise non-cloud applications into cloud and service-oriented architectures. Launched in October 2012, ARTIST has already released a methodological framework and generic tools.
Image – The ARTIST methodology [source]
From code to diagrams
Beyond the UML standard, there are tools which also generate diagrams.
Aivosto offers Visustin, an application which uses source code to generate flow charts and UML activity diagrams. Visustin supports the majority of input languages.
X-Analysis deals with RPG and Cobol on the IBM i platform (AS/400). It extracts business rules and the relational data model. It generates Java as an output.
Round Trip Engineering Objects (RTEO) is an object-oriented code representation project (Java, C#, PHP5) in the XML language: XOL (XML Object Language). This representation should support refactoring, forward engineering and architecture documentation, as well as PIM (platform-independent model) extraction.
From code to architecture
One of the main challenges of reverse engineering is architecture discovery. The modernisation process is generally represented by a horseshoe model:
During the reverse engineering stage, the process follows a vertical path through the source code to the functions and then the architecture. The transformation stage follows a horizontal path from the source platform to the target platform. Finally the forward engineering stage goes back from the architecture model to the functional and technical target source code.
This model is the basis of the OMG’s ADM (Architecture-Driven Modernisation) methodology, which demonstrates the impact of transformations depending on the level. At the technical architecture level, the transformation is quicker but the impact is weaker, whereas at the highest business architecture level, the transformation is slower but the impact is stronger.
Image – Architecture-driven modernisation horseshoe model [Source]
From code to algorithm
With classic reverse engineering approaches, applications analyse the application’s source code, database queries and batch commands. These analyses can be used to discover the application’s logic and architecture more or less automatically, and to represent it using UML models.
The advantage of UML is that it is independent of the applications used. It is possible to extract models using one tool and generate code with another publisher’s tool. But the main disadvantage is that it is limited to the object-oriented paradigm.
CodeCase Software offers a modernisation approach based on extracting logic from an algorithm. With greater granularity but a more universal concept, the algorithm is a recipe used to help solve a problem, independently of the language and technical platform.
It analyses everything in the source code: the applications’ source code and database queries, as well as configuration files, dependencies on frameworks, APIs, etc. It discovers and understands the overall logic of the application and extract algorithms in the form of a universal meta-language, the Unified Meta-Model (UMM). This multi-paradigm pivot format can be used to modernise both procedural languages (C, COBOL, Fortran…) as well as object-oriented languages (C++, C#, Delphi, Java, VB, VBA…).
This makes it possible to modernise to and from all these languages. It is also possible to modernise iteratively. By using this generated source code, we can refactor the code: optimise the algorithm, factor the code, replace entire sections of the code with frameworks and APIs… We can then input this code into the tool and recreate a new, simpler, more efficient code which is less technical.
In the Reverse Engineering report: In Roadmap, the authors evaluate reverse engineering tools, specifically commenting on their lack of efficiency due to two factors: their slow uptake and their inability to analyse and understand programs. In conclusion, the most promising approach in the reverse engineering field is the Continuous Program Understanding approach, which involves applying reverse engineering continuously throughout the life cycle of the application, as part of continuous improvement.
Article written by Pierre Tran.