IRP programming : an efficient way to reduce inter-module coupling

From WikiOsp
Jump to: navigation, search

DOI: 10.13140/RG.2.1.3833.0406

Contents

Abstract[edit]

(this text is a draft)

Introduction[edit]

Let suppose we already have a computer program able to calculate, some entities S1, .., Sn and that we want to add to it a new functionality : the computation of a new entity A by applying a known function a_build to S1, .., Sn.

The usual or direct way of programming the computation of A is the following :

1. get the values of the arguments S1, ..., Sn.
2. apply a_build to them.

As you know, step 1 is rather difficult. It needs a deep knowledge of the code and is rarely straightforward. The Si calculation are usually embedded in part of code doing something else. Moreover each of them needs the values of other entities T1, ... Tm to be computed etc... .

It is clear that everything would be much simpler if the x_build functions had no arguments.

It is exactly what the IRP way of programming is doing :

It consists in encapsulating the gathering of the calculation of the arguments Si and the application of a_build on them in an intermediate A-specific and argumentless function a_provide which looks like :

 function a_provide {

   S1 = s1_provide 
   ....
   Sn = sn_provide

   a_build (S1, ... , Sn) 

 }

As you wee, each Si is also calculated by calling its own specific and argumentless function si_provide.

This simple encapsulation trick, avoid the programmer to worry about the calculations of the intermediate values Si, Ti ... Wi. Those calculations are recursively delegated to their specific provide functions until some data is reached.

We can summarized very schematically the program flow by the tree below :

                                       a_provide 
                   /                                         \
            s1_provide                  ....               sn_provide 
          /           \                                  /           \    
    t1_provide ... tm_provide                      w1_provide ... wk_provide
        /               \                              /               \
    data_1     ...    data_...                      data_...           data_s


As already pointed out, in programs written according to the direct way strong though unwanted coupling may occur between modules. Some authors even claim that this is unavoidable (see OOSC2 by Bertrand Meyer).

We show here that the IRP way avoids such spurious couplings. This method is called Implicit Reference to Parameters, because it relies on hiding the arguments (parameters) of the functions devoted to the building of the main entities of a program. As such, IRP does not depend on the programming language. Though it requires some additional design effort, it introduces constraints which enforce a more efficient design.

IRP can be applied to reshape step by step an already written code, keeping it running during the whole process. The resulting code is faster, easier to read and extremely easy to maintain and parallelize.[1]

As a consequence, within the frame of IRP, code development can be done by several unconnected teams.

The first part of this paper will describe a very simplified version of the IRP method, based on a simple example. The second part will introduce an improved version. The third part will discuss the consequences of using IRP for functional and object-oriented languages. The fourth part will introduce a general tool for writing programs within the frame of IRP.

Part I : the basic principles of the IRP method[edit]

Some definitions[edit]

Entity of interest (EI)[edit]

In this paper an entity is any element of a program which represents a computable value. Among all entities, some play a major role because they are the skeleton of the program architecture. These entities are called entities of interest or EIs, in order to distinguish them from the vast amount of local and of low interest entities most programs deal with.

Within the frame of IRP, the design of a program consists precisely in defining its EIs. Therefore, they are the abstract representations of the main concepts of the considered (e.g. scientific) domain, namely, all those that are associated to computable values of interest. Only EIs are concerned with the IRP method.

For instance, in a classical molecular dynamics code, the EIs are the entities used to represent a molecule or the components of the force field while in a quantum chemistry code, the EIs are the entities used to represent the wavefunction, the operators, the integrals, etc....

The promotion of an entity to the status of EI is arbitrary. So, it may happen that an entity not taken into account at first by IRP, can later be included and vice-versa. We shall see that, as a consequence of the low inter-module coupling, this has very little impact on the overall development of the code, namely, a change in the status of an entity will never imply a major reorganization and rewriting of the code.[remark 1]

Remark
EIs represent values of entities specific to some domain. Therefore they cannot be functions (which are used to build EIs values and not to represent them, this is done with data structures).

Modules[edit]

A module is an element of the code where an EI is defined and its functionalities implemented. In Fortran90 and in OCaml it is a module, in O-O languages it is a class[remark 2]

Building function[edit]

Let us call a_build, the building function of a given EI named A. When applied to the values of its arguments, a_build produces a_value, the value of A.

Father and sons[edit]

When b_value, the value of an EI named Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): B , is an argument of the building function a_build, we will say that A is the father of Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): B and Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): B a son of A.

Needs relation[edit]

A father and any of its sons are connected by the relation "necessary for the building of" noted :

Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): (A~needs~B)

or

Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): (A~\rightarrow~B)


This relation is transitive:

Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): if~(A~\rightarrow~B)~and~(B~\rightarrow~C)~then~(A~\rightarrow~C)

.

but asymmetric

Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): if~(A~\rightarrow~B)~then~\neg~(B~\rightarrow~A)


Because of this relation the way a program written in IRP runs looks very similar to the way a makefile runs.

Rules 1 and 2[edit]

In IRP, the building functions have specific properties :

Rule 1 (unicity of a building function)
 
for each A there is one and only one building function (a_build) and to each building function a_build corresponds a unique A. In other words, there is a bijection between A and a_build.


Rule 2 (unicity of sons list)
 
the arguments of a building function are the values of a well defined list of EI : its sons. By well defined we mean that the function is not partial : all arguments must have values when the function is applied.

Provider function[edit]

Let us consider the calculation of the value a_value of whichever EI, for example, A.
Rule 1 implies that there is only one way to calculate a_value : a call to its unique building function a_build.
While Rule 2 implies that the values of the arguments of a_build (b1_value, b2_value, ..., bn_value) can be calculated by calling the well defined list of n functions b1_build, b2_build, ... ,bn_build, because Rule 1 applies also to B_1, B_2, ... ,B_n (bi_value = bi_build ( ... ) ).

Therefore :

the execution of a_build and the provision of the values of its arguments can be encapsulated in an argumentless function ( a_provide ), applying (pseudo-)recursively the same mechanism to the arguments entities.

The argumentless function a_provide will look like :

    a_provide 
      b1_value = b1_provide
      b2_value = b2_provide
      ...
      bn_value = bn_provide
      result = a_build (b1_value, b2_value, ... , bn_value)
    end

and the bi_provide used to calculate the sons entities B_i will look like :

     bi_provide 
      ci1_value = ci1_provide
      ci2_value = ci2_provide
      ...
      cim_value = cim_provide
      result = bi_build (ci1_value, ci2_value, ... , cim_value)
    end
Because a povider function is argumentless, this mechanism garantees that a_value does not depend on the context in which it is called in the program.

There is no more to be worried about how to provide the values of the arguments of a_build : they will be recursively computed until their specific data are reached (see examples below). Therefore, the main difficulty of the standard way[2] of programming is avoided.

It must be emphasized that for using a function bi_provide nothing has to be known about the way a Bi have been computed. The only requirement is that someone has previously implemented the code of bi_provide somewhere in the program.

Rule 1 and Rule 2 ensure that the code will work in any case or fail at link if one of the provide function has not been implemented (Note that in Caml the failure will even occur at compilation).

Provider Summary in Caml[edit]

 let provide tag =
   let tag_son_list = formula tag in
   let value_son_list = 
     List.map Son.provide tag_son_list
   in
   build value_son_list

An example for Part I[edit]

There are many ways to implement the examples below depending on the language used (see Examples wikifrm:IRP_Programming_Example_OCaml_Triangle) These examples are written in a pseudo-language which does not exit, in the simplest way as possible.

A program to print the surface of a triangle[edit]

We want to write a program which, given the 9 floating-point values defining the coordinates of the 3 vertices of a triangle, will calculate its surface and print it.

Design of the program[edit]

We proceed from top to bottom, designing each needed EI one after the other.

Design of a Triangle[edit]

The EI Triangle is designed as a data structure containing the couple Point (summit) and Segment (basis)

Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): {\rm Triangle} : \begin{cases} {\rm Point} \\ {\rm Segment} \end{cases}


A Triangle :
  
                      Point (summit)
                              
                         A      
                        / \
                       /   \
                      /     \
                     B------>C  
                                      
                       Segment (basis)

So, within the frame of this design, the EI Triangle has two sons, namely, a Point and Segment.

Design of a Segment[edit]

The EI Segment is designed as a data structure containing a couple of Points (head and tail)

Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): {\rm Segment} : \begin{cases} {\rm Point} \\ {\rm Point} \end{cases}


A Segment :
  
          A-------->B
                         
        Point     Point 
       (tail)    (head)

The EI Segment has two sons, namely, a couple of Points.

Design of a Point[edit]

The EI Point is designed as a data structure containing a triplet of floats (x, y and z)

Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): {\rm Point} : \begin{cases} {\rm Float} \\ {\rm Float} \\ {\rm Float} \end{cases}


A Point :
  
         (Float, Float, Float)

So, within the frame of this design, the EI Point has three sons, namely, a triplet of Floats.

A possible implementation[edit]

The provider for a Point[edit]
  • the building function will look like :
  point_build (x, y, z)
   result = data_structure_of_x_of_y_and_z
  end 

It returns a data structure (a triplet in this case) containing the values of x, y and z.

  • the provider function will look like :
  point_provide
    x = float_read
    y = float_read
    z = float_read
    result = point_build (x, y, z)
  end
  • float_read is a function which reads the next float from the input stream. It is at the same time a float_provider and a float_build.
  • Note that a float entity is not an EI, it is a data (a concept that will be defined hereafter).
The provider for a Segment[edit]
  • the building function will look like :
  segment_build (a, b)
   result = data_structure_of_a_and_b
  end 

It returns a data structure containing the values of Points A and B.

  • the provider function will look like :
  segment_provide
    tail = point_provide
    head = point_provide
    result = segment_build (tail, head)
  end
The provider for a Triangle[edit]
  • the building function will look like :
  triangle_build (summit, basis)
   result = data_structure_of_summit_and_basis
  end 

It returns a data structure containing the values of Summit and of Basis.

Building a Triangle[edit]
  • The only way to build a triangle anywhere in the program will be to apply the provider function triangle_provide, which looks like:
  triangle_provide
    summit = point_provide
    basis = segment_provide
    result = triangle_build (summit, basis)
  end
  • Note that the surface of a triangle, for instance, is not considered as an EI in this design: it is treated as a simple property that can be computed as soon as the triangle is evaluated (as soon as the coordinates of the points are defined). Indeed, it can be calculated by the function triangle_surface which looks like :
  triangle_surface (triangle)
    h = triangle_height_length (triangle)
    bc = triangle_basis_length (triangle)
    result = (h * bc) / 2.
  end
The main module[edit]

the main module is reduced to the calculation of the triangle t by a simple call to triangle_provide followed by the calculation of its surface s.

Then s is printed .

  main 
    t = triangle_provide 
    s = triangle_surface (t) 
  
    print "surface is ",t
  end

The execution of the program[edit]

main
 triangle_provide
   point_provide
     float_read 0.0
     float_read 0.0
     float_read 0.0
   segment_provide
     point_provide
       float_read 0.5
       float_read 1.0
       float_read 0.0
     point_provide
       float_read 1.0
       float_read 0.0
       float_read 0.0
  triangle_surface
  surface is 0.25

Examples of implementations[edit]

Fortran[edit]

a Fortran programming environment which helps the development of large Fortran codes by applying the IRP method.

Java[edit]

Consequences : Low coupling beween modules[edit]

Part II : The consequences of designing a program as a tree[edit]

Some new definitions[edit]

The production-tree[edit]

Being asymmetric and transitive the relation needs forms an acyclic tree. Its vertices are the EIs and its edges the relation itself.


                                         my_triangle:Triangle
                                                    |
                                     ----------------------------------
                                    /                                  \
                             summit:Point                          basis:Segment
                                   |                                    | 
                             --------------                   ---------------------                           
                            /      |       \                 /                     \            
                          x:Float y:Float z:Float       tail:Point              Head:Point 
                            f1      f2      f3              |                       |
                                                       -----------             -----------
                                                      /     |     \           /     |     \
                                                     /      |      \         /      |      \ 
                                                   x:Float y:Float z:Float x:Float y:Float z:Float
                                                     f4      f5      f6      f7      f8      f9
    
  
                          The tree of a Triangle with its 9 calculations conditions
                          A vertical bar | or a / or a \ means needs

The target[edit]

For example the tree of Root A (the tree of A for short) is drawn below :

                              A                     
                            /   \
                          B1     B2                 
                        / | \    |                  
                     C11 C12 C13 C21                  

A has no father it is also called the target of the program.

The data[edit]

Each path of the tree climbs down until a leaf is reached.
A leaf is a data to be provided from the outside of the program. It is not built it is read : its building function is a read.

For example, suppose C21 were a data then we would have :

    c21_provide 
       result = c21_read
    end

The nodes[edit]

Any EI which is not a leaf is a node.

A path[edit]

We shall use a specific and restrictive definition for a path. In IRP the path of a node N is the list of node descriptors (names for short) encountered when going from the target of the program to the node N (included).

The path of Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): C_{12} is (A, B1, C12).

The list (B1, C12) is a sub-path of C12.

Here A, B1, C12 are node descriptors.

A sub-tree[edit]

The sub-tree of a node N is the tree having the node N as target.
Each node has a unique path and also a unique sub-tree.

A value[edit]

The sub-tree ends with the data, which, when read, recursively define all the values of the nodes in the same sub-tree.

The values of the leaves of a node are called the calculation-conditions of the node.

Therefore:

Property 1:
In IRP the value of a node is a pure function of its calculation-conditions

Consequence I : Persistence[edit]

From the property 1 above it is easy to make persistent any node N of the program. It is only necessary to built a data-base key from the calculation-conditions of N. These are collected recursively by joining up the calculation-conditions of the sons of N. A dry-run can do that prior to any other calculation. The key provider may look like :

    a_key_provide  
      b1_key = b1_key_provide
      b2_key = b2_key_provide
      result = a_key_build (b1_key, b2_key)
    end

Once the key is built (for example as a md5 key) the value of the node N can be stored in a data-base using that key. Therefore the provider can be modified in order not to calculate values already calculated elsewhere in the program (or even earlier) :

    a_provide 
      a_key = a_key_provide
      if is_stored (a_key) 
      then result = retrieve (a_key)
      else
        b1_value = b1_provide
        b2_value = b2_provide
        a_value = a_build (b1_value, b2_value)
        store (a_key, a_value)
        result = a_value
    end

Consequence II : Parallelism[edit]

Remarks on two possible bad designs[edit]

a triangle designed as 3 points[edit]

Designing a triangle as 3 points, would be too much unprecise : the summit would have been impossible to define a priori then and any of the 3 points would have played the same role which would have been very confusing. By naming each point we have avoided any source of confusion.

a triangle designed as a point and a segment[edit]

Designing a triangle from a point (the summit) and a segment, is not enough precise : the two points of the segment are equivalent and can be confused.

a triangle designed as 3 segments[edit]

The same critic as upper can be done here : each segment has to be defined as "basis" "left edge" and "right edge". Furthermore, designing a triangle as 3 segments (AB, BC, CA), would introduce a spurious coupling between the segments : the head a segment is necessarily the tail of an other and its tail the head of the third one.

This can be avoided by doing the following :


                    Triangle  ABC
  
                       AC
                       |  
                      AB    
                    /  |
                   A   BC
                     /   \
                    B     C


1 read "B" and "C" to build BC
2 read "A" and extract "B" out off BC (to enforce the constraint that "B" is shared by AB and BC), build AB
3 extract "A" out off AB, extract BC out off AB then extract "C" out off BC, build AC.
a triangle designed as 3 segments[edit]

Designing a triangle as 3 segments would be extremely difficult because of the conditions on the triangle length which could not be defined in the design (compilation) but only at the execution.

Simple examples[edit]

An abstract example[edit]

                                  t:T
                                  /\
                                 /  \
                                /    \
                              u:U    v:V
                              / \    / \
                            d1  d2 u:U w:W
                                   / \  |
                                  d1 d2 d3
  d1, d2, d3 are integer.
 
  t_build (u, v)
    result = u + v
  end
  u_build (d1, d2)
    result = d1 + d2
  end
  v_build (u, w)
    result = u + w
  end
  w_build (d3)
    result = d3
  end
  u_provide 
   d1 = integer_read
   d2 = integer_read
   result = u_build (d1, d2)
  end 
  w_provide 
   d3 = integer_read
   result = w_build (d3)
  end 
  v_provide 
   u = u_provide 
   w = w_provide
   result = v_build (u, w)
  end 
  t_provide 
   u = u_provide 
   v = v_provide 
   result = t_build (u, v)
  end 
  main
    print t
  end

The concept of node-Path and automatisation of iterative processes in IRP method[edit]

The concepts of Symbols and Formulas : towards automatisation of entity design[edit]

Symbols[edit]

A symbol names a class of entities having the same nature

The production-tree as a symbolic descriptor[edit]

The world of values as a dual of the symbolic world[edit]

Formulas[edit]

A formula is used to translate the symbol of an EI into the symbols of its sons. It is a data_structure F made off son symbols.

By applying a mapping to a formula F we can obtain the same data structure P filled with the son paths and thus calculate the sons values with the provide function of each son.

Example : a molecule tag formula is a tree of fragment tags, a fragment tag a list of block tags.

In Caml a tag is not a data structure it is a "pure" descriptive type it "contains" nothing, while a formula is a data structure which contains the tags of the son entities.

Conclusion[edit]

The basic idea under IRP is very simple get a value wihout any reference to the parameters needed to calculate this value. The consequences on the properties of the resulting code are not and it is possible that not all properties have yet been explored.

Anybody having used IRP is astonished by how easy is programming with it. It woud be even easier if people in charge of writting compilers could implement IRP in their language (i.e. the provide mecanism as part of the language).

An other application could be to write an OS on these principles : the package dependence would then be automatic and the growth of the system would have little effect on its maintainability.

References[edit]

  1. mettre la référence de CHAMP
  2. Cite error: Invalid <ref> tag; no text was provided for refs named standard_way

Public notes and remarks[edit]

Passages to be argumented[edit]


Passages to be clarified[edit]


Passages to be commented[edit]


Passages to be discussed[edit]


Passages to be emphasized[edit]


Passages to be examplified[edit]


Remarks[edit]

  1. We shall see that, as a consequence of the low inter-module coupling, this has very little impact on the overall development of the code, namely, a change in the status of an entity will never imply a major reorganization and rewriting of the code.
  2. we shall see that the blending of type and data-structure in a class is confusing

Specific Questions[edit]


General questions on the whole text[edit]


  • localization : ~/sources/irp_top/paper/Reduction_of_inter-module_coupling_using_IRP_programming_scheme.mw