EU Stars

GEPRO 3.1.6 Background

EI Stars



Gepro (Système de Gestion de la Production) has its origins in a program created in the early Eighties by two translators, one of whom became first, the Director-General for Translation and Publishing Services, and latterly, the Director-General for Personnel. This shows what can happen to Gepro developers if they are not careful. The original system was written in BASIC on the Wang OIS system then in use for word processing in the European Parliament. It attempted, with some success, to track translation work in the language divisions[1]. It became apparent that the OIS system had its limitations as a programming environment, and the translation directorate was upgraded to the Wang VS system. With a "real" programming environment available, first version of Gepro as we know it today was in place by early 1985. It was written mainly in COBOL (ugh), because the clever user-interface design tools were only supported in that language; some parts, notably date-handling routines, were implemented in a mix of Assembler and PL/1. The underlying "database" was an indexed-sequential file. Gepro I also implemented the first mail system in the European Parliament; it was based on the fact that every Gepro user had access to a printer directly connected to the mainframe computer and it was easy to write a program to allow users to print on any printer. It is still not possible to get Groupwise to print mail automatically. (Probably a good thing, actually.)

Gepro I ran successfully until the end of the decade, when the European Parliament introduced the "Système Bureatique", and created the Direction de l'Informatique et Télécommunications(DIT), with the intention of harmonising the various disparate IT systems then in use. It was decreed that, thenceforward, all databases should use Oracle Corporations RDBMS [2] product, running on UNIX platforms, while the users "workstations" should be IBM-compatible personal computers running DOS. After several false starts, due mainly to outside contractors totally misunderstanding the way Gepro and the European Parliament worked, Gepro II went live in 1991. It was written by Alan Carlisle and Harry Tan, an independent contractor who was to prove invaluable, nay essential to the Gepro project over the next few years.

Because of the limitations of the users' machines, it was decided to use them as "dumb terminals" (i.e. running a terminal emulator), and place all the front-end logic on the server. This was accomplished using JYACCs Application Manager with the database extensions, known as JAM/DBi. This product used the standard UNIX curses library to provide the end-user with a screen-oriented interface to the system. The hardcopy mail system (actually intended to print translation fiches in the divisions) was implemented using the printing capability of the terminal emulator (AmbraTerm).

It was always apparent that Gepro II would not run beyond 31/12/1999, and work started on the replacement in the summer of 1998. There was an inevitable scramble towards the end of 1999, due to misunderstandings as to requirements. The new machine was delivered without a C compiler because of budget restrictions; what use the bean counters thought a computer would be without a compiler I don't know. Then they delivered a C++ compiler which wouldn't build the Perl and Apache systems needed for the new version of Gepro. Despite all this, Gepro III was successfully rolled out in time for the beginning of the last year of the twentieth century.

The users’ view of the system.

For most users, the interface is one or more Java applets running in a Web browser (Internet Explorer 5 in the current standard). Having logged in, the user is shown a tabbed dialog tailored to his or her profile; for example, users in DG2 (committees and delegations, a major "customer" of the translation directorate) may introduce requests for translation and track their progress, but may not actually schedule a job.

Gepro screenshot

A typical screenshot from the Client applet

The flow of work, in general, is as follows: a client user [3] introduces a request for translation. This is known colloquially as a "pré-FdR". Certain data must be supplied, such as the document type, the overall deadline, meeting date, the target languages and so on. [In the next minor release, the original text(s) may also be attached electronically to the request.] Until the request is released to planning the job has version number -1; on release, the version changes to 0. This allows clients to edit their requests before submitting them. Once this request is released, the Planning service schedules the job, making any necessary modifications. The principal modification is the addition of internal deadlines for translation and printing, which cannot be specified by the client. When the job is scheduled, a copy of the job details is printed automatically in the target divisions. This is usually the first intimation the divisions have of an incoming task. [4] The job version moves to 1.

A recent modification allows the Tabling Office [5] to intercept work from certain requesters in DG2. On the basis of a filter controlled by the TO, the pré-FdR release code may set the version number of the job to -2, so that the Tabling Office "add-on", a hastily-written CGI script, can see it. When the TO has completed its work, the job's version number is set to 0, and Planning schedules the job as above.

The task must then be booked in by the division. [6] After the translation is completed, the task is booked out, and the translated text is made available to the next stage of production, This may be the printshop, the original requestor, or, increasingly, the European Parliament Document Exchange System (EPADES-2).

Certain translation tasks are sent to the External Translation Service (formerly the Freelance Unit (FLU)). There is a one-way link to the ETS's own job management system, "Fluid"; as the divisions retain the responsibility for the overall quality of translations, they must still book out work if it has been sent to external translation.

Because of the architecture of Gepro 3, users are not required to use the supplied front-end; they are free to develop their own interfaces if they have the resources, and this is what the printshop has done. Using the published API, their own in-house production system interfaces with Gepro.

Jobs may be modified after scheduling. This usually happens because an original text has been amended after the request for its translation has been accepted, or because a target language is added or removed. If Planning (the only authority who can introduce a modification) so decides, affected divisions may be required to book in the modifications. 

The architecture

It was decided to adopt the currently fashionable three-layer architecture for Gepro III. The constraints were

It was decided to implement a home-grown API protocol over BSD Sockets. The protocol is text-based and consists of a series of keyword-value pairs, separated by linefeed characters, optionally preceded by carriage return characters. This has the following advantages:

It is recognised that DCE RPC is almost universally available; however, its use was not considered for two reasons. The first was that an earlier attempt to use RPC in Gepro 2 met with disaster because the Microsoft program MIDL could not compile our interface definition file; the second was that a stream of keyword-value pairs is more easily parsed by the language chosen for the application server. Also, the simple socket approach probably generates less network traffic.

The application server is written in Perl 5. This language was chosen because of its rich set of high-level functions for handling text and data. In particular, support for associative arrays ("dictionaries") and regular expressions is built into the language. We estimate that the use of Perl cut development time by a factor of four or more. The chief objection to Perl, namely that a script must be compiled every time it is run, is not an issue; the compilation is done once only, when the listener starts; thereafter new copies of the server are forked off as required.

The database

The database engine is Oracle Corporation's RDBMS product version 8.1.6 (a.k.a. 8i release 2). There are three principal tables: G_JOB, which holds global data for the job, such as overall deadline, document type, meeting date, requester code etc. G_LANG_TASK, which holds data specific to the divisions (pages to be translated, translation deadline etc.) and G_PSDR_TASK, which holds information about the printing and distribution services (number of copies required, release date etc.) Another table, G_CODES, contains standard data, such as requestor codes, ISO language codes, document type codes etc. Code type zero serves as a human-readable contents list for the rest of the table.

There are corresponding "history" tables which contain completed and cancelled jobs. At present it is still possible to see jobs from mid-1991; given that the cost of mass storage devices continues to fall, there is no reason why we should not keep our entire history online for the foreseeable future.

Future developments

Gepro as it stands has just about reached the limit of its usefulness. The world it was designed to model, that of hardcopy printing and physical distribution, is fifteen years in the past; we now have the World Wide Web, on-line communications, and all the other curses blessings of the modern world. That being the case, a project has been put in place to develop a replacement for Gepro; it is hoped that the the new system will be in place by the end of 2006. However, as Epades-2, on which the EP document production systems depend so heavily, is also being redeveloped in the same time frame, it would be very unwise to place more than a trivial wager on that date.

[1]  It should be noted that the directorate for translation (DG7B) is divided into twenty language (or "linguistic") divisions, one for each working language, plus the Section de l'Informatique, Linguistique et Documention (SILD), which supplies IT and terminological support to the rest of the directorate. References to "the divisions" mean "the language divisions"

[2] Relational Database Management System

[3] In Gepro terminology, a client is one who requests a translation, i.e. a customer of the translation service.

[4] Gepro terminology distinguishes between a "job", which is basically what the clients deal with, and a "task", which is the production of a text in a specific target language.

[5]  The Tabling Office was set up at the Secretary-General's behest in the summer of 2003. There was an immediate requirement to hook it into Gepro, which was met by developing the add-on as CGI (Common Gateway Interface) scripts. This is not ideal, but it works nearly all the time, and was very quick to develop (about six weeks). A new application, the Tabling Office Portal, which will link to Gepro using the Epades-2 "workflow" system, is due to come into operation in July 2005. [In fact, the TOP project was still not complete when the author left the EP at the end of August 2006.]

[6]  A "bolt-on extra", called TRAMSYS (Translation Management System) may be invoked at this point. This allows a division to track work allocated to officials, their presence/absence etc. At present used by six of the eleven divisions; a possible future scenario makes its use mandatory, when the Freelance divisional application is integrated with Gepro. [The TFlow translation unit management system is due to  replace the Gepro TRAMSYS subsystem in late 2006.]