A new architecture for the DDB: 6 questions and their answers

Since the end of May 2018, the online results of the special project “DDB 2017”, under which the basic architecture of the DDB was completely modernised. The development project was funded by the Federal Ministry of the Interior with a total of one million euros, was coordinated and managed by the DDB project coordination and implemented together with various implementing partners – above all FIZ Karlsruhe.

Here are the most important questions and answers about the background and innovations of the project, answered together by Uwe Müller, Director of Technology, Development, Service at the German Digital Library and Stephan Bartholmei, responsible for product development and innovation.

The Questions

1. Why was the special project "DDB 2017" set up?

2. What changes or what advantages for existing and future data partners - culture and knowledge facilities - will result specifically from this new infrastructure?

3. Apart from the changes in the backend, are there any functional changes in the frontend of the DDB portal that will become visible to the users of the DDB with the activation of the new system?

4. Is the new architecture immune from sharing the fate of its previous architecture? What is done to prevent the DDB from reaching performance limits again in a few years?

5. New usage scenarios usually bring new, different requirements. Which requirements are these and to what extent is the new architecture ready for them?

6. Has the transition to the new system been completely smooth?

7. Materials and Downloads

The Answers 

1. Why was the special project „DDB 2017“ set up?

The Deutsche Digitale Bibliothek (German Digital Library) merges data – this is the core business – and quite a lot of data. To do this, we use a basic system with components for data storage, for loading and processing the data and for API via machine interfaces and the portal. The previously used system, which used a file system-based storage architecture, was now – including the conception phases – almost ten years old and has reached its limits at various points. These are, for instance, the duration of processing, but also the very limited possibilities to use or expand the data stored in the DDB outside the portal for other scenarios. To remain viable for the future, we must improve ourselves and have therefore decided to introduce a new architecture with current technologies for the base system, with which we want to achieve two concrete goals.

On the one hand, to increase the performance. This means that we can load more data in less time and update it in shorter cycles. We could achieve this goal due to the shift to modern backend technologies: The noSQL database Apache Cassandra and the compute framework Apache Spark use, due to their distributed concept, one cluster are therefore scalable for future needs. To become faster, we also use the improved technical support of work processes.

On the other hand, the shift to the new system architecture gives us the opportunity to realize completely new functionalities and usage scenarios based on them. This includes the analysis of the data, visualizations and data enhancement in the broadest sense.

2. What changes or what advantages for existing and future data partners - culture and knowledge institutions - will result specifically from this new infrastructure? 

In the future, we will first of all be faster in processing when importing and updating databases. The new architecture also makes us more transparent for data partners. Because with “DDBdash” we have developed a web-based administration console which can be used to trigger and monitor certain processes for data clearing step by step from the outside – first by the specialist departments of the DDB, later also by our data partners. 

This will noticeably accelerate technical processes – even if new data partners will still require a lot of intellectual and manual work. In order for some of the improvements enabled by the special project “DDB 2017” to gain traction, all existing data must also be re-imported after commissioning the new software architecture.

DDBdash Views

DDBdash now provides a web-based process control component. This is among other things the provision of data.
DDBdash now provides a web-based process control component. This is among other things the provision of data.
DDBdash can be used to transform the provided input data from the different formats and to load them into the different target systems (test system and production system) (Ingest).
DDBdash can be used to transform the provided input data from the different formats and to load them into the different target systems (test system and production system) (Ingest).

3. Apart from the changes in the backend, are there any functional changes in the frontend of the DDB portal that will become visible to the users of the DDB with the activation of the new system?

Yes, there are – especially in four fields: the realization of a completely new Object Viewer, a clear improvement of the search function, the introduction of so-called organization pages and the bundling of the editorial content in the newly developed “DDBjournal”. 

a) The Object Viewer

The new Viewer can now be used for viewing books and other printed or multi-page materials and for using them directly and completely within the DDB portal – including internal navigation, scrolling, zooming, rotation, and more. 

The Viewer has thumbnail (bottom) and tree views (left) as well as functions for scrolling, zooming in and out, and rotating pages.
The Viewer has thumbnail (bottom) and tree views (left) as well as functions for scrolling, zooming in and out, and rotating pages.

b) Improvements in the Search

The internal search has been extended by a suggest function, which becomes active when a search query provides few or no hits – for example because of incorrect entries. In addition, searching through the use of lemmatization now also makes allows finding records with different word forms – especially as regards inflections and compositions. 

The example “Frederick the Great” or “Fredrick the Great”

The auto-correction feature gives suggestions for (presumably) misspelled search words - now also for multi-word search.
The auto-correction feature gives suggestions for (presumably) misspelled search words - now also for multi-word search.

c) The Organization Pages

The introduction of the organization pages in the DDB portal has moved us a few steps forward in data networking. Like the personal pages that have already been used for a long time, they are a cross-link between objects and organizations or institutions. These include, in addition to cultural and knowledge facilities that provide data to the DDB, other entities – such as institutions or companies involved in the production, finding or investigation of objects (publishers, industrial companies, research institutions) or institutions and organizations, which are the subject of a book. 

Example 1: The organization page of the European Parliament

Institutions and organizations associated with DDB objects, will now be displayed on organization pages based on the common standard data and their association with the objects.
Institutions and organizations associated with DDB objects, will now be displayed on organization pages based on the common standard data and their association with the objects.

d) The DDBjournal

With the DDBjournal we bring together the editorial content in the DDB portal with a uniform and completely newly structured access. The editorial articles can be accessed via an overview page as well as via different categories that are displayed in an internal navigation. Also, the “calendar pages”, the persona pages and virtual exhibitions are anchored in the internal navigation. Additional navigation options comprise the indexing, the “recommended articles” features and “the most read” features as well as the archive. Thus, editorial formats such as “We are the DDB”, “Topic of the Month”, “The Digital Horizon”, news and background articles can be presented more attractively and found faster. 

The overview page of the Journal

The websites of the old system are now merged in the DDBjournal, with overview page, separate internal navigation, indexing and archive.
The websites of the old system are now merged in the DDBjournal, with overview page, separate internal navigation, indexing and archive.

4. Is the new architecture immune from sharing the fate of its previous architecture? What is done to prevent the DDB from reaching performance limits again in a few years?

Performance problems can be triggered by various factors. In addition to the sheer volume of data stored in the DDB, functions newly introduced in the years after the beta launch of the DDB in 2012 also influence the speed of certain data-processing operations. 

For example, in 2013 we extended the search area of the DDB with personal pages. Persons linked in the metadata of a cultural object with a unique identifier have their own personal page in the portal of the DDB. The structure of these personal pages requires automatic verification of all cultural objects for personal identifiers at regular intervals. 

In the previous architecture, we only had one search index to create such a list of all cultural objects – however, a search index is not really suitable for such tasks. In the new architecture, such transactions are performed on a database table, allowing the reduction of the required time from two weeks to two hours. 

On the other hand, the size of the database table used for this is smaller by a factor of 1000 than similar tables used by Netflix or Facebook. Thus, there is sufficient space for enhancements in the foreseeable future and the new architecture can be extended accordingly.

To do this, the core of the new architecture, the so-called processing cluster, only has to be added to new computers with medium or lower performance. With this “horizontal” scaling, in contrast to “vertical” scaling, which increases the performance of individual computers until the technical limits are reached, the performance problem can be replaced by a financing problem: As long as there are enough funds for new computers to expand the processing cluster, there are virtually no limits to growth.

5. New usage scenarios usually bring new, different requirements. Which requirements are these and to what extent is the new architecture ready for them?

A lot will change in the coming years in the portal offers and the API of the DDB.

A focal point will be the analysis of data for a variety of purposes. An important analysis purpose, for example, is to check and improve the quality of metadata in the DDB. Our data partners, who may not have their own capacities for this, can also benefit from this. The investigation results will also improve the order of the search results; a ranking app has already been developed for this as prototype. 

In the future, the DDB will process and store more and more data from more and more external and internal sources. This makes data management as an important task the focus of the DDB. The flexibility of the now modularized apps, together with the possibilities of the storage technologies underlying the new architecture, is an important prerequisite for this.

Along with the DDB portal and the German archive portal (Archivportal-D), further affiliate or subportals will be created on the API side for specific content or target groups, starting with the national newspaper portal. 

The DDB will also be more closely linked to data platforms from other domains, for example as part of existing partnerships with school cloud providers and university teaching and learning platforms. An important aspect of these links is the return flow of usage and user generated data into the DDB, such as the automatic indexing or classification by use of cultural objects in teaching materials for a particular topic. The new architecture creates the prerequisites necessary for storage and analysis of such data.

6. Has the transition to the new system been completely smooth?

The new system is running stable since the end of May. It fulfils our expectations in terms of functionally and operationally. We started running the new system in a parallel preview mode – among other things to be able to detect and fix errors and problems.

In the meantime, the shift has taken place – which means that the redesigned DDB system can now be accessed under the known URL.

Should errors be detected, these can be reported to us to: feedback [at] deutsche-digitale-bibliothek.de

7. Materials and downloads 

Additional materials and screenshots are also available on our material page. 

Press release 17.07.2018: Performance, Speed, New Usage Scenarios – Modernisation of Total Architecture Secures Sustainability of the DDB
[PDF] [Text version]  

Please send press inquiries to Astrid Müller (Communication, Press, Marketing Deutsche Digitale Bibliothek): a.mueller [at] hv.spk-berlin.de