ARCHE’s underlying system, initially based on the open-source repository software Fedora Commons version 4, was completely reworked in 2020. It is now based on a bespoke software stack. However, all data, stable identifiers (PIDs), and functionality of both the user interface and the APIs exposing data to external applications of the original application, were preserved.
Resources published via the repository get assigned a Handle-based persistent identifier issued by the PID-service run by the GWDG. By using these PIDs, we ensure that the resources remain referenceable and citable even if their actual location or the underlying repository system should change in the future.
The system runs in a Docker environment on one of the virtual machines hosted on dedicated servers maintained by the Computing Centre of the Academy (ARZ). The data is secured against any case of emergency or data loss with a multi-layered backup strategy, cf. Storage Procedures.
The technological development of the repository infrastructure is a continuous process, which is driven by the qualified staff of ACDH-CH and supported by the Academy’s computing centre. Furthermore, the team is embedded in a broad network of data centres via the research infrastructures CLARIN and DARIAH as well as the working group Datenzentren of the DHd alliance.
Software Stack #
The system is built in a modular, service-oriented manner, consisting of multiple interconnected components that communicate through well-defined APIs. The full code is available on GitHub. Detailed technical documentation can be found under https://acdh-oeaw.github.io/arche-docs.
The main software stack implemented in PHP consists of the following components:
- arche-core provides the REST API for CRUD operations and transactions support. Writing to the repository is only possible through this component. Its REST API is documented in https://app.swaggerhub.com/apis/zozlak/arche.
- arche-doorkeeper implements ACDH-CH-specific business logic. It integrates with arche-core using arche-core’s handle system.
- arche-resolver is the service for handling the URI namespace in use (in our case https://id.acdh.oeaw.ac.at). It resolves URIs against identifiers in the repository and provides redirection to proper dissemination methods.
- arche-oaipmh provides an OAI-PMH endpoint for the repository.
- arche-core-gui is ARCHE's graphical user interface for browsing its content as well as the API endpoint for other tools like the metadata editor. This part of the system is based on Drupal making use of some of its features, like multilingualism or static pages.
- arche-core-gui-api is ARCHE's graphical user interface API endpoint.
A reference deployment of the components is provided by:
- arche-docker, a docker image providing the runtime environment for the ARCHE software stack.
- arche-docker-config exemplary configuration settings.
Client Libraries #
To ease development of ARCHE components and client software a set of client libraries is provided. Documentation of the libraries is available at https://acdh-oeaw.github.io/arche-docs..
- arche-lib provides a convenient and uniform PHP API for repository search and CRUD operations. The API can communicate with the repository either by using the REST API provided by arche-core (with support for both read and write operations) or by using direct database access, which is limited to read-only operations. The first communication mode is aimed at external clients. The direct database access is used by the internal repository components arche-doorkeeper, arche-resolver, arche-oaipmh, arche-gui.
- arche-lib-schema provides object mappings for the ACDH-CH ontology. It is used by arche-doorkeeper and arche-gui.
- arche-lib-disserv provides a PHP API for handling of dissemination services, i.e. matching a repository resource with proper dissemination services and creating a redirection URL. It is used by arche-resolver and arche-gui.
- arche-lib-ingest provides a high-level PHP API for ingesting data into ARCHE. To this end RDF graphs with metadata are parsed and ingested along with files from a given directory. The application is used by curators to ingest data into ARCHE.
Dissemination Services #
The repository hosts a wide variety of data types. While it provides a uniform default view on all the collections and digital objects with metadata and description, it also integrates smoothly with a growing set of specialised web applications designed to process and visualise specific data types and formats. These dissemination services run independently of the repository proper and are dynamically registered to be applied on certain types of data in the repository, so that for example a TEI (or any other XML) document can be rendered and viewed as HTML, geographical data plotted on a map or graph-based data visualised as an interactive network.
The binding/matching of resources to certain dissemination services is dynamically configured based on certain characteristics of the resources. The binding is primarily governed by the format of the resources, however the binding mechanism allows flexible matching based on any metadata properties.
Storage Procedures #
Data storage and backup procedures are essential to our data management system. To avoid data loss due to deterioration of physical storage, malicious threats or other emergencies, redundancy is the key for the preservation of data. The primary server storage is a RAID-6 configuration allowing to sustain read and write operations in the presence of up to two concurrent disk failures.
Our backup policy follows a multi-layered setup: The full metadata dump is taken daily with a 7-day retention period. Every week a full metadata dump and an incremental binaries dump are taken and stored indefinitely on an NetApp network storage provided by the ARZ. The NetApp storage has a mirror in a second location.
ORIENTIERUNG
Wichtige Informationen