Filenames, Formats and Metadata


For file naming use only alphanumeric characters without special characters such as quotes, punctuation marks, characters with diacritics, spaces, slashes and the like. Underscores (_) and hyphens (-) can be used. For further guidance have a look at the recommendations from IANUS.


You are strongly encouraged to provide the resources in standard formats acknowledged by the respective research communities. We will support you in converting the data if this is necessary and feasible.

Suitable formats should be widely in use and, if possible, be in compliance with open and non-proprietary standards. Files should not be password protected, encrypted or compressed in a lossy way. If files depend on references to other files, fonts or other external data, these objects should be deposited as well, or at least described in e.g. a plain text README file. Whenever a choice for encoding is possible choose UTF-8 without the byte order mark (BOM) (see FAQ).

If file conversions become necessary, potential loss of information should be minimised. If lossless conversion into an open or recommended format cannot be achieved the original files will be kept together with the converted versions.

The preferred format for annotated textual data in our repository is TEI/XML (Text Encoding Initiative) with metadata in teiHeaders. Additionally, all language resources have to be described in CMDI (Component Metadata Infrastructure). We will gladly support you in creating this metadata. For an overview of recommended standard formats have a look at the CLARIN standards recommendations.

For other formats not covered in the CLARIN standards, for general text formats, and media formats refer to the table for preferred and accepted formats provided by us. The table is based on the formats listed at IANUS and at the Archaeology Data Service.

Preferred and accepted formats in ARCHE (08. 2017). Preferred formats are suitable for long-term preservation. Accepted formats require conversion.

  pdf PDF/A-1 preferred
  pdf PDF/A-2 preferred
  pdf PDF/A-3 accepted
  pdf other PDF variants accepted
  odt Open Document Format preferred
  docx Office Open XML Document (Microsoft) preferred
  doc Microsoft Word accepted
  rtf Rich Text Format accepted
  sxw Open Office XML accepted
  txt plain text preferred
  xml eXtensible Markup Language preferred
  sgml Markup text preferred
  html, htm HyperText Markup Language preferred
  dtd document type definition preferred
  xsd xml schema definition preferred
  tiff, tif Baseline TIFF v. 6, uncompressed preferred
  dng Adobe Digital Negative preferred
  png Portable Network Graphics accepted
  jpeg, jpg Joint Photographic Expert Group accepted
  gif Graphics Interchange Format accepted
  bmp Bit-Mapped Graphics Format (Microsoft) accepted
  psd Photoshop (Adobe) accepted
  cpt CorelPaint accepted
  jp2, jpx JPEG2000 accepted
  svg Scalable Vector Graphis 1.1, uncompressed preferred
  cgm Computer Graphics Metafile, WebCGM accepted
  dxf Drawing Interchange Format (Autodesk) accepted
  dwg Drawing (Autodesk) accepted
  ps, eps PostScript, Encapsulated PostScript accepted
  ai, indd Adobe Illustrator, Adobe InDesign accepted
  dwf Design Web Format (Autodesk) accepted
  csv Comma Separated Values preferred
  tsv Tab Separated Values preferred
  ods Open Document Format preferred
  xlsx Office Open XML Workbook (Microsoft) preferred
  sxc OpenOffice XML accepted
  xls Microsoft Excel accepted
  siard Software Independent Archiving of Relational Databases preferred
  sql Structured Query Language preferred
  json JavaScript Object Notation accepted
  mdb, accdb Microsoft Access Databases accepted
  fp5, fp7, fmp12 FileMaker Databases accepted
  dbf dBase Databases accepted
  bak, db, dmp binary export formats for databases accepted
  odb Open Document Databases accepted
  mkv Matroska preferred
  mj2 Motion JPEG 2000 accepted
  mp4 MPEG-4 accepted
  mxf Material eXchange Format accepted
  mpeg MPEG-2 accepted
  avi Audio Video Interleave accepted
  mov QuickTime File Format accepted
  asf, wmv Advanced Systems Format (ASF/WMV) accepted
ogg, ogv, ogx, ogm, spx Ogg accepted
  flv, f4v Flash accepted
  flac Free Lossless Audio Codec preferred
  wav Waveform Audio File Format preferred
  bwf Broadcast Wave Format preferred
  wav RF64/MBWF accepted
  aac, mp4 Advanced Audio Coding/MP4 accepted
  mp3 MP3 accepted
  aiff Audio Interchange File Format accepted
  wma Windows Media Audio accepted
  x3d eXtensible 3D Graphics preferred
  dae COLLADA preferred
  obj Wavefront .obj file preferred
  ply Polygon File Format preferred
  vrml Virtual Reality Modeling Language accepted
  u3d Universal 3D Format accepted
  stl Standard Tessellation Language accepted
  xhtml, xht Extensible HyperText Markup Language preferred
  mht, mhtml MIME Encapsulation of Aggregate HTML Documents preferred
  warc WebArchive preferred
  maff Mozilla Archive Format accepted


Metadata should answer basic questions regarding your data allowing others to understand, discover and share the data. Good metadata provides information about how data was produced, who was involved in the making and what the data is about. Using metadata is an essential part in complying to the FAIR Data Principles, to make data Findable, Accessible, Interoperable, and Reusable (see FAQ).

Metadata can cover different levels like collection-level, file-level and even data unit-level. Ideally metadata is implemented accurately and as completely as possible making use of a standard format. The Archaeology Data Service and IANUS provide format agnostic collection-level metadata which can be applied to all types of domains. Additionally in the respective sections in IANUS’ IT-Empfehlungen file-level metadata is presented, which in general is more technical and heavily depends on the data type and the methods used.

The metadata required when depositing in ARCHE is detailed in the table for metadata requirements. At ARCHE additionally project-level metadata is used alongside collection-level and file-level metadata. Mandatory fields required by ARCHE are marked as such, but using recommended fields is essential for increased findability, understandability and citability of data. The metadata schema of ARCHE is also available in OWL-format annotated and with extensive documentation, of which also a tabular representation exists.

Properties are listed for projects, collections and resources.
m = mandatory, r = recommended, o = optional, and * = property can be used multiple times.