CONVENTIONS
Filenames #
For file and directory naming use only alphanumeric characters without special characters such as quotes, punctuation marks, characters with diacritics, spaces, slashes and the like. Underscores (_) and hyphens (-) can be used. For further guidance have a look at the recommendations from IANUS.
PREFERENCES
Formats #
You are strongly encouraged to provide the resources in standard formats adopted by the respective research communities. We will support you in converting the data if this is necessary and feasible.
Suitable formats should be widely in use and, if possible, be in compliance with open and non-proprietary standards. Files should not be password protected, encrypted or compressed in a lossy way. If files depend on references to other files, fonts, or other external data, these objects should be deposited as well, or at least described in e.g. a plain text README file. Whenever a choice for encoding is possible choose UTF-8 without the byte order mark (BOM) (see [FAQ]).
If file conversions become necessary, potential loss of information should be minimised. If lossless conversion into an open or recommended format cannot be achieved the original files will be kept together with the converted versions.
The preferred format for annotated textual data in our repository is TEI/XML (Text Encoding Initiative) with metadata in teiHeaders. Additionally, all language resources have to be described in CMDI (Component Metadata Infrastructure), automatically generated based on the ARCHE metadata. For an overview of recommended standard formats, please consult the CLARIN standards recommendations.
For other formats which are not covered in the CLARIN standards, for general text formats, and media formats, please refer to the table for preferred and accepted formats provided by ARCHE. The table is based on the formats listed at IANUS and at the Archaeology Data Service.
Preferred and accepted formats in ARCHE (08. 2017). Preferred formats are suitable for long-term preservation. Accepted formats require conversion.
EXTENSION | FORMAT NAME & VERSION | PREFERENCE |
---|---|---|
DATASET | ||
csv | Comma-Separated Values | preferred |
dbf | dBase database file | accepted |
dbf | dBase database file | accepted |
siard | Software Independent Archiving of Relational Databases | preferred |
xml | eXtensible Markup Language | preferred |
fp5, fp7, fmp12 | FileMaker Databases | accepted |
bak | binary export formats for databases | accepted |
accdb | Microsoft Access Databases | accepted |
db | binary export format for databases | accepted |
dmp | binary export formats for databases | accepted |
json | Javascript Object Notation | accepted |
mdb | Microsoft Access Databases | accepted |
odb | Open Document Databases | accepted |
xls | Microsoft Excel | accepted |
ods | Open Document Format | preferred |
sql | Structured Query Language | preferred |
tsv | Tab Separated Values | preferred |
xlsx | Office Open XML Workbook (Microsoft) | preferred |
IMAGE | ||
dxf | Drawing Interchange Format (Autodesk) | accepted |
jpg, jpeg | Joint Photographic Expert Group | accepted |
png | Portable Network Graphics | accepted |
svg | Scalable Vector Graphis 1.1, uncompressed | preferred |
tif, tiff | Baseline TIFF v. 6, uncompressed | preferred |
ai, indd | Adobe Illustrator, Adobe InDesign | accepted |
bmp | Bit-Mapped Graphics Format (Microsoft) | accepted |
cgm | Computer Graphics Metafile, WebCGM | accepted |
cpt | CorelPaint | accepted |
dwf | Design Web Format (Autodesk) | accepted |
dwg | Drawing (Autodesk) | accepted |
eps, ps | PostScript, Encapsulated PostScript | accepted |
jp2, jpx | JPEG2000 | accepted |
psd | Photoshop (Adobe) | accepted |
dng | Adobe Digital Negative | preferred |
AUDIO / VIDEO | ||
gif | Graphics Interchange Format | accepted |
mkv | Matroska | preferred |
aac | Advanced Audio Coding | accepted |
mp4 | MP4 | accepted |
aiff | Audio Interchange File Format | accepted |
asf | Advances Systems Format (ASF/WMV) | accepted |
avi | Audio Video Interactive | accepted |
f4v | Flash | accepted |
mj2 | Motion JPEG 2000 | accepted |
mov | QuickTime File Format | accepted |
mp3 | MP3 | accepted |
mp4 | MPEG-4 | accepted |
mpeg | MPEG-2 | accepted |
mxf | Material eXchange Format | accepted |
ogg, ogm, ogv, ogx, spx | Ogg | accepted |
wav | RF64/MBWF | accepted |
wma | Windows Media Audio | accepted |
wmv | Advanced Systems Format (ASF/WMV) | accepted |
bwf | Broadcast Wave Format | preferred |
flac | Free Lossless Audio Codec | preferred |
wav | Waveform Audio File Format | preferred |
TEXT DOCUMENTS | ||
html | HyperText Markup Language | preferred |
other PDF variants | preferred | |
txt | Plain Text | preferred |
doc | Microsoft Word | accepted |
maff | Mozilla Archive Format | accepted |
rtf | Rich Text Format | accepted |
sxc | Open Office XML | accepted |
docx | Office Open XML Document (Microsoft) | preferred |
dtd | Document type definition | preferred |
htm, html | HyperText Markup Language | preferred |
mht, mhtml | MIME Encapsulation of Aggregate HTML Documents | preferred |
odt | Open Document Format | preferred |
sgml | Markp Text | preferred |
warc | WebArchive | preferred |
xht, xhtml | Extensible HyperText Markup Language | preferred |
xsd | XML Schema definition | preferred |
3D DATA | ||
obj | Wavefront .obj file | preferred |
ply | Polygon File Format, Stanford Triangle Format | preferred |
x3d | eXtensible 3D Graphics | preferred |
stl | Standard Tessellation Language | accepted |
u3d | Universal 3D Format | accepted |
vrml | Virtual Reality Modeling Language | accepted |
dae | COLLADA | preferred |
WHO, WHAT, WHEN, HOW
Metadata #
Metadata should answer basic questions regarding your data allowing others to understand, discover, and share the data. Good metadata provides information about how the data was produced, who was involved in its creation, and what the data is about. Using metadata is an essential part in complying to the FAIR Data Principles, to make data Findable, Accessible, Interoperable, and Reusable (see [FAQ]).
Metadata can cover different levels like collection-level, file-level and even data unit-level description. Ideally metadata is implemented accurately and as completely as possible making use of a standard format. The Archaeology Data Service and IANUS provide format agnostic collection-level metadata which can be applied to all types of domains. Additionally in the respective sections of IANUS’ IT-Empfehlungen file-level metadata is presented, which in general is more technical and heavily depends on the data type and the methods used.
The metadata required when depositing in ARCHE is explained in the table for metadata requirements. At ARCHE additional project-level metadata is used alongside collection-level and file-level metadata. Mandatory fields required by ARCHE are marked as such, but using recommended fields is essential for increased findability, understandability and citability of data. ARCHE’s OWL-format metadata schema is annotated with extensive documentationis also available in a tabular representation.
Properties are listed for projects, collections and resources.
m = mandatory, r = recommended, o = optional, and * = property can be used multiple times.
GUIDANCE
Important Information