An Aegean History and Archaeology Written through Radiocarbon Dates

Context The Aegean area has so far lagged behind several other parts of Europe and the Mediterranean in not offering any major listing of its considerable radiocarbon record, despite decades of radiocarbon sampling at major sites and worldwide radiocarbon-led debates, such as over the dating of the Santorini eruption (e.g. [1–8]). The dataset provided here is the outcome of a project “An Aegean Prehistory Written in Radiocarbon Dates” and it offers the most complete list so far of published radiocarbon dates from Greece. Some 3159 14C dates from 353 sites located in Greece have been discovered or cross-checked via a combination of harmonizing records from several existing radiocarbon databases, searching original publications and checking preliminary reports from both international and Greek sources. The project was designed to complement/enhance wider research agendas considering the interplay between human population, land use and long-term environmental processes, especially a Leverhulme Trust funded project known as “Changing the Face of the Mediterranean” (RPG-2015-031, PI Neil Roberts) to which the current radiocarbon data were used as part of regional case study paper ([9], for the special issue, see [10]). In a variety of contexts worldwide, the assessment of radiocarbon datelists as aggregate times series, often via summed probability distributions, has become popular for modelling human population change (e.g. [11]). Their collation with additional archaeological records, such as pollen cores and site data, has offered further opportunities to detect regional differences in the long-term socio-ecological development. Although the original aim of the project was to retrieve as exhaustive a list of published radiocarbon dates in the Aegean area from only the Mesolithic to Iron Age (ca.12–2 kya), it became evident that the number of published dates covering the area of modern Greece was far lower DATA PAPER

than expected. Four main reasons are identified for this issue: -There is a small core of radiocarbon dates especially for Aegean prehistory originally published in English and re-used extensively in subsequent attempts by researchers to better define individual chronological subperiod boundaries, period outsets (e.g. the beginning of Neolithic) or important events (e.g. the Santorini eruption). -A substantial number of measurements (ca. 17%) come from purely or partially non-anthropic contexts as part of investigation strategies involving boreholes towards the reconstruction of environmental or geomorphological conditions in the past. -In the Greek literature, there has been a marked tendency for archaeologists to report calibrated dates without clear reference to conventional (pre-calibration) radiocarbon ages or supplementary data (e.g. context details). -Later periods (after about ~2 500 BP) are underrepresented (far fewer dates) due to the lack of an academic tradition in collecting radiocarbon dates for Classical, Medieval and more recent periods of archaeology.
In this respect, the final dataset includes all dates encountered in the literature regardless of research context (archaeological, environmental or material conservation studies) or chronological period. Contextual information regarding the sampling procedure has also been recorded and sites have been identified and located as accurately as possible. A lot of effort has been directed towards cleaning data and refining terminology, with a view to data ingest into the ARIADNEplus portal. 1 As a result, we hope that in terms of both data structure and content, the current date-list aggregate will form an important radiocarbon data reference for Greece and continue to grow through further input and re-use.

Spatial coverage
Description: The dataset covers the area of modern Greece. Figure 1 shows the study area and the distribution of sites with archaeological and environmental samples.

Temporal coverage
Dates range from the Middle/Late Palaeolithic (ca.60,000 cal. BC) to early modern times.
(2) Methods The creation of this dataset was only possible due to the growing availability of openly available data records and relevant digital scholarship [12]. To approach the data collection, we combined secondary sources of already compiled radiocarbon datasets with other available online sources than might be screened and harvested for radiocarbon data.

Steps
More specifically, we have extracted lists of Greek dates from available 14C date lists and databases [13][14][15][16][17][18][19][20]. Archaeology. All the above were checked for all those volumes that were available online up to 2016. Radiocarbon lists were further extracted from several monographs, chapters in edited volumes, websites and site reports. Finally, Y. Maniatis (one of the authors here) provided clarifications for partially published (i.e. published by archaeologists only as calibrated timespans) radiocarbon dates from the NCSR Demokritos Radiocarbon Laboratory.

Sampling strategy
In addition to providing basic and alternative lab codes (stored in "LabID" and "OthLabID" respectively) as well as date codes in searched radiocarbon dating databases ("OtherDateCode"), conventional (pre-calibration) radiocarbon age ("CRA") and 1-standard deviation error ("Error"), we have further collected several data fields per date containing: -isotopic fractionation of stable carbon isotopes Carbon-13 (δ 13 C) for allowing clear assessment of fractionations and reservoir effects, but also for understanding changing water-stress across regions and through time ("DC13"), -other measurements related to the reported data, e.g. Percent Modern Carbon (pMC) ("Oth Measures"), -notices on the technique/method used to process the sample ("DateMethod"), -basic information on the sample material ("Material") as well as genus or species level identifications where possible ("Species").
For each date, we report its original publication and all subsequent works referencing it including online databases or publicly accessible data archives. In this respect, a considerable amount of contextual and supplementary information associated with each date has been included. The collection procedure focused on data quality control, by cross-checking all attributes associated with the radiocarbon dates and addressing possible inconsistencies in the published records. In cases where conflicting statements (e.g. sample age, deviation error) were encountered in the sources, we made decisions on our final database entry based on the most complete/detailed descriptions, the preference of original publications rather over compiled secondary sources (e.g. databases), the comparison with later (paper) publications on the possibility of measurement revisions (e.g. [21]). Problematic cases are reported in the "Problems" field, while alternative measurements alongside their link to their respective references have been included in the "Comments" field, both contained in the "C14Samples" table.
The geographic location of samples has been assigned Latitude/Longtitude coordinates in decimal degrees ("Longtitude" and "Latitude") recorded under the WGS84 ellipsoid (EPSG 4326). Each location has been coded according to its perceived accuracy using four different assessment levels (A: sub-site quality +/-10 m, B: within +/-1 km, C: Moderate accuracy within Admin Region, D: Unknown accuracy within Country). In cases of large sites (e.g. Knossos) where it was possible to locate samples within smaller or neighbouring research areas (e.g. Unexplored Mansion), this differentiation is also reflected in the "SiteName" field. However, the published dataset includes downgraded location coordinates that have been grouped by site name, in line with looting prevention policies by the Greek Ministry of Culture and Tourism. Researchers wishing to obtain the accurate coordinates can contact the lead author. Current Greek administrative divisions ("AdminRegion" and "Country") have been used to group samples.
In terms of contextual information, recordings include typological distinctions between site types ("SiteType"), notes on the stratigraphic context of each sample ("SiteContext"), chronological distinctions related to intra-site phasing ("SitePhase") and broader chronological periods related to the sample's cultural context ("CulturalPeriod").

Quality Control
After the completion of the data collection stage we undertook painstaking steps in data cleaning and checking. To solve data discrepancies, we had to re-visit and cross-check already screened sources. In descriptive fields, such as "SiteContext", readily available information was edited to achieve a standardized contextual notation for each site and mitigate differences in site context reporting between publications (e.g. Franchthi Cave). In fields where a term list could be established we tried to standardize entries as much as possible and map resulting terms to reference vocabularies or thesauri (see relevant tables). We used the Getty Art and Architecture Thesaurus (AAT) to map terms in the fields of "SiteType", "Material" and "Species". Also, the Getty Thesaurus of Geographic Names (TGN) to standardize individual entries for the "SiteName" and "AdminRegion" fields. Finally, we employed PeriodO and the Greek Historical Periods Vocabulary (URI: http:// semantics.gr/authorities/vocabularies/historical-periods) from the National Documentation Centre (NDC) [22] to further normalize the chronological periods reported in the "CulturalPeriod" field. All mappings are also being made available to ARIADNEplus to enhance data interoperability in the ARIADNEplus portal.

Constraints
In every respect, the dataset remains far from ideal. For example, almost 48% of the dates do not have associated δ 13 C values, whereas ca. 32% are not associated with any defined chronological period. Radiocarbon dates have been reported in the literature with varying degrees of associated information. Although the most recent publications are obviously more detailed, in many aspects reporting continues to vary significantly between laboratories or individual reports. Although, we have given a special emphasis to quality control while moving through the labyrinth of different paper and digital sources, a feeling persists that certain constraints rise from the original data reporting sources. In this regard, it remains up to the user to assess data reliability and proceed with appropriate caution. The data archive to which this paper points should be considered a versioned first release: we encourage all users to inform the lead author of any possible errors and we will seek to produce and update a more dynamic online repository accordingly.

(3) Dataset description
The dataset contains a single tab delimited text file (.txt) of the 14C dates for Greece (C14Samples) plus a BibTex format bibliography (References). A further tab delimited text file has been included to document the main file fields and the domain values included in the (C14Greece_ fields). The project's relational database was originally in MS Access and a version of this has been made available as a SQL dump file containing DDL (Data Definition Language) and DML (Data Manipulation Language) queries for reconstructing the database (C14Greece_dump). Apart from the main C14Samples table, the full relational database contained six more tables with standardized domain values for the following fields contained in the main table: 1) Admin Region, 2) CulturalPeriod, 3) Material, 4) SiteName, 5) SiteType, 6) Species, 7) Source. Values from tables 1 and 4 have been mapped to TGN, those of table 2 to PeriodO and NDC chronological periods, while the values from 3, 5 and 6 to the AAT. Table  7 included the original transcription of the source reference, which eventually resulted in the BibTeX file, but was also maintained in the original database.

Object name
C14Samples.txt -single file (Tab Delimited Text, UTF8 encoding) providing the data for all 14C samples. It corresponds to the original database main table.
C14Greece_fields.txt -single file (Tab Delimited Text, UTF8 encoding) containing field type definitions and domain values for all content included in the projects database.
References.bib -single BibTeX file containing references cited for all 14C samples recorded.
C14Greece_dump.sql -single file containing the main data table and seven additional tables for domain values and references.

Dataset Creators
The researcher responsible for data entry was Markos Katsianis. Online radiocarbon listings and literature sources were provided by Andrew Bevan, who supervised the data recording and standardisation process. Records were restructured, cleaned and standardized by Giorgos Styliaras. Yannis Maniatis provided additional records from the NCSR Demokritos Radiocarbon Laboratory and helped solve discrepancies between entries. Terminology mappings were performed by Giorgos Styliaras and Markos Katsianis.

English. Greek literature has been included in modern
Greek. Site context descriptions from Greek sources may contain Greek characters (e.g. Sector Φ).

(4) Reuse potential
This dataset comprises the largest single collection so far of radiocarbon data for the Aegean region, covering the equivalent area of modern Greece. It provides a comprehensive resource for accessing detailed chronological data from specific sites or wider regions within Greece. Sites have been located to the highest possible accuracy and although their coordinates have been downgraded to ca. 500 m. radius to discourage illegal uses, their positions can still be used in archaeological site mappings. The creation and circulation of radiocarbon databases of this kind follows wider efforts in sharing open licensed, georeferenced large-scale datasets in archaeology and beyond. On a broader level, the dataset allows the Greek radiocarbon listings to be added to data collections of other kinds from the Aegean region and to be used in comparative agendas of greater geographical scope. One key reuse potential relates to the use of aggregate lists of anthropogenic radiocarbon data as a proxy for human population change, for example via summed probability distributions (SPDs). Further potential might relate to enhanced interpretation of Aegean prehistoric archaeological sequences at regional levels or chronological comparison between different regions. Also, the juxtaposition of large lists of radiocarbon dates with other scientific data, such as pollen cores, macrobotanical remains, skeletal assemblages or archaeological settlement survey datasets offers further opportunities to approach long-term socio-environmental trajectories and questions.