(1) Overview

Context

The present dataset was created under the scope of the Leverhulme Trust funded project “Changing the Face of the Mediterranean” (grant number: RPG-2015-031, PI Neil Roberts) as a part of two regional case studies ([1, 2], for the special issue see [3]) and of the ERC project “CLASS – Climate, Landscape, Settlement and Society: Exploring Human-Environment Interaction in the Ancient Near East” (grant number: 802424, PI Dan Lawrence). Both projects explore trends in human-environment interactions respectively in the Mediterranean basin and the Near East between Late Pleistocene and Late Holocene. In these two projects, long-term demographic trends have been compared with climatic shifts and vegetation change in order to highlight possible causal relationships between them.

In this context, large lists of radiocarbon dates from archaeological contexts can be calibrated and counted up (summed in the manner of a histogram) as a proxy for population, based on the assumption that the more people living in a given region, the more the archaeological deposits, the more organic materials, and the more radiocarbon samples collected and dated. As a consequence, summed probability distributions (SPDs) of calibrated radiocarbon dates have become particularly popular for inferring demographic trends in prehistory and for assessing population response to climate shifts and the anthropic impact on landscapes [4, 5, 6, 7, 8, 9, 10]. The radiocarbon dates provide a better chronological resolution than archaeo-demographic proxies based on typo-chronological schemes (e.g., raw site count, aggregated estimated settlement size, number of burials, count of potsherds), where reliance on relatively long-lived artefactual data types means phases may span several centuries. Hence, comparing radiocarbon-inferred demographic trends with palaeoclimatic records and fossil pollen cores offers further opportunities to understand long-term socio-ecological trajectories.

Although radiocarbon dates from the Near East are available in some extant digital repositories and databases, they face several challenges, such as inconsistent chronological and geographical coverage, missing or inaccurate spatial location, missing updates from recent publications, lack of interoperable file formats, and unspecified licenses. These issues hamper the interoperability and the reuse of radiocarbon dates for comparative studies and do not adhere to the FAIR (findable, accessible, interoperable, reusable) principles for scientific data management and stewardship [11]. Comparisons of long-term demographic trends across the whole Near East require a synthetic dataset of radiocarbon dates collated under the same protocol easily accessible to a wider audience.

Here we provide the largest list so far of published radiocarbon dates for the whole Near East from the Late Pleistocene to Late Holocene between 15,000 and 1,500 cal. yr. BP. A total of 11,027 sites from 1,023 archaeological sites have been collated and harmonised from extant online digital repositories, original digital and print publications and archaeological excavation reports (Figure 1). The NERD dataset is hosted in the collaborative online platform GitHub, where registered users can contribute to the content of the original repository. Therefore, NERD offers a novel dynamic resource that can be updated at any time with new radiocarbon dates without the restrictions and limitations of traditional static databases.

Map of the Near East showing the spatial distribution of radiocarbon samples
Figure 1 

Map of the Near East showing the spatial distribution of radiocarbon samples.

Spatial coverage

Description: The dataset spatially covers the whole Near East which is approximately 5,900,000 sq. km (Figure 1). The radiocarbon dates are distributed over the following countries: Armenia, Azerbaijan, Bahrein, Cyprus, Egypt, Georgia, Israel, Iraq, Jordan, Kuwait, Palestine, Russia, Saudi Arabia, Syria, Turkey, United Arab Emirates, and Yemen.

The decimal degree coordinates of the minimum-bounding box are given in the geographic coordinate system WGS 84.

Northern boundary: + 42.7400

Southern boundary: + 13.0204

Eastern boundary: + 61.3377

Western boundary: + 25.7084

Temporal coverage

Dates range from the Late Pleistocene to the Late Holocene (15,000 – 1,500 cal. yr. BP/13,000 BC – 500 AD).

(2) Methods

The creation of this dataset was eased by the availability of openly accessible digital online repositories and by a growing consensus among scholars about the benefits of data sharing to advance knowledge [12]. The collation of all radiocarbon dates included in the present dataset was enabled by a desk-based synthesis integrating and harmonising secondary published sources and extant online databases.

Steps

The radiocarbon dates were collected in four different steps. Firstly, we extracted and merged uncalibrated radiocarbon dates from existing online digital archives and databases (ArAGATS project: [13]; BANADORA: [14]; [15]; CalPal: [16]; CONTEXT: [17]; [18]; IRPA/KIK: [19]; ORAU: [20]; PPND: [21]; RADON: [22]; TAY project: [23]; 14SEA: [24]).

Secondly, we enhanced the list of radiocarbon dates by a comprehensive screening of all the most relevant journals available online up to November 2021. Further dates were extracted from several monographs, chapters in edited volumes, archaeological excavation reports, and websites. We have not been able to include unpublished dates or those published in non-digital grey literature. The total number of references scrutinised to build our dataset is 719. As a result, we collected 11,027 radiocarbon dates from 1,023 archaeological sites. To our knowledge, this is the largest collation of published radiocarbon dates for the whole Near East.

Thirdly, we standardised the data and comparatively cross-checked all spatial coordinates from extant online digital archives and publications to guarantee maximum accuracy in the geographic location.

Lastly, a unique SiteID has been assigned to each radiocarbon date in order to link all of them to a specific site of provenance. Wherever possible, all information related to the radiocarbon samples has been recorded such as “LabID”, “Material”, and “Species”.

Sampling strategy

The dataset derives from existing publications relevant to the chronological scope of interest, spanning from the Late Pleistocene to the Late Holocene (15,000 – 1,500 cal. yr. BP). We tried to record all the known radiocarbon dates published until November 2021 that fall within our chronological scope of interest.

Quality Control

After the data collection stage, we cleaned and checked all the information inputted, as the collated radiocarbon dates come from various and inconsistent secondary sources. All radiocarbon dates have been checked for duplicates and have been standardised. For instance, some radiocarbon dates come from sites that have multiple site names and we assigned them a unique descriptive field “SiteID” and standardised the “SiteName” to avoid duplication of sites. Likewise, we have checked duplicated radiocarbon laboratory codes and removed those that refer to the same raw date. The “LabID” was standardised according to the following list: https://radiocarbon.webhost.uits.arizona.edu/sites/default/files/Labs-2021_09_03.pdf. For those radiocarbon dates having more than one lab ID, we created a second field named “OthLabID”, which provides the alternative lab ID.

In the field “Source”, we reported the original publication of each date and all subsequent works. During the data cleaning stage, we looked for any possible inconsistencies in the publications that could result in conflicting statements in the attributes (e.g., conventional pre-calibration radiocarbon age (“CRA”), 1-standard deviation error (“Error”), the material of the organic sample (“Material”), duplicated lab ID) associated with the radiocarbon dates. When we encountered discrepancies in the sources, the final entry in our dataset was based on the most detailed descriptions, on the original publication, on a comparison between multiple later publications and on any more recent measurement revision. Any issue associated with a specific date is reported in the field “Problems” and a more extensive description is recorded in the “Comments” field.

The geographic locations of the radiocarbon samples (“Longitude” and “Latitude”) were assigned in decimal degrees according to the World Geodetic System 1984 (WGS84, EPSG:4326). The location quality (“LocQual”) has been ranked according to five levels of accuracy: A (centroid of the archaeological site from which the organic sample has been collected), B (within +/– 2 km), C (within +/– 5 km), D (within +/– 10 km), E (within +/– 20 km). Most dates have the highest level of accuracy (n = 8851, category A), while only 129 have a coarse level of spatial accuracy in categories D and E. The radiocarbon dates have been systematically georeferenced by making use of any possible sources such as extant digital repositories, archaeological survey maps, satellite images, websites, coordinates from original publications, and digital tools (e.g., Google Earth). The coordinates from extant online digital archives (e.g., RADON, CalPal; BANADORA, etc.) have been manually cross-checked to avoid any possible spatial discrepancy and ambiguity due to the fact that it is unclear if those coordinates represent the centroid of the archaeological sites, or of the closest towns or villages. Radiocarbon samples with no geographic coordinates or unclear spatial location have not been included in the dataset. We are aware that publishing the geographic coordinates could make archaeological sites more vulnerable to looting activities. However, we did not obfuscate the locations of radiocarbon dates because they come from excavated archaeological sites whose coordinates are widely available through unrestricted resources publicly accessible (e.g., published archaeological surveys reports, monographies, digital publications, geographic tags in Google Earth, existing online databases).

Constraints

Although we have undertaken a scrupulous effort in data cleaning and checking, the present dataset may still contain some errors possibly inherited from multiple resources that were screened or due to human error. Hence, we would encourage all users of the NERD dataset to inform us of any errors so that we can correct them and update the dynamic online repository.

There are several caveats when dealing with radiocarbon dates. Unlike other areas of the world where archaeological work is routinely integrated into planning and construction industries through a long tradition of regulated commercial archaeology (e.g., North America, central and northern Europe), in southwestern Asia most archaeological investigations are carried out by academic projects. As a consequence, this region is more prone to spatial and chronological biases due to investigators interests (e.g., the spread of farming, the human response to rapid climatic shifts such as the 4.2 and 3.2 kya events) than areas with a long tradition of commercial archaeology. Regarding chronological biases, radiocarbon dates are particularly prevalent in the Late Pleistocene and Early Holocene due to the absence of high-resolution typo-chronological schemes. Conversely, for later historical periods archaeologists rely more on short-lived pottery types and historical media (e.g., clay tablets, seals and coins) for dating archaeological layers rather than using radiometric dating. A certain reluctance in using radiocarbon dating among those archaeologists digging Iron Age sites is also justified by the Hallstatt radiocarbon calibration plateau (ca. 2750–2350 cal. yr. BP) which makes it difficult to obtain refined radiocarbon-based chronologies. On a pan-regional scale, the present dataset guarantees a good chronological coverage until ~ 3000 cal. yr. BP, while other regions such as Mesopotamia, Anatolia and Iran show a research-biased drop in the available radiocarbon dates from ~4000–3500 cal. yr. BP onwards as shown recently by Palmisano et al [9].

Regarding spatial bias, 35% of the radiocarbon dates come from the Levant (modern Israel, Palestine, Lebanon, Jordan and western Syria), while other regions such as Iran, Arabia and southern Mesopotamia are underrepresented (Figure 2). Furthermore, 70% of the Levantine dates come from modern Israel and Palestine. This is due to the higher research budgets of Israeli archaeological teams interested in producing or improving an absolute chronology.

Kernel density map of radiocarbon samples
Figure 2 

Kernel density map of radiocarbon samples.

Additionally, certain chronological periods are more likely to be sampled than others. A clear research bias in the southern Levant is due to the interest of many archaeologists in providing a better chronology for the Early Bronze Age sub-periods (ca. 3800–2500 BC) and the Late Bronze Age/Iron Age transition (ca. 1200–950 BC) [25, 26].

The radiocarbon dates have been collected from a variety of existing sources with varying levels of published information. For instance, not all samples have published information about the site context, sample material, sample species, etc., although we tried to recover as much missing information as possible. For example, 87% of the listed dates have information about the sample material (e.g., charcoal, bone, wood, etc.), but only 28% have information about species (e.g., sample of wheat seeds, olive stones, etc.). Most of our dataset is composed of long-lived unidentified charcoal samples that could be affected by the so-called old wood effect. Their use could affect the resulting summed probability distributions (SPDs) of calibrated radiocarbon dates that have been widely used by palaeo-demographers for modelling past population human dynamics. To explore this issue, we produced SPDs of unnormalised calibrated radiocarbon dates from short-lived radiocarbon samples (e.g., bones, collagen, seeds, grains) and from all dates. The SPDs including all dates (Figure 3, in grey) and only short-lived dates (Figure 3, in green) are highly correlated (r = 0.93, p-value = 0.01, Pearson) and the so-called old wood effect seems not to have significantly affected the resulting post-calibration probability densities.

Summed probability distributions of unnormalised calibrated radiocarbon dates including all samples (in grey) and the short-lived samples (in green)
Figure 3 

Summed probability distributions of unnormalised calibrated radiocarbon dates including all samples (in grey) and the short-lived samples (in green).

(3) Dataset description

The dataset contains a set of three files providing a list of radiocarbon dates (nerd.csv), a file providing a list of references (References.txt file) of published radiocarbon dates stored in the nerd.csv file (see field “Source”), and a metadata and field description (see Table 1) for the attributes of the radiocarbon samples (README.md).

Table 1

Description of the fields in the file nerd.csv.


DATAFIELD DESCRIPTION

DateID (numeric) unique identifier for the radiocarbon sample

LabID (character) unique identifier for the lab’s radiocarbon sample

OthLabID (character) unique alternative identifier for the lab’s radiocarbon sample

Problems (character) problems related to the radiocarbon sample (e.g. missing Lab Id, duplicated Lab Ids, etc.)

CRA (numeric) radiocarbon concentration expressed in years before present (BP)

Error (numeric) Standard error of radiocarbon date in years

DC13 (numeric) Isotopic fractionation of stable carbon isotopes Carbon-13

Material (character) material of the radiocarbon sample

Species (character) species of the radiocarbon sample

SiteID (numeric) unique identifier of the site from which the radiocarbon sample has been collected

SiteName (character) name of archaeological site

SiteContext (character) original archaeological context from which the radiocarbon sample was collected

SiteType (character) type of archaeological site

Country (character) country from which the radiocarbon sample was collected

Longitude (numeric) WGS84 eastings

Latitude (numeric) WGS84 northings

LocQual (character) scale defining the accuracy of the spatial coordinates of radiocarbon samples

Source (character) source from which the radiocarbon samples have been collected

Comment (character) Comments about the issues reported in the field “Problems”

The NERD dataset has been published in accordance with the FAIR principles. There are several steps that need to be implemented before making our data easily findable and reusable (see Figure 4). After storing all radiocarbon dates into a .csv file, we published it publicly via GitHub, which is an ideal platform to modify, delete, and rewrite any content of a project but not for lasting referencing purposes. For this latter purpose, Zenodo offers more security for permanently archiving a specific dataset by assigning a Digital Object Identifier (DOI), which makes the repository more findable and citable. Hence, we authorized the GitHub account of the leading author to be connected to Zenodo. This allows Zenodo to publish online the repository stored on Github by issuing a DOI. This step is possible by creating a release in GitHub. Each time we created a new version of the repository, we created a new release that can be tracked in the repository on Github. Thanks to this step, any version of the repository is stored both on Github and Zenodo. The strength of this approach is to make our dataset dynamic and easy to be permanently stored online. In addition, the collaborative platform Github allows any registered user to contribute to NERD by adding new radiocarbon dates not included in our dataset through the command ‘Pull request’. The final stage of the roadmap to the FAIR principles is to publish a data paper of NERD in order to make it more visible to the research community.

The road map to the FAIR principles (findable, accessible, interoperable, reusable)
Figure 4 

The road map to the FAIR principles (findable, accessible, interoperable, reusable).

Finally, NERD is also currently integrated into the R package c14bazAAR, which allows for the querying, managing and merging of various open-access radiocarbon archives [27]. This makes NERD more accessible and usable within the scientific community.

Object name

nerd.csv

References.txt

README.md

Data type

Secondary data, and processed data from originally published materials.

Format names and versions

.csv, .txt, .md

Creation dates

The dataset was created in three different stages. 70% of the dataset was created during 2016–2018 as part of the Leverhulme Trust funded project “Changing the face of the Mediterranean: land cover and population since the advent of farming”. The remaining portion of the dataset was created between October 2019 and December 2021 under the umbrella of the ERC project “CLASS – Climate, Landscape, Settlement and Society: Exploring Human-Environment Interaction in the Ancient Near East” and of an Alexander von Humboldt Research Fellowship.

Dataset Creators

The researcher responsible for the data entry and management was Alessio Palmisano. A portion of radiocarbon dates for the Levant were added by Fayrouz Ibrahim and Alexander Kabelindde. Andrew Bevan was also heavily involved in checking and restructuring parts of the dataset.

Language

English

License

Creative Common License CC-BY 4.0: https://creativecommons.org/licenses/by/4.0/

Publication date

08/12/2021

(4) Reuse potential

This data set represents what is to our knowledge the largest existing collation of radiocarbon dates for the whole Near East from the Late Pleistocene to the Late Holocene (15,000 – 1,500 cal. yr. BP).

The locations of the archaeological sites yielding radiocarbon samples have been collected guaranteeing the highest possible accuracy and represent a basic resource for georeferencing of past Near Eastern settlements. This large dataset can be added with other lists of radiocarbon dates for comparative studies and research agendas focussing on a broader geographical scope. The anthropogenic radiocarbon dates can be used as a proxy of human population for building summed probability distributions (SPDs), an established technique for inferring past population fluctuations across space and time. The large size of the present dataset and its structure make it suitable for a wide range of spatio-temporal analyses, with the potential for careful handling of the underlying temporal uncertainty [28, 29]. The present dataset can be integrated with archaeological artefacts, and architectonic remains for revising established typo-chronological schemes and building new ones. Furthermore, the combination of this large list of radiocarbon dates with other scientific data, such as fossil pollen cores, paleoclimatic records, and archaeological survey data, opens new comparative perspectives in the study of long-term human-environment interactions at different chronological and spatial scales of analyses.