The Cultural Evolution of Neolithic Europe . EUROEVOL Dataset 1 : Sites , Phases and Radiocarbon Data

Context These datasets were collected in the scope of the EUROEVOL project, and collectively represent the largest repository of archaeological data from Neolithic Europe, at the time of publishing. The time frame of the Neolithic in this part of the world broadly encompasses 8000–4000 BP (before 1950 AD), and is characterised by the introduction of domestic plants and animals from the Near East alongside the development of novel ceramic and lithic technologies via two routes of dispersal; a maritime route linking the Levant with the Aegean coast of Turkey and Greece into the western Mediterranean, and a more continental route linking central and northwest Anatolia with the more eastern part of Bulgaria and into continental central Europe [1–3]. This spread of early agro-pastoral lifeways also correlates with fundamental changes in past human demography, ecology and social organization [4–5]. The aim of the EUROEVOL project was to explain the patterns of stability and change associated with the spread and establishment of farming in Neolithic Europe in the light of new perspectives on human cultures and societies derived from evolutionary theory. The project focused on the western half of temperate Europe, where the available data are best. The project’s most important conclusion is that the introduction of farming to Europe did not lead to a steady population increase, but was characterised by a pattern of ‘boom’ and ‘bust’ in many regions [5–6]. We did not find evidence that these trends could be accounted for by climate change alone, suggesting that it was internal factors in these early societies that led to them exceeding the sustainable limits of their socio-economic systems. In keeping with this, we found correlations between the population patterns and changing economic patterns [7], as well as with investment in conspicuous monument construction and in the incidence of evidence for violence, which appears to be associated with societies exceeding their limits. We have also shown that the cultural transmission processes that produce distinctive patterns of similarity and difference in the archaeological record have recognisable signatures that can be identified from the archaeological material [8–10]. In addition we have assessed the relationship between different dating approaches for the European Neolithic and demonstrated the underlying shape of the intensity of European Neolithic cultures through time [11]. DATA PAPER

These datasets were collected in the scope of the EUROEVOL project, and collectively represent the largest repository of archaeological data from Neolithic Europe, at the time of publishing.The time frame of the Neolithic in this part of the world broadly encompasses 8000-4000 BP (before 1950 AD), and is characterised by the introduction of domestic plants and animals from the Near East alongside the development of novel ceramic and lithic technologies via two routes of dispersal; a maritime route linking the Levant with the Aegean coast of Turkey and Greece into the western Mediterranean, and a more continental route linking central and northwest Anatolia with the more eastern part of Bulgaria and into continental central Europe [1][2][3].This spread of early agro-pastoral lifeways also correlates with fundamental changes in past human demography, ecology and social organization [4][5].
The aim of the EUROEVOL project was to explain the patterns of stability and change associated with the spread and establishment of farming in Neolithic Europe in the light of new perspectives on human cultures and societies derived from evolutionary theory.The project focused on the western half of temperate Europe, where the available data are best.The project's most important conclusion is that the introduction of farming to Europe did not lead to a steady population increase, but was characterised by a pattern of 'boom' and 'bust' in many regions [5][6].We did not find evidence that these trends could be accounted for by climate change alone, suggesting that it was internal factors in these early societies that led to them exceeding the sustainable limits of their socio-economic systems.In keeping with this, we found correlations between the population patterns and changing economic patterns [7], as well as with investment in conspicuous monument construction and in the incidence of evidence for violence, which appears to be associated with societies exceeding their limits.We have also shown that the cultural transmission processes that produce distinctive patterns of similarity and difference in the archaeological record have recognisable signatures that can be identified from the archaeological material [8][9][10].In addition we have assessed the relationship between different dating approaches for the European Neolithic and demonstrated the underlying shape of the intensity of European Neolithic cultures through time [11].The radiocarbon data were collated to ensure the greatest possible coverage of this period.Nevertheless data has been include from slightly outside this range to ensure comprehensive coverage, circa 10 to 3.8 kyr BP. (

2) Methods
The majority of data in this dataset was obtained directly from researchers and colleagues across Europe, as well as from publications, Masters theses, PhD theses and occasional unpublished reports.There are three main components of the dataset, including Site level data, Phase level data and the radiocarbon data.The dataset utilises the same recording system as the archaeobotanical and faunal datasets, as evidenced in the full published MySQL database found at (http://discovery.ucl.ac.uk/1469811/).

Steps
The data comprising the CommonSites table was collected in several stages -Stage 1 involved the import of Site names and georeferencing from extant databases (e.g.RADON) or from spreadsheets sent by colleagues and collaborators.At the same time, new sites were manually entered based on original site reports used for the faunal and archaeobotanical data collection.Stage 2 involved the assignment of several unique EUROEVOL attributes, such as a unique Site identification number (SiteID) and the modern country in which the site is found.Stage 3 involved a comprehensive checking of all spatial coordinates to ensure maximum accuracy in georeferencing.
The CommonPhase table was constructed to provide the main linking field between the CommonSites table and the radiocarbon, faunal and archaeobotanical datasets.The collection of phase specific data involved using either source culture information from the acquired radiocarbon datasets, or more commonly from the original faunal and archaeobotanical reports.The use of the term 'Phase' in these datasets refers to data aggregated at the level of the cultural unit, for example LBK, Michelsberg, Chasséen, etc.These broad cultural units were found to be the most common level of aggregation in the faunal and archaeobotanical reports, and therefore offered maximum comparative potential between the different datasets.Once the culture had been identified, that phase was then assigned a unique Phase identifier (PhaseCode), which could be linked to the Site through the SiteID.Hence a single site can have multiple phases, and all phases must have a SiteID.The final stage in the CommonPhase data collection was to assign a standardised culture to avoid duplication based on differences in spelling e.g.Chassey and Chasséen, and where applicable a standardised subculture, as well as broad archaeological period e.g.Early Neolithic, Middle Neolithic etc.As with the assigned cultures, the broad archaeological period retained the original assignments made in the excavation/analytical report.This field should therefore be treated with caution due to the incompatability in regional chronologies.
The radiocarbon data was obtained primarily from extant databases and colleagues and collaborators from across Europe (see acknowledgements below for a full list of contributors).Occasional radiocarbon dates from published reports were also manually entered into the database.Where available, all information relating to the sample was entered, including Labcode, C14Age, C14STD, Material and MaterialSpecies.Material and MaterialSpecies were then standardised to permit systematic analysis on these fields.

Quality Control
We have adopted a fully inclusive approach to the data collection, including all radiocarbon data, irrespective of the date at which the sample was processed.The justification for this is to provide a comprehensive repository of all radiocarbon samples processed from relevant archaeological sites.In regard to the cultural affiliation of individual radiocarbon samples, we have applied more stringent quality control.This was achieved in two stages -Stage 1 involved the removal of outliers by systematically checking all radiocarbon dates that fell outside the reported standard date range of its associated culture.This involved a review of published records in order to identify samples that were considered problematic, perhaps as a result of data entry errors, old wood effect, low collagen count, or those that had been identified as unreliable following further stratigraphic or Bayesian analysis to identify them as outliers.Whilst these samples are kept in the database, all cultural affiliation, and therefore associated PhaseCode were removed.All records have been thoroughly checked for duplications and to ensure that they are standardised wherever possible.For example, we encountered numerous sites that have multiple site names, and therefore had to be assigned just a single name in order to avoid duplication.Equally, we found occasional duplicate Labcodes for the radiocarbon samples, which were investigated and either corrected in the database, or removed due to uncertainty as to which radiocarbon result was the correct one.

Constraints
Although we have undertaken painstaking steps in data cleaning and checking, a considerable portion of this dataset was received without source publications or original reports.As such we initially inherited a number of errors, and it may still be possible for some to remain.We would encourage all users to double check for duplications, and if possible to inform us of any possible errors in order for us to correct the online repository.
It is important also to remind users that both the SiteID and PhaseCode are unique identifiers, which we have assigned for the purpose of the EUROEVOL project.They do not correspond with published site identification numbers or phase numbers.
There are a number of caveats to keep in mind when reusing the radiocarbon data.These datasets were received with varying levels of associated information.For example, not all samples came with associated cultures, the sample material or species of the material, or full isotopic values.Whilst we spent considerable time trying to recover this information from source publications, we were unable to adequately phase all samples, or comprehensively assign all samples their associated information.Therefore, only 45% of the radiocarbon dataset has been assigned a PhaseCode, linking the radiocarbon dates to the other datasets i.e. the faunal and archaeobotanical data.
The remaining 55% of C14 Samples are therefore linked only at the site level through SiteID.This means that unusually, the radiocarbon data is linked to the site via two different relationships -1) C14 sample to phase to site (for samples that have been assigned a phase), and 2) C14 sample to site (for all samples, including those that have not been assigned a phase).Meanwhile 76% of the data has the sample material e.g.wood, bone etc., but only 27% has the material species and only 12% has associated C13 values.The most reliable fields in the radiocarbon dataset are undoubtedly the Labcode, C14Age and C14STD.Whilst we have attempted as stringent quality control as possible, we were ultimately constrained by the original data sources, especially in regards to the context and cultural affiliation of the individual samples.It therefore falls to the user to assess the reliability of the original data, especially where samples lack a phase code, or other contextual information.
As noted above the broad archaeological periods, e.g Early, Middle, Late Neolithic, should be treated with caution.Whilst they are useful within a regional context, the inconsistence in regional chronologies means that the Early Neolithic in one region may not be contemporaneous with the Early Neolithic in another.

Object name
CommonSites -two files providing the data (EUROEVOL09-07-201516-34_CommonSites.csv)and field type definitions (CommonSites_fields.csv) for all sites within the database, including a unique SiteID, Sitename, latitude/longitude, and modern Country.
CommonPhases -two files providing the data (EUROEVOL09-07-201516-34_CommonPhases.csv)and field type definitions (CommonPhases_fields.csv) for all phasecodes within the database, including a unique PhaseCode, SiteID, Culture and SubCulture where applicable, Period and the SiteType e.g.settlement, cemetery etc.

Data type
Primary and secondary data.

Format names and versions
.csv., SQL.

Creation dates
Some records were created in 2007-2010 as part of the AHRC funded 'Origins and Spread of Stock-Keeping' (OSSK) Project.However, the majority of records, and current MySQL database were created in 2010-2015.

Dataset Creators
The researchers responsible for the data entry were Kevan Edinborough, Katie Manning, Sue Colledge and Tim Kerig.Records were also added and cleaned by Atakan Guven, and all data was checked and restructured by Enrico Crema and Adrian Timpson.

Language
English.

Repository location
The full relational database is available as a SQL dump file and the individual tables (CommonSites, CommonPhases and C14Samples) are available as .csvfiles at http:// discovery.ucl.ac.uk/1469811/.

(4) Reuse potential
This dataset comprises the largest single collation of site level and radiocarbon data for the European Neolithic.We envisage numerous ways in which this data could be reused.Primarily, the CommonSites table offers an important resource for basic georeferencing and potential mapping of archaeological sites dating to this period.The radiocarbon data offers endless potential, whether it be simply for accessing the chronological data from a specific site, more advanced statistical analyses of radiocarbon data, such as the sort of Summed Probability Distribution techniques used in the EUROEVOL project [5][6], or more nuanced combinations of, for example, Bayesian and SPD analyses.The EUROEVOL data is particularly re-usable because the sample sizes are so large (4,757 sites and 14,053 radiocarbon samples), permitting novel spatio-temporal analyses.The relational structure of this data also provides considerable potential for multivariate analyses, taking into account not only the spatial and chronological information in this dataset, but also the palaeoecological and palaeoeconomic information in the faunal and archaeobotanical datasets (EUROEVOL datasets 2 and 3 -http://discovery.ucl.ac.uk/1469811/).

Figure 1 :
Figure 1: Map of north-western Europe showing sample locations.