(1) Overview


These datasets were collected as part of the scope of the project “Changing the Face of the Mediterranean” and currently represent the largest repository of archaeological data for central Italy from the Late Mesolithic (ca. 8,000 BC) to the fall of the Roman Empire (500 AD). This project aims to reconstruct long-term trends in population dynamics and vegetation change from the introduction of farming to the Medieval times (ca. 10,000–1,000 BP), on a pan-Mediterranean scale, in order to assess the relationship between human population dynamics and accompanying transformation of the Mediterranean environment. The project compares radiocarbon dates, archaeological survey data and pollen records from several case study regions over the longue durée [1].

The dataset discussed here forms one of the six case studies of the project. Central Italy’s long history of extensive archaeological excavations and systematic survey projects (e.g. Forma Italiae Project by the Istituto di Topografia [2], Tiber Valley Project by the British School at Rome [3], research by the Archaeological Superintendence of Lazio and Tuscany, etc.) make this region an unusually privileged case study for assessing demographic trends and settlement patterns across space and time (Figure 1). Most of the studies we synthesise here were first stimulated by the work of Ward Perkins who introduced the modern landscape archaeology to central Italy [4, 5]. The Institute of Topography of the Sapienza University of Rome renewed the territorial research in Lazio and Tuscany with the Forma Italiae’s project, a large-scale complete archaeological map and gazetteer of Italy composed of 46 volumes published between 1926 and 2017 [6, 7, 8]. Furthermore, the Roman school of pre- and proto-history promoted territorial and spatial studies with particular emphasis in the transition from the smaller-sized dispersed hilltop villages (those generally about 2–3 ha in size) of the Final Bronze Age (1175/1150–1020/950 BC) to the larger proto-urban centres (50–200 ha) distributed over the lowlands and plateaus [9, 10]. Given the high intensity of archaeological research in Italy, here we offer the first systematic study and collation of settlements and radiocarbon data from a wide range of published sources arranged in a spatial database.

Figure 1 

Map showing the a) distribution of radiocarbon samples and b) sites (the blue polygons indicate the boundary of the archaeological surveys).

Spatial coverage

The portion of central Italy examined here covers around 50,000 sq.km:

  • Description: Tuscany, Lazio and a small part of western Umbria.
  • Geographic Coordinate system: World Geodetic System (WGS) 1984.
  • Datum: World Geodetic System (WGS) 1984.
  • Northern boundary: 44.24017 (decimal degrees).
  • Southern boundary: 41.20348.
  • Eastern boundary: 14.02845.
  • Western boundary: 10.01920.

Temporal coverage

8,000 BC–AD 500

The radiocarbon data were collated over a slight broader chronological range starting from circa 10,000 BC and ending at AD 1000 to ensure comprehensive coverage.

(2) Methods


Archaeological settlement data was collected in two stages. First, maps from 59 archaeological survey reports covering an overall area of ca. 10,000 sq. km were scanned and georeferenced (to an unprojected LatLon coordinate system, WGS84 datum, Figure 1b). Second, settlement data were recorded, where possible, as geo-referenced polygons per cultural period, and when the former was impossible, as circular buffers based on published estimates of site size per occupation period. The use of the term ‘period’ here refers to familiar archaeological episodes in the region such as Neolithic, Early Bronze Age, Iron Age, etc. These cultural units were found to be the most common level of aggregation and standardization were common in the archaeological survey reports we summarised, but were typically expressed without any absolute calendric dates. By recording both the stated cultural period and approximate estimated start and end dates in calendrical years, we have sought to provide maximum comparative potential across different archaeological excavation reports and surveys, standardizing period-based terminology where necessary (see Table 1 for the chronological scheme adopted). For instance, the occupation period of an Early Bronze Age site is translated, in calendric years, into a time span between 2300 and 1700 BC. A sub-periodization of each cultural period is provided when the chronological information retrieved in the original archaeological sites’ gazetteers was more accurate.

Table 1

A chronological scheme for central Italy.

Period Absolute dates

Mesolithic 10,000/9,500–6000/5800 BC
Early Neolithic 6000/5800–4500 BC
Middle Neolithic 4500–3500 BC
Late Neolithic 3500–3000 BC
Eneolithic 3000–2300
Early Bronze Age 2300–1700 BC
Middle Bronze Age 1700–1325/1300 BC
Recent Bronze Age
Final Bronze Age
Late Bronze Age
1325/1300–1175/1150 BC
1175/1150–1020/950 BC
Early Iron Age 1020/950–750/725 BC
Late Iron Age (Orientalizing Age) 750/725–580 BC
Archaic Period 580–480 BC
Post-Archaic Period 480–350 BC
Republican Period 350–30 BC
Early Imperial Period 30 BC–100 AD
Mid-Imperial Period 100–300 AD
Late Imperial Period 300–500 AD

One major caveat is that it was only possible to estimate a site’s two-dimensional spatial extent per phase (e.g. measured in hectares and visualised as a polygonal footprint on a map) for those larger multi-period sites that had also been extensively excavated and/or surveyed methodically. A total of 7,383 sites and 10,971 occupation phases have been collected using the above approach (with these numbers making it clear that many sites were occupied in multiple periods). In addition, although the wider definition of an archaeological site might refer not only to dwelling places, but can also to temporary activity areas (e.g. campsites), industrial zones (mines), and cemeteries for instance, here we prioritized data collection mainly to those places identified as human habitation sites or possible habitations. Nevertheless, we also recorded other types of sites (e.g. caves, necropolis, etc.) belonging to those cultural periods (e.g. Mesolithic and Neolithic) often outside the chronological scope of archaeological surveys or with ephemeral traces of dwelling places.

Radiocarbon data were collected in four different steps. In step 1, un-calibrated radiocarbon dates were collected from online sources and extant databases (e.g. EUBAR, RADON, EUROEVOL, etc.). In step 2, further dates were manually inputted after when we judge to be a fairly exhaustive search of published reports, journal articles, etc. (up to 2016). Step 3 involved a comprehensive check of all spatial coordinates in order to guarantee a maximum accuracy in geo-referencing. Step 4, unique SiteID has been assigned to each radiocarbon date in order to link all of them to a specific site of provenance. Where possible, all information related to the radiocarbon samples have been recorded such as “LabID”, “Material”, and “Species”. A total of 816 radiocarbon dates have been collected.

Sampling strategy

The datasets provided here were derived from existing publications spanning the chronological scope of interest. In particular, it is worth noting that the archaeological settlement data derives from excavations and surveys, particularly the latter, that involved a variety of different methods and investigative intensities. Some pedestrian field surveys, for example, have been far more intensive than others in central Italy. The collation of raw radiocarbon dates from all known published resources does not represent a systematic sample because biased by research budgets, research interests and the preference of excavators in dating later periods with, for instance, datable coins or fine-ware pottery.

Quality Control

All records of the attribute table of the settlement data have been checked and the cultural periods have been standardised wherever possible. The estimated sites extents have been refined by using multiple sources wherever possible, above all for the larger multi-period sites that had also been extensively excavated and/or surveyed systematically. We have checked and refined the site size estimates originally provided by published archaeological survey reports by calculating the area of the spatial footprint of these sites for individual phases as stored in the georeferenced vector polygons.

All radiocarbon dates have been checked for duplicates and standardised. For instance, some radiocarbon dates came from sites that had multiple site names and we assigned them a unique site ID in order to avoid duplication. Likewise, we have checked duplicated radiocarbon laboratory codes and removed those that refer to the same raw date. The approximate location of the sites from which the radiocarbon dates were taken has been added whenever possible.


The settlements data have been recorded as georeferenced polygons per cultural period and when the former was impossible, as circular buffers based on published estimates of site size per cultural period. Nevertheless, the estimated sizes have a different quality scale (from A to F) on the basis of the information provided by the original published sources (see the file sites.txt for a description of the size quality key). The start and end of the stated cultural period provide absolute calendric years for each site phase, based on typo-chronological schemes (themselves built from previous archaeologists assessment of diagnostic classes of material culture in the region). Of course, these chronological definitions have different degrees of uncertainty expressed in form of time-span of existence given the varying accuracy of artefacts (e.g. pottery) in dating site-phases and periods. Certain site durations can span several centuries if they were only dated via long-lived pottery types (e.g. ‘Neolithic’).

Although a considerable amount of time has been spent in cleaning and checking the radiocarbon dates, most of the raw dates have been collected from existing online repository and databases, while only a minority has been inputted manually from original sources. Therefore, some errors will have been inherited from these existing sources. The spatial coordinates of each uncalibrated radiocarbon date have been assigned a quality code to indicate our relative confidence in their accuracy. We manually georeferenced only those dates with missing and obviously wrong coordinates and have not checked one-by-one the Longitude and Latitude provided by existing databases.

The radiocarbon dates have been collected from a variety of existing sources with varying levels of published information. For instance, not all samples have published information about the cultural period, site context, sample material, sample species, etc., although we tried to recover as much missing information as possible. For example, only 75% of the listed dates have information about the sample material sample (e.g. charcoal, bone, wood, etc.), and only 25% have information about species (e.g. sample on wheat seeds, olive stones, etc.). In summary, the most complete and reliable fields in the radiocarbon data are the SiteID, LabID, SiteName, the conventional radiocarbon age (CRA) and the measurement error (Error).

(3) Dataset description

Object name

surveys – a vector polygon representing the spatial extent of the archaeological surveys from which settlements data have been collected (as .shp and with associated files).

sites – a set of three files respectively providing a vector polygon (in .shp and associated files) and a spreadsheet (.csv) of sites located in central Italy, and a field description for the attributes of the sites (.txt).

References – a file providing a list of references (.txt file) of published settlements data stored in the Surveys.shp and Sites.shp vector files (see field “Source”).

radiocarbon – a set of three files providing radiocarbon raw dates (.csv), a field description for the attributes of the radiocarbon samples (.txt), and a script for generating unnormalised and normalised Summed Probability Distribution (SPD) of radiocarbon dates (.R file).

timeseries – a script for generating time series of sites raw count, summed estimated settlement size, aoristic sum, randomised start date of sites, and SPD weighted randomised start date of sites (.R file).

spd_unnorm – a R object storing the summed probability distribution of cental Italian radiocarbon dates (.RData).

Data type

Primary and secondary data, and processed data from originally published materials.

Format names and versions

.csv, .shp, .txt, .R

Creation dates

The datasets were created in 2016–2017 as part of the Leverhulme Trust funded “Changing the face of the Mediterranean: land cover and population since the advent of farming” project.

Dataset Creators

The researcher responsible for the data entry was Alessio Palmisano. Radiocarbon data were also added by Janie Gammans and Martina Castagnacci, and cleaned and restructured by Andrew Bevan.





Repository location

The full datasets are available at https://doi.org/10.14324/000.ds.1575442.

Publication date


(4) Reuse potential

This data set represents what is to our knowledge the largest existing collation of archaeological settlement data and radiocarbon dates for central Italy from the Late Mesolithic (ca. 8,000 BC) to the fall of the Roman Empire (AD 500). The vector polygon (.shp) and the spreadsheet (.csv) data named Sites represent a very good quality (in our view) resource, providing basic georeferencing of central Italian settlements. The provision of settlement data as time-sliced polygons offers considerable potential for assessing changes in site location and extent, wider spatial configurations of sites in the landscape and regional site size hierarchies across time. The radiocarbon dates in turn provide an important opportunity for statistical analyses both for individual sites and in aggregate via techniques such as summed probability distribution (SPD), an established technique for inferring population fluctuations across time [11, 12, 13]. In other words, a SPD is the result of counting up (summed in the manner of a histogram) the calibrated raw radiocarbon years of each organic sample, which are expressed in form of probability statements with error ranges. It builds on the concept that the more people living in a given region, the more garbage, the more organic materials, and the more radiocarbon collected, and dated. The large size of the present dataset (7,383 sites, 10,971 site occupation phases, 816 radiocarbon dates) and its structure make it suitable for a wide range of spatio-temporal analyses, with the potential for careful handling of the underlying temporal uncertainty in this evidence [14, 15, 16]. The digital archive related to this paper also provides reproducible analysis in the form of two scripts written in R statistical computing language, which model population trends across time using both settlement and radiocarbon data (for more details about methods and approach see [17]). In archaeological demography, the divergences and convergences among the patterns defined by these two sets of data provide powerful insights and a wider range of explanations in describing population fluctuations both in the time span as a whole and in particular sub-periods [17].