The MedAfriCarbon Radiocarbon Database and Web Application. Archaeological Dynamics in Mediterranean Africa, ca. 9600–700 BC

The MedAfrica Project The MedAfrica Project (Leverhulme Trust, RPG-2016-261; PI: Cyprian Broodbank, RA Giulio Lucarini; duration from 2017 to 2019) set out to produce the first comprehensive, empirical and interpretative synthesis of long-term social and economic dynamics on the African flank of the Mediterranean between the beginning of the Holocene (ca. 9600 BC) and the arrival of Phoenicians and Greeks (800–600 BC), and to identify major factors shaping the patterns detected. The project aimed to answer the following four research questions:

1. What were the principal ways of life across this region and timespan, and why was farming apparently so limited and late in uptake? 2. What do internal interactions reveal about changing forms of mobility and exchange, as well as intensity of connectivity and isolation, and with what implications for socio-economic activity and possibly identities?
3. Why does the archaeological evidence reduce so drastically between the 4 th millennium BC and the colonial Iron Age? 4. Was Mediterranean African pre-Phoenician maritime engagement (beyond the obvious exception of the Nile Delta) as limited as assumed by the current mainstream narrative?
To address these questions, a comprehensive up-to-date database of published radiocarbon dates was assembled and, more unusually, these were also given systematic associations with key cultural markers (e.g. the co-presence or absence of domestic/wild species and various material culture traits), allowing, for the first time in a generation, overall chronological and cultural trajectories for the prehistory of this region to emerge. These questions have been explored as part of the interpretative synthesis presented in [3]. Here, in contrast, we present the raw database of, currently, 1584 fully-specified archaeological radiocarbon dates (plus 3 dates published

DATA PAPER
The MedAfriCarbon Radiocarbon Database and Web Application. Archaeological Dynamics in Mediterranean Africa, ca. 9600-700 BC without error) from 367 sites (plus one site, Fontaine Rahhal, Morocco, whose only associated date was one which was published without error) upon which that synthesis was based, alongside an associated web application (hereafter web app), which facilitates data exploration and informal analysis. The density of samples from North Africa is much lower than in most other parts of the Mediterranean (cf. [9]), and as with other regions its spatial and chronological distribution reveals several research biases. For example, almost a third of the total dates come from early Egyptian royal and elite cemeteries along the Nile between Abu Roash and Illahun. In contrast there are no radiocarbon dates from the Sirte hinterland, an area that, since the last century, has largely remained terra incognita. By collating all published radiocarbon dates, the MedAfriCarbon database nonetheless offers a novel resource with which to approach the region holistically, without the constraints of local periodisations and with a deep time perspective.

Spatial coverage
The dataset covers the entirety of Mediterranean Africa (Figure 1), an area that we define as running from Atlantic Morocco to the Suez isthmus, bordered by the sea-coast and a few offshore islands to the north, with the southern border shifting due to the fluctuations of Holocene environments in the Sahara, but roughly running along the Atlas ranges in the west, and as far south as the vertex of the Nile Delta and the Fayum in the far east. Figure 1 shows the study area and the distribution of the archaeological sites that yielded 14C dates. The coordinates (in WGS84 decimal degrees) of the minimumbounding box defined by the site coordinates are as follows:

2) Methods: MedAfriCarbon database and web-app
The creation of the MedAfriCarbon database was enabled by a desk-based synthesis of published radiocarbon dates, with the further addition of presence/absence data about major cultural traits, as well as faunal and botanical remains by species and domestic/wild status for each site.
The MedAfriCarbon database includes all known and published archaeological 14C dates within the study area, with the exception of those dates that are not associated with anthropic activities. In terms of existing date-lists, the Egyptian Radiocarbon Database (ORAU, Oxford, https:// c14.arch.ox.ac.uk/egyptdb/db.php) was particularly relevant for dates associated with Egyptian Dynastic contexts. The lead author also checked a total of c. 2000 published papers about North Africa's Late Pleistocene/Holocene archaeology to extract published 14C dates. Among these, 385 references (publications and databases) reported 14C from Mediterranean African sites and/or information about economic and cultural associations, and these were all entered in the MedAfriCarbon bibliography. For each date, we reported not only its first publication, but also all subsequent works referencing it. This enabled the gathering of a considerable amount of contextual information associated with each date. This process also ensured data quality control, allowing a cross-check of all attributes associated with the radiocarbon dates and detection of possible inconsistencies in the published information. Unresolved inconsistencies were reported in the "Problems" field and further details were included in the "Notes" field, both contained in the "dateTable" table. The only dates where we limited the number of references to a single one were those associated with Egyptian Dynastic Funerary contexts, which are numerous, well-studied and seemed therefore a lesser priority than rebalancing our study toward other parts of Mediterranean Africa. We also included 21 dates that were published without a lab code; we gave them a purpose-made lab code (NoLabID) followed by a sequential number from 1 to 21 (e.g. NoLabID-01). Three dates were published without the uncalibrated error and were included in the database, but excluded from our summed probability distribution analyses.
Retrieving geographical coordinates recorded over the last 50 years required the integration of multiple location descriptions and geographic references. In most cases, the identification was straightforward, and coordinates were confirmed by comparison with maps and satellite images (e.g. Google Earth). In other cases, the process was harder, particularly where locations had been published with obsolete or uncommon geographic systems and data formats. The diversity of standards and formats reflects the local history of modern African countries. A tangible sign of the implicit effects of map-based cultural colonialism can be perceived by the fact that only a few publications report information on the format and system the coordinates refer to, implying that the system requires no explanation.
All the coordinates found in the literature and included in the database have been converted and recorded using the common geographic latitude-longitude reference system, in decimal degrees and using the WGS84 ellipsoid (EPSG 4326). The two coordinate fields are followed by a field ("Location_Quality") reporting the coordinate quality in a scale from A (highest accuracy) to D (lowest accuracy). Level A was assigned to published coordinates whose high quality was confirmed after direct contact with colleague(s) who excavated the sites; or were easily identifiable on Google Earth (e.g. the Haua Fteah Cave). Level B was assigned to published coordinates whose quality was not confirmed by colleague(s) who excavated the site. Level C was assigned to sites whose coordinates were estimated to be within a range of +/-1000 m; this was also the case of sites with unknown coordinates, but whose location was shown on a published map that we manually georeferenced. Level D was given to sites whose exact position is unknown and reported coordinates often correspond to the location of the closest inhabited centre.
The definition of site phases allows the grouping of radiocarbon dates within the same site according to their chronological proximity, and enables consistency in terms of the attribution of cultural and environmental attributes. The phase of a given site was uniquely identified (using a key "Phase_ID") using a combination of the site code followed by a sequential number corresponding to the number of phases detected, and with 1 assigned to the oldest phase (e.g. MA001-1). When available, phases were matched to the ones proposed by the site excavators. When this information was not available, we arbitrarily created phases by grouping all radiocarbon dates included within a 200-year time span and assigning to them the same cultural and environmental associations/attributes. In order to discern between these cases, we included the field "Phase_by_Excavator", set to either "yes", "no", or "unique", with the last option referring to sites yielding a single radiocarbon date.
Cultural and environmental associations recorded in the "phase" table include presence/absence indications for several domestic/wild species and selected generic material culture types. For each attribute, the recorded options were: "yes" (definite presence); "<" (definite presence in low frequency); "?", (unconfirmed presence); "no", (absence of evidence); and "n/a", which refers to cases when a specific study/analysis has not been carried out yet and so no data are available. Given their very specific cultural context, we decided not to record economic and cultural associations for the 14C dates coming from the Egyptian royal and elite cemeteries between Abu Roash and Illahun. Their cultural association cells were therefore left blank. Information about cultural association was drawn from literature cited and linked to the culture table.
Site phases were also assigned to one or more cultural periods, as defined by the scholar(s) who excavated the site. In order to incorporate multiple interpretations and cultural affiliation provided by different scholars, a many to many relationship was established between phases and cultural periods.

3) Dataset Description Object name
The database is made up of four main tables: 1) Date; 2) Site; 3) Phase; 4) Cultural Phase; plus a BibTex format bibliography and a number of link tables (Figure 2). Although the project database was relational, the deposited versions of the tables are presented as individual CSV (comma separated values) format files or, in the case of the bibliographic database, in BibTeX format. Schemas and the meaning of options are further defined in more detail as a set of CSV files (under metadata/schema_ or options_).

Date table ("dateTable.csv")
The date table contains core information pertaining to each radiocarbon date: a unique identifier (based on the standard laboratory identifiers, or alternative identifiers), the CRA (Conventional Radiocarbon Age), the associated error, isotopic signatures when available (e.g. δ 13 C values), the material dated, the dating method, suggested calibration curve, local reservoir 14C value and local reservoir 14 C error. The date table also includes the fields "Sample_ID", which reports, when available, the label of the dated sample, and "Site_Context", which includes all the available information relative to the exact context of provenance (area, square, stratigraphic unit) of the dated sample within the site and any remaining "Problems" or further "Notes".
Date entries have a many-to-one relationship with Phase, and hence with Site (i.e. there are more than one date can be linked to Phase and Site), via the field "Phase_ ID". This provides the primary contextual and geographic links between dates. Publications in which a given piece of data is first published or referred to are linked to individual dates in a many-to-many link table ("DateRefLink.csv").

Site table ("siteTable.csv")
The site table contains core spatial and typological information regarding the sites from which dates in the date table have been sampled. This includes the name of the site, a unique abbreviated identifier (made up of the international code of the country where the site is located followed by a sequential number, e.g. MA001), the type of site (e.g. open-air, cave, shell midden, funerary complex, etc.), the modern country where the site is located (indicated by the ISO 3166-1 alpha-2 international country code), and a single geographical coordinate pair representing the site geometrically as a point. Toponyms were used transliterated from Arabic and Berber, or corresponding to the Classical name of the site. We did not usually face major problems of inconsistency in the ways these were quoted in the different publications. When inconsistencies were present, they were minor, and never prevented us from correctly linking the two (or more) versions in which they were quoted; in such cases we used the most common version reported in literature. Most radiocarbon dates were directly associated with specific sites. Only in few cases (in particular Biskra [DZ], Djerid [TN], Illahun [EG], Lisht [EG], and Saqqara [EG]), were 14C dates reported as coming from a general area and not from a particular site. Each of these areas was associated with slightly different coordinates in the source publication. In order to keep track of these different coordinates we decide to name the sites using the name of the area followed by a sequential number in square brackets (e.g. "Illahun [1]").

Phase table ("phaseTable.csv")
The phase table contains core cultural and environmental association information for a particular, project-defined chronological phase at a site.
Phases have a many-to-one relationship with Sites (i.e. there are multiple phases linked to the same site) via the Site_ID field, and a one-to-many with Dates (i.e. there are multiple dates linked to the same phase).
Phases are also linked to Cultures through a culture link table ("cultureLink.csv").

Culture table ("cultureTable.csv")
Dates are also associated with traditional cultures or cultural phase definitions (e.g. "Bronze Age"). Naturally these concepts should be treated with caution, but they are included since they are so well embedded in the literature. In order to facilitate the grouping of different cultural classifications, which fall within the same macro-cultural definition, we decided to group cultures hierarchically, up to 4 levels of details (e.g. level 1: Bronze Age; 2: Middle Kingdom; 3: Dynasty 12; 4: Amenemhet III).
The culture table links to a cultural reference link table ("cultureRefLink.csv"), which includes the BibTex reference to publications reporting general data about cultural or economic contextual information about this culture or cultural phase -but not necessarily radiocarbon dates.

Other contents of the MedAfriCarbon deposit directory
• README.md -introductory documentation to database and to format of metadata schema files

MedAfriCarbon web app
The MedAfricaCarbon web app provides a friendly graphical interface through which to explore the database, based on the Shiny platform [4], and running on a server at the University of Cambridge (https://theia.arch.cam.ac.uk/ MedAfriCarbon). The app is structured as a tabbed dashboard allowing spatial, temporal, and attribute based queries, on-the-fly calibration of each date, the creation of summed probability distribution (SPD) of radiocarbon dates analyses (via the rcarbon package [2]), as well access to the linked bibliography and custom-downloads of selected subsets.

Creation Dates
Records created from January 2017 to October 2019 as part of the Leverhulme Trust funded MedAfrica Project -Archaeological deep history and dynamics of Mediterranean Africa, ca. 9600-700 BC.

Dataset Creators
The primary researcher responsible for the data collation was Giulio Lucarini.

4) Reuse Potential
Large 14C databases have been widely published in recent years and often serve as the basis for creating summed probability distributions (SPD), a type of analysis that has been used as a proxy for exploring long-term population change [8,11]. For the Mediterranean and Saharan regions, several SPD-led analyses have been conducted in the past [5,6,9], and have also been the object of criticism, mainly on the grounds that they predominantly reflect research bias toward certain regions over others (e.g. [10], but see [7]). Most published radiocarbon databases provide little to no information beyond details of the dated sample. The MedAfriCarbon database and app is unusual and more widely re-usable than most, because it integrates radiocarbon dates with economic and cultural variables, allowing a more contextual use of the ' dates as data' approach, not limited to the reconstruction of prehistoric population change (cf. [1]), but also providing the basis for defining the movement and diffusion of particular species as well as the cultural dynamics of the groups who populated Mediterranean Africa during the Holocene. The scope of this work can be extended to the Saharan and Sahelian regions, and possibly linked with the existing databases for the northern coast of the Mediterranean.
Colleagues who wish to share freshly published archaeological 14C dates from the Mediterranean Africa, or to report any missing or incorrect information in the published dataset of dates and economic/cultural associations, are welcome to contact Giulio Lucarini (giulio.lucarini@cnr.it).