The ARCHIPELAGO Archaeological Isotope Database for the Japanese Islands

ARCHIPELAGO is an archaeological and historical database of land and sea food resources utilised in the Japanese Islands. Here we present a dataset of human bone and hair carbon and nitrogen stable isotopes measurements from Japanese archaeological sites covering the time span from the Upper Palaeolithic to the mid-nineteenth century. Reflecting the results of over 30 years of research, the dataset contains 1476 entries and covers the entire Japanese archipelago, although the data are more highly concentrated in coastal regions.

(1) OVERVIEW CONTEXT ARCHIPELAGO is an integrated archaeological and historical database of land and sea food resources utilised by humans in the Japanese Islands. Here, we present the first dataset from this initiative: human bone and hair carbon and nitrogen stable isotope measurements from archaeological sites in the Japanese archipelago covering the temporal range from the Upper Palaeolithic until the end of the early modern Tokugawa period (ca. 19,000 BC to AD 1868).
The use of stable isotope analysis of archaeological human remains to reconstruct past diets began in the late 1970s [1,2]. The technique explores how certain food groups exhibit differences in their stable isotope compositions for certain chemical elements, most often carbon or nitrogen. The isotopic compositions of human tissues reflect those of consumed foods and stable carbon and nitrogen analysis of hair keratin and of collagen extracted from bone or teeth are particularly informative of protein sources, whereas stable carbon isotope measurements of human bone carbonate or tooth enamel reflect the mix of all food macronutrients [3,4]. By 1987, stable isotope analysis was being applied to Japanese prehistory by Brian Chisholm and Hiroko Koike [5,6,7] and by Takeru Akazawa and Masao Minagawa [8,9]. Minoru Yoneda, a student of Akazawa, began his research in the early 1990s with analyses of Kitamura and other sites in Nagano prefecture [10]. Hiroto Takamiya included isotopic analyses (conducted by Brian Chisholm) of sites in Okinawa and Hokkaido in his 1997 PhD dissertation [11]. Building on these early applications, stable isotope analysis has become widely employed in Japan over the last decade or so, in salvage archaeology as well as in university-based research projects. A brief history of isotope archaeology in Japan up to 2004 is provided by Chisholm [12]. Recent overviews of applications to the Neolithic Jōmon and Bronze Age Yayoi periods were published by Kusaka [13] and Yoneda and Yamazaki [14], respectively.
The results of stable isotope analyses have only gradually been incorporated into interpretations of Japanese archaeology. Many texts do not mention the technique at all [e.g., 15,16]. Some publications have used isotope analysis to explore the diversity of Jōmon diets in Neolithic Japan [17,18], but the potential of the technique for Japanese history and archaeology remains underused. According to Chisholm's summary of the major results of archaeological isotope analysis in Japan, by 2004 there were a number of broad conclusions that seemed to be supported [12]. The first was that C4 plants played, at best, only a minor role in human diets in the archipelago. This assumption reflects a rice-centred view of Japanese history and needs to be re-evaluated in light of growing evidence for millet cultivation and other economic activities from the Yayoi period onwards [19]. A second finding was a very high marine component in the diets of prehistoric populations in Hokkaido, especially those of the Iron Age/medieval Okhotsk culture. This seems to be generally supported by later research, although details regarding the role of trade still need further research [20,21]. Thirdly, by the early 2000s it was not yet possible to find clear evidence of a dietary change at the Jōmon-Yayoi transition when full-scale cereal farming reached Japan. This point certainly requires new analysis. Finally, except in Hokkaido, few gender differences in diets in prehistoric Japan had been recognised. Despite huge advances in the quantity of isotopic data from Japan since 2004, there remains a real need to investigate these and other questions of historical relevance.

Spatial coverage
The dataset covers the Japanese archipelago (Figure 1), a land area of 364,545 km 2 .  Temporal coverage ca. 20,000 BC to AD 1868. Table 1 provides the periodisation employed in our dataset. Samples from the Jōmon (Neolithic) and early modern Tokugawa periods were the most common (Figure 2).

STEPS
Published stable carbon and nitrogen stable isotope data (δ 13 C and δ 15 N) were collected for premodern Japan up to the Meiji Restoration (1868). The majority of samples were derived from excavated human skeletons but historic hair from the early modern Tokugawa period (1603-1868) was also included.
Where radiocarbon dates were not available, samples were cross-dated on the basis of pottery and other artefacts. Table 1 shows current widely-accepted dates for archaeological and historical periods in Japan, with the prehistoric chronology taken from Barnes [22].
Geographic coordinates follow those reported by the organisation responsible for the excavation as published in the site report. These were reported using the Japanese Geodetic Datum (JGD2000) or World Geodetic System 84 (WGS 84). The reporting of geographic coordinates in Japanese archaeological site reports became standard practice after 2004. Coordinates for sites published prior to that date were estimated from Google Earth with typically an estimated accuracy better than 10 km.

QUALITY CONTROL
Whenever provided we included in the isotopic collection the standard parameters (collagen yield, %C, %N, atomic C/N) for assessment of bone collagen preservation [23], the principal type of tissue included in our data collection. We did not exclude data for which reported parameter values were outside of the recommended range since such data can still be useful for sample preservation studies. Furthermore, such data can be easily filtered prior to a study on human diet.

CONSTRAINTS
Bone preservation is generally poor in the acid soils found in Japan. Shell middens, wetland sites and sites on limestone geology provide major exceptions and most of the Neolithic Jōmon samples are from coastal shell middens. Skeletal remains from the Kofun period (AD 250-710) are biased towards elite burials in burial mounds (kofun in Japanese). Under Buddhist influence, cremation was commonly practiced during the Nara (710-794) and Heian (794-1185) periods and human skeletal remains from these centuries are rare. Our initial compilation was focused on the study of ancient human diets using bulk stable isotope measurements (e.g. on bulk extracted bone collagen or hair keratin). Data from less commonly employed isotopic proxies (e.g. sulphur, hydrogen, etc.), from isotopic proxies relative to the study of human mobility (e.g. oxygen or strontium isotopes), single compound isotope measurements (e.g. amino acids), and isotopic measurements from archaeological plants or animals were not included. Planned future data collections will add these data.
It was not possible to complete data input for all individuals. When this occurred, the fields for which data was not available were left blank. For four entries chronological data was not available and in the corresponding field 'Period tags' a question mark was entered. We will update our data collection whenever new data for already recorded individuals becomes available.

(3) DATASET DESCRIPTION
The dataset consists of a single table (available as "Japan human SI data v2.csv" and as "Japan human SI data v2.xlsx") deposited at the data platform of the Pandora initiative (https://pandoradata.earth/) within the ARCHIPELAGO community (https://pandoradata.earth/ organization/archipelago). The ARCHIPELAGO isotopic database is also a member of the IsoMemo initiative which brings together a network of isotopic databases (https:// isomemo.com) which includes a Webapp for querying and modelling of isotopic data (https://isomemoapp.com/).
The data table consists of fields organized into thematic groups. Each data entry is identified by a unique sequential key (Entry_ID). The data submitter may include additional comments not covered by the existing fields (Comments), and identify the data submitter by name (ID_submitter).
The archaeological site and sample context are described in several fields. A site name (Site_name), short description of the type of site (Site_description), short description of burial context (Context_description), a context identifier as given in original publication (Context_ID), the name of the locality at which the site is located (Locality), the corresponding region (Region), and the site altitude in metres (Altitude). Latitude (Latitude) and longitude (Longitude) are given using the WGS84 metric coordinate system. Each archaeological individual from which the sample was taken is identified using the identification provided in the original publication (Individual_ID) followed by a short description of the burial (Burial_type_skeletal_ context). Additional sample description includes taxon (Taxon) and the corresponding name in common language (Taxon_common_name). Our dataset currently contains only human data but in the future we plan to expand it to include other taxa. Osteological information includes sex identification (Sex), a text description of age (Age_category_individual), numeric ranges in years for minimum (Min_age_individual) and maximum (Max_ age_individual) biological age of the individual at death, and the type of bone or hair material sampled from the individual (Sample_type).
Biological age categories for skeletal individuals followed the published reports. These reports used standard bioarchaeological categories based on dental and skeletal age [24]: infant: birth -3 years; child: 3-12 years; adolescent: 12-20 years; young adult: 20-35 years; middle adult: 35-50 years; old adult: 50 + years. In some cases, particularly with older publications, the ages of these categories may differ slightly. The age >55 is sometimes used for 'older adults' in the Japanese literature. Cases where the same values are reported for minimum and maximum individual ages represent average estimated age. Some of the studies used here report very precise age estimates for sub-adults [e.g., 25]. In such cases our dataset follows these estimates, which are derived from different methods described in the studies concerned.
The chronological range of the sample is given by a minimum age (Min_chronology) and maximum age (Max_ chronology) in years BC and AD with years BC expressed by negative numbers. Age assignment followed a hierarchical approach. Whenever available we employed direct dates from samples (e.g. radiocarbon dates, in which the calibrated 95% range is reported) or from coeval samples from the same archaeological context. If necessary, corrections for marine radiocarbon reservoir effects were applied on direct radiocarbon measurements of human bones (see below). A dataset field (Dietary_ model_selection) identifies the type of Bayesian model employed to estimate the dietary contributions from marine carbon. If no secure dating was available from the sample context, we employed the site's chronology given usually in the archaeological report. If this was also not available, we employed the full cultural range to which the sample was assigned. A field was used to identify the type of employed dating method (Dating_method). Also included were fields for uncalibrated direct radiocarbon dates on sample (14C), its uncertainty (14_unc). Period tags are also used to provide traditional chronological information (Period_tags).
Measurements of stable carbon (delta_13C_coll) and nitrogen (delta_15N_coll) isotopic ratios in bone collagen and hair keratin are reported together with measurement quality indicators, the percentage of elemental carbon (%C), the percentage of elemental nitrogen (%N), the carbon to nitrogen atomic ratio (C/N), and the collagen yield for bone samples (Collagen_yield).
A reference in the format author(s)/year of publication/ title identified the source publication or publications from where the data was collected (Reference), in addition to a link to the publication whenever available (Link), a Digital object identifier as a persistent identifier (DOI), and the publication date or dates (Publication_date). Macrons were not used for Japanese titles in the list.

(4) BAYESIAN MODELLING OF DIRECT HUMAN RADIOCARBON MEASUREMENTS
Our dataset contains 292 human samples for which chronological information is based on direct bone radiocarbon measurements. These measurements may be influenced by the consumption of aquatic foods, in particular marine foods from the seas surrounding Japan, which typically result in radiocarbon ages apparently older than the actual chronology of the analysed individual. This effect is known as a dietary marine radiocarbon reservoir effect (dietary MRE) and its correction requires an estimate of the contribution from marine carbon to human bone collagen plus an estimate of the MRE of consumed marine foods given that these can vary in space and time. Below we describe the use of Bayesian modelling to perform such a correction for the direct human radiocarbon measurements within our dataset.
Dietary estimates of marine carbon contributions to human bone collagen were obtained following similar procedures to those described in [26,27]. Briefly, we used the Bayesian software ReSources available via the IsoMemo Webapp (https://isomemoapp.com/) to generate the dietary estimates. This software is an upgraded version of the Bayesian software FRUITS allowing for the implementation of different Bayesian mixing model variants [28]. We considered three models: model 1) a model that relies only on δ 13 C measurements and included four food groups (C3 terrestrial plants, terrestrial mammals, marine fish and marine shellfish); model 2) a model with four food groups relying on both δ 13 C and δ 15 N measurements (C3 terrestrial plants, terrestrial mammals, marine fish and marine shellfish); model 3) a model with five food groups relying on both δ 13 C and δ 15 N measurements (C3 terrestrial plants, C4 terrestrial plants, terrestrial mammals, marine fish and marine shellfish). Models that do not include C4 plants (e.g., millets) typically allow for dietary estimates of higher precision given that these have similar δ 13 C values to those of marine foods.
Prior to the arrival of broomcorn and foxtail millet after 1000 BC, barnyard millet (Echinochloa utilis) was also likely cultivated by some Jōmon groups [29,30]. According to the radiocarbon database of the National Museum of Japanese History (https://www.rekihaku.ac.jp/ up-cgi/login.pl?p=param/esrd/db_param), the earliest directly dated Echinochloa remains from Japan are two seeds from Middle Jōmon Tomi-no-sawa (Aomori), here recalibrated to 95% credible intervals of 3006-2703 BC and 2879-2851 BC. However, most finds with direct dates are from the historic era. The majority of archaeological finds of barnyard millet also have a limited distribution in southwest Hokkaido/northeast Honshu. The contribution of Echinochloa to the overall prehistoric diet in Japan was probably limited and as such we only considered C4 plants as a potential significant food source from 1000 BC onwards.
In terms of model selection, for human samples having both reported δ 13 C and δ 15 N values and for which either one of their 95% credible intervals following calibration without any dietary corrections was equal or higher than 1200 BC we employed model 3, while for younger samples model 2 was employed. A reference cut-off value older than 1000 BC was taken since marine dietary intakes shift radiocarbon ages towards older values and thus mixed consumers of C4 plants and marine foods dating after 1000 BC could apparently date older from radiocarbon measurements. The average surface marine radiocarbon reservoir is c. 400 years and so our reference cut-off of 1200 BC would correspond to c. 50% marine carbon dietary contributions (this is an approximate estimate given potential differences in local marine radiocarbon reservoir effects and fluctuations in terrestrial and marine calibration curves). Such high levels of marine consumption are not expected following the introduction of farming. For 30 individuals only δ 13 C values were available. Fortunately for these cases the calibrated radiocarbon ranges prior to dietary corrections clearly separated individuals dating older or younger than 1200 BC. For individuals dating younger we applied no model as higher stable carbon isotopic ratios likely reflect millet over marine consumption while for the remainder of individuals we employed model 1. Within the dataset, individuals with direct bone radiocarbon dates are tagged as 'none', 'model 1', 'model 2', and 'model 3' under the field 'Dietary_model_selection'.
Typical measurement uncertainties for δ 13 C and δ 15 N are c. 0.2‰. However, given that our data compilation contains data produced by different labs we need to take into account inter-lab differences as reported in previous studies [33]. Furthermore, isotopic differences may also occur within a bone given different tissue renewal rates across it, which is relevant when comparing radiocarbon and stable isotope results as these may be sampled from different sections of a bone [34]. Different skeletal elements may also exhibit large isotopic differences due to different renewal rates, although for the most part it is likely that the same bone or similar bones were used for both radiocarbon and sable isotope analyses. In our modelling, we set the uncertainties for isotopic measurements in humans at 0.5‰.
A final aspect of modelling to be considered for Bayesian dietary modelling are the offsets between dietary macronutrients and human tissues and metabolic routing mechanisms. For human bone collagen δ 15 N we took as reference an offset of 5.5 ± 0.5‰ towards dietary protein [26]. In the case of human bone collagen δ 13 C we considered a routed model in which dietary protein contributed with 74 ± 4% of the collagen signal and the remaining 26% originated from dietary carbohydrates/lipids [4]. The employed δ 13 C offset between diet and human bone collagen was 4.8 ± 0.5‰. We also employed a Bayesian prior that limited the contribution of dietary protein to between 10 and 35% of total calories in accordance with physiological studies [35].
The ReSources software generates different dietary estimates but for chronological purposes estimates of marine carbon contribution towards bone collagen are the relevant ones. These were expressed as a mean and standard deviation and represent the sum of estimates obtained separately for marine fish and shellfish. Full specification of the models is given in R workspace files ("Model 1.Rdata", "Model 2.RData", "Model 3.RData") made available at the ARCHIPELAGO depository. Estimates were generated for all individuals irrespective of their C/N atomic ratio values. However, as mentioned previously, bone samples for which C/N atomic ratio values are outside acceptable ranges should not be employed for dietary studies and are included in our dataset only for preservation studies.
In addition to dietary estimates, a marine reservoir effect human dietary correction also requires an estimate of the magnitude of the MRE of consumed marine species. For these we lack approximate contemporaneous values for each individual. Thus, we relied on radiocarbon measurements on modern pre-bomb marine samples listed in the Marine Reservoir Correction Database [36]. Using this data, we produced an estimate of the spatial variations in marine ΔR (representing the local MRE offset from the marine calibration curve) along the coast of Japan (Figure 3) using the Bayesian model AverageR available via the IsoMemo Webapp [37,38]. Average and standard deviation values for ΔR of marine foods for each individual within the dataset were obtained by considering an area within a 100 km radius around assigned geographical coordinates.
Radiocarbon calibration was done using the Bayesian chronological software OxCal v. 4.4 [39]. We employed a mixed curved model consisting of the terrestrial IntCal20 calibration curve for the northern hemisphere [40] and the Marine20 marine calibration curve [41] where the contribution from Marine20 is the isotope-based normally distributed dietary estimate for each individual. Calibrated radiocarbon results for each individual are reported as the 95% higher posterior density interval in the dataset fields "Min_chronology" and "Max_ chronology".

CREATION DATES
Records created from November 2019 to May 2021. Figure 3 Bayesian estimates of the spatial distribution of marine ΔR around Japan relying on radiocarbon measurements on marine samples (x points) listed in the Marine Reservoir Correction Database [36].

DATASET CREATORS
The primary researcher responsible for the metadata structure and modelling was Ricardo Fernandes and for data collation was Mark Hudson.

(5) REUSE POTENTIAL
The collected dataset combines isotopic data, informative of diet, with chronological, osteological, cultural, and other types of archaeological and historical information. This provides the basis for future research on the association between ancient diets and socioeconomic status, cultural and religious choices, and local paleo-environmental and palaeo-climatic conditions, among others. We also aim at investigating spatial and diachronic trends in dietary patterns across Japan using the modelling tools available via IsoMemo (https:// isomemoapp.com/).