(1) Overview


The dataset was collected in the scope of the ‘Lifestyle as an Unintentional Identity in the Neolithic’ project and currently represents (to the best of our knowledge) the largest collection of data on Neolithic settlement activities in Central Europe. The geographical scope is limited to the Morava River basin and the eastern part of Bohemia, i.e., the Czech Republic and small parts of Austria and Slovakia. See Figure 1 for the distribution of sampled sites and a precise definition of the study area. The time period of interest spans from the end of the Linear Pottery culture (LBK) at around 4900 BCE to the end of the Funnel Beaker culture (TRB) around 3300 BCE, i.e., the Neolithic and Eneolithic periods.

Map showing the distribution of sites (black circles); the regions included in the study are highlighted as not covered with a transparent white overlay
Figure 1 

Map showing the distribution of sites (black circles); the regions included in the study are highlighted as not covered with a transparent white overlay. Main rivers in dark blue, borders of countries in a dashed black line.

Site locations are represented as points (see Figure 1) with associated information on relative chronology. This information is present in two scales of detail: (1) general pottery traditions (cultures) and (2) detailed pottery groups (culture phases). These chronological phases are also generalised into slices of time (periods) of 200 years each. The covered general pottery traditions are (in chronological order) the Stroked Pottery culture (SBK), the Lengyel culture (LgK), the Proto-Eneolithic (includes the Jordanów culture and the Michelsberg culture) and the Funnel Beaker culture (TRB). With detailed pottery groups, the early and late phases of the given pottery cultures are differentiated in SBK, LgK and TRB.

The dataset discussed here represents a part of the ‘Lifestyle as an Unintentional Identity in the Neolithic’ project focusing on settlement patterns in the studied regions. Other parts of the project focus on the diet of the given populations through the study of stable isotopes and tooth microwear and chronological modelling of pottery styles based on radiocarbon data [1]. The density of radiocarbon dates is still low for some of the studied periods and/or regions and thus it was not yet possible to fully integrate these two types of evidence. For a comprehensive collection of radiocarbon dates from the Czech Republic see [2].

The project’s most important finding is that the dynamics of settlement patterns in two separate regions (the eastern part of Bohemia and the Morava River basin) with similar environmental conditions of temperate Europe are not synchronous. Changes in settlement patterns are not necessarily attributed to changes in archaeological culture or symbolic pottery style, respectively.

Spatial coverage

Description: Central Europe, Czech Republic (Morava River basin, eastern part of Bohemia), small parts of Slovakia and Lower Austria.

Boundary coordinates are given in the World Geodetic System (WGS) 1984 (EPSG: 4326), while the coordinate system of the dataset is S-JTSK/Krovak East North (EPSG: 5514).

Northern boundary: 50.60559

Southern boundary: 48.19618

Eastern boundary: 17.84843

Western boundary: 14.34401

Temporal coverage

4900 BCE to 3300 BCE, i.e., from the post-LBK Neolithic period to the end of TRB in Central Europe.

(2) Methods

Settlement site is defined here with a minimalist approach as the repetitive occurrence of pottery fragments in space.


The data was collected from heterogeneous sources, mainly published works (see list of cited works in the references.csv file), excavation and find reports deposited in the archives of the Institute of Archaeology, Czech Academy of Sciences, Brno; the Institute of Archaeology, Slovak Academy of Sciences; legacy databases, i.e., the National Heritage Monument List of the Czech Republic, unpublished records of various museums (Moravian Museum, Regional Museum in Mikulov, Masaryk Museum in Hodonín, Boskovice Region Museum, Vyškov Region Museum) and sporadically from personal communications. Each site was defined in space by a point located in the centre of a reported accumulation of Neolithic pottery fragments and other finds. The minimum distance to distinguish between two sites was arbitrarily set at 200 m between the edges of bordering pottery accumulations. Cave sites were not taken into account because these sites are not considered as settlement sites in our definition and often lie outside the main settlement region. The reported relative chronological characteristics derived from the typology of the pottery were recorded for each of the sites. The locations of raw material sources were derived from [3].

The data was collected using various versions of LibreOffice Calc and QGIS, and all the subsequent data processing and manipulation was performed in R [4]. This included standardisation of the relative chronology terminology, assigning unique identifiers and preserving them across spatial data and the flat tables, etc.

Sampling strategy

The dataset was derived from the aforementioned existing published, unpublished and legacy sources and only sites that we were able to locate in space with good precision were included. The strategy was to cover the selected study area (see the polygon in the study_area.gml file) in order to gather as much data as possible. After the initial phase of data gathering, the study region was limited to the eastern part of Bohemia and the Morava River basin area (see Figure 1 and the polygons in regions.gml file). Nonetheless, there are biases present in the dataset resulting from different intensities and practices of archaeological research conducted in different regions and/or states included in the study area caused by various factors, e.g., research interests, distance to large cities, budgets, data management and archiving etc. See Issue 58 of Internet Archaeology for details on how archaeological data is managed in Austria [5], the Czech Republic [6] and Slovakia [7].

Quality Control

Only settlement sites that we were able to locate in space with a precision greater than 1 km are included in the dataset. Sites with problematic information and uncertainties in spatial and/or chronological definitions were not included in the published version of the database. We employ standardised vocabularies across the database and, where possible, the chronological phases were linked to Periods Vocabulary of the Archaeological Map of the Czech Republic (AMCR) at Periodo (http://n2t.net/ark:/99152/p0wctqt). The data were extensively checked to eliminate duplicate records and to prevent the omission of sites with identical names in literature but with different space definitions.

The data were not automatically adopted from legacy databases, but records were checked one by one; therefore, some were excluded and the database as a whole need not fully correspond to the given source.


Each settlement site is represented only as a point in space and the extent or size of the original settlement is not taken into account. This is because this information is seldom available but it brings certain biases, because large settlements with enclosures, etc., are represented in the same way as a cluster of potsherds collected during a fieldwalking survey.

Another constraint is that some sites are repeatedly mentioned under different names and, on the other hand, different sites can be recorded under one local name covering a large spatial extent. Although this was controlled for in the dataset, it must be acknowledged.

(3) Dataset description

Object name

  • sites.csv – is a main list of sites with unique identifiers in the ID field; an ID starting with B means the site is from the eastern part of the Bohemia section of the study area, while in the case of an ID starting with M the site is from the Morava River basin area. The orig_id field contains an identifier by which the site is referenced in cited works and the field site contains the site name;
  • pot_traditions.csv – contains site IDs, field chrono giving the general pottery tradition and field period listing the occurrence of the site in one of the nine time slices;
  • pot_groups.csv – has the same fields as the previous file, with a difference in the chrono field containing information on detailed pottery groups;
  • references.csv – a list of references, where possible, the excavation reports are linked to their source in the Digital Archive of the Archaeological Map of the Czech Republic (https://digiarchiv.aiscr.cz/). Column ref_id is linked through file references_sites.csv to the database of sites in the sites.csv file:
  • references_sites.csv – connects files references.csv and sites.csv.

Geodata (in S-JTSK/Krovak East North coordinate reference system):

  • site_locations.gml and site_locations.xsd – settlement sites locations. The ID field gives a unique identifier for each site. Column accuracy indicates how accurately the site location is defined; value 1 means an accurate location (instrumentally measured); value 2 is precision in hundreds of metres, i.e., the site location is known by the local name, street name, etc.; and value 3 means the location is not very accurate, in a roughly 1-km range. The field surface is TRUE if the site is only defined based on a surface survey, and field altitude gives altitude in metres;
  • study_area.gml and study_area.xsd – polygon giving the borders of the initial study area where data was collected. For any analysis, spatial extent given in the regions.gml file should be used;
  • regions.gml and regions.xsd – polygons defining the extent of studied regions (i.e., the eastern part of Bohemia and the Morava River basin). The polygons were created by buffering the site locations in 10-km range and the resulting polygons were cropped by the polygon of the study area (study_area.gml file);
  • raw_material_sources.gml and raw_material_sources.xsd – locations of raw material sources as points or lines. Points are based on places where prehistoric procurement activities are known or outcrops of the given raw materials are present. Lines give the border of the raw material occurrence in the case of erratic flint or river courses in which the raw materials can be procured. The rm column gives an abbreviated name of the raw material and the type field is either l for chipped stone tools or p for polished stone tools.


  • voc_periods.csv – contains period labels;
  • voc_pot_traditions.csv – contains pottery traditions labels, where possible, field periodo_link maps the period to AMCR Periods Vocabulary at Periodo (http://n2t.net/ark:/99152/p0wctqt);
  • voc_pot_groups.csv – contains pottery groups labels, same fields as previous file;
  • voc_pot_groups_facets.csv – general labels for pottery groups;
  • voc_raw_materials.csv – list of raw material abbreviations in the rm column of raw_material_sources.gml file with full names derived from [3].

Data type

Secondary and processed data collected largely from published studies (for a detailed list of citations, see file references.csv) and excavation reports.

Format names and versions


Flat tables are represented as comma-separated values (CSV), spatial information is in Geographic Markup Language (GML) file format, i.e., a .gml file and an associated schema definition file (.xsd).

Creation dates

The dataset was created in several phases between 2010 and 2020. At first, data in the Thaya River basin, a segment of the Morava River basin, were collected as part of the PhD thesis of the second author [8]. Later, in the course of the ‘Lifestyle as an Unintentional Identity in the Neolithic’ project (2019 to 2021), the geographical scope was expanded to the whole Morava River basin and eastern part of Bohemia.

Dataset Creators

František Trampota (Department of Archaeology and Museology, Masaryk University) collected most of the data.

Pavel Burgert (Institute of Archaeology Prague, Czech Academy of Sciences) provided some of the data for the eastern part of Bohemia.


English, site names in Czech (or Slovak and German in the respective areas).


CC-BY 4.0, Creative Commons Attribution 4.0 International License.

Repository location

The data is deposited at the Zenodo repository under DOI https://doi.org/10.5281/zenodo.5653180.

Publication date

Version 1.1.0 (DOI https://doi.org/10.5281/zenodo.5768049) of the dataset was published 8/12/2021.

(4) Reuse potential

The data was initially collected for the analysis of first-order effects [9]. It is thus well suited for a thorough exploration of relationships between sites and the environmental variables. Predictive modelling and site location analysis is one of the options for further work with the dataset. An analysis of present settlement patterns, i.e., the underlying second-order properties and point patterns [9], is a possibility that should not be overlooked.

Thanks to a very general data model, the data can be easily aggregated and analysed together with data on settlement activity in different time periods and models covering larger time scales can be derived. The dataset or its subsets can also be easily used in teaching GIS-based methods of analysis in archaeology.

The reuse potential is somewhat limited by the temporal uncertainty in this evidence given that the time dimension is represented only by relative chronological phases. This is because radiocarbon dates are still largely missing for some of the chronological phases and/or regions [1, 2] and enhancing this settlement data with radiocarbon dates would show temporal dynamics in unprecedented detail.