The use of Z-scores to facilitate morphometric comparisons between African Plio-Pleistocene hominin fossils: An example of method

FUNDING: National Research Foundation (South Africa) South Africa and East Africa each have a rich palaeoanthropological heritage, but the taxonomy of fossil hominins from these regions is controversial. In this study, two morphometric methods related to the quantification of variability in morphology have been applied to pairwise comparisons of linear measurements of hominoid crania and mandibles. The log-transformed standard error of the m-coefficient (‘log sem’) is calculated from linear regressions. Like Procrustes Distances (PD), log sem statistics can serve to quantify variation in the shape of a cranium or mandible in the context of a constellation of landmarks. In this study, PD and log sem statistics are integrated and standardised using Z-scores, and applied probabilistically to Plio-Pleistocene hominins. As a test case, OH 7 and OH 24 as reference specimens of Homo habilis are compared to fossils representing other taxa. There is a wide spectrum of variation in Z-scores for specimens attributed to early Homo dated within the period between circa 1.8 Ma and 2 Ma. In terms of morphometric variation predating 1.8 Ma, Z-scores (Z<2) for Australopithecus afarensis, A. africanus and Homo habilis display a small range of variability. This study serves as a demonstration of a method whereby log sem and PD can be used together to facilitate an objective assessment of morphological variability, applicable in palaeontological contexts.


Introduction
The taxonomy of fossil hominins appears increasingly more complex given recent discoveries and announcements of new species. 1 This is true for early hominins, with new Australopithecus species recently named, and also with respect to questions about the emergence of the genus Homo and of modern humans. Taxonomic attributions and affiliations are therefore one of the major issues in palaeoanthropology, including, for example, the debate between Tobias and Robinson regarding specimens attributed to Australopithecus or early Homo. [2][3][4][5] By 1992, Tobias 6 triumphantly declared that Homo habilis was 'widely accepted as a good taxon'. However, just a few years later, Wood and Collard 7 suggested that H. habilis as well as H. rudolfensis should be placed within Australopithecus. The development of analytical and statistical tools is required to help clarify the complex picture such as this (as an example), notably for fossil specimens that cannot be sampled for DNA analyses.
In this study, our aim was to present a methodological approach combining two morphometric methods to quantitatively assess the degree of variability among individuals and provide a probabilistic reference that can be applied to the hominin fossil record for clarification of taxonomic attribution. This approach has the potential to add value for taxonomic debates. As a test case, it is applied to specimens attributed to H. habilis and other Plio-Pleistocene taxa. The primary objective of our current morphometric analyses was to demonstrate the novel approach without as yet attempting to resolve particular problems of taxonomy.

Materials
A hominin mandible catalogued as OH 7 (virtual reconstruction 8 shown in Figure 1), attributed to H. habilis (the holotype), was discovered at Olduvai Gorge (Bed I) in Tanzania from deposits dated at 1.8 Million years ago (Ma). 9 OH 24 is a contemporaneous skull of the same species from the same site. 10,11 The two specimens can be used as reference material for morphometric comparisons with other fossils (mandibles or crania) attributed to Australopithecus africanus, A. afarensis, H. erectus and H. rudolfensis.
The materials used in this study (Table 1) relate primarily to fossils from three general time periods: firstly, specimens attributed to A. afarensis, circa 3 million years ago; secondly, specimens attributed to A. africanus, circa 2.5 Ma; and, thirdly, specimens attributed to early Homo, between circa 1.8 Ma and 2 Ma. A. afarensis mandibles from Ethiopia 12-14 include AL 288-1 ('Lucy'), AL 200, AL 822 and AL 400-1. A mandible of A. africanus from Sterkfontein in South Africa is Sts 52b, while Sts 5 (the skull of 'Mrs Ples') from the same site also represents the latter species. 15 Sts 71 (a partial skull) from Sterkfontein was originally attributed to A. africanus but has also been referred to as A. prometheus. 16

Methods
In a morphometric approach based on pairwise comparisons, Thackeray and Odes 29 calculated 'log sem' statistics associated with regression analyses to compare OH 24 with other crania, based on measurements from original specimens published by Wood. 23 The log sem results reflect variability in skull shape. In the case of the comparison between OH 24 and Sts 5 (almost complete skulls), 54 measurements are in common.
In the instance of OH 24 and Sts 71 (partial skull), 44 measurements are common to both specimens.
A log sem value was obtained from a comparison between our measurements of OH 7 and Sts 52b mandibles, using landmarks based mainly on points associated with mesiodistal and bucco-lingual diameters (excluding third lower molars because the specimens do not represent fully adult individuals). Forty measurements per specimen Scale = 2 cm   30 respectively. Sts 52b was selected for comparison in this study because of an apparent degree of morphological similarity with OH 7, as discussed by Tattersall. 31 Using Procrustes Distances (PD), Spoor et al. 8 compared Plio-Pleistocene hominin specimens with a focus on OH 7, based on more than 50 landmarks. For purposes of our study, we used PD data made available through Spoor and Gunz (personal communication to JFT, 2020). Here we integrate the two kinds of shape-related statistics (PD and log sem) by expressing them as standardised Z-scores in relation to data obtained from humans and extant great apes.
The AL 400-1 mandible of A. afarensis has been compared with OH 7 on the basis of the PD method. AL 400-1 has also been compared with OH 7 using the log sem statistic. The difference between Z-scores is expected to be relatively small if the approaches yield consistent results.

Log sem statistics for crania
The log sem statistic has been previously used in analyses of linear measurements obtained from crania of modern specimens in natural history museums [32][33][34] and Plio-Pleistocene hominins. 29,32,34 In this method, measurements are subjected to pairwise comparisons, using least squares linear regression to generate an equation of the form y= mx + c, where m is the slope and c is the intercept. In an initial study of pairs of specimens of the same (extant) species in 1997, Thackeray et al. 32 reported central tendency of the log-transformed standard error of the m-coefficient, known as 'log sem' which is a measure of the degree of scatter around the regression line, reflecting variability in shape. Central tendency of log sem was also discovered using larger samples, associated with a mean log sem value of -1.61 reported in 2007 by Thackeray. 33 At least for hominoids, the mean log sem value of -1.61±0.1 was recognised in 2016 as a typical degree of intraspecific morphological variation in extant species. 34 In response to views expressed by Gordon and Wood, 35 Thackeray and Dykes 34 emphasised the need to make pairwise comparisons with specimen A on the x-axis and specimen B on the y-axis, and vice versa. Two log sem values are obtained. The absolute difference between these values is termed 'delta log sem'. The mean delta log sem is small (generally ≤ 0.03) when pairs of specimens of the same species are compared. By contrast, delta log sem values are large (generally >> 0.03) when specimens of different species are compared. 34 Thackeray and Dykes 34 stated that the number of measurable dimensions (k) obtained from pairs of specimens should be maximised as far as possible to ensure robusticity of the log sem statistic. When this is done, with the number of measurements for pairwise comparisons being greater than 20, there is a tendency for the mean log sem to stabilise around a value of circa -1.6. 34,35 In their analyses of cranial measurements of Pan troglodytes, P. paniscus, Gorilla gorilla and H. sapiens (using more than 20 measurable dimensions as published by Gordon and Wood 35 ), Thackeray and Dykes 34 obtained the following results from pairwise comparisons: 1. Mean log sem = -1.612±0.129 (n=8072 pairwise regressions) reflects what is considered to be a typical degree of intraspecific variation within hominoids.
These results, based on a very large number of regressions, clearly show that there is indeed a significant difference between the log sem values calculated for intraspecific and interspecific comparisons, which is related to similarity (or dissimilarity) in shape.
A criticism that has been levelled against the log sem approach relates to which variables are being measured. Remarkable as it may seem, the degree of intraspecific variability reflected by a mean log sem value of circa -1.61±0.1 has been obtained, not only from cranial variables, but also from measurements from teeth. 34 Bookstein 36 and Duta 37 have described the method whereby PD are calculated, reflecting differences in shape between objects (in this case, mandibles). PD statistics serve to quantify the difference between landmarks by using the square root of the sum of squared differences in positions of those landmarks.

Procrustes Distances for mandibles
Spoor et al. 8 calculated PD values from pairwise comparisons of OH 7 and other hominins, using landmarks indicated in their Fig. 2f and 2g. PD were also calculated for purposes of comparisons with H. sapiens, P. troglodytes and G. gorilla. The mean PD for the extant hominoids, based on data obtained by Spoor and Gunz (personal communication to JFT, 2020), provides a 'within group' (conspecific) frame of reference.

Z-scores
The mean and standard deviation for PD for pairwise comparisons of extant hominoids (0.089±0.021) are analogous to the mean and standard deviation for log sem values for extant hominoids (-1.61±0.1). These two means can both be related to probabilistic Z-scores, where the Z value of 0 corresponds to the mean value of 0.089 (PD) and also to the mean log sem value of -1.61. One standard deviation above or below the mean is circa 1 and -1, respectively. Likewise, two standard deviations above or below a mean Z-value of 0 correspond to Z-score values of

Results
Results from this study are presented in Table 1. Figure 2 is a visualisation of the results without a time scale.
The AL 400-1 mandible of A. afarensis has a Z-value of 0.71 when it is compared with OH 7 using the PD statistical method. A Z-value of 1.0 is obtained when AL 400-1 is compared to OH 7 using the log sem statistical method. The difference of only 0.29 is relatively small, reflecting consistency.
As indicated in Table 1 and Figure 2, the Z-score for the Sts 52b jaw (compared to the OH 7 mandible) is similar to that which has been obtained for the Sts 5 skull (compared to the OH 24 cranium), and similar also to the Z-score for the Sts 71 skull (also compared to OH 24).
In Figure 2

Discussion and conclusions
We integrated two statistics (PD and log sem) by expressing them as standardised Z-scores. As the important type specimen of H. habilis, the OH 7 mandible has been used as a frame of reference for quantifying PD between it and other hominin mandibles. The contemporary OH 24 skull has been used for purposes of calculating log sem statistics in the context of pairwise comparisons between it and other hominin skulls (notably Sts 5 and Sts 71) from South Africa. With OH 7 and OH 24 representing the same species, measurements of mandibles and skulls have been used to obtain PD and log sem statistics, respectively, expressed on a common Z-score scale relative to values for conspecific extant hominoids.
There is a wide spectrum of variation in Z-scores for Tanzanian and Kenyan specimens attributed to early Homo dated within the period between circa 1. We recognise that our study is based on data derived from only a few specimens within the hypodigms of certain taxa (especially with respect to the two Australopithecus species). In addition, we are including only data from the cranium and mandible, and, more narrowly, the mandibular data only reflect the shape of the dental arcade in the context of a constellation of landmarks. However, we have demonstrated a method whereby Z-scores allow us to integrate data from mandibles such as OH 7, Sts 52b and AL 288-1 ('Lucy') and skulls such as OH 24, Sts 71 and Sts 5 ('Mrs Ples'), in a probabilistic context.
Ideally, probabilistic approaches (as in the use of Z-scores) can be used to support one potential taxonomic attribution over another, as examples of sigma taxonomy, 39 defined as 'the classification of taxa in terms of probabilities of conspecificity, without assuming distinct boundaries between species', whereas alpha taxonomy generally does assume clear limits. 39,40 The probabilistic method of the kind presented in this study can supplement alpha taxonomy by providing an objective assessment of morphological variability, applicable in palaeontological contexts. One of the limitations relates to the fact that fossils are often fragmentary such that the number of measurable dimensions (k) is relatively small. Ideally log sem and PD values should be calculated from complete and undistorted specimens. Despite these limitations, we recommend that our approach using Z-scores should be explored further in the context of additional cases, to include assessment of the transition between Australopithecus and Homo.