Statistical classification of South African seasonal divisions on the basis of daily temperature data

FUNDING: University of the Free State; DSI-NRF Centre of Excellence for Palaeoscience Across South Africa, a wide range of activities is influenced by differences in seasonality. In a South African context, there is little consensus on the timing of seasonal boundaries. Inconsistency exists through the use of ad-hoc approaches to define seasonal boundaries across South Africa. In this paper, we present one of the very first uniform statistical classifications of South African seasonal divisions on the basis of daily temperature data. Daily maximum and minimum temperature data were obtained from 35 selected South African Weather Service meteorological stations that had sufficiently complete data sets and homogeneous time series, spanning the period 1980–2015. An Euclidean cluster analysis was performed using Ward’s D method. We found that the majority of the stations can be classified into four distinct seasons, with the remaining 12 stations’ data best classified into three seasons, using Tavg as the classifier. The statistically classified seasonal brackets include summer (October/November/December/ January/February/March), early autumn (April) and late autumn (May), winter (June/July/August), and spring (September). Exploring the boundaries of seasons, the start of summer and end of winter months follow a southwest to northeastwards spatial pattern across the country. Summers start later and winters end later in the southwestern parts of the country, whereas in the northeast, summers start earlier and winters end earlier.


Introduction
Seasonal differences in climate, day length, and plant activity form the primary environmental control on a wide range of activities. These activities include economic and agricultural practices 1 , which are affected by the length and timing of growing seasons 2,3 , the related timing of sowing and harvest, and the necessity for irrigation and fertilisation. Climatic seasons also influence resource management and energy demand 4 , tourism 5 , social and economic planning 6 , hydrology 1 and health 7 . The capacity to accurately determine the start and end of seasons is thus of critical importance. However, in the South African context, there is little consensus as to distinct seasonal boundaries. This lack of consensus is unusual. For most parts of the northern hemisphere 8,9 -including Italy 10 , USA 11 , China 12 , Poland 13 and Finland 14 -seasonal boundaries are well established and clearly communicated.
South Africa is classified as a semi-arid country 15 which is situated in the mid-latitudes and subtropics 16,17 . The South African climate is influenced by major synoptic systems, including the semi-permanent subtropical highpressure systems 16 and the variability of the Inter-Tropical Convergence Zone (ITCZ) [18][19][20] . The resultant continental anticyclones, ridging anticyclones, westerly waves, tropical easterly waves, and cut-off lows 16,21,22 , produce a pronounced climatic seasonality across the country 20 . However, there is little consensus on the timing of seasons in South Africa, and approaches in defining seasonal boundaries often vary on the basis of the application (Table 1). 18 The South African Weather Service (SAWS) even highlights that there is no official designation and definitions of seasons. 23 The most basic classifications commonly used are astronomical, meteorological and phenological divisions. 23 Astronomical summer is defined as the period from the summer solstice to the vernal equinox. By this classification, autumn is defined to conclude at the winter solstice, and winter spans the winter solstice to the spring equinox. 23,24 It is well known that the earth-sun geometry affects the seasons; however, there is no direct link between astronomical seasons and mean weather variations. 24,25 Meteorological classification refers to the subdivision of four equal-length periods of 3 months each, mostly and commonly used in the temperate latitudes. 23 In South Africa, the meteorological classification of temperature seasonality is the most widely used (Table 1). 2,[26][27][28][29] Most agroclimatological studies use the conventional break of 3 months but may extend it to the farming season of the specific regions that are under investigation and use two distinct seasons, summer and winter (Table 1). 30 Climate modelling projections and analyses of influences of climatic factors studies often use six run-on seasons that coincide with synoptic circulations, with the latter coinciding with epidemiological seasons of heightened disease-risk. 31 Some climatologists classify seasons on an ad-hoc basis that is appropriate to specific regions, with classifications such as hot season, cold season, post-rainy season and growing season, with no direct relationship with calendar months. These seasons are defined for convenience and to suit the data output rather than to drive sensible analytical processes. [25][26][27] Phenological studies can also be used to define annual seasonality, and to reveal shifts in the timing of these events 32 , as phenological shifts are often directly related to changes in local air temperatures 33 .
Classification of South African seasons based on rainfall patterns is similarly complicated due to the variety of rainfall regimes, and thus likewise no standard definition has been adopted 18 , and discrepancies exist between the seasonal brackets of rainfall and temperature-related classifications 34 . The differences are further complicated by the influence of distance from large water bodies, and the variation in heat from the Indian and Atlantic Oceans. 24  Here we present one of the first statistical classifications of South African seasonal divisions on the basis of daily temperature data for 35 weather stations spanning the country. We argue that this method represents a more standardised and objective approach to the classification of seasons, particularly in a region that spans the subtropics and midlatitudes.

Study region
South Africa is located within the latitudes 22-35°S and longitudes 17-33°E, and is bordered by the Atlantic Ocean in the west and southwest and the Indian Ocean to the south and southeast. It shares political boundaries with Mozambique, Zimbabwe, Botswana, Namibia and Eswatini (Swaziland), and encloses Lesotho (Figure 1). 35 The climate of South Africa, in particular temperature, is governed by the complex interaction between the subtropical location, the altitude of the interior plateau, the position of the subcontinent with respect to the major atmospheric circulation features, and the oceans on all sides except the north. 19,20,36 The subcontinent lies within the subtropics, with rainfall dominated by convective storms in the north and mid-latitude cyclones to the south. 16,37 The influence of the tropical and temperate pressure regimes, and the intra-annual migration of the inter-tropical convergence zone (ITCZ) results in pronounced seasonal differences in rainfall and temperature patterns over South Africa. 19 The ITCZ shifts with the monthly Statistical classification of South African seasons Page 2 of 15 and seasonal changes of the sun's maximum insolation and the location of dominant atmospheric high-and low-pressure systems. 19,37 The highpressure systems sit over the southern tip of the subcontinent in summer, and over the interior during winter. These high-pressure systems are interrupted by mid-latitude cyclones. 38 The influences of the subtropical high-pressure belt, and the mid-latitude westerlies with associated fronts vary significantly inter-and intra-annually over the subcontinent. 38 These interactions between tropical and temperate disturbance have significant consequences for the weather of the subcontinent. 16 The orography of South Africa influences the temperature distribution over the country such that the escarpment forms a climatic division between the high plateau and the low-lying coastal regions in the east and southeast (Figure 1). 19 The southern and eastern escarpments are the regions with the lowest temperatures, due to the decrease in temperature with altitude. 39,40 The oceans surrounding South Africa influence the temperatures experienced along the coastal areas. 39,40 The Indian Ocean, on the east, is warmed by the western boundary Agulhas Current, while the Atlantic, on the west coast, is cooled by the eastern boundary Benguela Current ( Figure  1). 19,39 All these factors result in a broad east-west temperature gradient, with the Northern Cape experienceeing the lowest rainfall and highest temperatures in the country. 39,40

Data and methodology
For this study, daily maximum and minimum temperature data were obtained from 35 selected SAWS meteorological stations ( Figure 1; Table  2) that had a minimum of 30 years of data, sufficiently complete data sets and homogeneous time series, spanning the period 1980-2015. These stations were selected as they span the country, ranging from 22°S to 35°S and 15°E to 33°E with an intended 1° interval ( Figure 1). Before performing any statistical techniques, exploratory data analysis was Statistical classification of South African seasons Page 3 of 15 applied to investigate the data homogeneity due to inevitable changes in aspects including observation sites, station relocation, observation practices/procedures and time. 29 However, in the context of the study, sudden increases or decreases in values over a prolonged period would not have significantly influenced the results. Visible outliers in the data series were checked by comparison with data from surrounding stations spanning the period of interest as well as reports of anomalous weather in the media.  been found that IDW interpolates station data accurately. 51 Additionally, annual mean graphs were produced for each of the temperature metrics.

Cluster analysis
Results will mainly focus on T avg , with reference to T max and T min only where statistically relevant. The cluster analysis reveals that the majority of the stations, 23 out of the 35, are most appropriately classified into four seasons ( Table 3). The remaining 12 stations are best classified into three seasons. All the stations in the Limpopo, KwaZulu-Natal and North-West Provinces are clustered into four seasons, and those in the Eastern Cape are clustered into three seasons. All these stations have a statistically strong grouping (CPCC>0.7) and distinct cluster structures (ASW>0.5), except for Cedara (ASW=0.47) in KwaZulu-Natal, with a weaker cluster structure. The weaker cluster structures are also prominent in the Eastern Cape stations (ASW<0.5). The cluster analysis results revealed that Dohne (CPCC=0.7009) in the Eastern Cape has the lowest quality of grouping among the 35 stations analysed.
The three stations in the Free State, two of which (Bethlehem and Welkom) are classified into four seasons and one (Bloemfontein) into three seasons, have a good quality grouping (CPCC>0.7) and distinct cluster structure (ASW>0.5). A similar degree of confidence in cluster structures is found in both the Mpumalanga stations, Skukuza and Carolina, which are divided into three and four seasons, respectively. These two stations have a higher quality grouping (CPCC>0.8) than those in the Free State. A higher quality grouping is also prevalent in both the Gauteng stations, with Johannesburg Int (International) classified into four seasons, and Zuurbekom into three. However, a weaker cluster structure is calculated for Zuurbekom (ASW=0.45).

Statistical classification of South African seasons Page 4 of 15
The data sets of the selected stations were subjected to quality control. As a first step, all dates and times were checked, and two decimal point rounding was used to maintain consistency throughout. Missing weather station data were replaced with data from a station adjacent to the site within a 10-km radius, or, if not possible, replaced with the 5-day running average. If data were not available for more than five consecutive days, that period was excluded from the analysis.

Cluster analysis
Cluster analysis was performed using Ward's D method, defined by the Euclidean distance between variables, utilising the cluster, vegan and rioja packages in R. [41][42][43] Euclidean cluster analysis was initially supervised at four seasonal divides and validated by using the dendogram package average silhouette width (ASW) calculation. 44 The ASW value measures the degree of confidence in between-group distances and strength of within-group homogeneity. 45 If not significant, two, three, five and six seasonal divides were used serially until the cluster was significant, using orders of magnitude put forward by Kaufman and Rousseeuw 46 as reference for measures. The ASW was calculated, together with the cophenetic correlation coefficient (CPCC) for interpretation, evaluation and validation of consistencies within the cluster and groupings. 42,47 The CPCC measures the correlation between the original pairwise distance matrix and the cophenetic distance matric of the dendrogram. This allows for the verification of the quality of the grouping. [47][48][49] The closer the cophenetic correlation coefficient is to a value of one, the better the grouping quality. 49 The cluster analysis results for maximum (T max ), minimum (T min ) and average (T avg ) temperatures are given in Table 3. To investigate the spatial patterns, the cluster analysis outputs, and start and end dates of summer and winter, were spatially interpolated using the Inverse Distance Weighted (IDW) method using ArcGIS software. 50  For the majority of stations (23), classification using T max returns three seasons, with 9 stations classified into two seasons, and only 3 classified into four seasons ( Table 3). The majority of these stations have a strong grouping (CPCC>0.7), except for Dohne (CPCC=0.6714), located in the Eastern Cape. However, the degree of confidence in the cluster structures is low due to the weak cluster structures (ASW<0.5) for most of the stations except for Johannesburg Int, Zuurbekom, Mara, Mahikeng, Springbok and Cape Town with an ASW>0.5. The cluster analysis for T min classifies the majority of the stations (21) into four seasons, with 13 stations classified into three seasons, and only 1 station (Springbok) classified into two seasons. Similar to T max , the grouping quality for the stations is good. However, the cluster structures for most of the stations are distinct with only 19 stations returning an ASW<0.5.
Spatial analysis of the cluster analysis results ( Figure 2) indicates that most parts of the country experience three seasons, with the greatest spatial variability visible in T max . Similarities in the classification of seasons are visible for T max , T min and T avg , but more so for T min and T avg . The western and central regions of the country, and parts of the Eastern Cape, have three distinct seasons, when classified using T max , T min and T avg . Areas surrounding Springbok are similarly classified as having only two distinct seasons.

Seasonal timetable
Several variations of monthly classifications have been calculated (Tables 3 and 4), which will be referred to as 'seasonal brackets'.

Start and end dates of summer and winter
A distinct, southwest to northeast spatial pattern is apparent for the start and end dates of summer and winter across all the temperature metrics (Figures 3a-f and 4a-f). Summer broadly commences earlier in the northeastern and interior parts of the country and later along the southwestern parts and the south coast. The earliest start of summer is visible in T max for the northern parts of the country, and in parts of KwaZulu-Natal for T avg . The earliest end of summer is calculated using T min , with parts in KwaZulu-Natal and Gauteng ending in February ( Figure  3e). The greatest variability in the spatial patterns is recorded T max , for which the northern and southern region summer ends in April, similar to T avg for the southern region. However, there is a consensus amongst the temperature metrics that, for most parts of the country, summer ends in March, whereas for the western parts of the country, the season ends a month later in April.
For the majority of the country, winter starts in June, with the season starting earlier for a few interior regions. The greatest spatial variability in the timing of the start of winter is observed in T max (Figure 4a). For areas in the Western Cape and Northern Cape, winter is classified as starting in May, similar to the start of winter using T min . For parts of KwaZulu-Natal, winter is calculated to start as early as April. Regarding the end of winter, the greatest spatial variability is similarly observed for calculations using T max , for which winter ends latest in the southwest of the country and along the east coast. For the western parts of the Northern Cape, winter ends later using T max and T min . Similar to the end of summer maps ( Figure  4d-f), a distinct southwest to northeast spatial movement of end dates is visible for all the temperature metrics used. For the southwestern parts of the country, winter ends later, whereas moving northeastwards to the interior, the winter months end earlier, except for some parts in Gauteng, Limpopo and the North-West.

Discussion and conclusion
We present one of the first statistical classifications of seasons across South Africa using daily temperatures. Daily temperature data across the country were used as a distinctive marker to classify the seasons due to the detectability of temperature changes compared to rainfall across South Africa. Through statistical analysis and results captured in the seasonal timetable (Table 4), new seasonal brackets are put forward in accordance with the agreement of seasons and temperatures among stations used in this research.
Aggregated for the whole country, based on T max , T min and T avg , our results show that the weather stations agree that the following seasonal brackets can be used: These proposed seasonal brackets challenge our 'common knowledge' of four equal length seasons of 3 months each 2,26-29 , and the ad-hoc approaches some researchers use in South Africa [25][26][27] . Noticeable similarities occur between the two seasonal divisions of months used to define farming seasons 30 as well as monthly summer divisions related to the positions of South Africa related to disease-risk seasons. 7 However, the proposed longer duration of summer and shorter spring seasons may conflict with the agricultural practices used currently, in particular, the current observed length and timing of the growing season across the country. 1-3 Additionally, these proposed seasonal brackets may assist in the explanation of current delays and advances in seasonal phenological events 33 , and challenges in the tourism sector where most outdoor attractions are dependent on the seasonal climate 5 .
However, the high spatio-temporal variability in temperatures (e.g. annual mean temperatures Figure 5) presents a complex picture of seasonality. This presents challenges in defining seasonal brackets for a given location or region, particularly where regional climate regimes change within a small geographic area 24 , and due to the complexity of South Africa's climate 29 . Discrepancies have been found among the different temperature metrics. However, the majority of the stations (23 out of the 35), are divided into four seasons, using T avg as the classifier, with the remaining 12 stations clustered into three seasons. Interestingly, some stations within the same province (e.g. Johannesburg Int and Zuurbekom in Gauteng) have different seasonal groupings. With closer inspection, these differences may occur due to the location and elevation of the stations (Table 2). For example, it has been found that built-up areas such as Johannesburg may be warmer in late winter than rural areas due to the urban heat island 52 and higher elevations tend to be cooler than lower elevations 53 . Taking the above-mentioned into consideration, the importance of selecting the relevant temperature metric, e.g. T max , T min and T avg , is highlighted for analysis purposes, as this selection can return different results as portrayed in the results.
In general, the findings of the start and end dates of summer and winter (Figures 3 and 4) coincide with the pressure regimes, as well as the interannual migration of the ITCZ. 19,37 The results indicate that summer starts later (ending earlier) and winter starts earlier (ending later) in the southwestern and southern regions of the country. These results coincide with the movement of the cold front of the mid-latitude cyclones during the winter months. 38 While, during summer, the southward movement of the ITCZ and the position of the subtropical high-pressure system are associated with warmer conditions, which may result in the patterns found. Summers start earlier, and winters start and end later in the northeastern parts of the county. These patterns are found independently from the notable link between temperatures and weather systems. The patterns also show the annual progression of temperatures which follow a southwest to a northeastwards spatial pattern across the country.
The key limitations of this study are the nature of the temperature data sets. The data sets are not perfect and inherent errors may be present for a number of reasons. 29 Furthermore, inhomogeneity is not likely to play a significant role in this study as the consistency was ensured by using only SAWS data sets. 54 Mean daily temperature data were quantified using T max and T min ; this is a limitation as hourly temperature readings may provide accurate values of mean daily temperatures. 54 Furthermore, we acknowledge that station measurements are unable to display complete areal coverage as these are location-specific 54,55 , which is particularly an issue for the interpolated maps presented throughout. A limited number of stations that have long-term temperature records was selected using a broad grid approach, as discussed, to get a relatively good spatial representation of the country. To overcome this limitation, future research may benefit from the inclusion of temperature data from additional weather stations from other organisations, such as the South African Agricultural Research Council. Such addition would, however, require greater efforts at data homogenisation and quality checking, which introduce a further set of limitations.
Finally, this research provides an insight into the complexity of seasonality across South Africa, as well as direction for climate-relevant research with temperature data as the primary input. Possibly the most significant contribution of this research is the newly proposed seasonal brackets using temperature metrics. The knowledge presented here is crucial for agriculture practices, resource management, tourism and other temperature-dependent activities, especially to develop adaptive strategies in monitoring seasonal changes in temperatures under climate change.