Deep learning for photovoltaic defect detection using variational autoencoders

often faced in computer vision tasks is the lack of sufficient data to train these models effectively. We propose the use of variational autoencoders (VAEs) as a method to artificially expand the data set in order to improve the classification task in this context. Three convolutional neural network (CNN) architectures – InceptionV3, ResNet50 and Xception – were used for the classification of the images. Our results provide evidence that CNN models can effectively detect and classify PV faults from thermal images and that VAEs provide a viable option in this application, to improve model accuracy when training data are limited.


Introduction
The growing realisation that fossil fuels are not a long-term solution to the global energy demand has led to the exploration of alternative, environmentally sustainable, energy resources. 1 In recent years, solar power has emerged as a leading renewable energy technology and is experiencing rapid adoption globally. 2 South Africa is well placed to benefit from this drive towards renewable energy -particularly from photovoltaic (PV) systems -as the average annual 24-h global solar radiation for South Africa is 220 W/m 2 , which is higher than the 150 W/m 2 observed for parts of the United States, and 100 W/m 2 for Europe and the United Kingdom. 3 The maintenance of such PV systems is often labour intensive and costly, particularly when there are undetected faults in the system. Such faults can result in major energy loss, system shutdowns, financial loss, and safety breaches. It is thus crucial to detect and identify such faults to improve the efficiency, reliability, and safety of such systems. 4 Dunderdale et al. 5 investigated the use of convolutional neural network (CNN) architectures on infrared (IR) thermal images to detect and classify module-level faults within PV systems in South Africa. The results of the study showed that this approach can provide a quick and effective solution to this problem. The challenge with many of these applications, however, is that CNN models typically require a considerably large data set to train effective models. In many smaller applications, and at the start of such an initiative, these data are not readily available. In the present study we propose the use of variational autoencoders (VAEs) to create synthetic training images based on a small sample of collected data. The CNN testing data were sampled prior to the VAE training, to ensure complete separation of the testing and training data. This approach can artificially increase the size of an image data set, thus overcoming this barrier to entry and opening up computer vision application to smaller PV operations. We also used the InceptionV3 6 , ResNet50 7 and Xception 8 CNN models to determine the effectiveness of this data augmentation approach and to expand upon the original study of Dunderdale et al. 5

Photovoltaic systems and fault detection
PV modules absorb energy from sunlight and convert this energy into electricity through a process called the 'photovoltaic effect'. 9 These PV systems are typically composed of one or more modules, an inverter and other mechanical and electrical hardware, all of which are susceptible to faults. PV faults can lead to prolonged reduction in power output or the complete failure of a cell, module or system. 10 The detection of faults is therefore critical to the optimal functioning of a PV system. However, in large-scale PV plants, the inspection of solar modules is typically a manual and time-consuming process. As such, recent studies have used various techniques to improve this process.
Faults can be classified as those originating on the direct current (DC) side or on the alternating current (AC) side of the module. 11 Garoudja et al. 4 proposed a model-based fault-detection approach for the early detection of faults on the DC side of PV systems and the identification of whether shading was present. This approach made use of extracted original design manufacturer model parameters and their associated residuals. Although this study provided useful results, noisy and correlated data degraded the fault detection quality.
Fault detection using electroluminescence and thermal imagery has gained interest over the last few years owing to the relative ease in which data can be collected. Electroluminescence imaging consists of applying a direct current to a PV module and measuring the resulting photoemissions using an IR-sensitive or charge-coupled device camera. 12 This type of imaging is normally done in a dark room. Fioresi et al. 13 made use of electroluminescence imagery to identify cracks and contact PV cell faults with promising results. However, this was done manually and proved time-consuming. Dos Reis Benatto et al. 14 proposed the use of daylight electroluminescence images captured by a drone to detect faults in PV systems. This daylight-based electroluminescence system was able to capture electroluminescence images during high solar irradiance, but unfortunately resulted in lower-quality images when compared to indoor and stationary systems. More recent studies, such as those of Demirci et al. 15 and Tang et al. 16 , have investigated the use of CNNs to automate this detection process using electroluminescence images. Maximum accuracies of 76% and 83% were obtained by these studies, respectively, but small data set sizes and significantly long training times were identified as limitations to the studies.
Unlike electroluminescence images, IR thermography images are created by IR radiation emitted from the object, whereby a thermal camera detects the temperature at the surface of an object and converts this temperature into colour-assigned electrical signals depending on the intensity reading. 17 IR thermography has been applied as an effective tool for detecting faults in PV modules, and the recent development of unmanned aerial vehicles (UAV) has increased cost-effectiveness for large-scale PV plants to detect such faults. 18 A UAV (e.g. a drone) equipped with a thermal camera can be flown over a PV system and images can be taken and analysed. Faults are identified as localised areas of higher heat, or 'hot spots', on the PV modules. These hot spots occur because the faults impede the flow of electricity, and the excess energy built up by this is dissipated as heat in these areas. These hot spots are evident in IR imaging as areas of discolouration, typically with darker colours indicating hotter regions. Using IR thermography in this manner has the potential for widespread adoption because the fault detection process, through statistical modelling, can be automated. 19 Ancuta et al. 20 investigated the use of IR thermography as well as solar module measurements, such as module surface temperature, for PV fault analysis, and showed that PV faults become evident as hot spots in IR images, with different fault types exhibiting different hot spot patterns. The identification of fault types in this study was done manually and without the aid of computer vision and classification techniques. Tsanakas et al. 21 performed a study implementing wide area orthophoto IR thermography to detect and classify faults in large-scale PV plants. In addition to IR thermography, electrical performance characterisation using currentvoltage characteristic (IV) curves, as well as electroluminescence images, were used to successfully validate results. According to the preliminary results, it was found that all detected faults were diagnosed, classified, and quantified in terms of fault type and electrical power loss per module. Jaffrey et al. 22 produced a PV fault analysis algorithm for thermal images of PV modules using fuzzy logic and a six-class fuzzy logic categorical framework, which was implemented successfully to classify faults.
In a pioneering study in South Africa, Dunderdale et al. 5 used thermal images for the detection and classification of faults in PV systems. In the first (detection) phase of the study each panel was classified as either faulty or non-faulty. Faulty panels were further classified according to the type of fault exhibited by the panel. The study made use of feature-based approaches with support vector machine and random forest classifiers as well as CNNs for the detection and classification. The study showed that the CNN approaches performed better for fault classification, obtaining an 89.5% average cross-validated accuracy, in comparison to a maximum accuracy of 82.9% obtained using the feature-based approach.

Photovoltaic fault types
The study of Dunderdale et al. 5 proposed the classification of PV module faults as block faults, patchwork faults, single-cell faults, soiling faults, and string faults. These fault types were defined according to the shape of the hot spots present on thermal images. Depictions of these faults, as they would appear on thermal images, are provided in Figure 1.
A block fault is identified as a vertical band exhibiting temperatures significantly higher than those of the rest of the module. A single-cell fault can be identified as a small rectangular shape exhibiting higher temperatures than the rest of the module. Patchwork and string faults are considered extensions of single-cell faults, where multiple rectangular shapes exhibit temperatures higher than the rest of the module. Patchwork faults occur as single-cell faults in a random pattern across a module, while string faults consist of single-cell faults occurring in a straight vertical line on a module. Lastly, soiling faults are typically difficult to identify, mostly because they can differ in size, intensity, and shape.

Computer vision and image classification
Deep learning, a specialised form of machine learning, is fundamental in modern computer vision tasks. Computer vision refers to a computer's ability to perceive and understand three-dimensional (3D) shapes and objects from two-dimensional (2D) imagery, using mathematical techniques and algorithms. 23 In many applications, the purpose of the computer vision task is for image classification. That is, for the purposes of identifying and classifying images according to their attributes or contents. In such applications, supervised CNNs are the most prevalently used technique. 24 Basic neural networks are designed to mimic the workings of biological neurons, which receive an input (or stimulus), process it, and respond accordingly. In their simplest form, these artificial neural networks (ANNs) consist of exactly this -an input layer, hidden layer(s), and an output layer. Deep neural networks can be thought of as 'stacked' ANNs, or ANNs which have numerous hidden layers, with deep learning representing the process of training or building these networks. According to some recent studies, deep learning models achieve state-of-the-art accuracy in many application areas such as object recognition (computer vision) 25 and natural language processing 26 .
In image classification tasks, the input layer of a neural network receives a digital image as a matrix of pixel values. The hidden layer(s) process the image in an effort to provide an accurate output -the label or classification of the image contents. CNNs are a class of artificial neural networks which use a convolution operation in at least one layer of the implemented neural network, although in most cases this operation is used in multiple layers. 27 A convolution operation is a linear operation between matrices I and H, and can be defined as 28 : where I(u,v) indicates the element located at row u and column v on the matrix (or digital image) I, and H(i,j) represents position of the element on the filter kernel matrix which specifies the weights assigned to each pixel in the convolution operation. The output matrix of the convolution operator is denoted I'.
In supervised computer vision tasks, CNNs are trained to detect and classify images based on a given set of input data (training images). This process is achieved through using the backpropagation algorithm -which is ubiquitous in the field of neural networks. CNNs have proven to be highly successful in the field of image classification where the networks typically learn elementary shapes in the initial layers, and more complex details in the deeper layers of the network. 27 Typically, a large set of training data (images) is required, as it is important to ensure that the trained model can perform the classification task to a sufficient degree of accuracy on new and unseen data. In certain real-world applications, such as the one investigated in this study, there is a dearth of useable images for training, making the development of accurate classifications a challenge. To address this challenge, Dunderdale et al. 5 used several data augmentation approaches including rotating, flipping, and inverting the training images to increase the sample size. This provided a moderate improvement to classification accuracy. The drawback to this approach is that the number of meaningful and possible augmentations is limited. Additionally, certain augmentationsrotation in this case -result in data augmentations that give misleading and unlikely scenarios. For example, a block fault indicates a vertical hot spot running the length of the panel. Under 90° and 270° rotation this would result in the hot spot running horizontally, which is at odds with the classification. A possible remedy for the data scarcity problem is to generate random synthetic images based on the identified fault classes. This can be achieved using deep generative modelling approaches such as generative adversarial networks and VAEs. 29 VAEs are used in this study as generative adversarial networks, while known to provide higherquality images, are difficult to train. VAEs are more stable when training and generate satisfactory images for the current application.

Variational autoencoders
VAEs are generative models that have the ability to synthesise numerous complex data points in a potentially high-dimensional space 30 (e.g. digital images) using a given set of training data. Using a generative approach, VAEs can create non-identical images which are similar to the images on which they are trained. As a result, small data sets can be artificially inflated to include any number of synthetic or simulated observations (or images). As a large data set is a common requirement for CNN training in computer vision tasks, VAEs can be used to synthesise training images when inadequate training observations are available.
VAEs are a special case of a traditional autoencoder which is made up of two connected (and trained) neural networks: an encoder and a decoder. The encoder reduces or constricts the representation of the input data to a given set of dimensions or units, and the decoder attempts to re-create the original input from this reduced representation in the latent space. This is illustrated in Figure 2 where an image I is passed through the encoder and represented as a reduced vector in the latent space. The decoder then takes this encoded vector in the latent space and attempts to reconstruct the image (denoted I') from the reduced representation. The encoder and decoder are trained to minimise a loss function L(I,I'), which measures the differences between the original data (image) and the reconstructed data (image) -known as the reconstruction loss -ensuring that the input and output images are as similar as possible.
VAEs expand on this basic functioning by imposing a probabilistic structure on the latent variables and introducing a random sampling Deep learning for photovoltaic defect detection Page 3 of 8 step in the latent space. For each input object I the VAE determines a k x 1 vector of means (μ I ) and of standard deviations (σ I ) creating a single mean-standard deviation pair for each of the k variables in the latent space. Instead of sending the encoded latent values directly to the decoder, as in the traditional autoencoder, VAEs sample individual values from for each latent variable j, j=1,…,k. While this approach renders the network intractable to learning through backpropagation, this is overcome by constructing the latent variable realisations as μ I +σ I ⊙ε where ε(k×1) is a random observation vector from a multivariate standard normal distribution and ⊙ represents the elementwise vector product. 27 This is known as the 'reparameterisation trick'. 31 The loss function used for training the VAE consists of two competing terms, one which represents the reconstruction loss (L(I,I')) and another which represents the regularisation loss (R (I,I')). The regularisation loss uses the Kullback-Leibler divergence to measure the degree to which the distribution of the latent variables diverges from that of the multivariate standard normal distribution. 27 The loss function is thus represented as where λ>0 is the regularisation parameter. By using this loss function, the decoder is able to generate images (or outputs) that are similar (but not identical) to the data on which it is trained within a reasonable range. 27 This allows for the creation of distinct yet similar images which can be used as training data for a computer vision model.

Original data
The thermal image data for this study were collected from three different PV plants in South Africa. Due to the privacy agreement with the data's supplier, the locations of these PV plants cannot be disclosed. All three sites under study make use of crystalline silicon PV modules. A total of 398 thermal images of singular defective PV modules were collected. The thermal images were captured using a UAV equipped with a FLIR Tau 2 640 thermal imaging camera. Once these thermal images were captured and stored, the images were then cropped to show individual PV modules. 5 This resulted in a final data set of 376 thermal images indicating modules with hot spots or faults and an additional 400 images of non-faulty modules. The data supporting this study's results can be obtained on request from the authors. Dunderdale et al. 5 used a four-class classification, namely: block faults, patchwork faults, string faults and single-cell faults. In the current study, it was decided to group string and patchwork fault data into a single class of 'patchwork' faults. The motivation behind this was that the string fault class can be considered a special case of the patchwork fault class, where all affected cells occur in a straight vertical line rather than in a random or scattered pattern. 32 This is illustrated in Figure 3. Using this new classification, the 376 fault images are categorised into three distinct classes. The composition of the data set by fault class is given in Table 1, where the three classes are easily identifiable upon examination of a thermal image. Table 1: Composition of the data set by fault type 32

Fault class Image example Proportion
Block 34%

Synthetic images
The VAE approach outlined under the 'Variational autoencoders' section above was used to expand the data set. Prior to VAE training, 75 random images were sampled from the original data set and set aside for later use in CNN testing. Thereafter, the VAEs were trained separately for the block, single-cell, and patchwork PV fault classes. Each VAE generated an additional 900 images for each class. An example of the image generation for the single-cell fault class is provided in Figure 4.
For each class of VAE-generated images, manual data cleaning was also performed to remove any potentially 'noisy' synthesised images. Noisy images were considered to be any images in which random variations in colour or brightness were observed which may influence the results of analysis. When combined with the original data, the final data set consisted of 2881 images, as shown in Table 2. Included in this data set were the 75 randomly sampled images for testing once the three CNN architectures had been trained.

Data summary and validation
Testing and validation is an imperative step in statistical or machine learning implementation. For this study, approximately 20% of the original images were randomly removed to create the testing data sets.
The testing data sets were sampled from the original data both prior to model training and prior to the generation of synthetic images using VAEs. This ensured that the testing data sets were completely unseen in both assessments. The accuracy of the trained classifiers (one using the original data and one using the VAE-generated images) was determined on their respective testing data sets. The composition of the data sets used for the classification analysis is provided in Table 2.

Classification methods: Convolutional neural networks
In this study, the deep learning approach constitutes what CNNs use to make decisions regarding fault detection and classification of PV modules. CNNs eliminate the need for manual feature extraction and are able to extract features directly from the raw image data. 33 The CNNs are first trained using the training data with the corresponding classification labels, which then allows the system to find and extract features automatically.
Python 3.6.5 64-bit software was used for implementing the three CNN models/architectures: InceptionV3, ResNet and Xception. Python was used due to the vast number of packages available, as well as the easyto-access online support community. In this study, we also made use of the tensorflow package 34 and Keras 35 interface for analysis, as they allow for pre-trained CNN architectures to be downloaded, implemented, and adjusted in Python.
The Inception CNN architecture was initially introduced by Szegedy et al. 36 in 2014. The InceptionV3 architecture (the third version of Inception) was later released by Google and introduced to the Keras core in 2015.
The new InceptionV3 architecture allowed for higher computational efficiency with fewer parameters being required. The ResNet (or ResNet50) CNN architecture was introduced by Microsoft in 2015. This architecture was designed to enable a high number of convolutional layers with strong performance, as previous CNN architectures had a drop off in effectiveness owing to additional layers. The Xception CNN architecture was proposed by the creator and chief maintainer of the Keras library, François Chollet, in 2014. This architecture is an extension of the Inception architecture, which replaces the standard Inception modules with depth-wise separable convolutional layers. 37 These three architectures were chosen because the ResNet50 and InceptionV3 architectures placed first and second in the 2015 ImageNet Large Scale Visual Recognition Challenge, respectively, with the Xception architecture being an extension of one of these high-performing architectures.
All architectures were trained and optimised on the training data and their performance, or classification accuracy, was determined on the testing data which were removed prior to training.

Photovoltaic fault detection
Before PV faults can be classified, they first need to be detected. Each of the three CNN models were trained and tested on the training data which consisted of using greyscale images.
Each of the proposed CNN models obtained 100% testing accuracy, indicating perfect out of sample performance. These results agree with those of Dunderdale et al. 5 , which also produced maximum testing accuracies of 100% for fault detection using two CNN architectures, namely MobileNet and VGG-16. Table 3 provides the confusion matrix for PV fault detection using each of the architectures. In classification models, this outcome may raise a concern that the models are overfitting the data. However, in this case it is noted both that the accuracy was determined using unseen test data and that the classification task was a straightforward one as the images under study (i.e. fault/no fault) were typically simple to differentiate.

Photovoltaic fault classification
The results of the classification analysis described in 'Data and methodology' for the VAE-augmented data set are given in Table 4 for each of the architectures. The values indicated in Table 4 are the row percentages for the confusion matrix; that is, the value of 3.8% in the first row and second column indicates that the InceptionV3 model incorrectly classified 3.8% of the block faults as patchwork faults. The overall classification accuracy for the InceptionV3 and Xception models was 92%, while for the ResNet50 an accuracy of 89.3% was achieved. This indicates that all the models performed well in this application. To identify the preferred model, both overall accuracy and fault-wise accuracy were considered. This ensured that the highest accuracies were achieved for all three fault types. For the best performing models (i.e. InceptionV3 and Xception), Table 4 indicates that the Xception model performed best for the classification of block and patchwork faults while the InceptionV3 model Deep learning for photovoltaic defect detection Page 5 of 8 This indicates that the Xception model may perform better for multiple-cell faults, which make up a sizeable proportion of faults in practice and are also of interest to operators as this type of fault can significantly reduce module power. 38 These results suggest that the Xception model may be preferred in practical applications. For comparative purposes, and to determine whether the use of VAEgenerated synthetic images in the training of the models improved the classification accuracy, the results of the same models trained on the original data set are provided in Table 5. The overall accuracy for the Xception model was 90.4%, while ResNet50 and InceptionV3 achieved accuracies of 89.0% and 87.7%, respectively. Similar to the previous results, the Xception model performed best for the multiple-cell faults, and all models performed relatively poorly for the classification of the patchwork fault class. These findings are in agreement with the results found in Dunderdale et al. 5 Again, the Xception model may be considered to be the best performer as it has the highest accuracy and is the most consistent. Furthermore, similar traits observed for the models trained on the original data and those trained on the VAE-augmented data validate the use of VAEs for data inflation purposes. As similar characteristics are observed for both approaches, there is evidence that the VAE process generates relevant and useful images which are in line with those from the original data set. Table 6 provides a comparison of the accuracies of the three CNN modules based on the testing data for the VAE-augmented and original data sets.
The results in Table 6 provide evidence that the artificial inflation of the data set size using the synthetic images generated using VAEs does improve the classification accuracy of the fitted models. This indicates that, in applications where only small data sets are available, the use of VAEs to generate artificial training data, based on the original data, can lead to improved classification accuracies in these models. Because small data sets are a common problem in many applications, the results suggest that VAEs provide a viable method for data inflation which can lead to improved discrimination in classification models. The use of the VAE-augmented data set resulted in accuracy increases of between 0.3% and 4.3%. This appears to be dependent on the architecture on which the model is based, as the structurally similar InceptionV3 and Xception models both experienced considerably higher improvements than that of the ResNet50 model. For the models trained on the original data set, the results were similar to those of Dunderdale et al. 5 who achieved a maximum accuracy of 89.5% for a four-category problem. The improved accuracy observed for the Xception model could simply be a result of the present study being reduced to a three-category problem. However, the improvement in accuracy as a result of the artificial inflation of data through VAEs provides a notable advancement to the work of Dunderdale et al. 5

Conclusion
CNN models trained on the VAE-augmented data set showed that all three architectures were able to detect PV faults with 100% testing accuracy. These results are an improvement on those of Dunderdale et al. 5 This indicates that the proposed method is highly effective in distinguishing between faulty and non-faulty PV modules using thermal images.
For fault classification, the VAE-augmented approach achieved an overall testing classification accuracy for the InceptionV3 and Xception models of 92%, with the ResNet50 model achieving an accuracy of 89.3%. This indicates that all models performed well in the classification task. Further investigation of the fault-wise accuracy found that the Xception model performed better in identifying multiple-cell faults of PV modules and tended to have a consistently higher accuracy for each fault type. As such, the Xception model is recommended ahead of the InceptionV3 and ResNet50 models.
The comparative analysis performed in this study showed that the models trained using the VAE-augmented data consistently outperformed those trained on the original data set. This improvement was more evident for the InceptionV3 and Xception models than it was for the ResNet50 model. This may indicate that improved accuracies from a VAE-augmentation of a training data set may be model dependent.
In comparing the results obtained in the study to that of Dunderdale et al. 5 the use of VAE-augmented training data improved model accuracies for fault classification. Although Dunderdale et al. 5 reports results on a four-category problem, the combination of the string and patchwork faults is validated both by their similar appearance on thermal images as well as by similar groupings being used in related studies. 39 The VAE approach used in this study proved to be successful in artificially increasing a data set size and is therefore recommended in applications where limited data are available for analysis. This finding shows that the entry point to the use of computer vision methods in practice is lower than originally thought as smaller data sets can be inflated using synthetic VAE-generated images to train effective and accurate classification models.