Harmonization of ground-measured solar data

Prev Next

In this document

We will describe how Solargis combines ground-measured solar data from multiple instruments into a single validated dataset, and explains the two harmonization methods used.

Overview

Ground-measured data is rarely gap-free. Instrument failures, maintenance events, soiling, and quality control exclusions all result in periods of missing or flagged data. For applications such as PV performance evaluation, these gaps can lead to incomplete or incorrect assessment if not addressed.

Harmonization is the process of merging quality-assessed data records from multiple instruments into a single continuous, validated dataset. It is performed after quality assessment (QA) has been completed - only data that has passed QA is eligible for harmonization. The result is a bankable, gap-free time series suitable for performance analysis and site adaptation.

Note: Harmonization requires deep expertise in the characteristics of the measured data, including instrument-specific biases and systematic errors. It should not be attempted without appropriate tools and qualified personnel. A simple spreadsheet application is not suitable for this work.

Harmonization of ground-measured solar data

Solargis uses two harmonization methods: the averaging method and the segmentation method. The choice depends on the application and the desired properties of the resulting dataset.

Averaging method

The averaging method combines measurements from multiple instruments into a single value at each time step, with outlier filtering applied first.

The procedure is as follows:

  1. Remove outliers: for each time step, include only those instruments whose measured value falls within 3% of the mean of all valid measurements.

  2. Find average: for each time step calculate average from all values which passed the filter in step 1. If no instrument falls within the 3% tolerance, apply a simple average of all valid measurements at that time step.

  3. Fill the gaps: For time steps with no valid measurements from any instrument, fill the gap with site-adapted Solargis time series data.

The main advantage of this method is that the resulting dataset follows a signal representative of the entire PV plant area rather than a single instrument location. The computation process is straightforward and easily reproducible by a third party.

Limitations

The key limitation is signal smoothing. When cloud cover is highly variable - for example, during fast-moving broken cloud conditions - averaging across instruments attenuates the short-term variability present in the original measurements. The distribution of measured values is not preserved.

Segmentation method

The segmentation method aims to preserve the natural variability of the original measurements by selecting the best available instrument for each time segment rather than averaging across instruments.

The procedure is as follows:

  1. Divide the data: Divide all input datasets into regular time segments.

  2. Find the best data: For each segment, select the most representative candidate instrument based on defined criteria.

  3. Reconstruct the time series: Concatenate the selected candidates to produce a single dataset covering the full measurement period.

The selection of the best candidate per segment uses three criteria:

  • Data completeness : Tthe proportion of valid records within the segment.

  • Mean value similarity: How close the segment mean of the individual candidate is to the mean of all candidates.

  • Profile similarity: Minimization of differences between the individual candidate's profile and the averaged multi-instrument profile.

If the primary criteria cannot be met, looser criteria are applied. Remaining gaps are filled with Solargis model values.

The segmentation method preserves the original measured values and their variability - the smoothing effect seen in the averaging method is avoided, and the value distribution matches the original measurements. The trade-off is that the method is more complex and less straightforward to reproduce independently compared to the averaging method.

Choosing the right method

Both methods produce a validated, gap-free dataset. The choice between them depends on the intended use:

  • The averaging method is preferred when the dataset should represent the whole PV plant area, when reproducibility by a third party is important, or when the downstream analysis does not require preservation of short-term variability.

  • The segmentation method is preferred when preserving the original signal variability is important, such as for detailed performance analysis or statistical studies of the measured data.

Note: In both methods, gaps remaining after all valid instrument data is exhausted are filled using Solargis site-adapted time series. This ensures the final dataset is continuous and suitable for bankable analysis.

Further reading