Forecast accuracy: evaluation and expectations

In this document

This document describes best practices for evaluating solar power forecast accuracy, covering reference data selection, quality control, time period and data resolution recommendations, and forecast horizon considerations. It also explains the key factors that influence forecast accuracy and how to set realistic expectations.

Overview

Accurate solar power forecasting is essential for efficient energy grid management and optimal utilization of solar energy resources. Forecast accuracy is evaluated by comparing forecast values against reference values, enabling operators and data providers to understand how well predictions align with actual conditions.

Meaningful accuracy evaluation depends on several interdependent factors:

reference data quality and type,
the evaluation time period,
appropriate data resolution,
and a consistent approach to forecast horizons.

It is equally important to set realistic expectations — accuracy is inherently constrained by atmospheric variability, location characteristics, and the physical limitations of Numerical Weather Prediction (NWP) models.

Reference data for forecast accuracy evaluation

Forecast accuracy is evaluated by comparing forecast values against reference values. Selecting the right reference data and ensuring its quality is the most critical step in the process. The two main reference parameters are PV power output and solar radiation data (GHI / GTI).

PV power output

PV power output is the most commonly used reference parameter because it directly reflects operational plant performance. It is obtained from sensors or meters installed at the site, monitoring real-time power generation over the same time period as the forecast.

The following data quality issues must be considered. Affected data points must be excluded from the evaluation dataset:

Inverter outages: Malfunctioning inverters cause sudden drops in PV power output, compromising evaluation reliability.
Curtailment: Grid-imposed power limitations produce output values that do not reflect actual generation potential and must be excluded.
Data logger issues: Malfunctioning logging devices result in illogical or erroneous values that must be excluded.
Natural phenomena: Heavy snow or dense fog suppresses PV power output without corrupting the data. Consider carefully whether to include such periods, as NWP models struggle to predict these events accurately.

Figure 1: Example of curtailed PV power output (green) vs. forecast PV power output (red)

Figure 2: Example of reference PV power output suppressed by snow cover

Figure 3: Example of a partial drop in PV power output caused by an inverter issue

Figure 4: Example of data logger issue reflected in PV power output values

Solar radiation data: GHI and GTI

GHI is measured by pyranometers and GTI by pyrheliometers. Both instruments are sensitive and require proper maintenance to deliver reliable reference data.

Key maintenance issues to consider:

Regular calibration: Infrequent or omitted calibration increases measurement uncertainty and produces erroneous data.
Regular cleaning: Accumulated dirt blocks solar radiation, producing underestimated measurements and, over time, long-term sensor drift.
Proper alignment: Instruments shaded by surrounding structures consistently underestimate irradiation and must not be used as a reference.

Figure 5: Effect of soiling on GHI instrument measurements.

Figure 6: Systematic shading effect visible in measured GHI and DNI data

Measured vs. satellite-derived reference data

Measured data: Directly reflects ground conditions but is subject to the quality issues described above.
Satellite-derived data: Solar radiation estimated from satellite imagery. Continuous, gap- and error-free — a significant advantage over measured data. However, as an algorithmic estimate, it carries its own uncertainties: thin clouds may be underestimated; bright surfaces such as snow, water, and desert may be overestimated; and localized effects such as aerosols, cloud shadows, and microclimates may not be captured accurately.

Note: When high-quality measured data is available, it is the recommended reference as it best represents ground-truth conditions. Satellite-derived data is a reliable alternative when measured data is unavailable or of compromised quality.

Quality control

Quality control ensures the reference dataset is reliable before evaluation begins.

For PV power output, identify and address:

suspicious outliers (negative values or values above installed capacity),
missing data, sensor and inverter malfunctions,
and data logging errors.

For GHI / GTI, identify:

instrument misalignment,
shading by surrounding objects,
and sensor soiling.

Important: Corrupted or suspect data points should be removed from the evaluation dataset or replaced with satellite-derived counterparts. The goal is a reference dataset that reflects actual conditions as accurately as possible.

Evaluation time period and data resolution

Recommended time period

It is recommended to use the most recent 12 months of data for forecast accuracy evaluation. A full year ensures:

Seasonal coverage: All seasons are represented, capturing the full range of atmospheric conditions that affect solar radiation and forecast accuracy.
Variability assessment: A sufficient sample of weather patterns, times of day, and seasonal transitions allows a reliable assessment of model performance.
Avoidance of short-term bias: Transient weather anomalies have less influence over a full year than over short periods.

Note: Evaluating over only a few days or weeks is not statistically reliable. All forecasting models occasionally produce poor results — this is expected and should not trigger conclusions about overall forecast quality.

Recommended data resolution

Reference data must be resampled to match the temporal resolution of the forecast data before evaluation. High-frequency measurements (1-minute or 5-minute) should not be used directly to evaluate forecasts operating at hourly or 15-minute resolution, because:

NWP models operate at coarser temporal and spatial resolutions and are designed to capture large-scale patterns, not short-term fluctuations.
Microscale processes — rapid cloud formation, turbulence, local terrain effects — are not fully represented in NWP physics.

Using mismatched resolutions introduces apparent discrepancies that reflect the resolution difference rather than actual forecast error.

Tip: Aggregate high-frequency reference measurements to match the forecast data resolution (e.g., 15-minute or hourly) before comparing. This produces more meaningful results.

Figure 7: Difference in variability between forecast data and 5-minute reference measurements

Forecast horizons

Understanding forecast horizons

Forecast horizon refers to the future time period for which a prediction is made. Refer to the dropdown below for details.

Standard forecast horizon definitions for reference

H0 — intra-hour
H1 — hour-ahead
H2 — two hours-ahead
D0 — intra-day
D1 — day-ahead
D2 — two days-ahead
D3 — three days-ahead

Historical vs. operational forecasts

Historical forecast files contain predictions for a single, fixed horizon — evaluation can be performed directly on the file contents.

Operational forecast files may contain predictions from H0 up to D14 in a single time series. Evaluating these together is not meaningful, because uncertainty increases systematically with horizon length. Mixing horizons produces results that do not accurately represent any individual horizon's performance. More details

Important: Filter the forecast horizon of interest from operational forecast files and aggregate it into a single time series before evaluation. Consistent horizon evaluation ensures results are meaningful and comparable.

Forecast horizon and uncertainty

Shorter forecast horizons yield higher accuracy — there is less time for atmospheric conditions to change, and models benefit from more recent observational data. For longer horizons, small errors in initial conditions grow exponentially over time due to the chaotic nature of the atmosphere. Accuracy expectations must therefore be calibrated to the horizon — standards appropriate for H2 cannot be applied to D2.

Note: Forecast accuracy deteriorates as the forecast horizon extends — for all locations and climate types. This is a fundamental characteristic of atmospheric prediction.

The following tables show PV power output forecast accuracy (Bias, MAD, and RMSD) across six forecast horizons (H0 to D2) for three example locations. Data is the average of 2022 and 2023 values.

Leadtime	Bias [MW]	Bias [%]	MAD [MW]	MAD [%]	RMSD [MW]	RMSD [%]
H0	0.020	0.2	0.370	4.2	0.674	7.7
H1	0.041	0.5	0.479	5.4	0.845	9.6
H2	0.119	1.4	0.503	5.7	0.842	9.6
D0	0.151	1.7	0.538	6.1	0.893	10.1
D1	0.155	1.8	0.602	6.8	0.983	11.2
D2	0.152	1.7	0.649	7.4	1.034	11.8

Table 1: Forecast accuracy by horizon — Bratislava, Slovakia, Europe (daylight time)

Leadtime	Bias [MW]	Bias [%]	MAD [MW]	MAD [%]	RMSD [MW]	RMSD [%]
H0	-0.015	-0.4	0.209	5.9	0.384	10.8
H1	0.028	0.8	0.272	7.6	0.485	13.6
H2	0.098	2.8	0.294	8.3	0.504	14.2
D0	0.115	3.2	0.316	8.9	0.540	15.2
D1	0.092	2.6	0.325	9.2	0.546	15.4
D2	0.082	2.3	0.335	9.4	0.555	15.6

Table 2: Forecast accuracy by horizon — Johannesburg, South Africa, Africa (daylight time)

Leadtime	Bias [MW]	Bias [%]	MAD [MW]	MAD [%]	RMSD [MW]	RMSD [%]
H0	0.09	0.7	1.06	8.8	1.57	13.1
H1	0.25	2.1	1.29	10.7	1.85	15.4
H2	0.60	5.0	1.42	11.8	1.96	16.3
D0	0.70	5.8	1.51	12.6	2.09	17.4
D1	0.89	7.4	1.67	13.9	2.29	19.1
D2	0.95	7.9	1.72	14.3	2.35	19.6

Table 3: Forecast accuracy by horizon — Singapore, Asia (daylight time)

Factors affecting forecast accuracy

Forecast accuracy varies based on location, season, and the nature of the weather being predicted. Understanding these factors is essential for setting realistic expectations.

Seasonality

Seasonal weather patterns significantly affect PV power output forecast accuracy. NWP models perform better during stable atmospheric conditions and worse during rapidly changing weather. The following regional examples illustrate this:

Réunion Island: Two seasons with moderate differences — humid (November to April) and drier (May to October). Forecast accuracy expectations can remain relatively consistent year-round.
Central Mexico: Tropical climate with a rainy season from approximately May/June to September/October. Lower accuracy is expected during the rainy season due to more frequent weather fluctuations.
Southern France: Clear, stable summers (June–August) with high forecast accuracy; unstable autumn and winter (September–February) with higher uncertainty; improving conditions in spring (March–May).
Central Vietnam: Stable dry season (January–August) and a rainy season with typhoons and tropical storms (September–December). October and November present the highest uncertainty, with weather capable of shifting from clear skies to intense downpours within minutes.

Figure 8: Seasonality of forecast deviations — Réunion Island, PV power output (2023)

Figure 9: Seasonality of forecast deviations — Central Mexico, PV power output (2023)

Figure 10: Seasonality of forecast deviations — Southern France, PV power output (2023)

Figure 11: Seasonality of forecast deviations — Central Vietnam, PV power output (2023)

Location characteristics

A PV plant's location influences forecast accuracy through three main factors:

Climatic conditions: Stable, predictable climates — such as arid desert regions — yield higher forecast accuracy. Regions with highly variable weather, frequent cloud cover changes, or monsoon patterns are more challenging to forecast.
Topography: Mountainous terrain creates complex weather through orographic effects, leading to more frequent and sudden cloud cover changes. Flat terrain generally produces more predictable conditions.
Proximity to water and vegetation: Coastal areas and dense forest regions can produce localized weather phenomena — fog, lake-effect clouds, humidity-driven convection — that NWP models struggle to predict accurately.

The following tables compare forecast accuracy for two contrasting locations — Singapore (variable tropical weather) and Dubai (stable, arid climate):

Leadtime	Bias [MW]	Bias [%]	MAD [MW]	MAD [%]	RMSD [MW]	RMSD [%]
H0	0.09	0.7	1.06	8.8	1.57	13.1
H1	0.25	2.1	1.29	10.7	1.85	15.4
H2	0.60	5.0	1.42	11.8	1.96	16.3
D0	0.70	5.8	1.51	12.6	2.09	17.4
D1	0.89	7.4	1.67	13.9	2.29	19.1
D2	0.95	7.9	1.72	14.3	2.35	19.6

Table 4: Forecast accuracy by horizon — Singapore, Asia (daylight time)

Leadtime	Bias [MW]	Bias [%]	MAD [MW]	MAD [%]	RMSD [MW]	RMSD [%]
H0	0.06	0.3	0.38	2.0	0.77	4.2
H1	0.03	0.2	0.40	2.2	0.84	4.5
H2	0.02	0.1	0.40	2.2	0.81	4.4
D0	0.02	0.1	0.44	2.4	0.85	4.6
D1	0.02	0.1	0.45	2.4	0.88	4.7
D2	0.01	0.0	0.48	2.6	0.91	4.9

Table 5: Forecast accuracy by horizon — Dubai, United Arab Emirates (daylight time)

Dubai's Bias, MAD, and RMSD values are significantly lower than Singapore's across all horizons, reflecting the impact of climate stability on forecast accuracy.

The challenge of timing

A key forecasting challenge is predicting the exact moment a weather pattern changes. In solar power forecasting, a one-hour timing error — for example, clouds arriving at 16:00 instead of the predicted 15:00 — can produce large forecast deviations with real financial consequences for operators reporting to DSOs or TSOs.

The following examples show typical timing challenges in day-ahead forecasts (24–48 hour horizon):

A correctly predicted drop in PV power output occurs one hour later than forecast, producing a ~100 MW deviation at the predicted event time — despite the prediction being directionally correct.
Correctly predicted alternating clear-sky and overcast conditions are offset by approximately one hour throughout the day, producing alternating positive and negative deviations even when the daily energy total is accurate.

Figure 12: NWP day-ahead forecast vs. reference PV power output — steep drop timing example (Apr 6–7, 2021)

Figure 13: NWP day-ahead forecast vs. reference PV power output — variable weather timing example (Aug 11–12, 2022)

Note: Timing errors are inherent to atmospheric prediction and apply to all NWP models. They should be evaluated in the context of a full evaluation period, not individual events.

Forecast accuracy evaluation metrics

Forecast deviation (forecast error) is the numerical difference between forecasted and reference values of PV power output or GHI / GTI at each time step, expressed as an absolute value or percentage. Calculating forecast deviation is the first step in forecast accuracy evaluation.

Date	Time UTC+0	Forecast [kWh]	Reference [kWh]	Forecast deviation [kWh]
22.02.2024	8:00	400	383	17
22.02.2024	9:00	456	471	-15
22.02.2024	10:00	564	610	-46
22.02.2024	11:00	753	636	117
22.02.2024	12:00	673	526	147
22.02.2024	13:00	593	663	-70
22.02.2024	14:00	498	623	-125
22.02.2024	15:00	489	467	22

Table 6: Base inputs for forecast accuracy evaluation — PV power output example

Setting realistic forecast accuracy expectations

The final forecast product is a Time Series of predicted values, but producing it involves downloading, pre-processing, transforming, and post-processing large volumes of data from multiple NWP and Cloud Motion Vector models. Model combinations vary by location, season, horizon, and resolution. All models have inherent limitations that must be factored into accuracy expectations.

Realistic expectations should be based on analysis of at least 12 months of historical forecast performance, considering seasonality, time of day, forecast horizon, and location. Short evaluation periods — days or weeks — are not statistically reliable and can produce misleading conclusions.

Important: No forecast model produces accurate predictions every day. Accuracy expectations should be agreed upon between forecast data providers and PV power plant operators to ensure a shared, realistic understanding of what is achievable for a given location, season, and horizon.