In this document
This article describes the methodology behind the Solargis hail risk model. It covers the input datasets, machine-learning model, output definition, and guidance on interpreting the results.
Usage in Solargis platform |
|---|
This model is used in Solargis Prospect. |
Overview
The Solargis hail risk model quantifies long-term exposure to damaging hail environments in a spatially and temporally consistent way at global scale. The model approach is inspired by peer-reviewed research on hail hazard modeling (Torralba et al., 2023; Prein & Holland, 2018).
Hail poses a distinct threat to photovoltaic assets. Standard PV modules are generally certified to resist moderate hail impacts, typically defined by ice spheres of approximately 25 mm in diameter. Exposure to larger hailstones may cause glass fracture, micro-cracks, and latent degradation that manifests over multi-year periods. Hailstones exceeding 50 mm frequently surpass certification limits, often resulting in immediate destruction of standard PV assets. With the continued expansion of utility-scale PV into regions prone to severe convective activity, quantifying long-term hail risk has become a critical input for site selection and feasibility assessments, mounting structure and tracker configuration decisions, PV module and glass specification choices, and insurance evaluation and risk transfer strategies.
The methodology combines the ERA5 atmospheric reanalysis with a random forest machine-learning model trained on observed hail events. Hourly atmospheric conditions are evaluated, and results are aggregated into a long-term climatological metric: the average number of hail-risk days per year.
The Solargis hail risk model is designed for strategic and planning applications and does not predict individual hailstorms or provide operational warnings.
Important: Short-range, event-based hail mitigation is addressed in the Solargis Forecast product.
Input datasets
ERA5 atmospheric reanalysis
The primary input is the ERA5 atmospheric reanalysis (Copernicus Climate Change Service), which reconstructs the historical state of the atmosphere by combining numerical weather prediction models with a wide range of observations.
ERA5 is selected for this application because it provides:
global spatial coverage with uniform methodology
multi-decadal temporal consistency
physically coherent representation of convective environments
ERA5 supplies the atmospheric conditions that control hail formation, including instability, vertical wind shear, thermodynamic structure, moisture availability, and freezing-level characteristics.
Hail observation datasets
Observed hail events are used for model training and independent validation. Two datasets are used with a strict separation of roles:
SPC hail reports (NOAA Storm Prediction Center): used exclusively for model training. SPC provides a long, methodologically consistent record of hail observations over the United States.
ESWD hail reports (European Severe Storms Laboratory): used only for independent validation. ESWD data is not included in training and serves to verify that the model generalizes beyond the SPC reporting system.
This separation ensures that model performance reflects physically meaningful hail-favorable environments rather than dataset-specific reporting artifacts.
Model design
Atmospheric features
The model input features represent key physical processes relevant to hail development:
atmospheric instability supporting strong updrafts
vertical wind shear influencing storm organization
thermodynamic structure affecting hail growth and melting
moisture availability in the lower and mid-troposphere
Single-time evaluation strategy
Atmospheric conditions are evaluated at a single ERA5 analysis time step, with no temporal aggregation or sliding window applied at the feature level. This approach reduces sensitivity to overfitting, improves interpretability, and ensures the Solargis hail risk model learns physically meaningful hail-favorable environments rather than event-specific temporal extremes. Temporal persistence is addressed exclusively during the output aggregation stage.
Machine-learning model
A random forest classification model translates atmospheric conditions into a probabilistic hail-risk signal. Random forests are well suited for climatological applications because they:
capture non-linear relationships between atmospheric predictors
model interactions among multiple variables
remain robust in the presence of noisy or incomplete observations
limit overfitting through ensemble averaging
The model is trained exclusively on SPC hail observations, including both hail and non-hail environments to ensure balanced learning. Temporal separation between training and testing samples is applied using fixed multi-day blocks, reducing artificial skill inflation caused by temporal autocorrelation. Multiple randomized splits are used to assess model stability.
Model validation
Several alternative feature configurations and temporal aggregation strategies were evaluated during development, including the use of convective inhibition and short temporal windows. These experiments assessed model sensitivity and potential overfitting, and informed the selection of the final production setup.

Figure 1: Model robustness across alternative feature configurations and temporal strategies. False-negative and false-positive rates are averaged over multiple temporally blocked train/test splits. Results demonstrate stable performance across configurations and independent datasets, indicating limited overfitting and robust generalization.
Model performance is evaluated against both SPC and ESWD observations. Validation confirms that the Solargis hail risk model trained exclusively on SPC data performs comparably when evaluated against ESWD observations, demonstrating that it captures physically meaningful hail-favorable environments rather than dataset-specific artifacts. Validation focuses on the model's ability to:
identify hail-favorable environments associated with damaging hail
maintain stable false-negative and false-positive rates across datasets
generalize beyond the spatial and reporting characteristics of the SPC system
Output and climatological aggregation
Hourly hail-risk signal
The Solargis hail risk model produces an hourly probabilistic signal representing the likelihood that the atmospheric environment is favorable for damaging hail.
Hail-risk day definition
Hourly probabilities are aggregated at the daily scale. A hail-risk day is identified when high-confidence hail-favorable conditions persist for several hours within the same day. This approach emphasizes sustained exposure to hail-favorable environments rather than isolated, short-lived peaks.
Post-processing and normalization
A monotonic scaling transformation is applied to the final aggregated output to improve interpretability and visual consistency of the long-term climatology. This transformation:
compresses the dynamic range of hail-day frequency
preserves spatial patterns and relative risk ranking
does not modify the underlying machine-learning model or its predictions
The transformation is applied solely to the final climatological output to enhance comparability across regions.
Long-term climatology
Daily hail-risk indicators are aggregated over multiple years to produce the climatological metric: the average number of hail-risk days per year (Figure 2). Average monthly aggregates indicate seasonal variability for analyzed locations (Figure 3).

Figure 2: Global hail risk model output (2015–2024). The map highlights regions with recurrent damaging hail environments and is optimized for long-term risk comparison rather than event-level prediction.

Figure 3: Average number of days per month with potential for a significant hail event, showing seasonal variability for three selected sites (see locations in Figure 2).
How to interpret the data
Damaging hail events are inherently localized. Reported counts are aggregated per 0.25° grid cell, corresponding to an area of approximately 28 × 28 km. A value of five, for example, indicates the potential for several localized damaging hail events within this area in an average year, not widespread or continuous exposure.
The global distribution of significant hail potential is highly uneven (Figure 2):
Grey areas: no atmospheric conditions conducive to significant hail formation
Green areas: occasional favorable conditions, up to approximately one potential event per year
Blue areas: up to five potential events per year
Violet areas: more than five potential events per year
Seasonal variability differs markedly across regions. Figure 3 presents three locations, each averaging approximately seven hail-prone days per year, but with substantially different seasonal patterns:
Goya, Corrientes (Argentina): hail-favorable conditions occur in most months, indicating weak seasonal modulation
Hays, Kansas (USA): hail potential is concentrated in summer, with a peak from May to August
Asanol, West Bengal (India): dominant activity occurs in the pre-monsoon season, with strong peaks in April and May
Tip: Characterizing seasonal variability supports timely planning of monitoring, emergency response, and mitigation measures throughout the year for PV operators.
Limitations
The following limitations apply when interpreting hail risk model output:
The model does not resolve individual convective storms.
Short-lived extreme hail events may not be explicitly isolated.
Results depend on the quality and spatial resolution of the ERA5 reanalysis (0.25°, approximately 28 km).
Observational datasets used for training and validation contain reporting biases, particularly toward populated areas.
These limitations are inherent to large-scale climatological analyses of severe convective phenomena.
Relationship to hail forecast products
The Solargis hail risk model and Solargis Hail Forecast address complementary aspects of hail risk and together form a comprehensive framework for managing hail risk throughout the full lifecycle of PV assets:
Hail risk model quantifies long-term exposure and supports planning, site selection, and design decisions.
Hail forecast provides short-range warnings for operational mitigation.
Further reading
"Modelling hail hazard over Italy with ERA5 large-scale variables": Torralba, V.; Hénin, R.; Cantelli, A.; Scoccimarro, E.; Materia, S.; Manzato, A.; Gualdi, S.
"Global estimates of damaging hail hazard": Prein, A.F.; Holland, G.J.
"Severe Weather Database Files (1955–2024)": NOAA Storm Prediction Center.
"European Severe Weather Database (ESWD)": European Severe Storms Laboratory (ESSL).
"Hail forecast": Solargis knowledge base.
"Validation of hail forecast data": Solargis Knowledge Base.
"Prospect map layers": Solargis Knowledge Base.