HPC-generated data advance wind energy research
Researchers at Aarhus University have used GenomeDK via the DeiC Throughput HPC resources to generate large-scale, synthetic datasets for training data-driven wind-farm models.
What determines how much power a wind farm can extract? The answer may seem obvious - the wind - but in reality, the situation is far more complex. Wind continuously changes speed and direction, and within a wind farm, the turbines also influence each other, as a turbine can alter the airflow for those downstream. This makes it difficult both to predict energy production and to control wind-farm flow to improve how turbines operate together.
From “black-box” to physics-guided data-driven approach
The engineering models traditionally used to estimate wind turbine power output are highly simplified and struggle to capture the complex wind-farm flow physics, especially when turbines are placed close together in wind farms. This challenge motivated Tenure-track Assistant Professor Navid Zehtabiyan-Rezaie and Associate Professor Mahdi Abkar from the Department of Mechanical and Production Engineering at Aarhus University to pursue a different approach:
“Modeling, optimization, and control of wind farms are mainly conducted through low-order empirical models. These models are based on a set of simplistic physical assumptions, causing large errors and uncertainties in design and control strategies in wind-energy projects. To address this, we aimed at a paradigm shift from a ‘purely data-driven’ approach towards ‘physics-guided data-driven’ modeling.”, Tenure-track Assistant Professor Navid Zehtabiyan-Rezaie
The researchers’ goal was therefore to develop a new generation of models that can learn from large volumes of data while remaining grounded in the physical laws governing wind-farm flow physics. Achieving this requires access to a large, systematic, and well-controlled dataset on which data-driven models could be trained.
HPC-driven simulations as the path to data
Experiments and field measurements are vital for wind-energy research, but publicly available data are still scarce, particularly for benchmarking data-driven models across sites and conditions.
To address this, the researchers had to generate the data themselves using high-resolution CFD simulations that reproduce wind-farm flow physics under controlled conditions.
DeiC Throughput HPC is a collective term for the national HPC resources allocated via DeiC to three different HPC facilities: GenomeDK, Computerome, and Sophia.
Computational Fluid Dynamics (CFD) simulations numerically solve the governing equations of fluid motion to predict flow behavior. Instead of relying solely on field measurements, CFD enables detailed, high-resolution modeling of wind and turbulence in complex environments, providing insights into flow structures and interactions that are difficult to capture experimentally.
According to Navid, their first application for national HPC resources through DeiC Throughput in 2022 was decisive in reaching the scale and resolution required in the project:
“As a researcher, it is part of my work to explore available opportunities and identify the right resources, and I became aware of the DeiC call while searching for solutions that could support the computational needs of our projects. Based on the call guidelines, we decided that DeiC Throughput was the right setup for our project.”
Aarhus University provides guidance and support to its own researchers in terms of orientation to and use of national data management services and HPC resources.
The DeiC front office at Aarhus University is part of the Research Data Office, which provides support to AU's researchers in handling research data in a broad sense. The office consists of the former Data Protection Unit, the Open Science Coordinator, the Data Management Advisory Service and the URIS and Export Control Advisory Service.
Through the DeiC front office, researchers at AU can access extra computing power on supercomputers via local, national and international HPC resources. Depending on the specific resource, an application must be submitted - in some cases it is only possible when there are calls.
A virtual wind-farm laboratory
With access to the GenomeDK facility through DeiC Throughput, Navid and his colleagues were able to carry out extensive simulations of wind farms under a wide range of atmospheric conditions, turbine spacings, and operational parameters. Using the OpenFOAM software package, the researchers built parameterized CFD models of wind farms and simulated how wind flows through them and how turbines interact aerodynamically.
The CFD simulations effectively functioned as a virtual wind-farm laboratory, allowing the researchers to reproduce wind behavior with a remarkably high level of detail and extract data from the simulations as if they were measurements from a real wind farm. In this way, the synthetic dataset became a realistic training foundation for new models.
From heavy simulations to lightweight models
Based on the large dataset, the researchers were able to develop new, lightweight physics-guided data-driven models. While the CFD simulations themselves required massive computational resources, the training of data-driven models could be conducted on the local computing facility. The resulting models are fast enough to be used in the practical planning and operation of wind farms and can achieve accuracy approaching that of high-fidelity CFD.
Reliable performance enabled a smooth workflow
Navid emphasizes that the high availability and fast turnaround times of GenomeDK were crucial to the project’s success. The simulations required both substantial computational power and significant storage capacity, and the combination of performance, capacity, and stable workflow made it possible to complete the simulation work without interruptions. Initial setup was supported by GenomeDK’s HPC support team, after which the project progressed smoothly:
“The resources made available through DeiC Throughput enabled simulations that would simply not have been feasible within our standard computing resources. The high availability and stable workflow of GenomeDK allowed us to conduct large CFD campaigns with confidence, turning high-fidelity simulations into a practical foundation for data-driven wind-farm modeling.”
As the project evolved, Navid applied for and received additional compute time through subsequent national HPC calls, enabling him to expand the simulations and explore new scenarios.
Lessons learned and future scaling
If the project were to be repeated, Navid would focus on automating more of the data preparation and post-processing steps:
“Automating more of the pre-processing and post-processing pipeline would streamline large simulation batches. Introducing more advanced compression strategies would also reduce storage demands.”
The researchers are also open to scaling up to even larger resources in the future, including GPU-based HPC systems, should future studies require simulations of larger wind farms or more complex scenarios.
Researchers share both data and methods
The researchers have chosen to make the dataset and methodology available to others. In their scientific publications, they describe how the simulations are constructed—from domain setup and atmospheric modelling to turbine representation and the turbulence models used. This enables other researchers to build their own datasets following the same principles and to work from a shared, comparable foundation.
A part of the synthetic dataset has also been made publicly available and can serve as a reference for future studies. This provides a common basis for testing and comparing new methods across research environments. According to Navid, such openness is essential if data-driven models are to evolve from research tools into solutions that can be applied in practice:
“The successful experience in fields like computer vision, finance, and turbulence research shows that sharing data and code and collaborative work among research teams are essential to transform data-driven wind-farm models into reliable tools applicable in the real world.”
The researcher's results have been published in a number of scientific publications and on Zenodo.
Zehtabiyan-Rezaie, N., Abkar, M. (2024). An extended k -ε model for wake-flow simulation of wind farms. Renewable Energy, 222, 119904 https://doi.org/10.1016/j.renene.2023.119904
Amarloo, A., Zehtabiyan-Rezaie, N., Abkar, M. (2024). A progressive data-augmented RANS model for enhanced wind-farm simulations. Energy, 313, 133762 https://doi.org/10.1016/j.energy.2024.133762
Zehtabiyan-Rezaie, N., Iosifidis, A., Abkar, M. (2022). Data-driven fluid mechanics of wind farms: A review. Journal of Renewable and Sustainable Energy, 14(3), 032703. (Review article) https://doi.org/10.1063/5.0091980
Zehtabiyan-Rezaie, N., Iosifidis, A., Abkar, M. (2023). Physics-Guided Machine Learning for Wind-Farm Power Prediction: Toward Interpretability and Generalizability. PRX Energy, 2, 013009. Open Access. https://doi.org/10.1103/PRXEnergy.2.013009
Zehtabiyan-Rezaie, N., Iosifidis, A., Abkar, M. (2025). Supporting data – Physics-guided machine learning for wind-farm power prediction: Toward interpretability and generalizability. Dataset, version v1. Open data. https://doi.org/10.5281/zenodo.15593164