Machine Learning for Solar Generation Forecasting

On 11 October 2025, rooftop solar generated more electricity than every coal-fired power station in Australia combined. For about three hours on a mild Sunday afternoon, the National Electricity Market was running on sunshine, wind, and hope that nothing went wrong. That last part, the hope, is the problem we need to solve.

Australia's grid is undergoing the fastest energy transition of any developed nation. Rooftop solar alone now exceeds 20 GW of installed capacity, spread across more than 3.5 million households. Add utility-scale solar farms and you've got a generation fleet that produces enormous volumes of clean energy but can't tell you with much certainty how much it'll produce tomorrow afternoon. Traditional weather-based forecasting gets you part of the way there. Machine learning gets you the rest.

Why Forecasting Accuracy Matters More Than You Think

Every megawatt-hour of solar generation that wasn't predicted creates a problem somewhere in the system. Overforecast, and AEMO has dispatched gas peakers or held battery reserves that weren't needed, costing money and producing unnecessary emissions. Underforecast, and you get frequency deviations, emergency interventions, and in extreme cases, load shedding.

The financial stakes are significant. In the NEM, a 5-minute dispatch interval can see prices swing from -$1,000/MWh (yes, negative, generators paying to stay online) to $17,500/MWh. If you're a retailer or large industrial consumer managing exposure to these prices, the difference between a good and bad solar forecast is measured in millions of dollars per year.

For grid operators, forecasting error directly translates to the volume of reserves that must be held. AEMO currently procures Frequency Control Ancillary Services (FCAS) based partly on expected renewable variability. Better forecasts mean lower FCAS requirements, which means lower costs passed through to consumers. The Australian Energy Market Commission has estimated that improved renewable forecasting could save $200-400 million annually in system costs by 2030.

The ML Approaches That Work

Solar forecasting isn't a single problem. It's several problems wearing a trenchcoat. The approach that works best depends on the forecast horizon, the spatial resolution you need, and the data you have available.

Gradient Boosted Trees (XGBoost, LightGBM)

For day-ahead and intra-day forecasting at individual site level, gradient boosted trees remain remarkably hard to beat. They're fast to train, interpretable, handle missing data gracefully, and perform well with the tabular feature sets common in energy forecasting: numerical weather prediction outputs, historical generation, time-of-day encoding, calendar features, and site metadata.

In practice, a well-tuned LightGBM model with good features will outperform most deep learning approaches for horizons of 1-48 hours at individual sites. I've seen this pattern repeatedly. Teams spend months building LSTM architectures only to find that a gradient boosted model with better feature engineering delivers equivalent or superior accuracy at a fraction of the training cost.

The key is feature engineering. Raw NWP data (global horizontal irradiance, temperature, cloud cover, wind speed) is just the starting point. Derived features matter enormously: clear-sky index ratios, rolling cloud variability metrics, lagged generation values, inter-site correlation features, and seasonal decomposition components. A good feature set makes a simple model powerful.

LSTMs and Recurrent Architectures

Long Short-Term Memory networks earn their keep in two scenarios: very short-term forecasting (nowcasting, 5 minutes to 4 hours ahead) and portfolio-level forecasting where you're predicting aggregate output across many sites simultaneously.

For nowcasting, the advantage of LSTMs is their ability to learn temporal patterns in the generation signal itself. Cloud shadows, which cause the rapid ramps that make solar so challenging for grid operators, create characteristic patterns in the generation time series. An LSTM trained on high-frequency generation data (1-5 minute intervals) can learn to anticipate these ramp events better than NWP-based approaches, which lack the spatial and temporal resolution to capture individual cloud movements.

The architecture that works best in practice is an encoder-decoder LSTM with attention. The encoder processes a lookback window of recent generation and weather data, the decoder produces the forecast sequence, and the attention mechanism allows the model to weight different parts of the input history differently depending on current conditions. Sunny, stable conditions? The model attends to seasonal patterns. Rapidly changing cloud cover? It focuses on the most recent observations.

Transformers

Transformer architectures are the newest entrant and the most exciting for certain problem types. Their ability to model long-range dependencies without the vanishing gradient problems of RNNs makes them natural candidates for capturing seasonal and weather-regime patterns that span days or weeks.

The Temporal Fusion Transformer (TFT) has shown particular promise for solar forecasting. It combines multi-horizon forecasting with interpretable attention mechanisms, quantile regression for uncertainty estimation, and the ability to incorporate both static metadata (site location, panel capacity, tilt angle) and time-varying inputs (weather forecasts, calendar features).

Where transformers really shine is in fleet-level forecasting for large portfolios. When you're predicting aggregate output across hundreds of distributed sites, the cross-attention mechanisms can learn spatial correlation patterns that simpler models miss. If clouds are moving southeast at 40 km/h and sites to the northwest are already seeing output drops, a transformer can propagate that information to improve forecasts for downstream sites.

The downside is cost. Transformers need more data, more compute, and more expertise to train well. For a single utility-scale site, they're overkill. For a retailer managing 50,000 rooftop systems, they're starting to justify the investment.

Australia's Unique Grid Challenges

Australia's grid presents forecasting challenges you won't find in European or North American literature, and that matters because most published solar forecasting research comes from those regions.

The duck curve is a canyon here. Australia's ratio of distributed solar to total demand is the highest in the world. On mild, sunny days, net demand (total demand minus rooftop solar) drops so dramatically during the middle of the day that it creates operational challenges no other grid has faced at this scale. Minimum operational demand in South Australia has gone negative. Forecasting the depth and timing of this trough is critical for managing the system.

Climate variability is extreme. Australian weather patterns are dominated by phenomena that create forecasting challenges: ENSO cycles affecting cloud patterns across entire seasons, the Indian Ocean Dipole influencing rainfall and cloud cover on the western seaboard, and cut-off lows that produce rapid, difficult-to-predict weather changes. Models trained on benign European conditions don't transfer well.

The grid is long and thin. The NEM stretches from Far North Queensland to Tasmania, covering 5,000 km and multiple climate zones. Transmission constraints mean that surplus solar in one region can't always reach demand in another. Forecasting needs to be spatially disaggregated to be operationally useful.

Behind-the-meter is massive. More than half of Australia's solar capacity sits on rooftops behind the meter, invisible to AEMO's direct monitoring. Forecasting this distributed fleet requires fundamentally different approaches than forecasting a utility-scale farm with SCADA data. You're estimating aggregate behaviour across millions of small systems with varying orientations, ages, shading conditions, and degradation rates.

Virtual Power Plants and ML Orchestration

Virtual Power Plants represent the convergence of solar forecasting with real-time control. A VPP aggregates distributed energy resources, rooftop solar, home batteries, controllable loads, and dispatches them as a coordinated fleet in response to market signals and grid needs.

ML plays a dual role in VPP orchestration. First, it provides the generation and consumption forecasts that determine what the VPP can offer to the market. If your VPP aggregates 10,000 homes with solar and batteries, you need to predict how much solar each cluster will generate, how much each home will consume, and therefore how much stored energy is available for dispatch. The accuracy of these forecasts directly determines the revenue the VPP can capture and the penalties it avoids for failing to deliver on market commitments.

Second, ML optimises the dispatch decisions themselves. Battery charge/discharge scheduling, demand response activation, and export curtailment all require real-time optimisation that balances multiple objectives: maximising market revenue, maintaining battery health, respecting customer comfort preferences, and meeting grid reliability obligations. Reinforcement learning approaches are showing promise here, learning dispatch policies that adapt to changing market conditions and fleet characteristics.

The Australian VPP market is growing rapidly. SA Power Networks' VPP trial, Tesla's South Australian VPP, AGL's virtual power plant, and numerous smaller aggregators are all deploying ML-driven forecasting and optimisation. The AEMC's rule changes enabling aggregated bids from small resources are accelerating this trend.

Building a Forecasting Stack

For organisations looking to build or improve solar forecasting capabilities, here's what a production-grade stack looks like:

Data ingestion. NWP data from Bureau of Meteorology (ACCESS model), satellite imagery (Himawari-9), SCADA or smart meter data for generation actuals, and third-party weather forecast APIs for ensemble diversity.
Feature engineering pipeline. Automated feature computation, clear-sky modelling (pvlib is excellent for this), spatial interpolation for distributed fleet estimation, and data quality checks that catch sensor failures before they corrupt your training data.
Model training and selection. Multiple model types trained in parallel with automated hyperparameter optimisation. Ensemble the top performers. Retrain on a rolling basis as new data arrives and seasonal patterns shift.
Probabilistic outputs. Point forecasts aren't enough. Grid operators and traders need uncertainty quantification. Quantile regression, conformal prediction, or ensemble spread methods should produce prediction intervals alongside central estimates.
Monitoring and feedback. Continuous tracking of forecast accuracy across horizons, seasons, and weather regimes. Automated alerts when model performance degrades. Regular comparison against persistence and climatological baselines to ensure your models are adding genuine value.

The common mistake is over-investing in model architecture and under-investing in data quality and feature engineering. A mediocre model with excellent data will outperform a brilliant model with noisy inputs every time. Start with the data.

What's Coming

Foundation models for weather and energy are emerging. Google's GraphCast and Huawei's Pangu-Weather have demonstrated that ML can match or exceed traditional NWP models for medium-range weather forecasting at a fraction of the computational cost. Purpose-built foundation models for energy forecasting, pre-trained on global generation data and fine-tuned for specific markets, are likely within two years.

The implication for Australian energy companies is straightforward: ML-driven forecasting isn't a competitive advantage for much longer. It's becoming table stakes. The organisations that invest now in data infrastructure, modelling capability, and operational integration will be the ones best positioned when the market expects it from everyone.

Want to explore ML forecasting for your energy portfolio?

We help Australian energy companies build production-grade forecasting systems that improve grid integration, optimise trading positions, and enable VPP orchestration.

Book a Discovery Call