Big bang? When ‘big data’ gets too big

Has the incredible sophistication of big data in the airline business gone too far? In some cases yes, says EyeforTravel guest columnist Tom Bacon. He argues that applying it in too granular a way can, in fact, lead to worse performance.

Airlines helped invent ‘big data’. After all, the revenue management department compiles bookings by fare by market by day by days-before-departure and then uses this gargantuan database to develop regression-based algorithms for forecasting future bookings. If that isn’t enough to make you site up then consider this: a 50 aircraft airline will typically generate 7.5 million forecasts based on 50+ million datapoints each night! Yes, we forecast the number of passengers who will pay $129 for a 7am September 4th flight from Chicago-Los Angeles flight between 30 and 35 days before the flight! Similarly, we forecast the total revenue for the last flight of the day for Detroit-New York LaGuardia flight the Friday before Easter!

Certainly this is impressive and many airlines – and other industries -- have now taken this ‘big data’ analytics to even ‘bigger’ levels. However, in my view, the incredible sophistication has in some cases gone too far. Statisticians know that forecasts based on highly granular inputs don’t always produce more accurate aggregate forecasts. But many airlines seem to believe that the more granular, the more precise we can be, the better performance we’ll have.

Once upon a time I worked for a national economic forecasting firm that developed quarterly forecasts for various macro-economic metrics – including gross domestic product, inflation, and personal income and other metrics. The forecasts were based on an integrated model of the economy based on statistically-derived drivers. We used the macro-forecasts, in turn, to forecast the performance for more specific industries for our various corporate clients. The total demand for goods in the US was not forecast based on each individual good’s micro forecast - which, arguably, would have been a much more granular approach. In fact, we used the macro forecasts to forecast the more micro activities. We developed accurate aggregated forecasts – based on macro activity and an integrated model of the economy – and in turn applied them to generate forecasts for more micro-level activities, including demand for TV advertising and regional telephone services.

Picking it apart

However, for many airlines, the demand forecast for a particular flight is the sum of the forecast of each price point. This approach introduces considerable statistical error that more often than not leads to greater inaccuracy. Rather than offering greater precision, the more precise approach often drives tremendous aggregate forecast error.

o        Flights and fares are often substitutes for each other. To forecast $129 fares and $159 fares separately and then sum them -- rather than forecasting demand at a more aggregate level – ignores this substitution effect. In fact, demand for $159 fares could be increasing BECAUSE of the decline in $129 fare availability. By summing demand for fares that serve as substitutes, the forecast model can be much more robust.

o        Flights and fares also tend to move together (the statistical name is co-linearity). As overall demand increases – with stronger market or macro-economic performance – demand for each fare level will tend to increase. Again, the best way to capture this aggregate market effect is to model the co-linear elements together rather than try to forecast each separately.

o        At the most granular level there is considerable noise or volatility. Developing a simple regression model through historic demand for some of the higher fares, for example, is often futile. 

§         Forecasting the 1.0 passenger expected to pay $500 for a certain flight at a certain time on a certain date in the future has lots of error inherent in it. A 1.0 ‘average’ passenger demand could be based on a group of five corporate passengers travelling together one time in the base period and, more often than not, there was no such demand.

§         More robust patterns can emerge from more aggregated data. What is the next likely number in this series: 0, 4, 1, 0, 5, 2? Forecasting demand in price points of only one to three passengers is false precision. A clearer pattern is likely to emerge with larger numbers that are the sum of multiple price points, for example: 10, 12, 15, 12, 18, 20

Of course, to optimise fare availability, revenue management must forecast both aggregate demand for a flight AND demand at the various price points. As in my economic forecasting scenario, however, the better approach is to forecast at a more aggregate level (demand for 150 passengers) and then use that, in turn, to produce the more micro-level forecasts for use in RM (for example, 3% of the aggregate demand is likely to be $299 fares).

Airline case study

I recently worked with an airline that found they were forecasting flight demand at too micro a level – at each price point for each O&D. A 150-passenger aggregate forecast for a flight could be the sum of 200 separate micro forecasts. In aggregate, the flight level forecast was consistently overstated – and at times it could be 40% too high! The RM System closed most flights in response to the forecasted high demand. The highly granular forecasting approach resulted in an essentially unusable system. Revenue management, prudently, overrode the RM system-developed forecasts much of the time.

The RM vendor acknowledged the problem and helped develop a forecast modeling approach at a more aggregate level – and that new approach dramatically improved forecast accuracy. We clustered demand based on fare rules – recognising that fares under the same rules can more easily be substitutes for each other and that the most unrestricted tickets (full changeability/full refundability), for example, may represent more differentiable demand.

Of course, ‘big data’ has proven tremendously useful for airlines in revenue management. Other industries have followed airlines in this arena – and applied big data successfully in whole new ways. However, airlines also prove that big data on its own is not the answer to all business problems. Applying big data in too granular a way can lead to worse performance. Certainly airlines – and other users of big data – need to be cautious in applying it inappropriately.  

This guest article is by Tom Bacon, a former executive at five different airlines and industry consultant in revenue optimisation. His views are his own and do not reflect those of Questions? Contact Tom at

Related Reads

comments powered by Disqus