April 2018, San Francisco
Data science and the search for truth
In an ideal world, data specialists would agree on a single version of the truth but that is easier said than done
Getting data into an organised, clean, coherent and usable state is tricky, but it is something that data professionals are constantly striving for.
Speaking on a keynote panel at EyeforTravel’s recent Amsterdam show, Andrei Grintchenko, head of business intelligence at IATA, an association which supports global aviation standards, said: “Fragmented data, from multiple sources or the same data from multiple sources is your worst nightmare.”
According to Grintchenko, having one single version of truth (SVOT) is of paramount importance in any data based discussion, or it makes no sense. While agreeing on what this SVOT should be is difficult, at the very least, companies moving to being data and analytics driven needed to have a conversation about it.
Finding your place on the continuum
In what proved to be an interactive panel session, Mark Shilton, Skyscanner’s principal data scientist,who was in the audience, said agreeing on a single version of truth is, of course, the ideal but he’d never seen that happen in any organisation. Firms are always somewhere on “a continuum between finding a single version of truth and total chaos”.
Every company is different, however, so it is important to assess one’s own place in that journey. Grintchenko said it was important to have the right organisational structure, and people equipped with the right tools and mindset. Without this, he said, data-driven decisions could be silly, irrational, and even dangerous.
Without the right organisational structure, data-driven decisions could be silly, irrational, and even dangerous.
For some businesses finding that single version of truth is even more of a challenge. Yann Raoul, CEO of Gopili, a metasearch business for ground transportation said in his corner of the industry there were many different data standards and formats. “I looked at ten different players and there were 12 different ones,” he said.
Gopili has nevertheless created a metasearch, which aggregates data from these different carriers for its audience of travellers in seven countries across Europe. However, as the volumes of structured and unstructured data have grown, it has become harder work with. According to Raoul, a lot of things today fall out of the scope of what would have been considered the “classical data industry”.
Raoul said one of the five main challenges he’d identified was managing huge volumes of data. To put in perspective, just how much there is around, in his opening address, Leo Langford, EyeforTravel conference director, pointed out that today only 0.5% of available data is used.
Reliability is another challenge for Gopili, which aggregates data from multiple external sources. “In our business we drive users from metasearch to partner websites for the transaction. We have our own internal tracking system but also work with external and client tracking systems. When we do the report on the number of clicks and transactions there is sometimes and error margin of 20% [between the different data sources]. In our industry we must have a maximum 5% margin of error – anything higher kills the business.”
In our industry we must have a maximum 5% margin of error – anything higher kills the business
To address this, Gopili has built its own internal system, which creates alerts if the data error looks too high. When the reliability of data is in question, it is stored somewhere. “All decisions we take from aggregating metadata is only based on reliable data,” Raoul said.
Other challenges facing data professionals include new EU and local regulations (GDPR and ePrivacy will be two of the main keywords for 2018), choosing the right tools in what is still an immature market and building a data-driven internal culture.
IATA is a bit luckier with its data, which, according Grintchenko, “is a little cleaner at the entry point because of a relatively high degree of standardisation”.
However, IATA also needs to put some effort into normalising and aggregating data for consumption, which involved putting in place procedures to compare it with other sources of data and market information to ensure minimal margin of error.