“Hello this is John Doe. Interested in some data?” This is how a reporter of the German newspaper Süddeutsche Zeitung was contacted in February 2015 via an encrypted chat service. This source would eventually leak 2.6 terabytes of information detailing how a Panamanian law firm helped clients to setup anonymous offshore companies and bank accounts.
This data was finally revealed to the world in April 2016 as the “Panama Papers” and the company implicated as Mossack Fonseca.
What happened in-between these two events are an almost cloak-and-dagger tale of the enormous effort by hundreds of investigative journalists, all made possible by the extensive use of technology.
According to the German newspaper “the source wanted neither financial compensation nor anything else in return, apart from a few security measures.” No personal meetings ever occurred and communication was always encrypted. He did indicate to the newspaper that his life was in danger.
It must be stated that, although it is not necessary illegal to have offshore bank accounts, many wealthy individuals and/or criminal organizations hide money in these accounts to prevent paying taxes in their own country. The purpose of this post is not to implicate any individuals, companies or firms that assisted or benefitted from the Mossack Fonseca scheme. It is purely a glimpse into the technology behind this leak – at least the bit that was publicly revealed.
Size of the leak
Over the next couple of months, the source would systematically release pieces of data to the German reporter. According to Süddeutsche Zeitung the amount of information shared totaled to:
- Emails (4 804 618)
- Database formats (3 047 306)
- PDF documents (2 154 264)
- Image files (1 117 026)
- Text documents (320 166)
- Other (2 242)
To put it into context: The Ashley Madison hack of 2015 was reported to be around 30 GB’s worth of data; the Sony Pictures leak of 2014 a massive 230 GB. The Panama Papers outweighs this by more than 10 times!
How the leak began
According to sources, the leak started as a fairly “normal” hack of Mossack Fonseca’s (MF) email servers. Also typical of these hacks were that MF was not open about it and quick to respond. As seen in so many of these cases this is partly in order to limit damage to a company’s public image, directors or boards not on par with the technical knowledge as well as insufficient technical staff to deal with it.
In addition, Forbes reported that the main MF portal clients used to access their sensitive information ran on a 3-year old version of Drupal 7.23, which had known vulnerabilities at the time that could be exploited by hackers.
So, while MF was ‘dealing’ with the situation, the anonymous source was syphoning off huge amounts of sensitive customer information and sending it to the German newspaper.
Please note that I am not promoting or glamorizing hacking at all. What I can say is that, even though hacking is illegal, there is a public tolerance towards it when a ‘bad’ company is hacked. Take the hacks of Ashley Madison vs. Sony Pictures – the former is perceived as a valiant act. Like Robin Hood stealing from the rich to give to the poor.
Finding needles in a giant haystack
The German newspaper appointed a five-person team that worked tiredly for two months to verify if the data was genuine.
It very soon became apparent that one of the major aims in the vast majority of cases were to conceal the true identities of the owners.
Trying to connect the dots in this web of complex secret transactions almost became and addiction to the team. “We often messaged each other at crazy times, like 2 a.m. or 4 a.m. about the newest findings” one of the reporters said.
But the sheer amount of data proved too much for this small investigative team. Imagine trying to find a cash receipt of a purchase made during a holiday 15 years ago and then cross-reference it to an email from a travel agency to confirm that you personally booked that holiday. Maybe not so difficult if you scanned and saved the receipt with a properly named filename and kept archives of your email conversations. Now try finding that information buried inside 2.6 TB’s worth of data without knowing any names up front nor that such a relationship should exist in the first place…
At the end, the newspaper could not sift through emails and account ledgers (covering nearly 30 years) on their own. They had to seek help and found it in the form of the International Consortium of Investigative Journalists.
Help from Down Under
The International Consortium of Investigative Journalists (ICIJ), is a non-profit organization based in Washington D.C. and has coordinated several previous projects that investigated financial data leaks.
Apart from Süddeutsche Zeitung the ICIJ invited many other influential newspapers and news agencies from across the world to form a coalition with the common goal of investigating the Panama Papers. This included Le Monde (France), The Guardian and the BBC (Britain) and La Nación (Argentina). Eventually 400 journalists joined forces in this effort.
From the start it was critical that the ICIJ investigation remained secret. But data still had to be shared between hundreds of journalists across the globe. In order to achieve this many software systems and packages had to be utilized – some open source as well as proprietary.
Journalists had to ensure that all files and their replicas were spread across different encrypted hard drives, using VeraCrypt software to lock up the information.
Süddeutsche Zeitung decided to use the software of an Australian-based company, called Nuix, which assisted the ICIJ in leak investigations before.
Nuix specializes in turning huge amounts of unstructured data into an indexed and searchable database. Its origins dates back to the year 2000 when a group of computer scientists at a Sydney university were exploring ways to process large amounts of data at high speed.
The newspaper journalists started by uploading the millions of documents to high-performance computers that were never connected to the internet in order to prevent the story from “breaking too early” or from those seeking to destroy it.
Once uploaded, they used optical character recognition (OCR) software to transform scanned images, such as ID documents or signed contracts, into human readable and searchable files. They could then start analyzing the data by applying searching algorithms provided by Nuix’s software. This allowed journalists to formulate questions that would in turn kick off the backend database search to look for matching data – exactly how web-based search engines work. The ability to index and analyze all types of data was the real key to the success behind the project.
Nuix actually stands for New Universal Intelligence Exchange engine, the name given to the software by the Australian computer scientists that developed it. The driving force behind Nuix was Jim McInerney who formed the company but passed away in 2004. Jim and his team originally started by processing email files on large scale, but they soon developed techniques to reverse engineer all major file formats, including some complex and proprietary ones like tiff images.
Just before Jim’s death, Nuix won a contract with the Australian Department of Defense. His family tried to run the company after his death, but eventually had to bring in a professional management team in 2006, led by new CEO, Eddie Sheehy that have worked with companies like Cisco before.
Some unexpected interest in their company resulted from the financial crisis of 2008. A lot of money were lost in global financial markets due to the property bubble crash and people demanded answers. Software was supposed to help figure out what went wrong and who was to blame. Nuix became one of those solutions.
Nuix is certainly not the only firm that provides data processing solutions – even the ICIJ used other software as we will see later. Nuix made its name for being very fast in processing huge amounts of data.
Understanding files “at the level of ones and zeroes is what allows Nuix to achieve reliability and speed at scale” Eddie explains on their website.
The ICIJ’s technical army
The ICIJ is by no means just a group of ‘old-school’ investigative journalists. The have all the means and expertize required to operate in today’s digital and data-driven world.
Tools and software that were used in previous leak investigations were re-used or enhanced during this investigation. A lot of these tools were open-source.
Their search tool was based on Apache Solr and combined with Apache’s Tika, an indexing software that can also parse different file types like PDF.
They utilized a database search program called Blacklight that allowed the teams to hunt for specific names, countries or sources. On top of that there was also a real-time translation services for documents that were created in other languages. (Journalists primarily used English as the communication language).
Each news organization took their own precautions, restricting access to the secure computers that were used to connect to the ICIJ’s servers and ensuring that these were not accessible through their newsrooms’ regular networks.
The news broke
When the findings of the Panama Papers were released to the world on 3 April 2016, it immediately caught the public’s attention – mainly because some well-known and powerful individuals were implicated.
But although the story ensured some sensational newspaper headlines for many weeks it really was the hard work and collaboration effort of hundreds of individuals that worked behind the scenes that made it possible. All with the help of technology.