
Connect: Harnessing the “science of where” to unlock the value of geoscience data
August 5, 2020
By Marilynn Larkin
Elsevier is using Geofacets to integrate BHP data with Elsevier data, enabling a global natural resources leader to rapidly respond to new opportunities

BHP’s current data setup requires users to query and manage multiple data repositories. Elsevier is working with BHP to create a geoscience portal that will make their proprietary data discoverable alongside other relevant research and data.
Geoscientists at BHP(opens in new tab/window), a multinational mining, metals and petroleum company in Melbourne, Australia, were spending too much time searching for, georeferencing and formatting the data they needed to support time-sensitive decisions and innovative product development. The lack of organization and central data management even led the company to repurchase data they already had.
Now, as part of a multi-phase project, Elsevier and BHP are creating a geoscience portal powered by Elsevier’s Geofacets. The portal will make the company’s proprietary data discoverable alongside relevant research sourced from leading geoscience publishers and organizations.
“Effective, efficient management of data and information is critical for success and for managing risk in every industry,” said Geofacets Senior Technical Product Manager John Skero (opens in new tab/window) , “and natural resource exploration and production is no exception.”
Yet, as he noted in a recent webinar(opens in new tab/window), geoscience information is exploding, with data volumes exceeding 10 terabytes daily. Even as the data pours in, it can be buried in multiple repositories siloed in different parts of an organization, and it can be difficult to retrieve because it appears in diverse formats – graphs, well logs, photos, seismic profiles and stratigraphic

John Skero
The result: missed insights, lost opportunities to respond to time-sensitive offers, unnecessary expenditures and wasted manpower, as geoscientists working in exploration spend(opens in new tab/window) about 60 percent to 80 percent of their time searching for information they may already have, and another 20 percent georeferencing and formatting that information.
Research(opens in new tab/window) shows that two-thirds of natural resource exploration professionals say analytics is the most important capability for transforming their company. For companies like BHP, creating a portal that gives rapid access to current and archived geochemical, geophysical, spatial and land management information can be a good solution. According to Skero, this approach brings competitive advantages by facilitating more rapid discovery of investment opportunities and faster, more informed decisions.
Maximizing insights from multiple datasets and formats
“I don’t know what I know” is a problem many of Elsevier’s customers are facing across industries, and they are constantly looking for tools and solutions that can help them in their digital transformation journey, said Gilad Hoshen(opens in new tab/window), Senior Director of Industry Solutions at Elsevier.For BHP, having access to legacy data – information the company already has on board but may not be aware of – has been a core need.

Gilad Hoshen
Discussing the challenges of managing data at BHP, Giovanna Gamboa(opens in new tab/window), BHP’s Superintendent of Data Management, said(opens in new tab/window) the company’s legacy data spans more than 100 years of global exploration. Founded in 1800, BHP merged with Billiton in 2001 and has since been acquiring many other companies, each with their own data sources.“That legacy data is largely unstructured,” she said, “but we know it has high value. We need to unlock its value to help us find new mines around the world.”

Giovanna Gamboa
The “global legacy project,” as Gamboa calls it, is at the heart of Elsevier’s collaboration with BHP. It involves recovering 100 terabytes of information spread across about 80 million files and integrating that information with the company’s current production database — ultimately making this information rapidly searchable alongside public information in a geoscience portal. (Slide 19).
It’s the kind of project Elsevier’s Skero enjoys digging into:
I have a passion for working with enormous amounts of data that are like pieces to a puzzle – in this case, commercial data, proprietary data, government data and published data.

The end goal: BHP will be able to search for all relevant data via the geoscience portal with Geofacets.
The Geofacets platform enables the team to put the pieces together. Skero used the analogy of a popular entertainment-streaming device:
Think of a Roku stick that you plug into the back of your television. On that stick, there’s Netflix, Apple TV, Hulu, Showtime – all these different channels, each with its own programs. The portal is like that Roku stick. It allows you to do a single search on a movie and it points you to the channel where you can find it. Everything is aggregated so you can find whatever you need in one place, through a single search.
But before the portal can become a reality, three things need to be done with the actual data. First, the metadata – i.e., basic information such as file size, author, date created, date modified, as well as specialized information such as geolocation and fingerprinting – needs to be extracted and structured. Then, the data must be enriched with different taxonomies, and finally it must be normalized and delivered.
Normalization and delivery is arguably the most difficult part of the process, Skero said. “With BHP, we’re working with hundreds of different data formats. We need to get them into searchable file types (e.g., RTF, CSV, GeoTiff). Then we need to have a single way to interrogate the data. For instance, you may have multiple different scaled measurements of porosity or temperature or total organic carbon that need to be standardized in order to interrogate for analytics correctly.”
This is where the analytics and artificial intelligence capabilities built into Geofacets and similar Elsevier platforms come in. An example is natural language processing (NLP), a subset of AI, which makes geographical information easier to uncover from within journal articles and other documents. If a document contains the word “turkey,” for instance, Geofacets can quickly analyze the context. Is it near the word “Thanksgiving” or “bird”? Or the word “Istanbul”?
“That can tell us if the content is about a country, a place or a thing,” Skero said.
And that’s the kind of information that needs to be processed across every single document in the database.
Phased approach to data integration
Given the enormity of the project, Elsevier and BHP agreed to take a phased approach to analyzing and processing subsets of files and documents over about two years. Skero said this approach is different from what he’s seen other vendors do:
A lot of companies come to the client with a one-size-fits-all mentality and try to make the project fit their format. That may work in some cases, but we prefer to give a customized experience.
Phase one, which took about five months and involved 20 legacy projects and 10,000 files, was positioned as a proof-of-concept that would enable the BHP and Elsevier teams to work together, come up with best practices, and look at the initial results with a view toward building and strengthening the findings in future phases. Like many pilot data initiatives, it involved a mix of manual and automated data processing.
Now, in phase two, Elsevier and BHP are in the midst of a six-month effort involving 300 legacy data projects and about 1 million files. This phase includes productionized scaling enhancements and significantly more automation, according to Skero.
Phase three, scheduled for 2021, will include 2,000 legacy projects made up of 6 million files. Going forward, some 6,000 projects consisting of more than 15 million files will be brought into the system, published and integrated with BHP’s production database. At that point, BPH’s Gamboa said, the system should be ready to support multiple user groups, including the company’s data scientists, data managers, upper management and geoscientists.
What users said
After phase one, Elsevier surveyed BHP users on how satisfied they were with the data transformation project thus far. Ninety percent of respondents said they were “extremely satisfied” and “extremely likely” to recommend to others, according to Elsevier’s Hoshen. The team received high marks for communication, timeliness of deliverables, documentation of requirements, project management and experience/technical skill.
“Overseeing this project from the start, I’ve been experiencing the tremendous strength of our partnership approach,” Hoshen said. “Putting the customer at the center of all phases of data management, and focusing on cross-teams collaboration – with product, technology, operations, data science and life science groups – is enabling us to create a dynamic portal that will support BHP’s global leadership pipeline development for years to come.”
Geofacets in a nutshell
Simply put, “Geofacets is the science of where with the power of what,” says Geofacets Senior Technical Product Manager John Skero. “We’re taking data and showing you where it’s at, but also how to find it quickly – which is a powerful picture and helps address the ‘I don’t know what I don’t know.’” (slide 27)
As the largest digital, spatially aware database of scientific publications in the world, it provides access to:
Data from journals enhanced with geographic information and smart metadata to facilitate searching.
More than 2.2 million maps, tables, graphs, photos, stratigraphic columns, wells, cores, seismic profiles and x-sections.
Direct integration in ArcGIS with GeoTiffs, Shapefiles, and WMS/WFS feature services.
Geofacets enables customers to:
Cut costs by reducing third-party data purchases, consultant fees, and pre-production financial risks.
Facilitate a faster turnaround for time-sensitive decisions such as farm-ins, lease sales and pre-data room analysis.
Safely integrate proprietary data with public data for comprehensive search results.
Contributor

ML
Marilynn Larkin
Writer and Editor for medical, scientific and consumer audiences
The Lancet