Gravitational waves discovery shows why software should be every scientist’s business
July 26, 2021
By Ian Evans
“Software goes hand in hand with scientific discovery.” — Kate Keahey, PhD, Editor-in-Chief, SoftwareX special issue on gravitational waves discovery
Caption: Dr Kate Keahey is Editor-in-Chief of SoftwareX’s “Special issue on Software that Contributed to Gravitational Wave Discovery.” You can view the issue here.
Over the past half century, software has emerged as one of the most critical vehicles for scientific discovery. From text mining and running simulations to the rapid processing of vast amounts of data, it has become fundamental to the advancement of science. Software is a vital part of the research landscape, and most researchers will benefit from understanding its possibilities, limitations and the requirements for building it.
However software is under-appreciated, as Dr Kate Keahey(opens in new tab/window), Senior Computer Scientist at Argonne National Laboratory(opens in new tab/window) and the Consortium for Advanced Science and Engineering of the University of Chicago(opens in new tab/window), explained:
A lot of the people who channel their originality and creative thoughts into the development of software can see their careers languish, rather than getting results. I want to change that.
Kate is also Editor-in-Chief of the Elsevier-published open access journal Software X(opens in new tab/window), which enables the publication of research software. In May 2021, the journal published a special issue on software that contributed to the gravitational waves discovery(opens in new tab/window).
Speaking from her home in Chicago, Kate described gravitational waves as “one of the most important scientific discoveries of modern times.” It resulted in a shared Nobel prize for Rainer Weiss and Kip S Thorne, together with Barry C Barish(opens in new tab/window), and opened up an entirely new way of studying the cosmos. For Kate, it’s a fantastic case study for the way software supports discovery:
To register gravitational waves, the research team needed to build interferometric gravitational wave detectors. That’s a whole a new kind of scientific instrument that needs to be built and fine-tuned for incredible sensitivity. Just to build that dedicated instrument requires simulations that themselves demand an enormous amount of computational power.
That’s just to get going. Kate pointed out that the instruments then need dedicated software to operate. And the dedicated software requirements don’t end there, Kate said:
Once you’ve got to this point where you’re actually capable of recording things, the instruments will start spewing out a lot of data — vast amounts of data, the vast majority of which won’t be what you’re interested in. So you have to clean the data; you have to extract the elements that are of interest.
That in itself requires another type of computational process to establish which data are signal and which are noise:
With that software you can learn about the data. As you make more discoveries about the data, you can then improve the software tools and refine that further. So you see, software goes hand in hand with scientific discovery, and the tools continually improve the more you discover.
Why should you publish your software?
Not only does the discovery of gravitational waves provide a great example of why software matters, it also demonstrates why a journal that publishes software matters — and why software should be something every researcher familiarizes themselves with. Kate continued:
For one thing, publishing your software in a journal like Software X gives credit to people who worked on the software. It can be cited, and in the world we live in this is often the unit of scholarly recognition.
Furthermore, the journal is a vehicle for dissemination. Researchers may find that someone else has already produced a piece of software that solves the issue they’re trying to solve, but even when you can’t transpose a piece of software directly from the journal to your own research project, the publications in Software X can show you how someone else approached a similar problem:
If someone wants to understand what role software plays in discovery, the gravitational waves special issue is an excellent place to go. More broadly, the journal shows you how different types of software projects have different dynamics, and deal with different types of challenges. It’s a good venue when you’re assessing what software instruments your project might need.
Image above: Omicron spectrogram of LIGO-Hanford detector’s data around the time of GW150914. The whitened data is projected in multiple time–frequency planes characterized by a constant Q value and the signal-to-noise ratio is measured for each tile. In this representation, all Q planes are stacked up and combined into one; the tile with the highest signal-to-noise ratio is displayed on top. Bottom: Omicron spectrogram of LIGO-Livingston detector’s data around the time of GW170817, using data after glitch subtraction. (Source: Florent Robinet et al: "Omicron: A tool to characterize transient noise in gravitational-wave detectors," SoftwareX, July-December 2020)
As Kate describes, it’s valuable to understand how software evolves during a project so you can plan around that evolution. In that way, the software requirements around the discovery of gravitational waves carried important lessons:
One example that one of my co-editors brought up was that we all thought the creation of software for something like would be a really controlled, directed process. You know, a central collaborative laboratory that creates a framework, establishes requirements and mandates who will contribute which elements to the framework.
What the work described in the special issue revealed, however, was that the process was much more fluid. When you’re building software on a massive research project involving entirely new tools and new kinds of data, you don’t know right from the start what your requirements are going to be. The special issue reveals a software development process that left room for improvisation, collaborative efforts, and grassroots contributions. Kate said:
It turns out that it was really a mix of a central effort that was very directed, combined with things that came from the wider community, that solved problems encountered along the way. There are some big lessons there for other large discovery projects — it’s not something you can scope out fully in advance because your instruments have to go where the process of discovery leads them, and the essence of discovery is that it is not something you can
With that in mind, documenting and publishing the software process means that other researchers can learn about how to structure the organizing framework for software development, or the best practice for managing data provenance. As Kate explained:
That’s why it’s very much a multi-disciplinary journal, not just a computer science journal. People from different disciplines can understand how others approached similar problems. I think if you’re working in science, sooner or later you will need some level of software literacy, even if it’s just understanding what might be possible and the resources required for development.
A tool to collaborate across time
Indeed, software is both a tool that benefits from a collaborative approach, and a tool that facilitates a collaborative approach. There’s often talk about multidisciplinary collaboration and multinational collaboration, but the nature of continually improving software enables collaboration through time.
When William Borucki, the primary investigator behind NASA’s historic Kepler mission, spoke to Elsevier Connect(opens in new tab/window), he emphasized the ways in which software improvements would yield new discoveries from the data:
There are probably 100 or more signals from small planets buried in the Kepler data that have not been found because not all the structures in the noise have been removed. One day, people who use more powerful analysis methods might be able to find them. I really like the idea that in the future, people will dig through the data and find those small planets.
Kate agreed, reiterating that when you understand and account for software requirements, you set up a pipeline for collaboration not just with what’s gone before but with the generations of researchers who will follow:
In a sense we’re preserving evidence for future generations. There may be something in the data that we didn’t notice immediately, or that we can’t notice because we don’t yet have the capability to process the data. But if you understand software, if you understand its needs and how it iterates, you can create datasets in a way that means future generations can build on your work.