# Case study: ORCESTRA

"When our server infrastructure was stuck in customs in Barbados, we got a Raspberry Pi with a 1 TB SD card running a Kubo IPFS node. Within hours, 50 scientists on site were sharing data laptop-to-laptop. That's when we realized IPFS could be at the core of our data infrastructure."

# Overview

In this case study, you'll learn how ORCESTRA (opens new window), a coordinated international atmospheric science campaign, uses IPFS to ensure that scientific datasets are verifiable, openly accessible, and collaboratively distributed across institutions worldwide.

# What is ORCESTRA

ORCESTRA (opens new window) (Organized Convection and EarthCARE Studies over the Tropical Atlantic) is an international field campaign that launched in early 2024 to study tropical mesoscale convective systems: the storm systems that play a significant role in the Earth's weather and climate dynamics.

The campaign brings together over twenty scientific institutions spanning Europe, North America, and Africa. Eight sub-campaigns (three airborne, one land-based, and four at sea) coordinate aircraft, ships, ground stations, and satellites to collect atmospheric measurements across the tropical Atlantic.

ORCESTRA represents the kind of large-scale scientific collaboration where data infrastructure can make or break a mission: dozens of research groups generating terabytes of observational data that must be shared, verified, and preserved across institutional boundaries.

# ORCESTRA by the numbers

20+
Research institutions
8
Sub-campaigns
50+
Participating organizations
8+
Countries involved

# The story

Scientific campaigns like ORCESTRA face a fundamental infrastructure challenge: how do you enable real-time data sharing among researchers spread across aircraft, ships, and ground stations, often in remote locations with limited connectivity, while ensuring that every dataset remains verifiable and tamper-proof?

Traditional approaches involve centralized servers and institutional data portals. But centralized infrastructure introduces single points of failure, and when teams are in the field, connectivity to distant data centers can't be taken for granted.

ORCESTRA's IPFS story began not with a grand architectural plan, but with a practical crisis. During one of the field campaigns in Barbados, the team's planned server infrastructure was delayed in customs. With roughly 50 scientists on site needing to share and collaborate on data, they needed a solution fast.

A Raspberry Pi with a 1 TB SD card became that solution. Running a Kubo (opens new window) IPFS node, the device enabled local data sharing from laptop to laptop, provided temporary storage on the Pi node, and facilitated eventual transfer to a data center in Hamburg. No central server required; just content-addressed, peer-to-peer data sharing that worked.

This improvised setup revealed something important: IPFS wasn't just a workaround, it was a better fit for how field science actually works. As the campaign evolved and datasets grew, the team expanded IPFS from an emergency fix to the core of their data publishing infrastructure.

# How ORCESTRA works

ORCESTRA's eight sub-campaigns span sea, air, and land, collecting atmospheric measurements such as temperature, humidity, wind, radiation, aerosols, and cloud properties. This observational data is structured as multidimensional arrays and stored primarily in the Zarr (opens new window) format, a cloud-native format optimized for chunked, distributed access to large scientific datasets.

# From collection to publication

When researchers collect data in the field, it follows a path from raw measurements to published, citable datasets:

  1. Collection: Instruments on aircraft, ships, and ground stations capture measurements
  2. Processing: Raw data is quality-controlled, calibrated, and structured into Zarr datasets
  3. Publishing: Processed datasets are added to IPFS, generating content identifiers (CIDs) that uniquely and verifiably identify each dataset
  4. Discovery: Datasets are catalogued in a metadata-rich browser, making them findable by the wider community
  5. Retrieval: Scientists worldwide access data through IPFS gateways or directly from peers

# How ORCESTRA uses IPFS

ORCESTRA uses IPFS to make scientific data openly accessible, verifiable, and resilient.

The raw data is processed by the at the Max Planck Institute for Meteorology, who process the data for publishing, where the end result is a set of CIDs corresponding to data from the different sub campaigns. allowing anyone who retrieves the data can independently verify they received exactly what was published, with no trust required in the specific server it was fetched from.

The architecture involves several coordinated components:

# IPFS nodes for collaborative hosting

A team at the Max Planck Institute for Meteorology processes the data from the different teams into Zarr and publishes them to IPFS with a fleet of [Kubo] nodes, ensuring some redundancy. The CID for the whole data set is published via pinlist.yaml on Github (opens new window) with the CID of the whole data set, giving full snapshot history of the growing data set from all sub campaigns.

# A metadata-rich data browser

The ORCESTRA data browser (opens new window) provides a web interface for discovering and retrieving datasets. Built on top of Climate and Forecast (CF) conventions (opens new window) metadata embedded in the Zarr datasets, the browser lets researchers search by variable, time range, sub-campaign, and other dimensions, then retrieve data directly via IPFS.

The browser leverages both Helia, the TypeScript implementation of IPFS and

# Pinset tracking on GitHub

The campaign maintains a "pinset" (a list of datasets and their CIDs) in a GitHub repository (opens new window). This serves a dual purpose: it provides CID discovery, so researchers can find which CIDs correspond to which datasets, and provenance tracking, since the Git history records when datasets were published and updated. Other institutions can use this pinset to replicate the entire collection on their own IPFS nodes.

# Python ecosystem integration

Through ipfsspec (opens new window), ORCESTRA data integrates seamlessly with the Python data science ecosystem. Scientists can open datasets directly with familiar tools:

import xarray as xr

ds = xr.open_dataset(
    "ipfs://bafybeif52irmuurpb27cujwpqhtbg5w6maw4d7zppg2lqgpew25gs5eczm",
    engine="zarr"
)

ipfsspec implements the fsspec (opens new window) interface, the same abstraction layer used by xarray, pandas, Dask for remote data access. This means IPFS retrieval works anywhere these tools expect a filesystem, with no special handling needed.

# IPFS benefits

ORCESTRA's adoption of IPFS demonstrates several properties that matter for open scientific data:

# Data integrity without trust

Every dataset on IPFS is identified by a CID derived from its content. When a researcher retrieves a dataset, they can verify it matches the published CID; there is no need to trust the server, the network, or any intermediary. For science built on reproducibility, this is foundational.

# Resilient, decentralized access

With multiple institutions hosting the same content-addressed data, there is no single point of failure. If one provider goes offline, others continue serving the same verified data. This resilience matters for long-lived scientific datasets that need to remain accessible beyond the lifetime of any single project or server.

# Collaborative distribution

IPFS enables a model where providing data is a collaborative effort. Any institution can pin ORCESTRA datasets on their own nodes, contributing bandwidth and availability without any coordination protocol beyond the shared pinset. The more institutions that participate, the more resilient and performant the system becomes.

# Open, auditable data sharing

Because every dataset, its CID, and its metadata are public, anyone can audit the data: verify its integrity, replicate it, or build upon it. This aligns with the principles of open science and the FAIR data principles (opens new window) (Findable, Accessible, Interoperable, Reusable) that increasingly govern publicly funded research.

# Field-ready infrastructure

The Barbados Raspberry Pi story illustrates a property of IPFS worth highlighting: it works at the edge. In field conditions with limited connectivity, IPFS enables local peer-to-peer sharing without dependence on remote infrastructure. Data collected locally can be shared immediately among nearby peers and synced to institutional servers when connectivity is available.

# ORCESTRA & IPFS: the future

As ORCESTRA's datasets continue to grow and are used by research groups worldwide, the IPFS-based infrastructure positions the project for long-term sustainability. Datasets published today remain verifiable and retrievable as long as any node in the network continues to provide them, whether that's an ORCESTRA server, a university research group, or an individual scientist's node.

The campaign's approach also serves as a reference for other scientific communities. By demonstrating that content-addressed, peer-to-peer data sharing works at the scale of an international field campaign, ORCESTRA shows a practical path forward for scientific data infrastructure: one that prioritizes verifiability, openness, and collaboration over centralized control.

Note: Details in this case study are current as of early 2026. The ORCESTRA campaign and its data infrastructure continue to evolve.