The Data Lake Model developed in ESCAPE's WP2 federates different distributed storage systems via a high level Data Management layer: Rucio. The different storage endpoints that constitute the global Data Lake system are the Rucio Storage Elements (RSEs), each of them mapping to a storage endpoint offered by the resource provider.
Rucio enables file upload and download capabilities mapping the client-server interaction to the RSEs offering different transfer protocols (http/WebDAV, root, GridFTP, s3).
The Rucio system is a policy driven, rule based Data Management system enabling data lifecycle capabilities for the users and the experiments: file replication levels, placement rules and implementation of policies. The goal of this study is to evaluate the integration of heterogeneous resources within the ESCAPE Data Lake and assess the flexibility of the Data Lake model and the system to address punctual resource integration using standard interfaces. In particular integration with commercial cloud resources from AWS and Google have been evaluated, including both storage & CPU via Swift/S3 protocol.
Part of this study has been performed with the support of the cloud bank project with fundings for AWS and Google Cloud Platform (GCP).