Sattanathan Subramanian, Paweł Sztromwasser, Pål Puntervoll and Kjell Petersen
eScience workflows use orchestration for integrating and coordinating distributed and heterogeneous scientific resources, which are increasingly exposed as web services. The rate…
Abstract
Purpose
eScience workflows use orchestration for integrating and coordinating distributed and heterogeneous scientific resources, which are increasingly exposed as web services. The rate of growth of scientific data makes eScience workflows data‐intensive, challenging existing workflow solutions. Efficient methods of handling large data in scientific workflows based on web services are needed. The purpse of this paper is to address this issue.
Design/methodology/approach
In a previous paper the authors proposed Data‐Flow Delegation (DFD) as a means to optimize orchestrated workflow performance, focusing on SOAP web services. To improve the performance further, they propose pipelined data‐flow delegation (PDFD) for web service‐based eScience workflows in this paper, by leveraging from the domain of parallel programming. Briefly, PDFD allows partitioning of large datasets into independent subsets that can be communicated in a pipelined manner.
Findings
The results show that the PDFD improves the execution time of the workflow considerably and is capable of handling much larger data than the non‐pipelined approach.
Practical implications
Execution of a web service‐based workflow hampered by the size of data can be facilitated or improved by using services supporting Pipelined Data‐Flow Delegation.
Originality/value
Contributions of this work include the proposed concept of combining pipelining and Data‐Flow Delegation, an XML Schema supporting the PDFD communication between services, and the practical evaluation of the PDFD approach.