A framework for the integration of heterogeneous distributed computing infrastructures
- Joaquín Ezpeleta Mateo Zuzendaria
- Pedro Álvarez Pérez-Aradros Zuzendaria
Defentsa unibertsitatea: Universidad de Zaragoza
Fecha de defensa: 2016(e)ko ekaina-(a)k 29
- Manuel Lama Penín Presidentea
- Javier Fabra Caro Idazkaria
- Carlos Pedrinaci Godoy Kidea
Mota: Tesia
Laburpena
The computational requirements and complexity of current applications are continuously growing as technology improves. Both academical and business organizations have provide themselves with computing clusters, grids and clouds in order to satisfy the previous requirements. Furthermore, public clouds have appeared providing the opportunity of executing applications on a pay-per-use basis. Despite of the fact that some large-scale application has been executed in these execution environments, users experience many problems when they execute applications on them. Applications are strongly coupled with specific computing infrastructures and, as a consequence, they cannot be reused between infrastructures, migrating applications between infrastructures is hard, users experience long delays because of an insufficient number of resources available and peak loads. In this dissertation the integration of several heterogeneous computing infrastructures is explored as a response for the previous problems. This approach enables the possibility of offering a larger pool of resources to users and applications, helping them in solving their computational problems and addressing some of the aforementioned issues. However, it also imposes some challenges. On the one hand, applications must be uncoupled from specific infrastructures in order to execute them in different ones. On the other hand, proper techniques must be defined to manage applications complexity and to take full advantage of the integration approach. Thus, in this thesis a framework able to provide a global solution to the problem of heterogeneous distributed computing infrastructures integration and the execution of applications on them is proposed. The framework architecture is based on the use of a Linda-based message bus used as communication channel between applications, infrastructures and framework components. This architectural model favours the framework flexibility, adaptability and scalability, making the framework able to response to the dynamic and evolving nature of this execution scenario. Additionally, the use of the Amazon SQS service to implement the message bus has allowed the message bus to be highly reliable and scalable. Thus, the framework is able to transparently integrate heterogeneous infrastructures (isolated resources, cluster, grids and clouds) and to execute applications on them avoiding coupling applications with specific computing environments. Furthermore, the framework provides many components intended to provide several functionalities. On the one hand, mediation components are integrated to manage the heterogeneity of the integrated infrastructures favouring their integration. Each mediator is responsible for interacting with a specific infrastructure by encapsulating its particularities. On the other hand, several management components are included to provide different capabilities aimed to enhance application lifecycle. A data movement component is included to move input and output data between infrastructures. A hierarchical fault management strategy is developed to improve the job completion rate. A simulation-based meta-scheduling approach is proposed to select the most appropriate private infrastructure to execute each application depending on infrastructure load. An advanced resource provisioning technique is included to minimize the cost of applications executed in public clouds. Finally, a self-configuration component is integrated to autonomically modify the number of mediators used providing scalability guarantees to the framework. Finally, the framework has supported the execution of various large-scale applications. First, it has been applied to the semantic annotation of a large-scale repository of learning objects solving the problem, which was initially estimated to require more than $1,600$ years of CPU, in only $178$ days. Second, it has been used to execute large-scale distributed process discovery techniques in data-intensive environments managing the execution of such applications thanks to the possibility of executing MapReduce applications in Apache Hadoop by using the framework. Therefore, the execution of these applications has proven the effectiveness of the framework to solve large-scale computing-intensive and data-intensive applications.