Projects
Current Projects
Enabling the Big Data Pipeline Lifecycle on the Computing Continuum (DataCloud) (H2020, 2021-2024)
With the recent developments in technologies such as the Internet of Things, massive amounts of data are being generated and often become Dark Data, i.e., data that are collected but not used and turned into value. Big Data pipelines are composite pipelines for processing data with non-trivial properties and characteristics, commonly referred to as the Vs of Big Data (e.g., volume, velocity, variety, veracity, value,etc.). They are essential for leveraging Dark Data, but tapping their potential requires going beyond the current approaches and frameworks for Big Data processing. At the same time, the Computing Continuum – federating Cloud services with emerging Edge and Fog computing paradigms – enables new opportunities for supporting Big Data pipelines, although challenges remain in the efficient management of heterogeneous and untrusted resources across the Computing Continuum. The overall vision of the DataCloud project is the creation of a novel paradigm for Big Data pipeline processing over heterogeneous resources encompassing the Computing Continuum, covering the complete lifecycle of managing Big Data pipelines.
Real-time Analytics for Internet of Sports (RAIS) (H2020 Marie Curie ITN, 2019-2023)
Over the past few years, we have been witnessing an increasing presence and usage of wearable sensing and quantified- self devices. The rise of embedded and wearable computing is expected to bring the next revolution of the Internet of Sports, enhancing fitness, performance health, productivity and safety as well as creating new jobs and opening new markets. Nevertheless, at a European level, there is a recognized shortage of highly skilled researchers, scientists and engineers with transferable skills and entrepreneurial experience, trained in building IT platforms and service infrastructures capable of hosting innovative collective sensing services and applications. The RAIS consortium comprising six beneficiaries and seven fully committed partner organizations, aspires to establish the core for a fertile multidisciplinary research and innovation community with strong entrepreneurial culture that will advance: (i) wearable sport-sensing and quantified-self devices and accompanying middleware, and (ii) technologies of Big Data mining and analytics that are needed to capture a broad range of users’ sports- and wellness-related information. The main objective of RAIS is to provide world class training for a next generation of researchers, computer scientists, and data engineers, emphasizing a strong combination of advanced understanding in both theoretical and experimental approaches, methodologies and tools that are required to develop decentralized, scalable, and secure collective sensing infrastructures and platforms. RAIS will focus on developing new technologies on Big Data Analytics on the Edge, Data Stream Processing, Distributed and Decentralized Machine Learning, Blockchain as well as Security/Privacy.
Continuous Deep Analytics (CDA) (SSF, 2018-2022)
Modern end-to-end data pipelines are highly complex and unoptimized. They combine code from different frontends (e.g., SQL, Beam, Keras), declared in different programming languages (e.g., Python, Scala) and execute across many backend runtimes (e.g., Spark, Flink, Tensorflow). Data and intermediate results take a long and slow path through excessive materialization, conversions down to different partially supported hardware accelerators. End-to-End guarantees are typically complex to reason due to the mismatch of processing semantics across runtimes. The Continuous Deep Analytics (CDA) project aims to shape the next-generation software for scalable, data-driven applications and pipelines. Our work binds state of the art mechanisms in compiler and database technology together with hardware-accelerated machine learning and distributed stream processing.
ExtremeEarth (H2020, 2019-2021)
ExtremeEarth concentrates on developing techniques and software that will enable the extraction of information and knowledge from big Copernicus data using deep learning techniques and extreme geospatial analytics, and the development of two use cases based on this information and knowledge and other relevant non-EO data sets. ExtremeEarth will impact developments in the Integrated Ground Segment of Copernicus and the Sentinel Collaborative Ground Segment. ExtremeEarth tools and techniques can be used for extracting information and knowledge from big Copernicus data and making this information and knowledge available as linked data and, in this way, allow the easy development of applications by developers with minimal or no knowledge of EO techniques, file formats, data access protocols etc.
Past Projects
Erasmus Mundus Joint Doctorate in Distributed Computing (EMJD-DC) (EU/EACEA, 2011-2020)
EMJD-DC is an international doctoral programme in Distributed Systems. Students carry out their research work over up to four years in two universities from different countries, with additional mobility to industry in most projects. Joint training schools cover both scientific topics and transferable skills, such as project and scientific management, communication, innovation techniques. EMJD-DC initially awards double degrees, however a task is evaluating the implementation of a Joint Degree. The research projects address some of the key technological challenges of our time, mainly but not exclusively: ubiquitous data-intensive applications, scalable distributed systems (including Cloud computing and P2P models), adaptive distributed systems (autonomic computing, green computing, decentralized and voluntary computing), and applied distributed systems (distributed algorithms and systems, working in an inter-disciplinary manner, in existing and emerging fields to address industrial and societal needs in the European and worldwide context. The consortium partners assembled in EMJD-DC have a high international reputation in the research fields described above. They complement each other very well in their specialisation fields of research, and in the corresponding training offers. The first language of all training and research activities will be English, but students are exposed to local languages.
A Big Data Analytics Framework for a Smart Society (BIDAF) (KK-stiffen (KKS), 2014-2019)
The overall aim of the BIDAF project is to significantly further the research within massive data analysis, by means of machine learning, in response to the increasing demand of retrieving value from data in all of society. This will be done by creating a strong distributed research environment for big data analytics. There are challenges on several levels that must be addressed: (i) platforms to store and process the data, (ii) machine learning algorithms to analyze the data, and (iii) high level tools to access the results.
StreamLine (H2020, 2016-2018)
Streamline is funded by the European Union’s Horizon 2020 research and innovation program to enhance the European data plattform Apache Flink to handle both stream data and batch data in a unified way. The project includes both research and use cases to validate the results. The project has the following objectives: (i) to research, design, and develop a massively scalable, robust, and efficient processing platform for data at rest and data in motion in a single system, (ii) to develop a high accuracy, massively scalable data stream-oriented machine learning library based on new algorithms and approximate data structures, (iii) to provide a unified interactive programming environment that is user-friendly, i.e., easy to deploy in the cloud and validate its success as measured by well-defined KPIs, (iv) to implement a real-time contextualization engine, enabling analytical and predictive models to take real world context into account, and (v) to develop a multi-faceted, effective dissemination of Streamline results to the research, academic, and international community, especially targeting SMEs, developers, data analysts, and the open source community.
iSocial (FP7 Marie Curie ITN, 2013-2017)
The rapid proliferation of Online Social Networking (OSN) sites is expected to reshape the Internet’s structure, design, and utility. We believe that OSNs create a potentially transformational change in consumer behavior and will bring a far-reaching impact on traditional industries of content, media, and communications. The iSocial ITN aspires to bring a transformational change in OSN provision, pushing the state-of-the-art from centralized services towards totally decentralized systems that will pervade our environment and seamlessly integrate with future Internet and media services. OSN decentralization can address privacy considerations and improve service scalability, performance and fault-tolerance in the presence of an expanding base of users and applications. The project will pursue the vision of a decentralized Ubiquitous Social Networking Layer and the development of a novel distributed computing substrate that provides Decentralized Online Social Networking (DOSN) services and supports the seamless development and deployment of new social applications and services, in the absence of central management and control. The iSocial consortium envisions the emergence of distributed and scalable overlay networking and distributed storage infrastructures that will provide support for open social networks and for innovative social network applications, preserving end-user privacy and information ownership. The main objective of iSocial is to provide world class training for a next generation of researchers, computer scientists, and Web engineers, emphasizing on a strong combination of advanced understanding in both theoretical and experimental approaches, methodologies and tools that are required to develop DOSN platforms. iSocial is divided into four interconnected research topics, which include important research challenges with a high exploitation potential: (i) overlay Infrastructure for Decentralized Online Social Networking Services, (ii) data storage & distribution, (iii) security, privacy & trust, and (iv) modelling and Simulation.
A Community networking Cloud in a Box (CLOMMUNITY) (FP7 EU-project, 2013-2015)
Community networking is an emerging model for the Future Internet across Europe and beyond where communities of citizens can build, operate and own open IP-based networks, a key infrastructure for individual and collective digital participation. The CLOMMUNITY project aims at addressing the obstacles for communities of citizens in bootstrapping, running and expanding community-owned networks that provide community services organised as community clouds. That requires solving specific research challenges imposed by the requirement of: self-managing and scalable (decentralized) infrastructure services for the management and aggregation of a large number of widespread low-cost unreliable networking, storage and home computing resources; distributed platform services to support and facilitate the design and operation of elastic, resilient and scalable service overlays and user-oriented services built over these underlying services, providing a good quality of experience at the lowest economic and environmental cost. This will be achieved through experimentally-driven research, using the FIRE CONFINE community networking testbed, the participation of large user communities (20000+ people) and software developers from several community networks, by extending existing cloud service prototypes in a cyclic participatory process of design, development, experimentation, evaluation and optimization for each challenge. The consortium has two representative community networks with a large number of end-users and developers, who use diverse applications (e.g., content distribution, multimedia communication, community participation) and also service providers, research institutions with experience and prototypes in the key related areas, and a recognized international organisation for the dissemination of the outcome.
E2E-Clouds (SSF, 2012-2017)
E2E-Clouds was a five-year research project financed by the Swedish Foundation for Strategic Research. The goal of the project is to develop an End-to-End information-centric Cloud (E2E-Cloud) for data intensive services and applications. The E2E-Clouds is a distributed and federated cloud infrastructure that meets the challenge of scale by aggregating, provisioning and managing computational, storage and networking resources from multiple centers and providers. Like some current data-center clouds it manages computation and storage in an integrated fashion for efficiency, but adds wide-scale distribution.
Portable and Predictable Performance on Heterogeneous Embedded Manycores (PaPP) (FP7 EU-project, 2012-2015)
Modern advanced products of today use embedded computing systems with exacting requirements on execution speed, timeliness, and power consumption. It is a grand challenge to guarantee these requirements across product families and in the face of rapid technological evolution, as current development practices cannot manage performance requirements the same way they manage functional requirements. Even worse, with the proliferation of complex parallel target platforms, it becomes more difficult to design a system that reaches a given performance goal with just the minimum amount of resources, managed right. Today the only solution to this problem is to over-design systems: systems are equipped pragmatically with an overcapacity that likely avoids under-performance, but for this very reason are more expensive and consume more resources than necessary. The proposed project aims at making performance predictable in every development phase, from the modelling of the system, over its implementation, to its execution by allowing for early specification and analysis of performance of systems, its adaptation to different hardware platforms, including an adaptive runtime system. During the project, the developed methods and tools will be evaluated on a number of industrial use cases and demonstrators in three application domains important to European industry: Multimedia, Avionics and space, and Mobile communication. This approach will guarantee that the methods and tools developed are both usable and effective.
ENabling Technologies for a Programmable Many-CORE (ENCORE) (FP7 EU-project, 2010-2013)
The consumer quest for computing power is insatiable. In the past, chip manufacturers could increase processing power by simply increasing the speed of the processor core. However, in recent years, these manufacturers have come up against a natural barrier to their previous approach. In response to the resulting performance wall, desktop computer companies have followed the example of server producers by adding more cores to their products - and now the producers of mobile devices are following suit. To keep up with the demand, the current trend in computer systems is to double the number of cores comprising contemporary processors approximately every two years, leading to hundreds of cores per chip in the near future. Developing applications that harness this computational power, however, is a complex, laborious task that often requires specialized training. Moreover, applying traditional programming methods can negatively impact processing efficiency and drive up power consumption. The ENCORE project focuses on alleviating these problems by proposing a programming model for multi-cores and delivering an integrated set of tools that will simplify software development for many-core systems with increased portability and scalability, while at the same time providing high performance and maintaining power efficiency for real-world applications. Specifically, ENCORE aims to reduce the number of lines of code required to adapt an application for mult-core by 90% which translates to less development time and potentially faster, cheaper time-to-market.
Peer-to-Peer Live Streaming (PeerTV) (Vinnova, 2007-2010)
The PeerTV project is defined to develop, deploy and validate peer-to-peer media streaming platforms that address three key requirments not currently met by existing broadband infrastructures: (i) efficient utilization of upload bandwidth available at peers to reduce the amount of bandwidth that needs to be centrally provisioned and paid for by TV broadcasters, (ii) reducing the playback latency and increasing the playback continuity of video, through constructing novel topologies, and (iii) minimizing the amount (and cost) of network traffic for Internet Service Providers (ISPs) through building an autonomous-system infrastructure aware.
Self Management for Large-Scale Distributed Systems (SELFMAN) (FP6 EU-project, 2006-2009)
The goal of SELFMAN is to make large-scale distributed applications that are self managing, by combining the strong points of component models and structured overlay networks. One of the key obstacles to deploying large-scale applications running on networks such as the Internet is the issue of management. Currently many specialized personnel are needed to keep large Internet applications running. SELFMAN will contribute to removing this obstacle, and thus enable the development of many more Internet applications. In the context of SELFMAN, we define self management along four axes: self configuration (systems configure themselves according to high-level management policies), self healing (systems automatically handle faults and repair them), self tuning (systems continuously monitor their performance and adjust their behaviour to optimize resource usage and meet service level agreements), and self protection (systems protect themselves against security attacks). SELFMAN will provide self management by combining a component model with a structured overlay network.
Self-* Grid: Dynamic Virtual Organizations for Schools, Families, and All (Grid4All) (FP6 EU-project, 2006-2008)
Grid4All aims to enable domestic users and non-profit organisations such as schools and small enterprises, to share their resources and to access massive grid resources when needed, envisioning a future in which access to resources is democratised and cooperative. Examples include home users of image editing application, school projects like volcanic eruption simulations, or small businesses doing data mining. Cooperation examples include joint homework between pupils, or international collaboration. Grid4All goals entail a system pooling large amounts of cheap resources; a dynamic system satisfying spikes of demand; using self-management techniques to scale; supporting isolated, secure, dynamic, geographically distributed user groups and using secure peer-to-peer techniques to federate large numbers of small-scale resources into large-scale grids. We target small communities such as domestic users, schools and SMEs, harnessing their resources added to resources from operated IT centres, to form on-demand service oriented grids, avoiding preconfigured infrastructures. The technical issues addressed are security, support for multiple administrative and management authorities, P2P techniques for self-management/adaptivity/dynamicity, on-demand resource allocation, heterogeneity, and fault tolerance.The proof of concept applications include: e-learning tools for collaborative editing in schools and digital content processing service accessible by end residential users.
EVERGROW, a European Research Project on the Future Internet (EU-project, 2004-2008)
The goal of the project is to build the science-based foundations for the global information networks of the future. Not only will networks soon provide us with access to all the world's knowledge, but society as a whole will become network-based, from private life and business to industry and the processes of government. The demands on the future Internet will be high. We can already see how the complexity of the Internet is continually increasing, and we know a great deal about the problems this will cause. Above all, a number of today's highly manual processes must be automated, such as network management, network provisioning and network repair on all levels.
CoreGrid (EU-project, 2004-2008)
The CoreGRID Network of Excellence (NoE) aims at strengthening and advancing scientific and technological excellence in the area of Grid and Peer-to-Peer technologies. To achieve this objective, the Network brings together a critical mass of well-established researchers (161 permanent researchers and 164 PhD students) from forty-one institutions who have constructed an ambitious joint programme of activities. This joint programme of activity is structured around six complementary research areas that have been selected on the basis of their strategic importance, their research challenges and the recognised European expertise to develop next generation Grid middleware.
Peer-To-Peer-Implementation-and-TheOry (PEPITO) (FP5 EU-project, 2002-2004)
Traditional centralised system architectures are ever more inadequate. A good understanding is lacking of future decentralised peer-to-peer (P2P) models for collaboration and computing, both of how to build them robustly and of what can be built. The PEPITO project will investigate completely decentralised models of P2P computing.
Information Cities (ICITIES) (EU-project, 2000-2003)
The Information Cities project models the aggregation and segregation patterns in a virtual world of infohabitants (humans, virtual firms, on-line communities and software agents acting on their behalf). The objective is to capture aggregate patterns of virtual organisation, emerging from the interaction over the emerging information infrastructure, a virtual place where millions (or billions) meet of infohabitants meet, co-operate and trade: a stable and scalable micro-environment that supports the efficient provision of many e-commerce and personal services, and allows for the continuous creation of new activities and relationships. To investigate conditions of emergence and evolution of Information Cities, we will develop an open multiagent environment, flexible and adaptive to the dynamic nature of the Information Society.