Hadoop and Data Lakes (englisch)
Discussion surrounding Hadoop and data lakes is as relevant as ever. The Hadoop ecosystem is considered THE technological breakthrough for enabling companies to capitalize on the big data revolution. A data lake, in turn, is viewed as a broad data management concept and a prerequisite for data-driven companies. It promises a fast, efficient, low-cost way to manage, use and analyze any amount of data from different systems with varying structures. As a source for any type of analytic task, it can also claim to be the technological backbone of digitalization and the (big) datafication of the entire economy.
Hadoop, a top-level project of The Apache Software Foundation, is an open source Java framework for scalable, distributed applications. It includes a collection of components for administering, accessing and analyzing structured and unstructured data. Hadoop is capable of managing huge amounts of polystructured data and adding value to new or established IT technologies. It is especially well-suited as a platform for implementing big data projects and is often viewed as a technology for data lake deployments. The concept of a data lake, however, can extend far beyond that, depending on how broadly the term is defined. A data lake often focuses on data availability and providing downstream applications with schema-free data that is close to its raw format regardless of origin.
The following BARC user survey provides answers and insights. It explores the status quo of Hadoop and data lakes in general and real experiences from Hadoop use cases across the globe. It tackles important questions including:
- How widespread is the current usage of Hadoop and data lakes in companies? What are their plans for the future?
- How do companies utilize Hadoop or plan to use it?
- How do companies currently use data lakes?
- What problems do they face?
- What real-world benefits does Hadoop bring? What projects have companies implemented already?
- How is the technological implementation set up?
This study was conducted independently by BARC. Thanks to sponsorship from Cloudera, SAS and Teradata, it can be published free of charge. BARC would like to thank you, our readers, in advance for your in future surveys as well. Your and support are essential to fuel through empirical data.
The Study is available for free download after login (above) or registration in the BARC customer area.
Authors: Timm Grosser, Jacqueline Bloemen, Melanie Mack, Jevgeni Vitsenko