Difference between revisions of "Data Lake"

From Clinfowiki
Jump to: navigation, search
Line 1: Line 1:
A Data Lake is similar to that of a data warehouse, but it allows for the flow and storage of unstructured data sources in addition to the structured data in an enterprise data warehouse or data mart. The idea of a lake is such that water flows from various paths into the reservoir and then flows out.
+
A data lake is a central repository that allows the storage and flow of structured and unstructured data sources. This concept is akin to a lake with multiple streams or sources to fill up a reservoir and store data as is, before it is allowed to flow out to various applications within an organization.
  
 
=Functions of a Data Lake=
 
=Functions of a Data Lake=
Line 22: Line 22:
  
 
=References=
 
=References=
 
+
https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
  
 
Submitted by Tom Nahass
 
Submitted by Tom Nahass
 
[[Category:BMI512-FALL-20]]
 
[[Category:BMI512-FALL-20]]

Revision as of 19:11, 26 October 2020

A data lake is a central repository that allows the storage and flow of structured and unstructured data sources. This concept is akin to a lake with multiple streams or sources to fill up a reservoir and store data as is, before it is allowed to flow out to various applications within an organization.

Functions of a Data Lake

Data Ingestion

  • Tools

Data Storage and Retention

  • Tools

Data Processing

  • Tools

Data Access

  • Tools


Difference from Data Warehouse

Data Swamp

This is when a data lake can become unruly and become a data swamp.


References

https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/

Submitted by Tom Nahass