Databricks has now made all of Delta Lake open source, including all APIs. The product’s storage layer was made open source in 2019. Delta Lake can be used to create data lakehouses, which enable data warehousing and machine learning directly on the data lake.
Delta Lake manages the step where data is fed into an organization’s data lake. It stores data in Apache Parquet format and is designed for use in HDFS-based data lakes and cloud storage.
Databricks was started as a company by the original developers of Apache Spark and specializes in business technologies that use Spark. Delta Lake is a unified analysis engine and associated table format built on top of Apache Spark, and until it was made open source it was only available as part of Databricks Delta, the business owner stack.
Since the storage layer was made open source, the project has attracted more than 190 contributors from more than 70 organizations, nearly two-thirds of which are outside of Databricks, including contributors from companies such as Apple, IBM, Microsoft, Disney, Amazon and eBay. .
Delta Lake comes with standalone readers/writers that allow any Python, Ruby, or Rust client to write data directly to Delta Lake without requiring a big data engine such as Apache Spark, as well as open connectors source, including Apache Flink, Presto, and Trino. . The open source announcement unlocks features that until now were only available in Databricks.
Delta Lake 2.0, the latest version of Delta Lake, features enhancements including support for ZOrder, Change Data Feed, Dynamic Partition Overwrites, and Dropped Columns. Z-Ordering is a technique for grouping related information into the same set of files. This co-locality is used by Delta Lake in data hopping algorithms, and the developers say it greatly reduces the amount of data Delta Lake on Apache Spark has to read.
Delta Lake 2 is available now.
Databricks Delta Lake now open source
Databricks Delta adds faster parquet import
Databricks runtime for machine learning
Databricks adds ML model export
Spark gets the NLP library
Apache Spark with structured streaming
Spark BI gets fine-grained security
Spark 2.0 released
or send your comment to: [email protected]