Data Lake vs Integrated Data Warehouse ETL

Data Warehouse ETL vs Data Lake ETL

JsonEDI makes a fundamental change of the cost calculus between a Data Lake vs the IDW. A Data Lake is loosely integrated data typically placed in Hadoop. An IDW has tightly integrated data stored in either a relational database and/or Hadoop. Without JsonEDI, data warehouse ETL LOE is prohibitively high but data analytics & reporting is easier, while Data Lakes have lower ETL costs but shifts LOE to analytics & reporting. It’s the “pay me now or pay later” decision.

JsonEDI eliminates the rationale of a Data Lake by lower integration costs below that of the higher analytics & reporting costs of a Data Lake. Any analyst or data scientist will gladly choose integrated data over non-integrated. So how does JsonEDI reduce data warehouse ETL costs?

First of all, JsonEDI’s metadata data integration is fundamentally much faster than manual ETL coding. There are several ways to automate metadata creation from source schema or destination or even from the incoming Json document’s schema. These optioned are outline in other documentation on this site.

Semi-structured source data (even hierarchical) can be normalized into tabular tables without any coding or metadata creation. It can be even be created in real-time if required. How much simpler can we make it? JsonEDI fundamentally breaks down the wall between schema and schema-less data.

Master Data Management and JsonEDI fit perfectly together for a powerful data governance solution. A good MDM solution has typically raised the LOE of the IDW as the ETL programmer needs to manual integrate MDM processing, but with JsonEDI it’s a natural extension with minimal additional LOE.

Together, JsonEDI’s data integration solutions dramatically reduces ETL LOE for the IDW. Well past the point of considering the poorly integrated Data Lake architecture.