Use Case: Load Elasticsearch ETL from SQL

Multi-table SQL data merged into hierarchical Json document using Elasticsearch ETL

Nearly all OLTP centric database vendors (both SQL or NoSQL) struggle with text search. While text search engines (like Elasticsearch) are problematic with maintaining ACID transactions required of OLTP. So many business have a need to maintain separate data stores of OLTP from text search. They are then forced to choose between implementing dual data stores in the application’s data tier or need a low latency data integration solution (i.e. an Elasticsearch ETL tool) .

But there is usually a big problem, OLTP data tends to have complex multi-table schemas and multiple ways that data can change. For optimal performance, Elasticsearch requires all data in a text search to reside in the same index. So all the complex OLTP data must be merged into a single Elasticsearch Json based index. So maintaining data synchronization between the OLTP  database and Elasticsearch is very problematic.

JsonEDI’s Json ETL based processing with agnostic source data makes partial and/or hierarchical Json documents easy to load into Elasticsearch.  JsonEDI eliminates the need for the application data tier to maintain data consistency between OLTP and Elasticsearch. Both full loads and low latency data changes (DML) are managed via JsonEDI.

Here is a real world use case:

Data architects want to move text searches off SQL Server onto ElasticSearch. Columns from 8 tables must be merged into a single hierarchical json document for a single index on Elasticsearch. For technical reasons the application’s data tier cannot maintain consistency between OLTP and Elasticsearch.
Note: Change Data Capture (CDC) has been implemented to log OLTP data changes.

JsonEDI is able to pull the CDC data and map columns into a partial Json document. The partial Json is then merged into a single document on ElasticSearch. By pulling only data that changed in the CDC tables, versus retrieving all the data needed for the full Json document, we are reducing the load on the source server. CDC is used for incremental loads and the base tables are queried if a full load is required.

For additional features of JsonEDI or to discuss your company’s data integration needs feel to contact Fred Zimmerman via LinkedIN.