Cut Your ETL Development Time in Half With Metadata
by Chris Bond, on August 16, 2016
Do you find yourself or your developers doing the same ETL work over and over again? You might be able to cut out all of that redundant work with Metadata Driven ETL Development. Not sure if it's right for you? Here's an overview of our process!
First, here's how to identify if this approach is right for you:
You're Doing a Lot of Redundant/Repetitive Work
Maybe your workflow consists of fairly simple transformations across all your sources. Are you truncating all tables in a schema and reloading them with new data from a different schema? Do you have to make a job in your ETL tool for each of these?
The Workload is High
Are you loading 2 tables or 2,000? A large number of sources is a great indicator that you should forego your old ways and dive into the future. It's not a big deal if you have to do the same ETL for a few tables. This also has to be weighed against how complex your logic is. If it's super complex logic for 10 tables, maybe it's better to go with the manual way, but if it's just a delete statement on each with a certain where clause, go with dynamic!
Sound like you? Here's how it works:
Integrate Metadata into Workflow
This is the tricky part. Obviously, metadata consists of tons of different things. You'll be implementing pieces of the information schema (or sys tables), your own business defined metadata (which tables should be processed which ways)
The trick here is to first locate where the metadata you need is located. If you're in SQL Server, the INFORMATION_SCHEMA is your new best friend. Give him a call, take him out for a beer, he's there for you. You can find information such as column names and data types, and often that is all you need. Simply plug that data into the common framework you build out and let the sparks fly.
But be careful, it's easy to mistake a small use case for a universal pattern. This can create a huge tangled mess if you don't know what to look for!
Identify Patterns in ETL Code/Work
Maybe you're only loading a single subject area in your data warehouse, and every table in that subset starts with a 'TRANSACTION_'. You can select all the tables in that schema where TABLE_NAME is like 'TRANSACTION_%', append your string with some hardcoded SQL and you've got a dynamically generated statement!
Proceed With Caution
The one downfall of metadata driven ETL development is that there's no standard formula to follow and it's not one-size-fits-all. This process will look different for everyone, which is why if you don't know what you're doing - things can go downhill very quickly. This is why it's crucial that you work with a development team that knows exactly what they're doing - like Arkatechture!