The advent of the relational database model was a major achievement in data science. The rules around what constitutes a relational database make it easier to design, build, and manage databases and to build applications and reports around them.
Still, relational databases suffered from a lack of standardized tools to create, manipulate, and extract data in relational databases. Hence the development of Structured Query Language (SQL), an international standard computing language that vastly simplifies all aspects of database management, from defining tables and columns to implementing extract-transform-load (ETL) processes to executing complex table joins that combine data from different tables into useful information. Because it strictly adheres to relational database rules, SQL helps maintain relational database integrity.
Thanks to its status as an international standard and its widespread use in many types of database systems, from Oracle to MySQL to Microsoft SQL Server and beyond, SQL also serves another useful purpose: enabling SQL lineage tools.
What is SQL lineage?
Simply put, it’s data lineage based on SQL; that is, it’s a data lineage tool that focuses on the SQL code that is used to build, maintain, and manage data sources, database tables, and the reports that leverage the data in those tables.
In many cases, all you need to know about data’s journey from source to database system (or systems) to destination report can be gleaned from the SQL code.
This can be done manually, of course, by having a human open and examine all the stored procedures and other artifacts that incorporate SQL. However, this approach requires in-depth knowledge of SQL, obsessive attention to detail, and a lot of free time.
A better approach is the use of automated data lineage tools, which find and parse the SQL automatically and build the data lineage relationships without the need for human intervention. And because SQL has a standard syntax and is used across so many disparate database systems, SQL lineage tools are system-agnostic.
Not all databases are relational, and non-relational databases are becoming popular in certain contexts. However, relational databases and SQL are still the bread-and-butter for the vast majority of enterprise database systems, and as long as that’s the case, SQL lineage will be a key component of any automated data lineage tool.