Exploring the Opportunities and Risks for Generative AI and Corporate Databases: Data Use and Dependability
In our initial article we examined the lesser-discussed but quickly expanding use of generative artificial intelligence (GenAI) systems that leverage large language models (LLMs) to query data stored in corporate databases. To better understand the risks and potential guidelines related to this evolving conduit to information, practitioners in both the corporate and legal settings must understand the many uses (and lineage) of structured and semi-structured data within their organizations, as previously outlined in our examination of this topic.
To help guide how practitioners assess the impacts of this technology, we will explore the risks and potential downstream implications associated with changes to how structured and semi-structured data is used and accessed through GenAI, as well as the continued need to ensure accuracy in results.
Structured and Semi-Structured Data: Foundations of Organizational Environments
Relational databases have formed the backbone of organizational data environments for many years. Companies rely on relational databases to support various critical back-office tasks, from invoice, inventory, and supply chain management to compliance, human resources, reporting, and data analytics. Whether stored on-premises, in the cloud, or in a hybrid of the two, databases provide a foundation for the consistency and integrity of vast amounts of information. The data stored in databases is in great demand by “data consumers”—users across a company’s landscape who require a variety of information to perform their functions and accomplish their goals.
Beyond supporting corporate functionality and analysis, relational databases play a crucial role in many legal matters. Corporate database content, including adjacent system logs, are increasingly a source of information responsive to legal matters. As a result, legal professionals frequently work with database owners to obtain, organize, and analyze this information.
Organizations have also turned to semi-structured formats to store data. Semi-structured data types include XML, HTML, CSV, and even email, and generally contain tags or other markers (rather than a defined schema) to separate different elements. NoSQL (nonrelational) databases are one form of semi-structured data seeing increased adoption. These tools allow organizations to handle diverse and rapidly generated records, including the customer sales orders and system log files inherent in e-commerce. The proliferation of data lakes within organizations has also resulted in further reliance on semi-structured data, as they can manage disparate data types from various operational feeds.
Accuracy and Dependability of Data
These same concerns also extend to legal professionals because they, too, are data consumers. For example, attorneys searching for relevant data within a database typically have relied on database owners and/or a database analyst team to provide reports and/or assist with data analysis. With the implementation of GenAI, attorneys can query these datasets directly, potentially unlocking new insights that would not have been found before. However, attorneys will not likely possess a comprehensive understanding of the datasets and controls necessary to ensure the accurate interpretation of data retrieved through GenAI.
Conclusion