How can GPT model support reference data management?

For some time now, ChatGPT has been gaining attention for its potential in various industries. At first glance, the field of reference data management may not seem like an ideal fit for the capabilities of this AI technology. However, upon further examination, it becomes clear that ChatGPT can play a vital role in streamlining and automating many of the processes associated with this activity.

Some examples of how ChatGPT can assist in reference data management include:

Generating documentation and metadata for reference data, which can improve data organization and searchability.
Creating indexes for reference data, which can facilitate easier access and retrieval of data.
Generating reports and analyses from reference data, which can provide valuable insights and support decision-making.
Sharing reference data through API interfaces can increase data accessibility and collaboration within an organization.
Automating curatorial processes, such as validation and updating of data, which can reduce human error and improve data quality.

Utilizing the capabilities of ChatGPT is possible by submitting reference data (e.g. in CSV format) to the model. This is of course related to the level of trust in the “intelligence” of the model. Each user must assess this for themselves. Based on the submitted data, ChatGPT can generate metadata, documentation, indexes, reports, and other objects in commonly accepted formats. Script templates can be generated in several popular programming languages in the same way. In my opinion, the responses may not be ideal, but they provide a very good starting point. Those managing reference data can focus on the important elements and entrust the rest to the AI.

“Fine-tuning” the model using your data sources opens up new possibilities. The model’s responses will take into account the information provided. This can be the foundation for creating your own API that supports the use of reference data. Even more interesting possibilities are provided by validating and updating data using such a prepared model.

As previously mentioned, the primary issue is the level of trust in an external model. In response to my question, the AI concluded as follows:

“It is also good practice to only send to the model the data that is necessary to accomplish the task, and also ensure that the data sent does not contain confidential information that should not be shared.” Nothing more to add.

Additionally, it’s important to consider the cost of using the model. The pricing is expressed in tokens sent to the model. The total cost can be more accurately estimated only by conducting tests on a specific project.

A few hours of testing showed that using the model provided by ChatGPT can free up experts from tedious tasks and allow them to focus on truly important issues. However, it’s important to consider the confidentiality of the data we send to the model and the potential costs associated with sending data/queries to it. Given the rapid development of cloud solutions, these don’t seem to be significant obstacles.