Efficient metadata management: import of dictionary data from xls

One of the most common challenges for a Metastudio DRM business administrator is creating a dictionary based on an Excel spreadsheet where the dictionary was previously maintained.

Completing such a task requires at least three steps:

  1. Establishing the structure of the dictionary table in the database.
  2. Registering the dictionary in Metastudio.
  3. Importing data from the spreadsheet where the data was previously maintained.

For simple dictionaries, this is a relatively straightforward task. But what do we mean by a “simple” dictionary? It is a dictionary characterized by:

  • A small number of columns (a few).
  • A small number of records (a few hundred).
  • Ordered dictionary data (codes of a fixed length/structure, descriptions, etc.).

The situation is more challenging with dictionaries that have been maintained as Excel spreadsheets for an extended period of time, where the content deviates to a greater or lesser extent from the original assumptions and contains additional notes, values inconsistent with the originally anticipated “type” of column, etc. In such cases, each of the above steps will require careful attention.

Considerations for Setting up a Dictionary Table in a Database

Setting up the structure of a dictionary table in a database will require preparing a DDL script with the appropriate syntax for the database engine in which the dictionary table is to be created. Let’s start with column names. The most obvious approach to completing this task is to follow these steps:

  1. Open the Excel spreadsheet containing the dictionary data.
  2. Copy the row with the headers to a separate sheet and transpose it during pasting. This will give us a list of the columns for the target dictionary.
  3. If the column names are descriptive in nature, technical names for each column will need to be created. Consider that often there is a written or conventionally agreed-upon naming scheme for columns in a project, such as underscores instead of spaces, specific prefixes or suffixes, or a distinctive way of shortening names.

We now have column names for our dictionary. Next, it’s time to determine the type and length/precision for each column. This is where the “stumbling blocks” typically arise, as at this stage we must consider one of the subsequent steps to be carried out – namely, data import. Why? Properly defining the structure of the dictionary will require an approach that is appropriate for its intended use. What do I mean by this?

If the dictionary is to be populated only once (known as “zero loading”) and then maintained through the Metastudio DRM interface, it is more appropriate to precisely define each column. For example, if a 2-character ISO country code is to be placed in a given column, we would define it as varchar2(2) in Oracle. If there is any discrepancy between the imported data and the anticipated structure, we will have to manually correct the data in the spreadsheet – but we only do this once.

If the dictionary is to be updated multiple times throughout its “life” through import from an Excel spreadsheet, it makes more sense to consider all potential cases by analyzing the data in each column, adding some “buffer” for column length, and defining the validation of data entered “manually” through the Metastudio DRM interface by adding appropriate validators for each column.

At this stage, it is also worth considering the addition of technical columns, including at least a column with a primary key: for popular databases (Oracle, Ms SQL Server), the easiest way is to add an identity column with automatic value generation. If the dictionary is to be audited, columns that store audit information, such as Modified by, Modification date, and optionally Created by and Creation date, should be included. If values in the dictionary are to be historized, appropriate columns should be added, for example, using a slowly changing dimension type 2 (SCD Type 2) model and adding columns for the start date and end date of the record’s validity period. If values in the dictionary are to be validated and the result of this validation is to be stored in the dictionary, a column with the validation status should be added.

If the dictionary will only be modified through the Metastudio DRM interface (manual editing, import from Excel), it is not recommended to use additional database restrictions (such as CHECK or dedicated triggers). It is more appropriate to use application functions – this will avoid potential “collisions” between definitions at the database and Metastudio DRM levels.

Considerations for registering a dictionary in Metastudio

When registering a dictionary in Metastudio, one should remember to verify column types that were set automatically during the reading of table metadata, including verification of the primary key correctness, verification of automatically set column types (for example, DATE type in Oracle is registered as DATETIME, which may not be desirable), correct definition of audit columns, and so on. In the next step, one should take care of defining properties such as validity periods or default values. If the dictionary is to be filled only once, the first step should be to import data from Excel, and only in the next stage perform activities related to dictionary parameterization, such as

  • setting column validators,
  • adding domains,
  • and setting default formatting.

Common Problems When Importing Data from Excel into Metastudio DRM

Importing data from an Excel spreadsheet where the data has been maintained so far can be challenging. The most common problems encountered during the import process are:

  1. Formulas in the columns we want to import: the data itself that will be imported is located in one of the XML files that make up the “package” in the form of a file that we see in the operating system as .xlsx (to check this, just change the extension of the spreadsheet from .xlsx to .zip and check the effect). If there is a formula in the column instead of a value, the mechanism for retrieving data from the spreadsheet sees the content as a string, and the import will probably fail.
  2. Data type errors, usually involving dates or numbers stored as text. How can we tell? When we open such a spreadsheet, we will see a warning message (a green triangle in the corner of the cell) next to the problematic cell. For example, if a cell has a value of 23.45 but is stored as text, the decimal separator will not be recognized correctly during import because the XML file always uses a “dot” as the decimal separator, while the “comma” we see when we open Excel is only the format imposed for display.
  3. Hidden columns that may cause problems when matching imported data to the dictionary structure.
  4. Various notes, comments entered directly into the cell value.

To sum up, before starting the import process, it is worth thoroughly reviewing the spreadsheet to be imported (or rather the specific worksheet containing the data we want to import into the dictionary) to identify and remove the above problems. This will save us a lot of time and possibly nerves when trying to import data into the dictionary.

Ta strona używa plików Cookies. Dowiedź się więcej o celu ich używania i możliwości zmiany ustawień Cookies w przeglądarce.