Database Management

With the DB process, it will be possible to carry out data extraction operations, data loading, deletion, and cleaning of GE.db tables

The available tables are:

datasource
connector
term_group
term_category
term
ds_column
prefix
wordterm
termmap
wordmap

Python function

get_data

The get_data() function allows extracting data from the GE database and loading this data into a Pandas DataFrame structure or CSV File.

It has an intelligent filter mechanism that allow you to perform data selections simply through a conversion layer of function arguments and SQL syntax. This allows the same input arguments regardless of implemented database management system.

Parameters:

Only the table parameter will be mandatory, the others being optional, and will model the data output. In the case of only informing the table, the function will return a DataFrame with all the columns and values of the table.

table: str
datasource, connector, ds_column, term_group, term_category, term, prefix, wordterm, termmap, wordmap

path: str
With this parameter, the function will save the selected data in a file in the directory informed as the parameter argument. In this scenario, data will not be returned in the form of a Dataframe; only a Boolean value will be returned, informing whether the file was generated or not

columns: list[“str”]
Columns that will be selected for output. They must be informed with the same name as the database. It is possible to load other data from other tables as long as it correlate. For example, suppose the table only has the term field and not the category field. In that case, you can inform as an argument: “term_id__term_category_id__category”, the system selected the ID of the term, consulted the ID of the category in the Term table, and went to the Category table to choose the category

columns_out: list[“str”]
If you want to rename the header of the output fields to more familiar names, you can use this parameter, passing the desired names in the same sequential sequence in the parameter columns

datasource: Dict{“str”:list[”str”]}
Filter argument. It is used to filter datasource, with the dictionary key being the selection argument and the dictionary value being the datasources selected as the filter. Without this parameter, the function will return all datasources

connector: Dict{“str”:list[”str”]}
Filter argument. It uses the same logic as the datasource, but applied to the connector field

word: Dict{“str”:list[”str”]}
Filter argument. It uses the same logic as the datasource, but applied to the word field

term: Dict{“str”:list[”str”]}
Filter argument. It uses the same logic as the datasource, but applied to the term field

term_category: Dict{“str”:list[”str”]}
Filter argument. It uses the same logic as the datasource, but applied to the term_categorty field

term_group: Dict{“str”:list[”str”]}
Filter argument. It uses the same logic as the datasource, but applied to the term_group field

Return:

Pandas Dataframe or Boolean (If the parameter path is informed, the function will generate the file; if successful, it will return the TRUE. Otherwise, it will return FALSE)

Examples:
>>> from igem.ge import db
>>> db.get_data(
      table=”datasource”,
      datasource={“datasource__in”: [“ds_01”,”ds_02”]},
      columns=[“id”,”datasource”],
      columns_out=[“Datasource ID”, “Datasource Name”],
      path=”{your_path}/datasource.csv”
      )
>>> df = db.get_data(
      table="connector",
      connector={"connector__start": ["conn_ds"]},
      datasource={"datasource_id__datasource__in": ["ds_01"]},
      columns=["connector", "status"]
      )
>>> x = db.get_data(
      table="termmap",
      term={"term_id__term": "chem:c112297"},
      path="{your_path},
      )
   If x:
      print("file created")

load_data

Loads data from a CSV file into the IGEM database. This process does not update existing data, it only inserts new records.

Parameters:

table: str
datasource, connector, ds_column, term_group, term_category, term, prefix, wordterm, termmap, wordmap

path: str
full path and file name to load

Layout of data file:

Datasource:
(datasource, description, category, website)

Connector:
(connector, datasource, description, update_ds, source_path, source_web, source_compact, source_file_name, source_file_format, source_file_sep, source_file_skiprow, target_file_name, target_file_format)

Ds_column:
(connector, status, column_number, column_name, pre_value, single_word)

Term_group:
(term_group, description)

Term_category:
(term_category, description)

Term:
(term, category, group, description)

Prefix:
(pre_value)

Wordterm:
(term, word, status, commute)

Termmap:
(ckey, connector, term_1, term_2, qtd_links)

Wordmap:
(cword, datasource, connector, term_1, term_2, word_1, word_2, qtd_links)

We can generate an example file with the get_data() function and manipulate and load it with the new data.

Return:

Boolean: (TRUE if the process occurred without errors and FALSE if had some errors).

Examples:
>>> from igem.ge import db
>>> db.load_data(
      table="datasource”
      path=”{your_path}/datasource.csv”
      )

delete_data

Allows deleting a record from the given table. The deletion will be carried out in all records related to the informed parameter. For example, if we delete a datasource, the connectors, ds_columns, and termmap associated with the datasource will be deleted.

Parameters:

Only the table parameter will always be requested, the others will depend on the selected table, functioning as a record that will be eliminated.

table: str

(datasource, connector, ds_column, term_group, term_category, term, prefix, wordterm, termmap, wordmap, workflow) - datasource: Dict{“str”:list[”str”]} - connector: Dict{“str”:list[”str”]} - word: Dict{“str”:list[”str”]} - term: Dict{“str”:list[”str”]} - term_category: Dict{“str”:list[”str”]} - term_group: Dict{“str”:list[”str”]} - prefix: Dict{“str”:list[”str”]}

(Filter argument. It is used to filter the field, with the dictionary key being the selection argument and the dictionary value being the field selected as the filter. Without this parameter, the function will return all values of the field.)

Return:

Boolean: (TRUE if the process occurred without errors and FALSE if had some errors).

Examples:
>>> from igem.ge import db
>>> db.delete_data(
      table='datasource',
      datasource={'datasource__in': [ds_01]}
      )

truncate_table

will delete all records from a table, never use this function, with excess if the need is to restart a new instance of the database, free up log table space or in test environments.

Parameters:

table: str
(datasource, connector, dst, term_group, term_category, term, prefix, wordterm, termmap, wordmap, workflow, logs)

If inform table=”all”, the function will truncate all table on GE database. The other tables of the IGEM system will be maintained.

Return:

Boolean: (TRUE if the process occurred without errors and FALSE if had some errors).

Examples:
>>> from igem.ge import db
>>> db.truncate_table(
         table='datasource'
         )

backup

Backup the database with the internal keys. It can be performed at once for all GE.db tables

Parameters:

table: str
(datasource, connector, dst, term_group, term_category, term, prefix, wordterm, termmap, wordmap, workflow, logs)

path_out: str
Folder path to store the generated backup files

If inform table=”all”, the function will backup all table on GE database.

Return:

Boolean: (TRUE if the process occurred without errors and FALSE if had some errors).

Examples:
>>> import igem
>>> igem.ge.db.backup(
         table="",
         path_out="/root/back")

restore

Restore the database with the internal keys. It can be performed at once for all GE.db tables

Parameters:

table: str
(datasource, connector, dst, term_group, term_category, term, prefix, wordterm, termmap, wordmap, workflow, logs)

path_out: str
Folder path to store the generated backup files

If inform table=”all”, the function will restore all table on GE database.

Return:

Boolean: (TRUE if the process occurred without errors and FALSE if had some errors).

Examples:
>>> import igem
>>> igem.ge.db.restore(
         table="",
         path_out="/root/back")

Command Line

Within the parameters, inform the same ones used for the functions, as well as the arguments, example:

$ $ python manage.py db --get_data 'table="datasource", datasource={“datasource__in”: [“ds_01”,”ds_02”]}'

Get data:

$ python manage.py db --get_data {parameters}

Load data:

$ python manage.py db --load_data {parameters}

Delete data:

$ python manage.py db --delete_data {parameters}

Delete all table:

$ python manage.py db --truncate_table {parameters}

Backup (get data with internal ID):

$ python manage.py db --backup {parameters}

Restore (load data with internal ID):

$ python manage.py db --restore {parameters}