critcatworks.database package¶

Submodules¶

critcatworks.database.extdb module¶

critcatworks.database.extdb.fetch_simulations(extdb_connect, simulation_ids)[source]¶

Fetches simulation records by simulation id

Parameters

extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.
simulation_ids (1D ndarray) – unique identifiers of the simulation collection.

Returns

documents of the simulation collection

Return type

list

critcatworks.database.extdb.gather_all_atom_types(calc_ids, simulations)[source]¶

Helper function to determine all atom types in the dataset

Parameters

calc_ids (list) – ids of the simulation collection
simulations (list) – simulation documents

Returns

a sorted unique list of atomic numbers in the: dataset

Return type

list

critcatworks.database.extdb.get_external_database(extdb_connect)[source]¶

A helper function to connect to a mongodb database.

Parameters: extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.
Returns: address to database
Return type: pymongo object

critcatworks.database.extdb.update_machine_learning_collection(method, extdb_connect, workflow_id=-1, method_params={}, descriptor='soap', descriptor_params={}, training_set=[], validation_set=[], test_set=[], prediction_set=[], metrics_training={}, metrics_validation={}, metrics_test={}, output={}, **kwargs)[source]¶

A new document is added to the machine_learning collection of the mongodb database. It contains records of all types of workflows. The documents should be in a specific format. Any arguments can be specified, however, certain arguments below should be consistently given to allow for comprehensive database querying.

Parameters

workflow_id (int) – ID of workflow which the machine learning run was part of
method (str) – name of the ML method: krr, nn, …
method_params (dict) – Parameters of the method
descriptor (str) – name of the descriptor: soap, mbtr, lmbtr, cm, …
descriptor_params (dict) – Parameters of the descriptor used
training_set (1D ndarray) – list of simulation IDs used for training
validation_set (1D ndarray) – list of simulation IDs used in validation. If empty, cross-validation was used.
test_set (1D ndarray) – list of simulation IDs used in testing. If empty, only validation was used
prediction_set (1D ndarray) – list of simulation IDs used for prediction.
metrics_training (dict) – dictionary of (“metric name”: value) on training set key (str) : name of the metric value (float) : calculated value
metrics_validation (dict) – dictionary of (“metric name”: value) on validation set key (str) : name of the metric value (float) : calculated value
metrics_test (dict) – dictionary of (“metric name”: value) on test set key (str) : name of the metric value (float) : calculated value
output (dict) – relevant training output info

Returns

A dictionary with the provided arguments plus: a unique id provided by the database.

Return type

dict

critcatworks.database.extdb.update_simulations_collection(extdb_connect, **kwargs)[source]¶

A new document is added to the simulations collection of the mongodb database. It contains records of all manipulation steps of a structure, in particular the initial structure, structure after DFT relaxation, structure with added or removed asdorbates, etc. The documents should be in a specific format. Any arguments can be specified, however, the optional arguments below should be consistently given to allow for comprehensive database querying.

Parameters

extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.
source_id (int) – ID of the parent simulation that originated this, -1 if none
workflow_id (int) – ID of workflow when instance was added, -1 if none
wf_sim_id (int) – ID of simulation (unique within the workflow this belongs to)
atoms (dict) –
dictionary with information about the atoms. should be in the following format

numbers (1D ndarray) : list of atomic numbers as numpy array [N] of ints positions (2D ndarray) : positions as numpy matrix [Nx3] of doubles constraints (2D ndarray) : frozen flags a matrix [Nx3] of int [optional] 1 = frozen, 0 = free pbc (bool) : use periodic boundaries cell (2D ndarray) : matrix 3x3 with cell vectors on the rows celldisp (1D ndarray) : displacement of cell from origin info (dict) : field for additional information related to structure
nanoclusters (list of ATOMS dict) –
list of dictionaries with information about the nanocluster(s) The dictionaries should have the following form:

reference_id (int) : ID of the simulation where this cluster was made, -1 if original atom_ids (1D ndarray) : atom indices in the ATOMS dictionary of the simulation record.
adsorbates (list of dict) –
list of dictionaries with information about the adsorbate(s) The dictionaries should have the following form:

reference_id (int) : ID of the simulation to use as reference atom_ids (1D ndarray) : atom indices in the ATOMS dictionary of the simulation record. site_class (str) : class of adsorption site: “top”, “bridge”, “hollow”, “4-fold hollow” site_ids (1D ndarray) : list of atom ids (in simulation record) that define the adsorption site
substrate (list of dict) –
list of dictionaries with information about the substrate(s) The dictionaries should have the following form:

reference_id (int) : ID of the parent support simulation, -1 if no parent atom_ids (1D ndarray) : atom indices in the corresponding ATOMS dictionary
operations (list) – List of dictionaries, each describing one operation. Always with respect to the parent simulation if applicable. The dictionaries can be of arbitrary form.
inp (dict) – property/value pairs describing the simulation input The dictionary can be of arbitrary form.
output (dict) – property/value pairs output by the calculation The dictionary can be of arbitrary form.

Returns

A dictionary with the provided arguments plus: a unique id provided by the database.

Return type

dict

critcatworks.database.extdb.update_workflows_collection(username, password, creation_time, extdb_connect, parameters={}, name='UNNAMED', workflow_type='NO_TYPE', **kwargs)[source]¶

A new document is added to the workflows collection of the mongodb database. (Usually at the beginning of the workflow run.) It contains records of all types of workflows. The documents should be in a specific format. Any arguments can be specified, however, certain arguments below should be consistently given to allow for comprehensive database querying.

Parameters

extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.
username (str) – user who executed the workflow
creation_time (str) – time of creation of the workflow
parameters (dict) – workflow-specific parameters
name (str) – custom name of workflow
workflow_type (str) – custom type of workflow

Returns

Contains the keys username, name, workflow_type, creation_time,: parameters and _id, the latter being a unique id provided by the database.

Return type

dict

critcatworks.database.format module¶

critcatworks.database.format.adsorbate_pos_to_atoms_lst(adspos, adsorbate_name)[source]¶

Helper function to turn positions for adsorbates into ase atoms objects while the species is defined by adsorbate_name Attention! Works with only one adsorbate atom. In the future, cluskit might generalize to return a list of adsorbates already in ase format.

Parameters

adspos (2D ndarray) – positions of the adsorbate atoms
adsorbate_name (str) – chemical symbol of the adsorbate atoms

Returns

ase.Atoms objects of single atoms at each position

Return type

list

critcatworks.database.format.ase_to_atoms_dict(atoms)[source]¶

Helper function to convert an ase.Atoms object into its corresponding python dictionary

Parameters: atoms (ase.Atoms) – ase.Atoms object
Returns: Corresponding python dictionary
Return type: dict

critcatworks.database.format.atoms_dict_to_ase(atoms_dict)[source]¶

Helper function to convert a ATOMS dictionary into an ase.Atoms object

Parameters

atoms_dict (dict) –

dictionary with information about the atoms. should be in the following format

numbers (1D ndarray) : list of atomic numbers as numpy array [N] of ints positions (2D ndarray) : positions as numpy matrix [Nx3] of doubles constraints (2D ndarray) : frozen flags a matrix [Nx3] of int [optional] 1 = frozen, 0 = free pbc (bool) : use periodic boundaries cell (2D ndarray) : matrix 3x3 with cell vectors on the rows celldisp (1D ndarray) : displacement of cell from origin info (dict) : field for additional information related to structure

Returns

Corresponding ase.Atoms object

Return type

ase.Atoms

critcatworks.database.format.join_cluster_adsorbate(cluster, adsorbate)[source]¶

Helper function to merge the structures cluster and adsorbate while retaining information about the ids

Parameters

cluster (ase.Atoms) – nanocluster structure
adsorbate (ase.Atoms) – single adsorbate

Returns

ase.Atoms object of merged structure, ids of the: nanocluster, ids of the adsorbate

Return type

tuple

critcatworks.database.format.read_descmatrix(fw_spec)[source]¶

Helper function to read a descriptor matrix required for machine learning. It is stored as a file, since large arrays make fireworks slow.

Parameters

fw_spec (dict) – Only the key ‘descmatrix’ is read. It expects a string with the absolute path to file

Returns

descriptor matrix with: M features x N datapoints

Return type

2D np.ndarray

critcatworks.database.format.write_descmatrix(descmatrix)[source]¶

Helper function to write a descriptor matrix required for machine learning. It is stored as a file, since large arrays make fireworks slow.

Parameters: descmatrix (2D np.ndarray) – descriptor matrix with M features x N datapoints
Returns: absolute path to file
Return type: str

critcatworks.database.mylaunchpad module¶

critcatworks.database.mylaunchpad.create_launchpad(username, password, server='serenity', lpadname=None)[source]¶

Creates the fireworks launchpad on specific preset servers.

Parameters

username (str) – username for the mongodb database
password (str) – password for the mongodb database
server (str) – server name: “serinity” (default) or “atlas”
lpadname (str) – name of the fireworks internal database. If not given, the name is inferred.

Returns

Launchpad for internal fireworks use.

Return type

fireworks object

critcatworks.database.read module¶

class critcatworks.database.read.NCReadTask(*args, **kwargs)[source]¶

Bases: fireworks.core.firework.FiretaskBase

Task to read nanocluster structures from xyz files.

Parameters

Path (str) – Absolute path to a directory containing structures readable by ASE
cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure

Returns

Firework action, update fw_spec

Return type

FWAction

optional_params = ['cell_factor']¶

required_params = ['path']¶

run_task(fw_spec)[source]¶

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters: fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
Returns: (FWAction)

class critcatworks.database.read.NCStartFromDatabaseTask(*args, **kwargs)[source]¶

Bases: fireworks.core.firework.FiretaskBase

Task to setup starting structures from nanoclusters (ASE atoms objects).

Parameters

db_ids_lst (str) – list of simulation ids in external database
ext_db (pymongo) – external database pymongo object. Defaults to using extdb_connect (dictionary containing the keys host, username, password, authsource and db_name).

Returns

Firework action, update fw_spec

Return type

FWAction

optional_params = []¶

required_params = ['db_ids_lst', 'ext_db']¶

run_task(fw_spec)[source]¶

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters: fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
Returns: (FWAction)

class critcatworks.database.read.NCStartFromStructuresTask(*args, **kwargs)[source]¶

Bases: fireworks.core.firework.FiretaskBase

Task to setup starting structures from nanoclusters (ASE atoms objects).

Parameters: ase_atoms_lst (str) – list of ASE atoms objects in dictionary format
Returns: Firework action, update fw_spec
Return type: FWAction

required_params = ['ase_atoms_lst']¶

run_task(fw_spec)[source]¶

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters: fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
Returns: (FWAction)

critcatworks.database.read.read_structures(path, spec={}, cell_factor=2.5)[source]¶

Sets up Firework to read nanocluster structures from structure files (e.g xyz) In the second line of the input file, it looks for the keywords: E, energy, total_energy, TotalEnergy, totalenergy It stores the first value found in the field output.total_energy

The structures are stored in individual documents of the simulation collection.

Parameters

path (str) – absolute path to the directory where the structure files (e.g. xyz format) can be found.
spec (dict) – optional additional entries for the fw_spec
cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure

Returns

NCReadWork Firework

Return type

Firework

critcatworks.database.read.read_structures_locally(path, cell_factor=2.5)[source]¶

Helper function to read structures locally. Can be used within a firework or outside.

Parameters

path (str) – absolute path to the directory where the structure files (e.g. xyz format) can be found.
cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure

Returns

list of ase.Atoms objects with a manipulated cellsize field.

Return type

list

critcatworks.database.read.start_from_database(db_ids_lst, ext_db=None, spec={})[source]¶

Sets up Firework to retrieve nanocluster structures from the simulation collection of the mongodb database. In atoms.info it looks for the keywords: E, energy, total_energy, TotalEnergy, totalenergy It stores the first value found in the field output.total_energy

The structures are stored in individual documents of the simulation collection.

critcatworks.database.read.start_from_structures(ase_atoms_lst, spec={})[source]¶

Sets up Firework to read nanocluster structures from ASE atoms objects. The structures are copied to new individual documents of the simulation collection. References to the current workflow, the parent nanocluster and the source are updated.

Parameters

ase_atoms_lst (str) – list of ASE atoms objects in dictionary format
spec (dict) – optional additional entries for the fw_spec

Returns

Firework action, update fw_spec

Return type

FWAction

critcatworks.database.update module¶

class critcatworks.database.update.InitialTask(*args, **kwargs)[source]¶

Bases: fireworks.core.firework.FiretaskBase

Custom Firetask to initialize a new workflow instance in the database. Additionally, initializes a few entries in the fw_spec.

optional_params = ['extdb_connect']¶

required_params = ['username', 'password', 'parameters', 'name', 'workflow_type']¶

run_task(fw_spec)[source]¶

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters: fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
Returns: (FWAction)

critcatworks.database.update.initialize_workflow_data(username, password, parameters, name='UNNAMED', workflow_type='UNNAMED', extdb_connect={})[source]¶

Creates a custom Firework object to initialize the workflow. It updates the workflow collection and makes a few entries in the fw_spec.

Parameters

username (str) – username for the mongodb database
password (str) – password for the mongodb database
parameters (dict) – workflow-specific input parameters
name (str) – custom name of the workflow
workflow_type (str) – custom workflow type
extdb_connect (dict) – dictionary optionally containing the keys host, authsource and db_name. All fields have a default value.

Returns

InitialWork

Return type

Firework object

critcatworks.database package¶

Submodules¶

critcatworks.database.extdb module¶

critcatworks.database.format module¶

critcatworks.database.mylaunchpad module¶

critcatworks.database.read module¶

critcatworks.database.update module¶

Module contents¶

Table of Contents

Previous topic

Next topic

This Page