critcatworks.database package

Submodules

critcatworks.database.extdb module

critcatworks.database.extdb.fetch_simulations(extdb_connect, simulation_ids)[source]

Fetches simulation records by simulation id

Parameters
  • extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.

  • simulation_ids (1D ndarray) – unique identifiers of the simulation collection.

Returns

documents of the simulation collection

Return type

list

critcatworks.database.extdb.gather_all_atom_types(calc_ids, simulations)[source]

Helper function to determine all atom types in the dataset

Parameters
  • calc_ids (list) – ids of the simulation collection

  • simulations (list) – simulation documents

Returns

a sorted unique list of atomic numbers in the

dataset

Return type

list

critcatworks.database.extdb.get_external_database(extdb_connect)[source]

A helper function to connect to a mongodb database.

Parameters

extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.

Returns

address to database

Return type

pymongo object

critcatworks.database.extdb.update_machine_learning_collection(method, extdb_connect, workflow_id=-1, method_params={}, descriptor='soap', descriptor_params={}, training_set=[], validation_set=[], test_set=[], prediction_set=[], metrics_training={}, metrics_validation={}, metrics_test={}, output={}, **kwargs)[source]

A new document is added to the machine_learning collection of the mongodb database. It contains records of all types of workflows. The documents should be in a specific format. Any arguments can be specified, however, certain arguments below should be consistently given to allow for comprehensive database querying.

Parameters
  • workflow_id (int) – ID of workflow which the machine learning run was part of

  • method (str) – name of the ML method: krr, nn, …

  • method_params (dict) – Parameters of the method

  • descriptor (str) – name of the descriptor: soap, mbtr, lmbtr, cm, …

  • descriptor_params (dict) – Parameters of the descriptor used

  • training_set (1D ndarray) – list of simulation IDs used for training

  • validation_set (1D ndarray) – list of simulation IDs used in validation. If empty, cross-validation was used.

  • test_set (1D ndarray) – list of simulation IDs used in testing. If empty, only validation was used

  • prediction_set (1D ndarray) – list of simulation IDs used for prediction.

  • metrics_training (dict) – dictionary of (“metric name”: value) on training set key (str) : name of the metric value (float) : calculated value

  • metrics_validation (dict) – dictionary of (“metric name”: value) on validation set key (str) : name of the metric value (float) : calculated value

  • metrics_test (dict) – dictionary of (“metric name”: value) on test set key (str) : name of the metric value (float) : calculated value

  • output (dict) – relevant training output info

Returns

A dictionary with the provided arguments plus

a unique id provided by the database.

Return type

dict

critcatworks.database.extdb.update_simulations_collection(extdb_connect, **kwargs)[source]

A new document is added to the simulations collection of the mongodb database. It contains records of all manipulation steps of a structure, in particular the initial structure, structure after DFT relaxation, structure with added or removed asdorbates, etc. The documents should be in a specific format. Any arguments can be specified, however, the optional arguments below should be consistently given to allow for comprehensive database querying.

Parameters
  • extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.

  • source_id (int) – ID of the parent simulation that originated this, -1 if none

  • workflow_id (int) – ID of workflow when instance was added, -1 if none

  • wf_sim_id (int) – ID of simulation (unique within the workflow this belongs to)

  • atoms (dict) –

    dictionary with information about the atoms. should be in the following format

    numbers (1D ndarray) : list of atomic numbers as numpy array [N] of ints positions (2D ndarray) : positions as numpy matrix [Nx3] of doubles constraints (2D ndarray) : frozen flags a matrix [Nx3] of int [optional] 1 = frozen, 0 = free pbc (bool) : use periodic boundaries cell (2D ndarray) : matrix 3x3 with cell vectors on the rows celldisp (1D ndarray) : displacement of cell from origin info (dict) : field for additional information related to structure

  • nanoclusters (list of ATOMS dict) –

    list of dictionaries with information about the nanocluster(s) The dictionaries should have the following form:

    reference_id (int) : ID of the simulation where this cluster was made, -1 if original atom_ids (1D ndarray) : atom indices in the ATOMS dictionary of the simulation record.

  • adsorbates (list of dict) –

    list of dictionaries with information about the adsorbate(s) The dictionaries should have the following form:

    reference_id (int) : ID of the simulation to use as reference atom_ids (1D ndarray) : atom indices in the ATOMS dictionary of the simulation record. site_class (str) : class of adsorption site: “top”, “bridge”, “hollow”, “4-fold hollow” site_ids (1D ndarray) : list of atom ids (in simulation record) that define the adsorption site

  • substrate (list of dict) –

    list of dictionaries with information about the substrate(s) The dictionaries should have the following form:

    reference_id (int) : ID of the parent support simulation, -1 if no parent atom_ids (1D ndarray) : atom indices in the corresponding ATOMS dictionary

  • operations (list) – List of dictionaries, each describing one operation. Always with respect to the parent simulation if applicable. The dictionaries can be of arbitrary form.

  • inp (dict) – property/value pairs describing the simulation input The dictionary can be of arbitrary form.

  • output (dict) – property/value pairs output by the calculation The dictionary can be of arbitrary form.

Returns

A dictionary with the provided arguments plus

a unique id provided by the database.

Return type

dict

critcatworks.database.extdb.update_workflows_collection(username, password, creation_time, extdb_connect, parameters={}, name='UNNAMED', workflow_type='NO_TYPE', **kwargs)[source]

A new document is added to the workflows collection of the mongodb database. (Usually at the beginning of the workflow run.) It contains records of all types of workflows. The documents should be in a specific format. Any arguments can be specified, however, certain arguments below should be consistently given to allow for comprehensive database querying.

Parameters
  • extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.

  • username (str) – user who executed the workflow

  • creation_time (str) – time of creation of the workflow

  • parameters (dict) – workflow-specific parameters

  • name (str) – custom name of workflow

  • workflow_type (str) – custom type of workflow

Returns

Contains the keys username, name, workflow_type, creation_time,

parameters and _id, the latter being a unique id provided by the database.

Return type

dict

critcatworks.database.format module

critcatworks.database.format.adsorbate_pos_to_atoms_lst(adspos, adsorbate_name)[source]

Helper function to turn positions for adsorbates into ase atoms objects while the species is defined by adsorbate_name Attention! Works with only one adsorbate atom. In the future, cluskit might generalize to return a list of adsorbates already in ase format.

Parameters
  • adspos (2D ndarray) – positions of the adsorbate atoms

  • adsorbate_name (str) – chemical symbol of the adsorbate atoms

Returns

ase.Atoms objects of single atoms at each position

Return type

list

critcatworks.database.format.ase_to_atoms_dict(atoms)[source]

Helper function to convert an ase.Atoms object into its corresponding python dictionary

Parameters

atoms (ase.Atoms) – ase.Atoms object

Returns

Corresponding python dictionary

Return type

dict

critcatworks.database.format.atoms_dict_to_ase(atoms_dict)[source]

Helper function to convert a ATOMS dictionary into an ase.Atoms object

Parameters

atoms_dict (dict) –

dictionary with information about the atoms. should be in the following format

numbers (1D ndarray) : list of atomic numbers as numpy array [N] of ints positions (2D ndarray) : positions as numpy matrix [Nx3] of doubles constraints (2D ndarray) : frozen flags a matrix [Nx3] of int [optional] 1 = frozen, 0 = free pbc (bool) : use periodic boundaries cell (2D ndarray) : matrix 3x3 with cell vectors on the rows celldisp (1D ndarray) : displacement of cell from origin info (dict) : field for additional information related to structure

Returns

Corresponding ase.Atoms object

Return type

ase.Atoms

critcatworks.database.format.join_cluster_adsorbate(cluster, adsorbate)[source]

Helper function to merge the structures cluster and adsorbate while retaining information about the ids

Parameters
  • cluster (ase.Atoms) – nanocluster structure

  • adsorbate (ase.Atoms) – single adsorbate

Returns

ase.Atoms object of merged structure, ids of the

nanocluster, ids of the adsorbate

Return type

tuple

critcatworks.database.format.read_descmatrix(fw_spec)[source]

Helper function to read a descriptor matrix required for machine learning. It is stored as a file, since large arrays make fireworks slow.

Parameters

fw_spec (dict) – Only the key ‘descmatrix’ is read. It expects a string with the absolute path to file

Returns

descriptor matrix with

M features x N datapoints

Return type

2D np.ndarray

critcatworks.database.format.write_descmatrix(descmatrix)[source]

Helper function to write a descriptor matrix required for machine learning. It is stored as a file, since large arrays make fireworks slow.

Parameters

descmatrix (2D np.ndarray) – descriptor matrix with M features x N datapoints

Returns

absolute path to file

Return type

str

critcatworks.database.mylaunchpad module

critcatworks.database.mylaunchpad.create_launchpad(username, password, server='serenity', lpadname=None)[source]

Creates the fireworks launchpad on specific preset servers.

Parameters
  • username (str) – username for the mongodb database

  • password (str) – password for the mongodb database

  • server (str) – server name: “serinity” (default) or “atlas”

  • lpadname (str) – name of the fireworks internal database. If not given, the name is inferred.

Returns

Launchpad for internal fireworks use.

Return type

fireworks object

critcatworks.database.read module

class critcatworks.database.read.NCReadTask(*args, **kwargs)[source]

Bases: fireworks.core.firework.FiretaskBase

Task to read nanocluster structures from xyz files.

Parameters
  • Path (str) – Absolute path to a directory containing structures readable by ASE

  • cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure

Returns

Firework action, update fw_spec

Return type

FWAction

optional_params = ['cell_factor']
required_params = ['path']
run_task(fw_spec)[source]

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters

fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.

Returns

(FWAction)

class critcatworks.database.read.NCStartFromDatabaseTask(*args, **kwargs)[source]

Bases: fireworks.core.firework.FiretaskBase

Task to setup starting structures from nanoclusters (ASE atoms objects).

Parameters
  • db_ids_lst (str) – list of simulation ids in external database

  • ext_db (pymongo) – external database pymongo object. Defaults to using extdb_connect (dictionary containing the keys host, username, password, authsource and db_name).

Returns

Firework action, update fw_spec

Return type

FWAction

optional_params = []
required_params = ['db_ids_lst', 'ext_db']
run_task(fw_spec)[source]

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters

fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.

Returns

(FWAction)

class critcatworks.database.read.NCStartFromStructuresTask(*args, **kwargs)[source]

Bases: fireworks.core.firework.FiretaskBase

Task to setup starting structures from nanoclusters (ASE atoms objects).

Parameters

ase_atoms_lst (str) – list of ASE atoms objects in dictionary format

Returns

Firework action, update fw_spec

Return type

FWAction

required_params = ['ase_atoms_lst']
run_task(fw_spec)[source]

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters

fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.

Returns

(FWAction)

critcatworks.database.read.read_structures(path, spec={}, cell_factor=2.5)[source]

Sets up Firework to read nanocluster structures from structure files (e.g xyz) In the second line of the input file, it looks for the keywords: E, energy, total_energy, TotalEnergy, totalenergy It stores the first value found in the field output.total_energy

The structures are stored in individual documents of the simulation collection.

Parameters
  • path (str) – absolute path to the directory where the structure files (e.g. xyz format) can be found.

  • spec (dict) – optional additional entries for the fw_spec

  • cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure

Returns

NCReadWork Firework

Return type

Firework

critcatworks.database.read.read_structures_locally(path, cell_factor=2.5)[source]

Helper function to read structures locally. Can be used within a firework or outside.

Parameters
  • path (str) – absolute path to the directory where the structure files (e.g. xyz format) can be found.

  • cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure

Returns

list of ase.Atoms objects with a manipulated cellsize field.

Return type

list

critcatworks.database.read.start_from_database(db_ids_lst, ext_db=None, spec={})[source]

Sets up Firework to retrieve nanocluster structures from the simulation collection of the mongodb database. In atoms.info it looks for the keywords: E, energy, total_energy, TotalEnergy, totalenergy It stores the first value found in the field output.total_energy

The structures are stored in individual documents of the simulation collection.

critcatworks.database.read.start_from_structures(ase_atoms_lst, spec={})[source]

Sets up Firework to read nanocluster structures from ASE atoms objects. The structures are copied to new individual documents of the simulation collection. References to the current workflow, the parent nanocluster and the source are updated.

Parameters
  • ase_atoms_lst (str) – list of ASE atoms objects in dictionary format

  • spec (dict) – optional additional entries for the fw_spec

Returns

Firework action, update fw_spec

Return type

FWAction

critcatworks.database.update module

class critcatworks.database.update.InitialTask(*args, **kwargs)[source]

Bases: fireworks.core.firework.FiretaskBase

Custom Firetask to initialize a new workflow instance in the database. Additionally, initializes a few entries in the fw_spec.

optional_params = ['extdb_connect']
required_params = ['username', 'password', 'parameters', 'name', 'workflow_type']
run_task(fw_spec)[source]

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters

fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.

Returns

(FWAction)

critcatworks.database.update.initialize_workflow_data(username, password, parameters, name='UNNAMED', workflow_type='UNNAMED', extdb_connect={})[source]

Creates a custom Firework object to initialize the workflow. It updates the workflow collection and makes a few entries in the fw_spec.

Parameters
  • username (str) – username for the mongodb database

  • password (str) – password for the mongodb database

  • parameters (dict) – workflow-specific input parameters

  • name (str) – custom name of the workflow

  • workflow_type (str) – custom workflow type

  • extdb_connect (dict) – dictionary optionally containing the keys host, authsource and db_name. All fields have a default value.

Returns

InitialWork

Return type

Firework object

Module contents