critcatworks.database package¶
Submodules¶
critcatworks.database.extdb module¶
-
critcatworks.database.extdb.
fetch_simulations
(extdb_connect, simulation_ids)[source]¶ Fetches simulation records by simulation id
- Parameters
extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.
simulation_ids (1D ndarray) – unique identifiers of the simulation collection.
- Returns
documents of the simulation collection
- Return type
list
-
critcatworks.database.extdb.
gather_all_atom_types
(calc_ids, simulations)[source]¶ Helper function to determine all atom types in the dataset
- Parameters
calc_ids (list) – ids of the simulation collection
simulations (list) – simulation documents
- Returns
- a sorted unique list of atomic numbers in the
dataset
- Return type
list
-
critcatworks.database.extdb.
get_external_database
(extdb_connect)[source]¶ A helper function to connect to a mongodb database.
- Parameters
extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.
- Returns
address to database
- Return type
pymongo object
-
critcatworks.database.extdb.
update_machine_learning_collection
(method, extdb_connect, workflow_id=-1, method_params={}, descriptor='soap', descriptor_params={}, training_set=[], validation_set=[], test_set=[], prediction_set=[], metrics_training={}, metrics_validation={}, metrics_test={}, output={}, **kwargs)[source]¶ A new document is added to the machine_learning collection of the mongodb database. It contains records of all types of workflows. The documents should be in a specific format. Any arguments can be specified, however, certain arguments below should be consistently given to allow for comprehensive database querying.
- Parameters
workflow_id (int) – ID of workflow which the machine learning run was part of
method (str) – name of the ML method: krr, nn, …
method_params (dict) – Parameters of the method
descriptor (str) – name of the descriptor: soap, mbtr, lmbtr, cm, …
descriptor_params (dict) – Parameters of the descriptor used
training_set (1D ndarray) – list of simulation IDs used for training
validation_set (1D ndarray) – list of simulation IDs used in validation. If empty, cross-validation was used.
test_set (1D ndarray) – list of simulation IDs used in testing. If empty, only validation was used
prediction_set (1D ndarray) – list of simulation IDs used for prediction.
metrics_training (dict) – dictionary of (“metric name”: value) on training set key (str) : name of the metric value (float) : calculated value
metrics_validation (dict) – dictionary of (“metric name”: value) on validation set key (str) : name of the metric value (float) : calculated value
metrics_test (dict) – dictionary of (“metric name”: value) on test set key (str) : name of the metric value (float) : calculated value
output (dict) – relevant training output info
- Returns
- A dictionary with the provided arguments plus
a unique id provided by the database.
- Return type
dict
-
critcatworks.database.extdb.
update_simulations_collection
(extdb_connect, **kwargs)[source]¶ A new document is added to the simulations collection of the mongodb database. It contains records of all manipulation steps of a structure, in particular the initial structure, structure after DFT relaxation, structure with added or removed asdorbates, etc. The documents should be in a specific format. Any arguments can be specified, however, the optional arguments below should be consistently given to allow for comprehensive database querying.
- Parameters
extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.
source_id (int) – ID of the parent simulation that originated this, -1 if none
workflow_id (int) – ID of workflow when instance was added, -1 if none
wf_sim_id (int) – ID of simulation (unique within the workflow this belongs to)
atoms (dict) –
dictionary with information about the atoms. should be in the following format
numbers (1D ndarray) : list of atomic numbers as numpy array [N] of ints positions (2D ndarray) : positions as numpy matrix [Nx3] of doubles constraints (2D ndarray) : frozen flags a matrix [Nx3] of int [optional] 1 = frozen, 0 = free pbc (bool) : use periodic boundaries cell (2D ndarray) : matrix 3x3 with cell vectors on the rows celldisp (1D ndarray) : displacement of cell from origin info (dict) : field for additional information related to structure
nanoclusters (list of ATOMS dict) –
list of dictionaries with information about the nanocluster(s) The dictionaries should have the following form:
reference_id (int) : ID of the simulation where this cluster was made, -1 if original atom_ids (1D ndarray) : atom indices in the ATOMS dictionary of the simulation record.
adsorbates (list of dict) –
list of dictionaries with information about the adsorbate(s) The dictionaries should have the following form:
reference_id (int) : ID of the simulation to use as reference atom_ids (1D ndarray) : atom indices in the ATOMS dictionary of the simulation record. site_class (str) : class of adsorption site: “top”, “bridge”, “hollow”, “4-fold hollow” site_ids (1D ndarray) : list of atom ids (in simulation record) that define the adsorption site
substrate (list of dict) –
list of dictionaries with information about the substrate(s) The dictionaries should have the following form:
reference_id (int) : ID of the parent support simulation, -1 if no parent atom_ids (1D ndarray) : atom indices in the corresponding ATOMS dictionary
operations (list) – List of dictionaries, each describing one operation. Always with respect to the parent simulation if applicable. The dictionaries can be of arbitrary form.
inp (dict) – property/value pairs describing the simulation input The dictionary can be of arbitrary form.
output (dict) – property/value pairs output by the calculation The dictionary can be of arbitrary form.
- Returns
- A dictionary with the provided arguments plus
a unique id provided by the database.
- Return type
dict
-
critcatworks.database.extdb.
update_workflows_collection
(username, password, creation_time, extdb_connect, parameters={}, name='UNNAMED', workflow_type='NO_TYPE', **kwargs)[source]¶ A new document is added to the workflows collection of the mongodb database. (Usually at the beginning of the workflow run.) It contains records of all types of workflows. The documents should be in a specific format. Any arguments can be specified, however, certain arguments below should be consistently given to allow for comprehensive database querying.
- Parameters
extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name.
username (str) – user who executed the workflow
creation_time (str) – time of creation of the workflow
parameters (dict) – workflow-specific parameters
name (str) – custom name of workflow
workflow_type (str) – custom type of workflow
- Returns
- Contains the keys username, name, workflow_type, creation_time,
parameters and _id, the latter being a unique id provided by the database.
- Return type
dict
critcatworks.database.format module¶
-
critcatworks.database.format.
adsorbate_pos_to_atoms_lst
(adspos, adsorbate_name)[source]¶ Helper function to turn positions for adsorbates into ase atoms objects while the species is defined by adsorbate_name Attention! Works with only one adsorbate atom. In the future, cluskit might generalize to return a list of adsorbates already in ase format.
- Parameters
adspos (2D ndarray) – positions of the adsorbate atoms
adsorbate_name (str) – chemical symbol of the adsorbate atoms
- Returns
ase.Atoms objects of single atoms at each position
- Return type
list
-
critcatworks.database.format.
ase_to_atoms_dict
(atoms)[source]¶ Helper function to convert an ase.Atoms object into its corresponding python dictionary
- Parameters
atoms (ase.Atoms) – ase.Atoms object
- Returns
Corresponding python dictionary
- Return type
dict
-
critcatworks.database.format.
atoms_dict_to_ase
(atoms_dict)[source]¶ Helper function to convert a ATOMS dictionary into an ase.Atoms object
- Parameters
atoms_dict (dict) –
dictionary with information about the atoms. should be in the following format
numbers (1D ndarray) : list of atomic numbers as numpy array [N] of ints positions (2D ndarray) : positions as numpy matrix [Nx3] of doubles constraints (2D ndarray) : frozen flags a matrix [Nx3] of int [optional] 1 = frozen, 0 = free pbc (bool) : use periodic boundaries cell (2D ndarray) : matrix 3x3 with cell vectors on the rows celldisp (1D ndarray) : displacement of cell from origin info (dict) : field for additional information related to structure
- Returns
Corresponding ase.Atoms object
- Return type
ase.Atoms
-
critcatworks.database.format.
join_cluster_adsorbate
(cluster, adsorbate)[source]¶ Helper function to merge the structures cluster and adsorbate while retaining information about the ids
- Parameters
cluster (ase.Atoms) – nanocluster structure
adsorbate (ase.Atoms) – single adsorbate
- Returns
- ase.Atoms object of merged structure, ids of the
nanocluster, ids of the adsorbate
- Return type
tuple
-
critcatworks.database.format.
read_descmatrix
(fw_spec)[source]¶ Helper function to read a descriptor matrix required for machine learning. It is stored as a file, since large arrays make fireworks slow.
- Parameters
fw_spec (dict) – Only the key ‘descmatrix’ is read. It expects a string with the absolute path to file
- Returns
- descriptor matrix with
M features x N datapoints
- Return type
2D np.ndarray
-
critcatworks.database.format.
write_descmatrix
(descmatrix)[source]¶ Helper function to write a descriptor matrix required for machine learning. It is stored as a file, since large arrays make fireworks slow.
- Parameters
descmatrix (2D np.ndarray) – descriptor matrix with M features x N datapoints
- Returns
absolute path to file
- Return type
str
critcatworks.database.mylaunchpad module¶
-
critcatworks.database.mylaunchpad.
create_launchpad
(username, password, server='serenity', lpadname=None)[source]¶ Creates the fireworks launchpad on specific preset servers.
- Parameters
username (str) – username for the mongodb database
password (str) – password for the mongodb database
server (str) – server name: “serinity” (default) or “atlas”
lpadname (str) – name of the fireworks internal database. If not given, the name is inferred.
- Returns
Launchpad for internal fireworks use.
- Return type
fireworks object
critcatworks.database.read module¶
-
class
critcatworks.database.read.
NCReadTask
(*args, **kwargs)[source]¶ Bases:
fireworks.core.firework.FiretaskBase
Task to read nanocluster structures from xyz files.
- Parameters
Path (str) – Absolute path to a directory containing structures readable by ASE
cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure
- Returns
Firework action, update fw_spec
- Return type
FWAction
-
optional_params
= ['cell_factor']¶
-
required_params
= ['path']¶
-
run_task
(fw_spec)[source]¶ This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.
- Parameters
fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
- Returns
(FWAction)
-
class
critcatworks.database.read.
NCStartFromDatabaseTask
(*args, **kwargs)[source]¶ Bases:
fireworks.core.firework.FiretaskBase
Task to setup starting structures from nanoclusters (ASE atoms objects).
- Parameters
db_ids_lst (str) – list of simulation ids in external database
ext_db (pymongo) – external database pymongo object. Defaults to using extdb_connect (dictionary containing the keys host, username, password, authsource and db_name).
- Returns
Firework action, update fw_spec
- Return type
FWAction
-
optional_params
= []¶
-
required_params
= ['db_ids_lst', 'ext_db']¶
-
run_task
(fw_spec)[source]¶ This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.
- Parameters
fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
- Returns
(FWAction)
-
class
critcatworks.database.read.
NCStartFromStructuresTask
(*args, **kwargs)[source]¶ Bases:
fireworks.core.firework.FiretaskBase
Task to setup starting structures from nanoclusters (ASE atoms objects).
- Parameters
ase_atoms_lst (str) – list of ASE atoms objects in dictionary format
- Returns
Firework action, update fw_spec
- Return type
FWAction
-
required_params
= ['ase_atoms_lst']¶
-
run_task
(fw_spec)[source]¶ This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.
- Parameters
fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
- Returns
(FWAction)
-
critcatworks.database.read.
read_structures
(path, spec={}, cell_factor=2.5)[source]¶ Sets up Firework to read nanocluster structures from structure files (e.g xyz) In the second line of the input file, it looks for the keywords: E, energy, total_energy, TotalEnergy, totalenergy It stores the first value found in the field output.total_energy
The structures are stored in individual documents of the simulation collection.
- Parameters
path (str) – absolute path to the directory where the structure files (e.g. xyz format) can be found.
spec (dict) – optional additional entries for the fw_spec
cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure
- Returns
NCReadWork Firework
- Return type
Firework
-
critcatworks.database.read.
read_structures_locally
(path, cell_factor=2.5)[source]¶ Helper function to read structures locally. Can be used within a firework or outside.
- Parameters
path (str) – absolute path to the directory where the structure files (e.g. xyz format) can be found.
cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure
- Returns
list of ase.Atoms objects with a manipulated cellsize field.
- Return type
list
-
critcatworks.database.read.
start_from_database
(db_ids_lst, ext_db=None, spec={})[source]¶ Sets up Firework to retrieve nanocluster structures from the simulation collection of the mongodb database. In atoms.info it looks for the keywords: E, energy, total_energy, TotalEnergy, totalenergy It stores the first value found in the field output.total_energy
The structures are stored in individual documents of the simulation collection.
-
critcatworks.database.read.
start_from_structures
(ase_atoms_lst, spec={})[source]¶ Sets up Firework to read nanocluster structures from ASE atoms objects. The structures are copied to new individual documents of the simulation collection. References to the current workflow, the parent nanocluster and the source are updated.
- Parameters
ase_atoms_lst (str) – list of ASE atoms objects in dictionary format
spec (dict) – optional additional entries for the fw_spec
- Returns
Firework action, update fw_spec
- Return type
FWAction
critcatworks.database.update module¶
-
class
critcatworks.database.update.
InitialTask
(*args, **kwargs)[source]¶ Bases:
fireworks.core.firework.FiretaskBase
Custom Firetask to initialize a new workflow instance in the database. Additionally, initializes a few entries in the fw_spec.
-
optional_params
= ['extdb_connect']¶
-
required_params
= ['username', 'password', 'parameters', 'name', 'workflow_type']¶
-
run_task
(fw_spec)[source]¶ This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.
- Parameters
fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
- Returns
(FWAction)
-
-
critcatworks.database.update.
initialize_workflow_data
(username, password, parameters, name='UNNAMED', workflow_type='UNNAMED', extdb_connect={})[source]¶ Creates a custom Firework object to initialize the workflow. It updates the workflow collection and makes a few entries in the fw_spec.
- Parameters
username (str) – username for the mongodb database
password (str) – password for the mongodb database
parameters (dict) – workflow-specific input parameters
name (str) – custom name of the workflow
workflow_type (str) – custom workflow type
extdb_connect (dict) – dictionary optionally containing the keys host, authsource and db_name. All fields have a default value.
- Returns
InitialWork
- Return type
Firework object