critcatworks.database package¶
Submodules¶
critcatworks.database.extdb module¶
- 
critcatworks.database.extdb.fetch_simulations(extdb_connect, simulation_ids)[source]¶
- Fetches simulation records by simulation id - Parameters
- extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name. 
- simulation_ids (1D ndarray) – unique identifiers of the simulation collection. 
 
- Returns
- documents of the simulation collection 
- Return type
- list 
 
- 
critcatworks.database.extdb.gather_all_atom_types(calc_ids, simulations)[source]¶
- Helper function to determine all atom types in the dataset - Parameters
- calc_ids (list) – ids of the simulation collection 
- simulations (list) – simulation documents 
 
- Returns
- a sorted unique list of atomic numbers in the
- dataset 
 
- Return type
- list 
 
- 
critcatworks.database.extdb.get_external_database(extdb_connect)[source]¶
- A helper function to connect to a mongodb database. - Parameters
- extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name. 
- Returns
- address to database 
- Return type
- pymongo object 
 
- 
critcatworks.database.extdb.update_machine_learning_collection(method, extdb_connect, workflow_id=-1, method_params={}, descriptor='soap', descriptor_params={}, training_set=[], validation_set=[], test_set=[], prediction_set=[], metrics_training={}, metrics_validation={}, metrics_test={}, output={}, **kwargs)[source]¶
- A new document is added to the machine_learning collection of the mongodb database. It contains records of all types of workflows. The documents should be in a specific format. Any arguments can be specified, however, certain arguments below should be consistently given to allow for comprehensive database querying. - Parameters
- workflow_id (int) – ID of workflow which the machine learning run was part of 
- method (str) – name of the ML method: krr, nn, … 
- method_params (dict) – Parameters of the method 
- descriptor (str) – name of the descriptor: soap, mbtr, lmbtr, cm, … 
- descriptor_params (dict) – Parameters of the descriptor used 
- training_set (1D ndarray) – list of simulation IDs used for training 
- validation_set (1D ndarray) – list of simulation IDs used in validation. If empty, cross-validation was used. 
- test_set (1D ndarray) – list of simulation IDs used in testing. If empty, only validation was used 
- prediction_set (1D ndarray) – list of simulation IDs used for prediction. 
- metrics_training (dict) – dictionary of (“metric name”: value) on training set key (str) : name of the metric value (float) : calculated value 
- metrics_validation (dict) – dictionary of (“metric name”: value) on validation set key (str) : name of the metric value (float) : calculated value 
- metrics_test (dict) – dictionary of (“metric name”: value) on test set key (str) : name of the metric value (float) : calculated value 
- output (dict) – relevant training output info 
 
- Returns
- A dictionary with the provided arguments plus
- a unique id provided by the database. 
 
- Return type
- dict 
 
- 
critcatworks.database.extdb.update_simulations_collection(extdb_connect, **kwargs)[source]¶
- A new document is added to the simulations collection of the mongodb database. It contains records of all manipulation steps of a structure, in particular the initial structure, structure after DFT relaxation, structure with added or removed asdorbates, etc. The documents should be in a specific format. Any arguments can be specified, however, the optional arguments below should be consistently given to allow for comprehensive database querying. - Parameters
- extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name. 
- source_id (int) – ID of the parent simulation that originated this, -1 if none 
- workflow_id (int) – ID of workflow when instance was added, -1 if none 
- wf_sim_id (int) – ID of simulation (unique within the workflow this belongs to) 
- atoms (dict) – - dictionary with information about the atoms. should be in the following format - numbers (1D ndarray) : list of atomic numbers as numpy array [N] of ints positions (2D ndarray) : positions as numpy matrix [Nx3] of doubles constraints (2D ndarray) : frozen flags a matrix [Nx3] of int [optional] 1 = frozen, 0 = free pbc (bool) : use periodic boundaries cell (2D ndarray) : matrix 3x3 with cell vectors on the rows celldisp (1D ndarray) : displacement of cell from origin info (dict) : field for additional information related to structure 
- nanoclusters (list of ATOMS dict) – - list of dictionaries with information about the nanocluster(s) The dictionaries should have the following form: - reference_id (int) : ID of the simulation where this cluster was made, -1 if original atom_ids (1D ndarray) : atom indices in the ATOMS dictionary of the simulation record. 
- adsorbates (list of dict) – - list of dictionaries with information about the adsorbate(s) The dictionaries should have the following form: - reference_id (int) : ID of the simulation to use as reference atom_ids (1D ndarray) : atom indices in the ATOMS dictionary of the simulation record. site_class (str) : class of adsorption site: “top”, “bridge”, “hollow”, “4-fold hollow” site_ids (1D ndarray) : list of atom ids (in simulation record) that define the adsorption site 
- substrate (list of dict) – - list of dictionaries with information about the substrate(s) The dictionaries should have the following form: - reference_id (int) : ID of the parent support simulation, -1 if no parent atom_ids (1D ndarray) : atom indices in the corresponding ATOMS dictionary 
- operations (list) – List of dictionaries, each describing one operation. Always with respect to the parent simulation if applicable. The dictionaries can be of arbitrary form. 
- inp (dict) – property/value pairs describing the simulation input The dictionary can be of arbitrary form. 
- output (dict) – property/value pairs output by the calculation The dictionary can be of arbitrary form. 
 
- Returns
- A dictionary with the provided arguments plus
- a unique id provided by the database. 
 
- Return type
- dict 
 
- 
critcatworks.database.extdb.update_workflows_collection(username, password, creation_time, extdb_connect, parameters={}, name='UNNAMED', workflow_type='NO_TYPE', **kwargs)[source]¶
- A new document is added to the workflows collection of the mongodb database. (Usually at the beginning of the workflow run.) It contains records of all types of workflows. The documents should be in a specific format. Any arguments can be specified, however, certain arguments below should be consistently given to allow for comprehensive database querying. - Parameters
- extdb_connect (dict) – dictionary containing the keys host, username, password, authsource and db_name. 
- username (str) – user who executed the workflow 
- creation_time (str) – time of creation of the workflow 
- parameters (dict) – workflow-specific parameters 
- name (str) – custom name of workflow 
- workflow_type (str) – custom type of workflow 
 
- Returns
- Contains the keys username, name, workflow_type, creation_time,
- parameters and _id, the latter being a unique id provided by the database. 
 
- Return type
- dict 
 
critcatworks.database.format module¶
- 
critcatworks.database.format.adsorbate_pos_to_atoms_lst(adspos, adsorbate_name)[source]¶
- Helper function to turn positions for adsorbates into ase atoms objects while the species is defined by adsorbate_name Attention! Works with only one adsorbate atom. In the future, cluskit might generalize to return a list of adsorbates already in ase format. - Parameters
- adspos (2D ndarray) – positions of the adsorbate atoms 
- adsorbate_name (str) – chemical symbol of the adsorbate atoms 
 
- Returns
- ase.Atoms objects of single atoms at each position 
- Return type
- list 
 
- 
critcatworks.database.format.ase_to_atoms_dict(atoms)[source]¶
- Helper function to convert an ase.Atoms object into its corresponding python dictionary - Parameters
- atoms (ase.Atoms) – ase.Atoms object 
- Returns
- Corresponding python dictionary 
- Return type
- dict 
 
- 
critcatworks.database.format.atoms_dict_to_ase(atoms_dict)[source]¶
- Helper function to convert a ATOMS dictionary into an ase.Atoms object - Parameters
- atoms_dict (dict) – - dictionary with information about the atoms. should be in the following format - numbers (1D ndarray) : list of atomic numbers as numpy array [N] of ints positions (2D ndarray) : positions as numpy matrix [Nx3] of doubles constraints (2D ndarray) : frozen flags a matrix [Nx3] of int [optional] 1 = frozen, 0 = free pbc (bool) : use periodic boundaries cell (2D ndarray) : matrix 3x3 with cell vectors on the rows celldisp (1D ndarray) : displacement of cell from origin info (dict) : field for additional information related to structure 
- Returns
- Corresponding ase.Atoms object 
- Return type
- ase.Atoms 
 
- 
critcatworks.database.format.join_cluster_adsorbate(cluster, adsorbate)[source]¶
- Helper function to merge the structures cluster and adsorbate while retaining information about the ids - Parameters
- cluster (ase.Atoms) – nanocluster structure 
- adsorbate (ase.Atoms) – single adsorbate 
 
- Returns
- ase.Atoms object of merged structure, ids of the
- nanocluster, ids of the adsorbate 
 
- Return type
- tuple 
 
- 
critcatworks.database.format.read_descmatrix(fw_spec)[source]¶
- Helper function to read a descriptor matrix required for machine learning. It is stored as a file, since large arrays make fireworks slow. - Parameters
- fw_spec (dict) – Only the key ‘descmatrix’ is read. It expects a string with the absolute path to file 
- Returns
- descriptor matrix with
- M features x N datapoints 
 
- Return type
- 2D np.ndarray 
 
- 
critcatworks.database.format.write_descmatrix(descmatrix)[source]¶
- Helper function to write a descriptor matrix required for machine learning. It is stored as a file, since large arrays make fireworks slow. - Parameters
- descmatrix (2D np.ndarray) – descriptor matrix with M features x N datapoints 
- Returns
- absolute path to file 
- Return type
- str 
 
critcatworks.database.mylaunchpad module¶
- 
critcatworks.database.mylaunchpad.create_launchpad(username, password, server='serenity', lpadname=None)[source]¶
- Creates the fireworks launchpad on specific preset servers. - Parameters
- username (str) – username for the mongodb database 
- password (str) – password for the mongodb database 
- server (str) – server name: “serinity” (default) or “atlas” 
- lpadname (str) – name of the fireworks internal database. If not given, the name is inferred. 
 
- Returns
- Launchpad for internal fireworks use. 
- Return type
- fireworks object 
 
critcatworks.database.read module¶
- 
class critcatworks.database.read.NCReadTask(*args, **kwargs)[source]¶
- Bases: - fireworks.core.firework.FiretaskBase- Task to read nanocluster structures from xyz files. - Parameters
- Path (str) – Absolute path to a directory containing structures readable by ASE 
- cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure 
 
- Returns
- Firework action, update fw_spec 
- Return type
- FWAction 
 - 
optional_params= ['cell_factor']¶
 - 
required_params= ['path']¶
 - 
run_task(fw_spec)[source]¶
- This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction. - Parameters
- fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources. 
- Returns
- (FWAction) 
 
 
- 
class critcatworks.database.read.NCStartFromDatabaseTask(*args, **kwargs)[source]¶
- Bases: - fireworks.core.firework.FiretaskBase- Task to setup starting structures from nanoclusters (ASE atoms objects). - Parameters
- db_ids_lst (str) – list of simulation ids in external database 
- ext_db (pymongo) – external database pymongo object. Defaults to using extdb_connect (dictionary containing the keys host, username, password, authsource and db_name). 
 
- Returns
- Firework action, update fw_spec 
- Return type
- FWAction 
 - 
optional_params= []¶
 - 
required_params= ['db_ids_lst', 'ext_db']¶
 - 
run_task(fw_spec)[source]¶
- This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction. - Parameters
- fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources. 
- Returns
- (FWAction) 
 
 
- 
class critcatworks.database.read.NCStartFromStructuresTask(*args, **kwargs)[source]¶
- Bases: - fireworks.core.firework.FiretaskBase- Task to setup starting structures from nanoclusters (ASE atoms objects). - Parameters
- ase_atoms_lst (str) – list of ASE atoms objects in dictionary format 
- Returns
- Firework action, update fw_spec 
- Return type
- FWAction 
 - 
required_params= ['ase_atoms_lst']¶
 - 
run_task(fw_spec)[source]¶
- This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction. - Parameters
- fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources. 
- Returns
- (FWAction) 
 
 
- 
critcatworks.database.read.read_structures(path, spec={}, cell_factor=2.5)[source]¶
- Sets up Firework to read nanocluster structures from structure files (e.g xyz) In the second line of the input file, it looks for the keywords: E, energy, total_energy, TotalEnergy, totalenergy It stores the first value found in the field output.total_energy - The structures are stored in individual documents of the simulation collection. - Parameters
- path (str) – absolute path to the directory where the structure files (e.g. xyz format) can be found. 
- spec (dict) – optional additional entries for the fw_spec 
- cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure 
 
- Returns
- NCReadWork Firework 
- Return type
- Firework 
 
- 
critcatworks.database.read.read_structures_locally(path, cell_factor=2.5)[source]¶
- Helper function to read structures locally. Can be used within a firework or outside. - Parameters
- path (str) – absolute path to the directory where the structure files (e.g. xyz format) can be found. 
- cell_factor (float) – enlarges cell size to x times the diameter diameter of the structure 
 
- Returns
- list of ase.Atoms objects with a manipulated cellsize field. 
- Return type
- list 
 
- 
critcatworks.database.read.start_from_database(db_ids_lst, ext_db=None, spec={})[source]¶
- Sets up Firework to retrieve nanocluster structures from the simulation collection of the mongodb database. In atoms.info it looks for the keywords: E, energy, total_energy, TotalEnergy, totalenergy It stores the first value found in the field output.total_energy - The structures are stored in individual documents of the simulation collection. 
- 
critcatworks.database.read.start_from_structures(ase_atoms_lst, spec={})[source]¶
- Sets up Firework to read nanocluster structures from ASE atoms objects. The structures are copied to new individual documents of the simulation collection. References to the current workflow, the parent nanocluster and the source are updated. - Parameters
- ase_atoms_lst (str) – list of ASE atoms objects in dictionary format 
- spec (dict) – optional additional entries for the fw_spec 
 
- Returns
- Firework action, update fw_spec 
- Return type
- FWAction 
 
critcatworks.database.update module¶
- 
class critcatworks.database.update.InitialTask(*args, **kwargs)[source]¶
- Bases: - fireworks.core.firework.FiretaskBase- Custom Firetask to initialize a new workflow instance in the database. Additionally, initializes a few entries in the fw_spec. - 
optional_params= ['extdb_connect']¶
 - 
required_params= ['username', 'password', 'parameters', 'name', 'workflow_type']¶
 - 
run_task(fw_spec)[source]¶
- This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction. - Parameters
- fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources. 
- Returns
- (FWAction) 
 
 
- 
- 
critcatworks.database.update.initialize_workflow_data(username, password, parameters, name='UNNAMED', workflow_type='UNNAMED', extdb_connect={})[source]¶
- Creates a custom Firework object to initialize the workflow. It updates the workflow collection and makes a few entries in the fw_spec. - Parameters
- username (str) – username for the mongodb database 
- password (str) – password for the mongodb database 
- parameters (dict) – workflow-specific input parameters 
- name (str) – custom name of the workflow 
- workflow_type (str) – custom workflow type 
- extdb_connect (dict) – dictionary optionally containing the keys host, authsource and db_name. All fields have a default value. 
 
- Returns
- InitialWork 
- Return type
- Firework object