For Future Developers

Become a developer if you want to add more functionality to critcatworks. Raising an issue on github would be the first step. Do not hesitate to contact us if you have any questions.

The goal of critcatworks is to automate nanocluster-surface related research. In critcatworks belongs everything which joins already available building blocks into a complex workflow.

If you want to improve the workflow manager side, Fireworks is the dependency to work on.

If you want to instead create a nanocluster tool, it most likely belongs in cluskit (unless very simple). Since cluskit is also developed in this group, a concerted effort to make that tool available both in cluskit and in critcatworks can be tackled (contact us in that case). Make sure that the functionality is not already in cluskit!

How to Write Custom Firetasks

Writing your custom Firetask is easy. You just need to wrap your function in a class with some decorations beforehand and afterwards. Before you start implementing your first Firetask, make sure to have a basic knowledge about Fireworks.

from fireworks import explicit_serialize, FiretaskBase, FWAction

@explicit_serialize
class MyCustomTask(FiretaskBase):
    """
    Custom Firetask template.

    Args:
        required_parameter1 (any):  you can read any parameters
                                    during creationg of this task

        required_parameter2 (any):  lists, dictionaries, arrays, etc. are all fine,
                                    but no pure python objects

        optional_parameter1 (any):  Remember to add them to the list below
    Returns:
        FWAction : Firework action, updates fw_spec
    """
    _fw_name = 'MyCustomTask'
    required_params = ['required_parameter1', 'required_parameter2']
    optional_params = ['optional_parameter1']

    def run_task(self, fw_spec):
        # those values cannot be modified during runtime of the workflow
        optional = self.get("optional_parameter1", "default_value")
        important_parameter = self["required_parameter1"]
        another_parameter = self["required_parameter2"]


        # you can also get information from the firework spec (this can be
        #modified during runtime of the workflow)
        analysis_ids = fw_spec.get("temp", {}).get("analysis_ids", [1, 2, 3])
        # analysis_ids becomes calc_ids and is stored later
        calc_ids = analysis_ids

        # run your custom code
        mycustom_dct = {1 :2, 3 : 4}


        # check where this file gets written
        with open('mycustomfile.txt', 'w') as outfile:
            json.dump(mycustom_dct, outfile)

        # fireworks
        # Store information for future jobs to fetch and/or to keep record
        fw_spec["calc_ids"] = calc_ids

        # important to remove those, otherwise they would
        # overwrite the next Firework's _category and name
        fw_spec.pop("_category")
        fw_spec.pop("name")

        # always return a FWAction object.
        # other arguments can deviate or defuse the workflow
        return FWAction(update_spec=update_spec)

Reading from and Writing to Permanent database

For interacting with an external database, consider using the functions in critcatworks.database.extdb.

The function get_external_database connects you to a database using extdb_connect.

Then, for instance fetch_simulations can get multiple simulations by id.

Lastly, update_simulations_collection uploads one simulation document to the database.

For other functionalities in critcatworks.database.extdb consult the code documentation.

Fireworks Spec Entries

The current workflows use the following fw_spec entries. It is recommended to adhere to the structure but is not prohibited in any way.

simulations (dict)

simulation collection entries for this workflow. Usually, simulations are not stored here, since large amounts of documents would slow the workflow manager down

workflow (dict)

relevant information about this workflow, entry for workflow collection

machine_learning (dict)

machine_learning instances of this workflow entries for machine_learning collection

n_calcs_started (int)

number of calculations which have already been started

extdb_connect (dict)

Connection information to permanent mongodb database containing the keys host, username, password, authsource and db_name.

temp (dict)
calc_paths (list of str)

paths to the dft calculations, sorted by adsorbate ids

calc_ids (list of int)

ids of simulations in permanent database

is_converged_list (list of int)

1 - converged, 0 - not converged calculation, same order as calc_paths

fps_ranking (list of int)

adsorbate ids ordered by FPS ranking

analysis_ids (list of int)

calculation ids which have been analysed and where analysis can be processed

calc_analysis_ids_dict (dict)

keys are calculation ids before DFT values are calculation ids which have been analysed

cohesive_energy_dct (dict)

for each chemical formula key, the value corresponds to a dict of simulation indices and cohesive energies (total energies if no atomic energies were given)

descmatrix (str)

path to numpy array. 2D-matrix descriptor, row representing datapoint

property (list of str)

property of interest to machine learning

last_machine_learning_id (int)

id of last machine learning step

reference_energy (float)

reference energy for the adsorbate. Can be the total energy of the isolated adsorbate molecule or a different reference point

free_energy_correction (float)

constant shift in free energy. This is relevant for the coverage ladder target energy range.

branch_dct (dict)

keys of parent simulations with values being lists of child simulations

direction (bool)

1 - adding adsorbate 0 - removing adsorbate

ne_dct (dict)

stores total energies of all calculations with respect to the number of adsorbates and their ids

n_adsorbates_root (int)

number of adsorbates of the root structure

n_adsorbates (int)

number of adsorbates of the current step

is_return (bool)

current state of the coverage ladder workflow. If True, the ladder search is on the way back to the root level

is_new_root (bool)

If True, the last simulation has resulted in a new root simulation

open_branches (list)

each element is a tuple containing parent simulation ids and direction

root_history (list)

ordered ids of root simulations during the course of the workflow, starting with the start_id

step_history (list)

each entry is a tuple of a list of calculation ids and a direction indicator

calc_parents (dict)

keys of simulation ids with values being parent simulation ids

start_id (int)

unique identifier of the simulation which is used to start the workflow