.. _developer:

For Future Developers
=====================

Become a developer if you want to add more functionality to critcatworks. Raising an issue on github would be the first step. Do not hesitate to contact us if you have any questions.

The goal of critcatworks is to automate nanocluster-surface related research. In critcatworks belongs everything which joins already available building blocks into a complex workflow.

If you want to improve the workflow manager side, Fireworks is the dependency to work on.

If you want to instead create a nanocluster tool, it most likely belongs in cluskit (unless very simple). Since cluskit is also developed in this group, a concerted effort to make that tool available both in cluskit and in critcatworks can be tackled (contact us in that case). Make sure that the functionality is not already in cluskit!


How to Write Custom Firetasks
-----------------------------

Writing your custom Firetask is easy. You just need to wrap your function in a class with some decorations beforehand and afterwards. Before you start implementing your first Firetask, make sure to have a basic knowledge about Fireworks.

.. code-block:: python

    from fireworks import explicit_serialize, FiretaskBase, FWAction

    @explicit_serialize
    class MyCustomTask(FiretaskBase):
        """ 
        Custom Firetask template.

        Args:
            required_parameter1 (any):  you can read any parameters 
                                        during creationg of this task

            required_parameter2 (any):  lists, dictionaries, arrays, etc. are all fine, 
                                        but no pure python objects

            optional_parameter1 (any):  Remember to add them to the list below
        Returns:
            FWAction : Firework action, updates fw_spec
        """
        _fw_name = 'MyCustomTask'
        required_params = ['required_parameter1', 'required_parameter2']
        optional_params = ['optional_parameter1']

        def run_task(self, fw_spec):
            # those values cannot be modified during runtime of the workflow
            optional = self.get("optional_parameter1", "default_value")
            important_parameter = self["required_parameter1"]
            another_parameter = self["required_parameter2"]


            # you can also get information from the firework spec (this can be 
            #modified during runtime of the workflow)
            analysis_ids = fw_spec.get("temp", {}).get("analysis_ids", [1, 2, 3])
            # analysis_ids becomes calc_ids and is stored later
            calc_ids = analysis_ids
            
            # run your custom code
            mycustom_dct = {1 :2, 3 : 4}

            
            # check where this file gets written
            with open('mycustomfile.txt', 'w') as outfile:
                json.dump(mycustom_dct, outfile)

            # fireworks
            # Store information for future jobs to fetch and/or to keep record
            fw_spec["calc_ids"] = calc_ids

            # important to remove those, otherwise they would 
            # overwrite the next Firework's _category and name
            fw_spec.pop("_category")
            fw_spec.pop("name")

            # always return a FWAction object. 
            # other arguments can deviate or defuse the workflow
            return FWAction(update_spec=update_spec)


Reading from and Writing to Permanent database
----------------------------------------------

For interacting with an external database, consider using the functions in :code:`critcatworks.database.extdb`.

The function *get_external_database* connects you to a database using *extdb_connect*.

Then, for instance *fetch_simulations* can get multiple simulations by id.

Lastly, *update_simulations_collection* uploads one simulation document to the database.

For other functionalities in :code:`critcatworks.database.extdb` consult the code documentation.


Fireworks Spec Entries
-----------------------

The current workflows use the following *fw_spec* entries. It is recommended to adhere to the structure but is not prohibited in any way.

:simulations (dict): 
    simulation collection entries for this workflow.
    Usually, simulations are not stored here, since large amounts
    of documents would slow the workflow manager down

:workflow (dict): 
    relevant information about this workflow,
    entry for workflow collection

:machine_learning (dict): 
    machine_learning instances of this workflow
    entries for machine_learning collection

:n_calcs_started (int): 
    number of calculations which have already been started

:extdb_connect (dict):
    Connection information to permanent mongodb database containing the keys host, username, password, 
    authsource and db_name.

:temp (dict):

    calc_paths (list of str)
        paths to the dft calculations, sorted by adsorbate ids
    calc_ids (list of int) 
        ids of simulations in permanent database
    is_converged_list (list of int)
        1 - converged, 
        0 - not converged calculation, 
        same order as calc_paths
    fps_ranking (list of int)
        adsorbate ids ordered by FPS ranking
    analysis_ids (list of int)
        calculation ids which have been analysed and where analysis can be processed

    calc_analysis_ids_dict (dict)
        keys are calculation ids before DFT
        values are calculation ids which have been analysed
    cohesive_energy_dct (dict)
        for each chemical formula key, the value corresponds
        to a dict of simulation indices and cohesive energies
        (total energies if no atomic energies were given)

    descmatrix (str)
        path to numpy array. 2D-matrix descriptor, row representing datapoint
    property (list of str)
        property of interest to machine learning
    last_machine_learning_id (int)
        id of last machine learning step

    reference_energy (float) 
        reference energy for the adsorbate. Can be the total energy of the isolated adsorbate molecule or a different reference point

    free_energy_correction (float) 
        constant shift in free
        energy. This is relevant for the coverage ladder
        target energy range.

    branch_dct (dict)
        keys of parent simulations
        with values being lists of child simulations
    direction (bool)
        1 - adding adsorbate
        0 - removing adsorbate

    ne_dct (dict) 
        stores total energies of all calculations with respect to the number of adsorbates and their ids
    n_adsorbates_root (int) 
        number of adsorbates of the root structure
    n_adsorbates (int) 
        number of adsorbates of the current step
    
    is_return (bool)
        current state of the coverage ladder workflow. If True, 
        the ladder search is on the way back to the root level
    is_new_root (bool)
        If True, the last simulation has resulted in a new
        root simulation

    open_branches (list)
        each element is a tuple containing parent simulation ids and direction 

    root_history (list)
        ordered ids of root simulations during the course of
        the workflow, starting with the start_id

    step_history (list)
        each entry is a tuple of
        a list of calculation ids
        and a direction indicator

    calc_parents (dict)
        keys of simulation ids
        with values being parent simulation ids

    start_id (int)
        unique identifier of the simulation which is used to start the workflow