critcatworks.ml package

Submodules

critcatworks.ml.convergence module

class critcatworks.ml.convergence.CheckConvergenceTask(*args, **kwargs)[source]

Bases: fireworks.core.firework.FiretaskBase

Task to check the convergence of the database. If not converged, the workflow continues.

Parameters
  • threshold (float) – If the convergence_criterion (default MAE of property) is below the given threshold, the workflow is defused early

  • convergence_criterion (str) – Type of machine learning criterion, based on which to stop the workflow. Defaults to mae (MAE)

Returns

Firework action, updates fw_spec, possible defuses the workflow

Return type

FWAction

optional_params = []
required_params = ['threshold', 'convergence_criterion']
run_task(fw_spec)[source]

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters

fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.

Returns

(FWAction)

critcatworks.ml.convergence.check_convergence(threshold, convergence_criterion='mae')[source]

Checks the convergence of the database. If not converged, the workflow continues.

Parameters
  • threshold (float) – If the convergence_criterion (default MAE of property) is below the given threshold, the workflow is defused early

  • convergence_criterion (str) – Type of machine learning criterion, based on which to stop the workflow. Defaults to mae (MAE)

Returns

Firework CheckConvergenceWork

Return type

Firework

critcatworks.ml.krr module

class critcatworks.ml.krr.MLTask(*args, **kwargs)[source]

Bases: fireworks.core.firework.FiretaskBase

Machine Learning Task. It predicts the property of all uncomputed structures in the workflow. It is trained on all converged structures. Crossvalidation is used to infer the optimal machine learning hyperparameters. Currently, only KRR (kernel ridge regression) is implemented.

A new document is added to the machine_learning collection.

Parameters

target_path (str) – absolute path to the target directory (needs to exist) on the computing resource.

Returns

Firework action, updates fw_spec, possibly defuses

workflow upon failure.

Return type

FWAction

optional_params = []
required_params = ['target_path']
run_task(fw_spec)[source]

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters

fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.

Returns

(FWAction)

critcatworks.ml.krr.get_mae(target_path)[source]

Creates Firework from MLTask. It predicts the property of all uncomputed structures in the workflow. It is trained on all converged structures. Crossvalidation is used to infer the optimal machine learning hyperparameters. Currently, only KRR (kernel ridge regression) is implemented.

A new document is added to the machine_learning collection.

Parameters

target_path (str) – absolute path to the target directory (needs to exist) on the computing resource.

Returns

Firework action, updates fw_spec, possibly defuses

workflow upon failure.

Return type

FWAction

critcatworks.ml.krr.ml_krr(features, labels, train_test_ids, to_predict_features, to_predict_ids, alpha_list=array([1.e-01, 1.e-02, 1.e-03, 1.e-04, 1.e-05, 1.e-06, 1.e-07, 1.e-08, 1.e-09]), gamma_list=array([1.e-01, 1.e-02, 1.e-03, 1.e-04, 1.e-05, 1.e-06, 1.e-07, 1.e-08, 1.e-09]), kernel_list=['rbf'], sample_size=0.8, is_scaled=False, n_cv=5, path='.')[source]

Helper function to estimate the generalization error (MAE, MSE). The hyperparameters alpha and gamma are by default scanned on a logarithmic scale. The data set is split randomly into training and test set. The ratio of the split is defined by sample_size. The training set is used for cross validation.

Parameters
  • features (2D ndarray) – descriptor input for the machine learning algorithm for training/testing

  • labels (1D ndarray) – property labels for the machine learning algorithm for training/testing

  • train_test_ids (1D ndarray) – pythonic ids (of features and labels) for training and testing.

  • to_predict_features (1D ndarray) – descriptor input for the machine learning algorithm for prediction

  • to_predict_ids (1D ndarray) – pythonic ids (of features and labels) ommited from training and testing.

  • alpha_list (lsit) – Regularization parameter. Defaults to np.logspace(-1, -9, 9)

  • gamma_list (list) – Kernel function scaling parameter. Defaults to np.logspace(-1, -9, 9)

  • kernel_list (list) – List of kernel functions (see sklearn documentation for options). Defaults to [‘rbf’]

  • sample_size (float) – The ratio of the training-test split is defined by this. Defaults to 0.8

  • is_scaled (bool) – If set to True, the features are scaled. Defaults to False

  • n_cv (int) – Number of cross-validation splits. Defaults to 5

  • path (str) – path whereto to write the machine learning output. Defaults to the current working directory

Returns

machine learning results with the following keys:

ids_train, ids_test, ids_predicted, method_params, output (.label_predicted, .label_train, .label_test), metrics_test, metrics_validation, metrics_training

Return type

dict

critcatworks.ml.krr.predict_and_error(learner, x_test, x_train, y_test)[source]

Helper function to predict the property on a training and a test set.

Parameters
  • learner (sklearn.learner) – learner object with which the training set was fitted

  • x_test (2D ndarray) – test data scaled accordingly

  • x_train (2D ndarray) – training data which is scaled

  • y_test (1D ndarray) – labels of the test data

Returns

mae, mse, y_pred, train_y_pred, learner

Return type

tuple

critcatworks.ml.krr.scale_data(x_train, x_test, is_mean=True)[source]

Helper function to the scale the data with respect to the mean.

Parameters
  • x_train (2D ndarray) – training data which is scaled

  • x_test (2D ndarray) – test data scaled accordingly

  • is_mean (bool) – if set to False, scaled between 0 and 1. Otherwise, scaled centered around the mean of x_train.

Returns

the scaled arrays x_train, x_test

Return type

tuple

critcatworks.ml.krr.split_scale_data(x_data, y_data, ids_data, sample_size, is_scaled)[source]

Helper function to split and scale the data.

Parameters
  • x_data (2D ndarray) – features of the training and test data

  • y_data (1D ndarray) – labels of the training and test data

  • ids_data (1D ndarray) – complete list of ids of datapoints used for training and testing

  • sample_size (float) – The ratio of the training-test split is defined by this

  • is_scaled (bool) – True scales the features centered around the mean

Returns

ndarrays x_train, x_test (split from x_data)

y_train, y_test (split from y_data) ids_train, ids_test (split from ids_data)

Return type

tuple

critcatworks.ml.krr.write_output(learner, sample_size, ml_method, mae, mse, runtype, ids_test, y_test, y_pred, ids_train, y_train, train_y_pred, to_predict_ids, y_to_predict, path)[source]

Helper function to write the machine learning output.

Module contents