critcatworks.ml package¶

Submodules¶

critcatworks.ml.convergence module¶

class critcatworks.ml.convergence.CheckConvergenceTask(*args, **kwargs)[source]¶

Bases: fireworks.core.firework.FiretaskBase

Task to check the convergence of the database. If not converged, the workflow continues.

Parameters

threshold (float) – If the convergence_criterion (default MAE of property) is below the given threshold, the workflow is defused early
convergence_criterion (str) – Type of machine learning criterion, based on which to stop the workflow. Defaults to mae (MAE)

Returns

Firework action, updates fw_spec, possible defuses the workflow

Return type

FWAction

optional_params = []¶

required_params = ['threshold', 'convergence_criterion']¶

run_task(fw_spec)[source]¶

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters: fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
Returns: (FWAction)

critcatworks.ml.convergence.check_convergence(threshold, convergence_criterion='mae')[source]¶

Checks the convergence of the database. If not converged, the workflow continues.

Parameters

threshold (float) – If the convergence_criterion (default MAE of property) is below the given threshold, the workflow is defused early
convergence_criterion (str) – Type of machine learning criterion, based on which to stop the workflow. Defaults to mae (MAE)

Returns

Firework CheckConvergenceWork

Return type

Firework

critcatworks.ml.krr module¶

class critcatworks.ml.krr.MLTask(*args, **kwargs)[source]¶

Bases: fireworks.core.firework.FiretaskBase

Machine Learning Task. It predicts the property of all uncomputed structures in the workflow. It is trained on all converged structures. Crossvalidation is used to infer the optimal machine learning hyperparameters. Currently, only KRR (kernel ridge regression) is implemented.

A new document is added to the machine_learning collection.

Parameters

target_path (str) – absolute path to the target directory (needs to exist) on the computing resource.

Returns

Firework action, updates fw_spec, possibly defuses: workflow upon failure.

Return type

FWAction

optional_params = []¶

required_params = ['target_path']¶

run_task(fw_spec)[source]¶

This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.

Parameters: fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
Returns: (FWAction)

critcatworks.ml.krr.get_mae(target_path)[source]¶

Creates Firework from MLTask. It predicts the property of all uncomputed structures in the workflow. It is trained on all converged structures. Crossvalidation is used to infer the optimal machine learning hyperparameters. Currently, only KRR (kernel ridge regression) is implemented.

A new document is added to the machine_learning collection.

Parameters

target_path (str) – absolute path to the target directory (needs to exist) on the computing resource.

Returns

Firework action, updates fw_spec, possibly defuses: workflow upon failure.

Return type

FWAction

critcatworks.ml.krr.ml_krr(features, labels, train_test_ids, to_predict_features, to_predict_ids, alpha_list=array([1.e-01, 1.e-02, 1.e-03, 1.e-04, 1.e-05, 1.e-06, 1.e-07, 1.e-08, 1.e-09]), gamma_list=array([1.e-01, 1.e-02, 1.e-03, 1.e-04, 1.e-05, 1.e-06, 1.e-07, 1.e-08, 1.e-09]), kernel_list=['rbf'], sample_size=0.8, is_scaled=False, n_cv=5, path='.')[source]¶

Helper function to estimate the generalization error (MAE, MSE). The hyperparameters alpha and gamma are by default scanned on a logarithmic scale. The data set is split randomly into training and test set. The ratio of the split is defined by sample_size. The training set is used for cross validation.

Parameters

features (2D ndarray) – descriptor input for the machine learning algorithm for training/testing
labels (1D ndarray) – property labels for the machine learning algorithm for training/testing
train_test_ids (1D ndarray) – pythonic ids (of features and labels) for training and testing.
to_predict_features (1D ndarray) – descriptor input for the machine learning algorithm for prediction
to_predict_ids (1D ndarray) – pythonic ids (of features and labels) ommited from training and testing.
alpha_list (lsit) – Regularization parameter. Defaults to np.logspace(-1, -9, 9)
gamma_list (list) – Kernel function scaling parameter. Defaults to np.logspace(-1, -9, 9)
kernel_list (list) – List of kernel functions (see sklearn documentation for options). Defaults to [‘rbf’]
sample_size (float) – The ratio of the training-test split is defined by this. Defaults to 0.8
is_scaled (bool) – If set to True, the features are scaled. Defaults to False
n_cv (int) – Number of cross-validation splits. Defaults to 5
path (str) – path whereto to write the machine learning output. Defaults to the current working directory

Returns

machine learning results with the following keys:: ids_train, ids_test, ids_predicted, method_params, output (.label_predicted, .label_train, .label_test), metrics_test, metrics_validation, metrics_training

Return type

dict

critcatworks.ml.krr.predict_and_error(learner, x_test, x_train, y_test)[source]¶

Helper function to predict the property on a training and a test set.

Parameters

learner (sklearn.learner) – learner object with which the training set was fitted
x_test (2D ndarray) – test data scaled accordingly
x_train (2D ndarray) – training data which is scaled
y_test (1D ndarray) – labels of the test data

Returns

mae, mse, y_pred, train_y_pred, learner

Return type

tuple

critcatworks.ml.krr.scale_data(x_train, x_test, is_mean=True)[source]¶

Helper function to the scale the data with respect to the mean.

Parameters

x_train (2D ndarray) – training data which is scaled
x_test (2D ndarray) – test data scaled accordingly
is_mean (bool) – if set to False, scaled between 0 and 1. Otherwise, scaled centered around the mean of x_train.

Returns

the scaled arrays x_train, x_test

Return type

tuple

critcatworks.ml.krr.split_scale_data(x_data, y_data, ids_data, sample_size, is_scaled)[source]¶

Helper function to split and scale the data.

Parameters

x_data (2D ndarray) – features of the training and test data
y_data (1D ndarray) – labels of the training and test data
ids_data (1D ndarray) – complete list of ids of datapoints used for training and testing
sample_size (float) – The ratio of the training-test split is defined by this
is_scaled (bool) – True scales the features centered around the mean

Returns

ndarrays x_train, x_test (split from x_data): y_train, y_test (split from y_data) ids_train, ids_test (split from ids_data)

Return type

tuple

critcatworks.ml.krr.write_output(learner, sample_size, ml_method, mae, mse, runtype, ids_test, y_test, y_pred, ids_train, y_train, train_y_pred, to_predict_ids, y_to_predict, path)[source]¶: Helper function to write the machine learning output.

critcatworks.ml package¶

Submodules¶

critcatworks.ml.convergence module¶

critcatworks.ml.krr module¶

Module contents¶

Table of Contents

Previous topic

Next topic

This Page