critcatworks.ml package¶
Submodules¶
critcatworks.ml.convergence module¶
-
class
critcatworks.ml.convergence.
CheckConvergenceTask
(*args, **kwargs)[source]¶ Bases:
fireworks.core.firework.FiretaskBase
Task to check the convergence of the database. If not converged, the workflow continues.
- Parameters
threshold (float) – If the convergence_criterion (default MAE of property) is below the given threshold, the workflow is defused early
convergence_criterion (str) – Type of machine learning criterion, based on which to stop the workflow. Defaults to mae (MAE)
- Returns
Firework action, updates fw_spec, possible defuses the workflow
- Return type
FWAction
-
optional_params
= []¶
-
required_params
= ['threshold', 'convergence_criterion']¶
-
run_task
(fw_spec)[source]¶ This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.
- Parameters
fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
- Returns
(FWAction)
-
critcatworks.ml.convergence.
check_convergence
(threshold, convergence_criterion='mae')[source]¶ Checks the convergence of the database. If not converged, the workflow continues.
- Parameters
threshold (float) – If the convergence_criterion (default MAE of property) is below the given threshold, the workflow is defused early
convergence_criterion (str) – Type of machine learning criterion, based on which to stop the workflow. Defaults to mae (MAE)
- Returns
Firework CheckConvergenceWork
- Return type
Firework
critcatworks.ml.krr module¶
-
class
critcatworks.ml.krr.
MLTask
(*args, **kwargs)[source]¶ Bases:
fireworks.core.firework.FiretaskBase
Machine Learning Task. It predicts the property of all uncomputed structures in the workflow. It is trained on all converged structures. Crossvalidation is used to infer the optimal machine learning hyperparameters. Currently, only KRR (kernel ridge regression) is implemented.
A new document is added to the machine_learning collection.
- Parameters
target_path (str) – absolute path to the target directory (needs to exist) on the computing resource.
- Returns
- Firework action, updates fw_spec, possibly defuses
workflow upon failure.
- Return type
FWAction
-
optional_params
= []¶
-
required_params
= ['target_path']¶
-
run_task
(fw_spec)[source]¶ This method gets called when the Firetask is run. It can take in a Firework spec, perform some task using that data, and then return an output in the form of a FWAction.
- Parameters
fw_spec (dict) – A Firework spec. This comes from the master spec. In addition, this spec contains a special “_fw_env” key that contains the env settings of the FWorker calling this method. This provides for abstracting out certain commands or settings. For example, “foo” may be named “foo1” in resource 1 and “foo2” in resource 2. The FWorker env can specify { “foo”: “foo1”}, which maps an abstract variable “foo” to the relevant “foo1” or “foo2”. You can then write a task that uses fw_spec[“_fw_env”][“foo”] that will work across all these multiple resources.
- Returns
(FWAction)
-
critcatworks.ml.krr.
get_mae
(target_path)[source]¶ Creates Firework from MLTask. It predicts the property of all uncomputed structures in the workflow. It is trained on all converged structures. Crossvalidation is used to infer the optimal machine learning hyperparameters. Currently, only KRR (kernel ridge regression) is implemented.
A new document is added to the machine_learning collection.
- Parameters
target_path (str) – absolute path to the target directory (needs to exist) on the computing resource.
- Returns
- Firework action, updates fw_spec, possibly defuses
workflow upon failure.
- Return type
FWAction
-
critcatworks.ml.krr.
ml_krr
(features, labels, train_test_ids, to_predict_features, to_predict_ids, alpha_list=array([1.e-01, 1.e-02, 1.e-03, 1.e-04, 1.e-05, 1.e-06, 1.e-07, 1.e-08, 1.e-09]), gamma_list=array([1.e-01, 1.e-02, 1.e-03, 1.e-04, 1.e-05, 1.e-06, 1.e-07, 1.e-08, 1.e-09]), kernel_list=['rbf'], sample_size=0.8, is_scaled=False, n_cv=5, path='.')[source]¶ Helper function to estimate the generalization error (MAE, MSE). The hyperparameters alpha and gamma are by default scanned on a logarithmic scale. The data set is split randomly into training and test set. The ratio of the split is defined by sample_size. The training set is used for cross validation.
- Parameters
features (2D ndarray) – descriptor input for the machine learning algorithm for training/testing
labels (1D ndarray) – property labels for the machine learning algorithm for training/testing
train_test_ids (1D ndarray) – pythonic ids (of features and labels) for training and testing.
to_predict_features (1D ndarray) – descriptor input for the machine learning algorithm for prediction
to_predict_ids (1D ndarray) – pythonic ids (of features and labels) ommited from training and testing.
alpha_list (lsit) – Regularization parameter. Defaults to np.logspace(-1, -9, 9)
gamma_list (list) – Kernel function scaling parameter. Defaults to np.logspace(-1, -9, 9)
kernel_list (list) – List of kernel functions (see sklearn documentation for options). Defaults to [‘rbf’]
sample_size (float) – The ratio of the training-test split is defined by this. Defaults to 0.8
is_scaled (bool) – If set to True, the features are scaled. Defaults to False
n_cv (int) – Number of cross-validation splits. Defaults to 5
path (str) – path whereto to write the machine learning output. Defaults to the current working directory
- Returns
- machine learning results with the following keys:
ids_train, ids_test, ids_predicted, method_params, output (.label_predicted, .label_train, .label_test), metrics_test, metrics_validation, metrics_training
- Return type
dict
-
critcatworks.ml.krr.
predict_and_error
(learner, x_test, x_train, y_test)[source]¶ Helper function to predict the property on a training and a test set.
- Parameters
learner (sklearn.learner) – learner object with which the training set was fitted
x_test (2D ndarray) – test data scaled accordingly
x_train (2D ndarray) – training data which is scaled
y_test (1D ndarray) – labels of the test data
- Returns
mae, mse, y_pred, train_y_pred, learner
- Return type
tuple
-
critcatworks.ml.krr.
scale_data
(x_train, x_test, is_mean=True)[source]¶ Helper function to the scale the data with respect to the mean.
- Parameters
x_train (2D ndarray) – training data which is scaled
x_test (2D ndarray) – test data scaled accordingly
is_mean (bool) – if set to False, scaled between 0 and 1. Otherwise, scaled centered around the mean of x_train.
- Returns
the scaled arrays x_train, x_test
- Return type
tuple
-
critcatworks.ml.krr.
split_scale_data
(x_data, y_data, ids_data, sample_size, is_scaled)[source]¶ Helper function to split and scale the data.
- Parameters
x_data (2D ndarray) – features of the training and test data
y_data (1D ndarray) – labels of the training and test data
ids_data (1D ndarray) – complete list of ids of datapoints used for training and testing
sample_size (float) – The ratio of the training-test split is defined by this
is_scaled (bool) – True scales the features centered around the mean
- Returns
- ndarrays x_train, x_test (split from x_data)
y_train, y_test (split from y_data) ids_train, ids_test (split from ids_data)
- Return type
tuple