dscribe.descriptors.mbtr module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.mbtr.MBTR(k1=None, k2=None, k3=None, normalize_gaussians=True, normalization='none', flatten=True, species=None, periodic=False, sparse=False)[source]¶

Bases: dscribe.descriptors.descriptor.Descriptor

Implementation of the Many-body tensor representation up to \(k=3\).

You can choose which terms to include by providing a dictionary in the k1, k2 or k3 arguments. This dictionary should contain information under three keys: “geometry”, “grid” and “weighting”. See the examples below for how to format these dictionaries.

You can use this descriptor for finite and periodic systems. When dealing with periodic systems or when using machine learning models that use the Euclidean norm to measure distance between vectors, it is advisable to use some form of normalization.

For the geometry functions the following choices are available:

\(k=1\):
- “atomic_number”: The atomic numbers.
\(k=2\):
- “distance”: Pairwise distance in angstroms.
- “inverse_distance”: Pairwise inverse distance in 1/angstrom.
\(k=3\):
- “angle”: Angle in degrees.
- “cosine”: Cosine of the angle.

For the weighting the following functions are available:

\(k=1\):
- “unity”: No weighting.
\(k=2\):
- “unity”: No weighting.
- “exp”: Weighting of the form \(e^{-sx}\)
\(k=3\):
- “unity”: No weighting.
- “exp”: Weighting of the form \(e^{-sx}\)

The exponential weighting is motivated by the exponential decay of screened Coulombic interactions in solids. In the exponential weighting the parameters threshold determines the value of the weighting function after which the rest of the terms will be ignored and the parameter scale corresponds to \(s\). The meaning of \(x\) changes for different terms as follows:

\(k=2\): \(x\) = Distance between A->B
\(k=3\): \(x\) = Distance from A->B->C->A.

In the grid setup min is the minimum value of the axis, max is the maximum value of the axis, sigma is the standard deviation of the gaussian broadening and n is the number of points sampled on the grid.

If flatten=False, a list of dense np.ndarrays for each k in ascending order is returned. These arrays are of dimension (n_elements x n_elements x n_grid_points), where the elements are sorted in ascending order by their atomic number.

If flatten=True, a sparse.COO sparse matrix is returned. This sparse matrix is of size (n_features,), where n_features is given by get_number_of_features(). This vector is ordered so that the different k-terms are ordered in ascending order, and within each k-term the distributions at each entry (i, j, h) of the tensor are ordered in an ascending order by (i * n_elements) + (j * n_elements) + (h * n_elements).

This implementation does not support the use of a non-identity correlation matrix.

Parameters

k1 (dict) –

Setup for the k=1 term. For example:

k1 = {
    "geometry": {"function": "atomic_number"},
    "grid": {"min": 1, "max": 10, "sigma": 0.1, "n": 50}
}

k2 (dict) –

Dictionary containing the setup for the k=2 term. Contains setup for the used geometry function, discretization and weighting function. For example:

k2 = {
    "geometry": {"function": "inverse_distance"},
    "grid": {"min": 0.1, "max": 2, "sigma": 0.1, "n": 50},
    "weighting": {"function": "exp", "scale": 0.75, "threshold": 1e-2}
}

k3 (dict) –

Dictionary containing the setup for the k=3 term. Contains setup for the used geometry function, discretization and weighting function. For example:

k3 = {
    "geometry": {"function": "angle"},
    "grid": {"min": 0, "max": 180, "sigma": 5, "n": 50},
    "weighting" : {"function": "exp", "scale": 0.5, "threshold": 1e-3}
}

normalize_gaussians (bool) – Determines whether the gaussians are normalized to an area of 1. Defaults to True. If False, the normalization factor is dropped and the gaussians have the form. \(e^{-(x-\mu)^2/2\sigma^2}\)
normalization (str) –
Determines the method for normalizing the output. The available options are:
- ”none”: No normalization.
- ”l2_each”: Normalize the Euclidean length of each k-term individually to unity.
- ”n_atoms”: Normalize the output by dividing it with the number of atoms in the system. If the system is periodic, the number of atoms is determined from the given unit cell.
flatten (bool) – Whether the output should be flattened to a 1D array. If False, a dictionary of the different tensors is provided, containing the values under keys: “k1”, “k2”, and “k3”:
species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical speices as low as possible is preferable.
periodic (bool) – Set to true if you want the descriptor output to respect the periodicity of the atomic systems (see the pbc-parameter in the constructor of ase.Atoms).
sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

check_grid(grid)[source]¶

Used to ensure that the given grid settings are valid.

Parameters: grid (dict) – Dictionary containing the grid setup.

create(system, n_jobs=1, only_physical_cores=False, verbose=False)[source]¶

Return MBTR output for the given systems.

Parameters

system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.
n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.
only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.
verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns

MBTR for the given systems. The return type depends on the ‘sparse’ and ‘flatten’-attributes. For flattened output a single numpy array or sparse.COO matrix is returned. If the output is not flattened, dictionaries containing the MBTR tensors for each k-term are returned.

Return type

np.ndarray | sparse.COO | list

create_single(system)[source]¶

Return the many-body tensor representation for the given system.

Parameters: system (ase.Atoms | System) – Input system.
Returns: The return type is specified by the ‘flatten’ and ‘sparse’-parameters. If the output is not flattened, a dictionary containing of MBTR outputs as numpy arrays is created. Each output is under a “kX” key. If the output is flattened, a single concatenated output vector is returned, either as a sparse or a dense vector.
Return type: dict | np.ndarray | sparse.COO

get_k1_axis()[source]¶

Used to get the discretized axis for geometry function of the k=1 term.

Returns: The discretized axis for the k=1 term.
Return type: np.ndarray

get_k2_axis()[source]¶

Used to get the discretized axis for geometry function of the k=2 term.

Returns: The discretized axis for the k=2 term.
Return type: np.ndarray

get_k3_axis()[source]¶

Used to get the discretized axis for geometry function of the k=3 term.

Returns: The discretized axis for the k=3 term.
Return type: np.ndarray

get_location(species)[source]¶

Can be used to query the location of a species combination in the the flattened output.

Parameters

species (tuple) – A tuple containing a species combination as
symbols or atomic numbers. The tuple can be for example (chemical) –
("H") ("H", "O") or ("H", "O", "H") –

:param : :type : “H”, “O”) or (“H”, “O”, “H”

Returns: slice containing the location of the specified species combination. The location is given as a python slice-object, that can be directly used to target ranges in the output.
Return type: slice
Raises: ValueError – If the requested species combination is not in the output or if invalid species defined.

get_number_of_features()[source]¶

Used to inquire the final number of features that this descriptor will have.

Returns: Number of features for this descriptor.
Return type: int

property k1¶

property k2¶

property k3¶

property normalization¶

property species¶