dscribe.descriptors.lmbtr module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.lmbtr.LMBTR(k2=None, k3=None, normalize_gaussians=True, normalization='none', flatten=True, species=None, periodic=False, sparse=False)[source]¶

Bases: dscribe.descriptors.mbtr.MBTR

Implementation of local – per chosen atom – kind of the Many-body tensor representation up to k=3.

Notice that the species of the central atom is not encoded in the output, but is instead represented by a chemical species X with atomic number 0. This allows LMBTR to be also used on general positions not corresponding to real atoms. The surrounding environment is encoded by the two- and three-body interactions with neighouring atoms. If there is a need to distinguish the central species, one can for example train a different model for each central species.

You can choose which terms to include by providing a dictionary in the k2 or k3 arguments. The k1 term is not used in the local version. This dictionary should contain information under three keys: “geometry”, “grid” and “weighting”. See the examples below for how to format these dictionaries.

You can use this descriptor for finite and periodic systems. When dealing with periodic systems or when using machine learning models that use the Euclidean norm to measure distance between vectors, it is advisable to use some form of normalization.

For the geometry functions the following choices are available:

\(k=2\):
- “distance”: Pairwise distance in angstroms.
- “inverse_distance”: Pairwise inverse distance in 1/angstrom.
\(k=3\):
- “angle”: Angle in degrees.
- “cosine”: Cosine of the angle.

For the weighting the following functions are available:

\(k=2\):
- “unity”: No weighting.
- “exp”: Weighting of the form \(e^{-sx}\)
\(k=3\):
- “unity”: No weighting.
- “exp”: Weighting of the form \(e^{-sx}\)

The exponential weighting is motivated by the exponential decay of screened Coulombic interactions in solids. In the exponential weighting the parameters threshold determines the value of the weighting function after which the rest of the terms will be ignored and the parameter scale corresponds to \(s\). The meaning of \(x\) changes for different terms as follows:

\(k=2\): \(x\) = Distance between A->B
\(k=3\): \(x\) = Distance from A->B->C->A.

In the grid setup min is the minimum value of the axis, max is the maximum value of the axis, sigma is the standard deviation of the gaussian broadening and n is the number of points sampled on the grid.

If flatten=False, a list of dense np.ndarrays for each k in ascending order is returned. These arrays are of dimension (n_elements x n_elements x n_grid_points), where the elements are sorted in ascending order by their atomic number.

If flatten=True, a sparse.COO is returned. This sparse matrix is of size (n_features,), where n_features is given by get_number_of_features(). This vector is ordered so that the different k-terms are ordered in ascending order, and within each k-term the distributions at each entry (i, j, h) of the tensor are ordered in an ascending order by (i * n_elements) + (j * n_elements) + (h * n_elements).

This implementation does not support the use of a non-identity correlation matrix.

Parameters

species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical speices as low as possible is preferable.
periodic (bool) – Set to true if you want the descriptor output to respect the periodicity of the atomic systems (see the pbc-parameter in the constructor of ase.Atoms).

k2 (dict) –

Dictionary containing the setup for the k=2 term. Contains setup for the used geometry function, discretization and weighting function. For example:

k2 = {
    "geometry": {"function": "inverse_distance"},
    "grid": {"min": 0.1, "max": 2, "sigma": 0.1, "n": 50},
    "weighting": {"function": "exp", "scale": 0.75, "threshold": 1e-2}
}

k3 (dict) –

Dictionary containing the setup for the k=3 term. Contains setup for the used geometry function, discretization and weighting function. For example:

k3 = {
    "geometry": {"function": "angle"},
    "grid": {"min": 0, "max": 180, "sigma": 5, "n": 50},
    "weighting" = {"function": "exp", "scale": 0.5, "threshold": 1e-3}
}

normalize_gaussians (bool) – Determines whether the gaussians are normalized to an area of 1. Defaults to True. If False, the normalization factor is dropped and the gaussians have the form. \(e^{-(x-\mu)^2/2\sigma^2}\)
normalization (str) –
Determines the method for normalizing the output. The available options are:
- ”none”: No normalization.
- ”l2_each”: Normalize the Euclidean length of each k-term individually to unity.
flatten (bool) – Whether the output should be flattened to a 1D array. If False, a dictionary of the different tensors is provided, containing the values under keys: “k1”, “k2”, and “k3”:
sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

create(system, positions=None, n_jobs=1, only_physical_cores=False, verbose=False)[source]¶

Return the LMBTR output for the given systems and given positions.

Parameters

system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.
positions (list) – Positions where to calculate LMBTR. Can be provided as cartesian positions or atomic indices. If no positions are defined, the LMBTR output will be created for all atoms in the system. When calculating LMBTR for multiple systems, provide the positions as a list for each system.
n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.
only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.
verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns

The LMBTR output for the given systems and positions. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of positions and systems and the second dimension is determined by the get_number_of_features()-function.

Return type

np.ndarray | scipy.sparse.csr_matrix

create_single(system, positions=None)[source]¶

Return the local many-body tensor representation for the given system and positions.

Parameters

system (ase.Atoms | System) – Input system.
positions (iterable) – Positions or atom index of points, from which local_mbtr is created. Can be a list of integer numbers or a list of xyz-coordinates. If integers provided, the atoms at that index are used as centers. If positions provided, new atoms are added at that position. If no positions are provided, all atoms in the system will be used as centers.

Returns

The local many-body tensor representations of given positions, for k terms, as an array. These are ordered as given in positions.

Return type

1D ndarray

get_location(species)[source]¶

Can be used to query the location of a species combination in the the flattened output.

Parameters

species (tuple) – A tuple containing a species combination as
symbols or atomic numbers. The central atom is marked as (chemical) –
"X". The tuple can be for example (species) –
"H") –

Returns

slice containing the location of the specified species combination. The location is given as a python slice-object, that can be directly used to target ranges in the output.

Return type

slice

Raises

ValueError – If the requested species combination is not in the
output or if invalid species defined. –

get_number_of_features()[source]¶

Used to inquire the final number of features that this descriptor will have.

The number of features for the LMBTR is calculated as follows:

For the pair term (k=2), only pairs where at least one of the atom is the central atom (in periodic systems the central atom may connect to itself) are considered. This means that there are only as many combinations as there are different elements to pair the central atom with (n_elem). This nmber of combinations is the multiplied by the discretization of the k=2 grid.

For the three-body term (k=3), only triplets where at least one of the atoms is the central atom (in periodic systems the central atom may connect to itself) and the k >= i (symmetry) are considered. This means that as k runs from 0 to n-1, where n is the number of elements, there are n + k combinations that fill this rule. This sum becomes: \(\sum_{k=0}^{n-1} n + k = n^2+(n-1)*n/2\). This number of combinations is the multiplied by the discretization of the k=3 grid.

Returns: Number of features for this descriptor.
Return type: int

property normalization¶

property species¶