dscribe.descriptors.soap module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.soap.SOAP(rcut=None, nmax=None, lmax=None, sigma=1.0, rbf='gto', weighting=None, crossover=True, average='off', species=None, periodic=False, sparse=False, dtype='float64')[source]¶

Bases: dscribe.descriptors.descriptor.Descriptor

Class for generating a partial power spectrum from Smooth Overlap of Atomic Orbitals (SOAP). This implementation uses real (tesseral) spherical harmonics as the angular basis set and provides two orthonormalized alternatives for the radial basis functions: spherical primitive gaussian type orbitals (“gto”) or the polynomial basis set (“polynomial”).

For reference, see:

“On representing chemical environments, Albert P. Bartók, Risi Kondor, and Gábor Csányi, Phys. Rev. B 87, 184115, (2013), https://doi.org/10.1103/PhysRevB.87.184115

“Comparing molecules and solids across structural and alchemical space”, Sandip De, Albert P. Bartók, Gábor Csányi and Michele Ceriotti, Phys. Chem. Chem. Phys. 18, 13754 (2016), https://doi.org/10.1039/c6cp00415f

“Machine learning hydrogen adsorption on nanoclusters through structural descriptors”, Marc O. J. Jäger, Eiaki V. Morooka, Filippo Federici Canova, Lauri Himanen & Adam S. Foster, npj Comput. Mater., 4, 37 (2018), https://doi.org/10.1038/s41524-018-0096-5

Parameters

rcut (float) – A cutoff for local region in angstroms. Should be bigger than 1 angstrom for the gto-basis.
nmax (int) – The number of radial basis functions.
lmax (int) – The maximum degree of spherical harmonics.
sigma (float) – The standard deviation of the gaussians used to expand the atomic density.
rbf (str) –
The radial basis functions to use. The available options are:
- "gto": Spherical gaussian type orbitals defined as \(g_{nl}(r) = \sum_{n'=1}^{n_\mathrm{max}}\,\beta_{nn'l} r^l e^{-\alpha_{n'l}r^2}\)
- "polynomial": Polynomial basis defined as \(g_{n}(r) = \sum_{n'=1}^{n_\mathrm{max}}\,\beta_{nn'} (r-r_\mathrm{cut})^{n'+2}\)
weighting (dict) –
Contains the options which control the weighting of the atomic density. Leave unspecified if you do not wish to apply any weighting. The dictionary may contain the following entries:
- "function": The weighting function to use. The following are currently supported:
  - "poly": \(w(r) = \left\{ \begin{array}{ll} c(1 + 2 (\frac{r}{r_0})^{3} -3 (\frac{r}{r_0})^{2}))^{m}, \ \text{for}\ r \leq r_0\\ 0, \ \text{for}\ r > r_0 \end{array}\right.\)
    
    This function goes exactly to zero at \(r=r_0\). If you do not explicitly provide rcut in the constructor, rcut is automatically set to r0. You can provide the parameters c, m and r0 as additional dictionary items.
    
    For reference see:
    ”Caro, M. (2019). Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials. Phys. Rev. B, 100, 024112.”
  - "pow": \(w(r) = \frac{c}{d + (\frac{r}{r_0})^{m}}\)
    
    If you do not explicitly provide rcut in the constructor, rcut will be set as the value at which this function decays to the value given by the threshold entry in the weighting dictionary (defaults to 1e-2), You can provide the parameters c, d, m, r0 and threshold as additional dictionary items.
    
    For reference see:
    ”Willatt, M., Musil, F., & Ceriotti, M. (2018). Feature optimization for atomistic machine learning yields a data-driven construction of the periodic table of the elements. Phys. Chem. Chem. Phys., 20, 29661-29668. “
  - "exp": \(w(r) = \frac{c}{d + e^{-r/r_0}}\)
    
    If you do not explicitly provide rcut in the constructor, rcut will be set as the value at which this function decays to the value given by the threshold entry in the weighting dictionary (defaults to 1e-2), You can provide the parameters c, d, r0 and threshold as additional dictionary items.
- "w0": Optional weight for atoms that are directly on top of a requested center. Setting this value to zero essentially hides the central atoms from the output. If a weighting function is also specified, this constant will override it for the central atoms.
crossover (bool) – Determines if crossover of atomic types should be included in the power spectrum. If enabled, the power spectrum is calculated over all unique species combinations Z and Z’. If disabled, the power spectrum does not contain cross-species information and is only run over each unique species Z. Turned on by default to correspond to the original definition
average (str) –
The averaging mode over the centers of interest. Valid options are:
- "off": No averaging.
- "inner": Averaging over sites before summing up the magnetic quantum numbers: \(p_{nn'l}^{Z_1,Z_2} \sim \sum_m (\frac{1}{n} \sum_i c_{nlm}^{i, Z_1})^{*} (\frac{1}{n} \sum_i c_{n'lm}^{i, Z_2})\)
- "outer": Averaging over the power spectrum of different sites: \(p_{nn'l}^{Z_1,Z_2} \sim \frac{1}{n} \sum_i \sum_m (c_{nlm}^{i, Z_1})^{*} (c_{n'lm}^{i, Z_2})\)
species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical species as low as possible is preferable.
periodic (bool) – Set to True if you want the descriptor output to respect the periodicity of the atomic systems (see the pbc-parameter in the constructor of ase.Atoms).
sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.
dtype (str) –
The data type of the output. Valid options are:
- "float32": Single precision floating point numbers.
- "float64": Double precision floating point numbers.

create(system, positions=None, n_jobs=1, only_physical_cores=False, verbose=False)[source]¶

Return the SOAP output for the given systems and given positions.

Parameters

system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.
positions (list) – Positions where to calculate SOAP. Can be provided as cartesian positions or atomic indices. If no positions are defined, the SOAP output will be created for all atoms in the system. When calculating SOAP for multiple systems, provide the positions as a list for each system.
n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.
only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.
verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns

The SOAP output for the given systems and positions. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of positions and systems and the second dimension is determined by the get_number_of_features()-function. When multiple systems are provided the results are ordered by the input order of systems and their positions.

Return type

np.ndarray | sparse.COO

create_single(system, positions=None)[source]¶

Return the SOAP output for the given system and given positions.

Parameters

system (ase.Atoms | System) – Input system.
positions (list) – Cartesian positions or atomic indices. If specified, the SOAP spectrum will be created for these points. If no positions are defined, the SOAP output will be created for all atoms in the system.

Returns

The SOAP output for the given system and positions. The return type depends on the ‘sparse’-attribute. The first dimension is given by the number of positions and the second dimension is determined by the get_number_of_features()-function.

Return type

np.ndarray | sparse.COO

derivatives(system, positions=None, include=None, exclude=None, method='auto', return_descriptor=True, attach=False, n_jobs=1, only_physical_cores=False, verbose=False)[source]¶

Return the descriptor derivatives for the given systems and given positions.

Parameters

system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.
positions (list) – Positions where to calculate the descriptor. Can be provided as cartesian positions or atomic indices. Also see the “attach”-argument that controls the interperation of locations given as atomic indices. If no positions are defined, the descriptor output will be created for all atoms in the system. When calculating descriptor for multiple systems, provide the positions as a list for each system.
include (list) – Indices of atoms to compute the derivatives on. When calculating descriptor for multiple systems, provide either a one-dimensional list that if applied to all systems or a two-dimensional list of indices. Cannot be provided together with ‘exclude’.
exclude (list) – Indices of atoms not to compute the derivatives on. When calculating descriptor for multiple systems, provide either a one-dimensional list that if applied to all systems or a two-dimensional list of indices. Cannot be provided together with ‘include’.
method (str) – The method for calculating the derivatives. Provide either ‘numerical’, ‘analytical’ or ‘auto’. If using ‘auto’, the most efficient available method is automatically chosen.
attach (bool) – Controls the behaviour of positions defined as atomic indices. If True, the positions tied to an atomic index will move together with the atoms with respect to which the derivatives are calculated against. If False, positions defined as atomic indices will be converted into cartesian locations that are completely independent of the atom location during derivative calculation.
return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. Notice that it typically is faster to calculate both in one go.
n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the number of jobs will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.
only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.
verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a either a 4D or 5D array, depending on whether you have provided a single or multiple systems. If the output shape for each system is the same, a single monolithic numpy array is returned. For variable sized output (e.g. differently sized systems, different number of centers or different number of included atoms), a regular python list is returned. The dimensions are: [(n_systems,) n_positions, n_atoms, 3, n_features]. The first dimension goes over the different systems in case multiple were given. The second dimension goes over the descriptor centers in the same order as they were given in the argument. The third dimension goes over the included atoms. The order is same as the order of atoms in the given system. The fourth dimension goes over the cartesian components, x, y and z. The fifth dimension goes over the features in the default order.

derivatives_single(system, positions, indices, method='numerical', attach=False, return_descriptor=True)[source]¶

Return the SOAP output for the given system and given positions.

Parameters

system (ase.Atoms) – Atomic structure.
positions (list) – Positions where to calculate SOAP. Can be provided as cartesian positions or atomic indices. If no positions are defined, the SOAP output will be created for all atoms in the system. When calculating SOAP for multiple systems, provide the positions as a list for each system.
indices (list) – Indices of atoms for which the derivatives will be computed for.
method (str) – ‘numerical’ or ‘analytical’ derivatives. Numerical derivatives are implemented with central finite difference. If not specified, analytical derivatives are used when available.
attach (bool) – Controls the behaviour of positions defined as atomic indices. If True, the positions tied to an atomic index will move together with the atoms with respect to which the derivatives are calculated against. If False, positions defined as atomic indices will be converted into cartesian locations that are completely independent of the atom location during derivative calculation.
return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. This is true by default as it typically is faster to calculate both in one go.

Returns

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a 4D numpy array. The dimensions are: [n_positions, n_atoms, 3, n_features]. The first dimension goes over the SOAP centers in the same order as they were given in the argument. The second dimension goes over the included atoms. The order is same as the order of atoms in the given system. The third dimension goes over the cartesian components, x, y and z. The last dimension goes over the features in the default order.

get_basis_gto(rcut, nmax, lmax)[source]¶

Used to calculate the alpha and beta prefactors for the gto-radial basis.

Parameters

rcut (float) – Radial cutoff.
nmax (int) – Number of gto radial bases.

Returns

The alpha and beta prefactors for all bases up to a fixed size of l=10.

Return type

(np.ndarray, np.ndarray)

get_basis_poly(rcut, nmax)[source]¶

Used to calculate discrete vectors for the polynomial basis functions.

Parameters

rcut (float) – Radial cutoff.
nmax (int) – Number of polynomial radial bases.

Returns

Tuple containing the evaluation points in radial direction as the first item, and the corresponding orthonormalized polynomial radial basis set as the second item.

Return type

(np.ndarray, np.ndarray)

get_cutoff_padding()[source]¶: The radial cutoff is extended by adding a padding that depends on the used used sigma value. The padding is chosen so that the gaussians decay to the specified threshold value at the cutoff distance.

get_location(species)[source]¶

Can be used to query the location of a species combination in the the flattened output.

Parameters: species (tuple) – A tuple containing a pair of species as chemical symbols or atomic numbers. The tuple can be for example (“H”, “O”).
Returns: slice containing the location of the specified species combination. The location is given as a python slice-object, that can be directly used to target ranges in the output.
Return type: slice
Raises: ValueError – If the requested species combination is not in the output or if invalid species defined.

get_number_of_features()[source]¶

Used to inquire the final number of features that this descriptor will have.

Returns: Number of features for this descriptor.
Return type: int

init_derivatives_array(n_centers, n_atoms, n_features)[source]¶: Return a zero-initialized numpy array for the derivatives.

init_descriptor_array(n_centers, n_features)[source]¶: Return a zero-initialized numpy array for the descriptor.

init_internal_array(n_centers, n_types, n, lMax)[source]¶

init_internal_dev_array(n_centers, n_atoms, n_types, n, lMax)[source]¶

prepare_centers(system, cutoff_padding, positions=None)[source]¶: Validates and prepares the centers for the C++ extension.

property species¶