dscribe.descriptors package

Submodules

dscribe.descriptors.acsf module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.acsf.ACSF(r_cut, g2_params=None, g3_params=None, g4_params=None, g5_params=None, species=None, periodic=False, sparse=False, dtype='float64')[source]

Bases: DescriptorLocal

Implementation of Atom-Centered Symmetry Functions.

Notice that the species of the central atom is not encoded in the output, only the surrounding environment is encoded. In a typical application one can train a different model for each central species.

For reference, see:

“Atom-centered symmetry functions for constructing high-dimensional neural network potentials”, Jörg Behler, The Journal of Chemical Physics, 134, 074106 (2011), https://doi.org/10.1063/1.3553717

Parameters:
  • r_cut (float) – The smooth cutoff value in angstroms. This cutoff value is used throughout the calculations for all symmetry functions.

  • g2_params (n*2 np.ndarray) – A list of pairs of \(\eta\) and \(R_s\) parameters for \(G^2\) functions.

  • g3_params (n*1 np.ndarray) – A list of \(\kappa\) parameters for \(G^3\) functions.

  • g4_params (n*3 np.ndarray) – A list of triplets of \(\eta\), \(\zeta\) and \(\lambda\) parameters for \(G^4\) functions.

  • g5_params (n*3 np.ndarray) – A list of triplets of \(\eta\), \(\zeta\) and \(\lambda\) parameters for \(G^5\) functions.

  • species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical species as low as possible is preferable.

  • periodic (bool) – Set to true if you want the descriptor output to respect the periodicity of the atomic systems (see the pbc-parameter in the constructor of ase.Atoms).

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

create(system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the ACSF output for the given systems and given centers.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • centers (list) – Indices of the atoms to use as ACSF centers. If no centers are defined, the output will be created for all atoms in the system. When calculating output for multiple systems, provide the centers as a list for each system.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

The ACSF output for the given systems and centers. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of centers and systems and the second dimension is determined by the get_number_of_features()-function. When multiple systems are provided the results are ordered by the input order of systems and their centers.

Return type:

np.ndarray | sparse.COO

create_single(system, centers=None)[source]

Creates the descriptor for the given system.

Parameters:
  • system (ase.Atoms | System) – Input system.

  • centers (iterable) – Indices of the atoms around which the ACSF will be returned. If no centers defined, ACSF will be created for all atoms in the system.

Returns:

The ACSF output for the given system and centers. The first dimension is given by the number of centers and the second dimension is determined by the get_number_of_features()-function.

Return type:

np.ndarray

property g2_params
property g3_params
property g4_params
property g5_params
get_number_of_features()[source]

Used to inquire the final number of features that this descriptor will have.

Returns:

Number of features for this descriptor.

Return type:

int

property r_cut
property species
validate_derivatives_method(method, attach)[source]

Used to validate and determine the final method for calculating the derivatives.

dscribe.descriptors.coulombmatrix module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.coulombmatrix.CoulombMatrix(n_atoms_max, permutation='sorted_l2', sigma=None, seed=None, sparse=False)[source]

Bases: DescriptorMatrix

Calculates the zero padded Coulomb matrix.

The Coulomb matrix is defined as:

C_ij = 0.5 Zi**exponent, when i = j

= (Zi*Zj)/(Ri-Rj), when i != j

The matrix is padded with invisible atoms, which means that the matrix is padded with zeros until the maximum allowed size defined by n_max_atoms is reached.

To reach invariance against permutation of atoms, specify a valid option for the permutation parameter.

For reference, see:

“Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning”, Matthias Rupp, Alexandre Tkatchenko, Klaus-Robert Mueller, and O. Anatole von Lilienfeld, Phys. Rev. Lett, (2012), https://doi.org/10.1103/PhysRevLett.108.058301

and

“Learning Invariant Representations of Molecules for Atomization Energy Prediction”, Gregoire Montavon et. al, Advances in Neural Information Processing Systems 25 (NIPS 2012)

Parameters:
  • n_atoms_max (int) – The maximum nuber of atoms that any of the samples can have. This controls how much zeros need to be padded to the final result.

  • permutation (string) –

    Defines the method for handling permutational invariance. Can be one of the following:

    • none: The matrix is returned in the order defined by the Atoms.

    • sorted_l2: The rows and columns are sorted by the L2 norm.

    • eigenspectrum: Only the eigenvalues are returned sorted by their absolute value in descending order.

    • random: The rows and columns are sorted by their L2 norm after applying Gaussian noise to the norms. The standard deviation of the noise is determined by the sigma-parameter.

  • sigma (float) – Provide only when using the random-permutation option. Standard deviation of the gaussian distributed noise determining how much the rows and columns of the randomly sorted matrix are scrambled.

  • seed (int) – Provide only when using the random-permutation option. A seed to use for drawing samples from a normal distribution.

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

create(system, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the Coulomb matrix for the given systems.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

Coulomb matrix for the given systems. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of systems.

Return type:

np.ndarray | sparse.COO

create_single(system)[source]
Parameters:

system (ase.Atoms) – Input system.

Returns:

The zero padded matrix as a flattened 1D array.

Return type:

ndarray

derivatives_numerical(d, c, system, indices, return_descriptor=True)[source]

Return the numerical derivatives for the given system. :param system: Atomic structure. :type system: ase.Atoms :param indices: Indices of atoms for which the derivatives will be

computed for.

Parameters:

return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. This is true by default as it typically is faster to calculate both in one go.

Returns:

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a 3D numpy array. The dimensions are: [n_atoms, 3, n_features]. The first dimension goes over the included atoms. The order is same as the order of atoms in the given system. The second dimension goes over the cartesian components, x, y and z. The last dimension goes over the features in the default order.

dscribe.descriptors.descriptor module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.descriptor.Descriptor(periodic, sparse, dtype='float64')[source]

Bases: ABC

An abstract base class for all descriptors.

Parameters:
  • periodic (bool) – Whether the descriptor should take PBC into account.

  • sparse (bool) – Whether the output should use a sparse format.

  • dtype (str) – The output data type.

check_atomic_numbers(atomic_numbers)[source]

Used to check that the given atomic numbers have been defined for this descriptor.

Parameters:

species (iterable) – Atomic numbers to check.

Raises:
  • ValueError – If the atomic numbers in the given system are not

  • included in the species given to this descriptor.

abstract create(system, *args, **kwargs)[source]

Creates the descriptor for the given systems.

Parameters:
  • system (ase.Atoms) – The system for which to create the descriptor.

  • args – Descriptor specific positional arguments.

  • kwargs – Descriptor specific keyword arguments.

Returns:

A descriptor for the system.

Return type:

np.array | scipy.sparse.coo_matrix

create_parallel(inp, func, n_jobs, static_size=None, only_physical_cores=False, verbose=False, prefer='processes')[source]

Used to parallelize the descriptor creation across multiple systems.

Parameters:
  • inp (list) – Contains a tuple of input arguments for each processed system. These arguments are fed to the function specified by “func”.

  • func (function) – Function that outputs the descriptor when given input arguments from “inp”.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the number of jobs will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • output_sizes (list of ints) – The size of the output for each job. Makes the creation faster by preallocating the correct amount of memory beforehand. If not specified, a dynamically created list of outputs is used.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

  • prefer (str) –

    The parallelization method. Valid options are:

    • ”processes”: Parallelization based on processes. Uses the “loky” backend in joblib to serialize the jobs and run them in separate processes. Using separate processes has a bigger memory and initialization overhead than threads, but may provide better scalability if perfomance is limited by the Global Interpreter Lock (GIL).

    • ”threads”: Parallelization based on threads. Has bery low memory and initialization overhead. Performance is limited by the amount of pure python code that needs to run. Ideal when most of the calculation time is used by C/C++ extensions that release the GIL.

Returns:

The descriptor output for each given input. The return type depends on the desciptor setup.

Return type:

np.ndarray | sparse.COO | list

derivatives_parallel(inp, func, n_jobs, derivatives_shape, descriptor_shape, return_descriptor, only_physical_cores=False, verbose=False, prefer='processes')[source]

Used to parallelize the descriptor creation across multiple systems.

Parameters:
  • inp (list) – Contains a tuple of input arguments for each processed system. These arguments are fed to the function specified by “func”.

  • func (function) – Function that outputs the descriptor when given input arguments from “inp”.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the number of jobs will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • derivatives_shape (list or None) – If a fixed size output is produced from each job, this contains its shape. For variable size output this parameter is set to None

  • derivatives_shape – If a fixed size output is produced from each job, this contains its shape. For variable size output this parameter is set to None

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

  • prefer (str) –

    The parallelization method. Valid options are:

    • ”processes”: Parallelization based on processes. Uses the “loky” backend in joblib to serialize the jobs and run them in separate processes. Using separate processes has a bigger memory and initialization overhead than threads, but may provide better scalability if perfomance is limited by the Global Interpreter Lock (GIL).

    • ”threads”: Parallelization based on threads. Has bery low memory and initialization overhead. Performance is limited by the amount of pure python code that needs to run. Ideal when most of the calculation time is used by C/C++ extensions that release the GIL.

Returns:

The descriptor output for each given input. The return type depends on the desciptor setup.

Return type:

np.ndarray | sparse.COO | list

format_array(input)[source]

Used to format a float64 numpy array in the final format that will be returned to the user.

abstract get_number_of_features()[source]

Used to inquire the final number of features that this descriptor will have.

Returns:

Number of features for this descriptor.

Return type:

int

property periodic
property sparse
validate_derivatives_method(method)[source]

Used to validate and determine the final method for calculating the derivatives.

dscribe.descriptors.descriptorglobal module

Copyright 2019 DScribe developers Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.descriptorglobal.DescriptorGlobal(periodic, sparse, dtype='float64')[source]

Bases: Descriptor

An abstract base class for all global descriptors.

Parameters:
  • periodic (bool) – Whether the descriptor should take PBC into account.

  • sparse (bool) – Whether the output should use a sparse format.

  • dtype (str) – The output data type.

derivatives(system, include=None, exclude=None, method='auto', return_descriptor=True, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the descriptor derivatives for the given system(s). :param system: One or

many atomic structures.

Parameters:
  • include (list) – Indices of atoms to compute the derivatives on. When calculating descriptor for multiple systems, provide either a one-dimensional list that if applied to all systems or a two-dimensional list of indices. Cannot be provided together with ‘exclude’.

  • exclude (list) – Indices of atoms not to compute the derivatives on. When calculating descriptor for multiple systems, provide either a one-dimensional list that if applied to all systems or a two-dimensional list of indices. Cannot be provided together with ‘include’.

  • method (str) – The method for calculating the derivatives. Supports ‘numerical’ and ‘auto’. Defaults to using ‘auto’ which corresponds to the the most efficient available method.

  • return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. Notice that it typically is faster to calculate both in one go.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the number of jobs will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a either a 3D or 4D array, depending on whether you have provided a single or multiple systems. If the output shape for each system is the same, a single monolithic numpy array is returned. For variable sized output (e.g. differently sized systems,different number of included atoms), a regular python list is returned. The dimensions are: [(n_systems,) n_atoms, 3, n_features]. The first dimension goes over the different systems in case multiple were given. The second dimension goes over the included atoms. The order is same as the order of atoms in the given system. The third dimension goes over the cartesian components, x, y and z. The fourth dimension goes over the features in the default order.

derivatives_numerical(d, c, system, indices, return_descriptor=True)[source]

Return the numerical derivatives for the given system. This is the default numerical implementation using python. You should overwrite this with a more optimized method whenever possible.

Parameters:
  • d (np.array) – The derivatives array.

  • c (np.array) – The descriptor array.

  • system (ase.Atoms) – Atomic structure.

  • indices (list) – Indices of atoms for which the derivatives will be computed for.

  • return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. This is true by default as it typically is faster to calculate both in one go.

derivatives_single(system, indices, method='numerical', return_descriptor=True)[source]

Return the derivatives for the given system. :param system: Atomic structure. :type system: ase.Atoms :param indices: Indices of atoms for which the derivatives will be

computed for.

Parameters:
  • method (str) – The method for calculating the derivatives. Supports ‘numerical’.

  • return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. This is true by default as it typically is faster to calculate both in one go.

Returns:

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a 3D numpy array. The dimensions are: [n_atoms, 3, n_features]. The first dimension goes over the included atoms. The order is same as the order of atoms in the given system. The second dimension goes over the cartesian components, x, y and z. The last dimension goes over the features in the default order.

dscribe.descriptors.descriptorlocal module

Copyright 2019 DScribe developers Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.descriptorlocal.DescriptorLocal(periodic, sparse, dtype='float64', average='off')[source]

Bases: Descriptor

An abstract base class for all local descriptors.

Parameters:
  • periodic (bool) – Whether the descriptor should take PBC into account.

  • sparse (bool) – Whether the output should use a sparse format.

  • dtype (str) – The output data type.

derivatives(system, centers=None, include=None, exclude=None, method='auto', return_descriptor=True, attach=False, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the descriptor derivatives for the given systems and given centers.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • centers (list) – Centers where to calculate the descriptor. Can be provided as cartesian positions or atomic indices. Also see the “attach”-argument that controls the interperation of centers given as atomic indices. If no centers are defined, the descriptor output will be created for all atoms in the system. When calculating descriptor for multiple systems, provide the centers as a list for each system.

  • include (list) – Indices of atoms to compute the derivatives on. When calculating descriptor for multiple systems, provide either a one-dimensional list that if applied to all systems or a two-dimensional list of indices. Cannot be provided together with ‘exclude’.

  • exclude (list) – Indices of atoms not to compute the derivatives on. When calculating descriptor for multiple systems, provide either a one-dimensional list that if applied to all systems or a two-dimensional list of indices. Cannot be provided together with ‘include’.

  • method (str) – The method for calculating the derivatives. Provide either ‘numerical’, ‘analytical’ or ‘auto’. If using ‘auto’, the most efficient available method is automatically chosen.

  • attach (bool) – Controls the behaviour of centers defined as atomic indices. If True, the centers tied to an atomic index will move together with the atoms with respect to which the derivatives are calculated against. If False, centers defined as atomic indices will be converted into cartesian locations that are completely independent of the atom location during derivative calculation.

  • return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. Notice that it typically is faster to calculate both in one go.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the number of jobs will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a either a 4D or 5D array, depending on whether you have provided a single or multiple systems. If the output shape for each system is the same, a single monolithic numpy array is returned. For variable sized output (e.g. differently sized systems, different number of centers or different number of included atoms), a regular python list is returned. The dimensions are: [(n_systems,) n_centers, n_atoms, 3, n_features]. The first dimension goes over the different systems in case multiple were given. The second dimension goes over the descriptor centers in the same order as they were given in the argument. The third dimension goes over the included atoms. The order is same as the order of atoms in the given system. The fourth dimension goes over the cartesian components, x, y and z. The fifth dimension goes over the features in the default order.

derivatives_numerical(d, c, system, centers, indices, attach=False, return_descriptor=True)[source]

Return the numerical derivatives for the given system. This is the default numerical implementation with python. You should overwrite this with a more optimized method whenever possible.

Parameters:
  • d (np.array) – The derivatives array.

  • c (np.array) – The descriptor array.

  • system (ase.Atoms) – Atomic structure.

  • centers (list) – Centers where to calculate SOAP. Can be provided as cartesian positions or atomic indices. If no centers are defined, the SOAP output will be created for all atoms in the system. When calculating SOAP for multiple systems, provide the centers as a list for each system.

  • indices (list) – Indices of atoms for which the derivatives will be computed for.

  • attach (bool) – Controls the behaviour of centers defined as atomic indices. If True, the centers tied to an atomic index will move together with the atoms with respect to which the derivatives are calculated against. If False, centers defined as atomic indices will be converted into cartesian locations that are completely independent of the atom location during derivative calculation.

  • return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. This is true by default as it typically is faster to calculate both in one go.

derivatives_single(system, centers, indices, method='numerical', attach=False, return_descriptor=True)[source]

Return the derivatives for the given system. :param system: Atomic structure. :type system: ase.Atoms :param centers: Centers where to calculate SOAP. Can be

provided as cartesian positions or atomic indices. If no centers are defined, the SOAP output will be created for all atoms in the system. When calculating SOAP for multiple systems, provide the centers as a list for each system.

Parameters:
  • indices (list) – Indices of atoms for which the derivatives will be computed for.

  • method (str) – The method for calculating the derivatives. Supports ‘numerical’.

  • attach (bool) – Controls the behaviour of centers defined as atomic indices. If True, the centers tied to an atomic index will move together with the atoms with respect to which the derivatives are calculated against. If False, centers defined as atomic indices will be converted into cartesian locations that are completely independent of the atom location during derivative calculation.

  • return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. This is true by default as it typically is faster to calculate both in one go.

Returns:

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a 4D numpy array. The dimensions are: [n_centers, n_atoms, 3, n_features]. The first dimension goes over the SOAP centers in the same order as they were given in the argument. The second dimension goes over the included atoms. The order is same as the order of atoms in the given system. The third dimension goes over the cartesian components, x, y and z. The last dimension goes over the features in the default order.

init_derivatives_array(n_centers, n_indices)[source]

Return a zero-initialized numpy array for the derivatives.

init_descriptor_array(n_centers)[source]

Return a zero-initialized numpy array for the descriptor.

validate_derivatives_method(method, attach)[source]

Used to validate and determine the final method for calculating the derivatives.

dscribe.descriptors.descriptormatrix module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.descriptormatrix.DescriptorMatrix(n_atoms_max, permutation='sorted_l2', sigma=None, seed=None, sparse=False, dtype='float64')[source]

Bases: DescriptorGlobal

A common base class for two-body matrix-like descriptors.

Parameters:
  • n_atoms_max (int) – The maximum nuber of atoms that any of the samples can have. This controls how much zeros need to be padded to the final result.

  • permutation (string) –

    Defines the method for handling permutational invariance. Can be one of the following:

    • none: The matrix is returned in the order defined by the Atoms.

    • sorted_l2: The rows and columns are sorted by the L2 norm.

    • eigenspectrum: Only the eigenvalues are returned sorted by their absolute value in descending order.

    • random: The rows and columns are sorted by their L2 norm after applying Gaussian noise to the norms. The standard deviation of the noise is determined by the sigma-parameter.

  • sigma (float) – Provide only when using the random-permutation option. Standard deviation of the gaussian distributed noise determining how much the rows and columns of the randomly sorted matrix are scrambled.

  • seed (int) – Provide only when using the random-permutation option. A seed to use for drawing samples from a normal distribution.

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

create_single(system)[source]
Parameters:

system (ase.Atoms | System) – Input system.

Returns:

The zero padded matrix either as a 1D array.

Return type:

ndarray

get_eigenspectrum(matrix)[source]

Calculates the eigenvalues of the matrix and returns a list of them sorted by their descending absolute value.

Parameters:

matrix (np.ndarray) – The matrix to sort.

Returns:

A list of eigenvalues sorted by absolute value.

Return type:

np.ndarray

get_matrix(system)[source]

Used to get the final matrix for this descriptor.

Parameters:

system (ase.Atoms | System) – Input system.

Returns:

The final two-dimensional matrix for this descriptor.

Return type:

np.ndarray

get_number_of_features()[source]

Used to inquire the final number of features that this descriptor will have.

Returns:

Number of features for this descriptor.

Return type:

int

sort(matrix)[source]

Sorts the given matrix by using the L2 norm.

Parameters:

matrix (np.ndarray) – The matrix to sort.

Returns:

The sorted matrix.

Return type:

np.ndarray

sort_randomly(matrix, sigma)[source]

Given a coulomb matrix, it adds random noise to the sorting defined by sigma. For sorting, L2-norm is used.

Parameters:

matrix (np.ndarray) – The matrix to randomly sort.

sigma:
float: Width of gaussian distributed noise determining how much the

rows and columns of the randomly sorted coulomb matrix are scrambled.

Returns:

The randomly sorted matrix.

Return type:

np.ndarray

unflatten(features, n_systems=None)[source]

Can be used to “unflatten” a matrix descriptor back into a 2D array. Useful for testing and visualization purposes.

Parameters:
  • features (np.ndarray) – Flattened features.

  • n_systems (int) – Number of systems. If not specified a value will be guessed from the input features.

Returns:

The features as a 2D array.

Return type:

np.ndarray

zero_pad(array)[source]

Zero-pads the given matrix.

Parameters:

array (np.ndarray) – The array to pad

Returns:

The zero-padded array.

Return type:

np.ndarray

dscribe.descriptors.ewaldsummatrix module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.ewaldsummatrix.EwaldSumMatrix(n_atoms_max, permutation='sorted_l2', sigma=None, seed=None, sparse=False, dtype='float64')[source]

Bases: DescriptorMatrix

Calculates an Ewald sum matrix for the a given system.

Each entry M_ij of the Ewald sum matrix will contain the Coulomb energy between atoms i and j calculated with the Ewald summation method. In the Ewald method a constant neutralizing background charge has been added to counteract the positive net charge.

The total electrostatic interaction energy in the system can calculated by summing the upper diagonal part of the matrix, including the diagonal itself.

A screening parameter a controls the width of the Gaussian charge distributions in the Ewald summation, but the final matrix elements will be independent of the value of the screening parameter a that is used, as long as sufficient cutoff values are used.

For reference, see:

“Crystal Structure Representations for Machine Learning Models of Formation Energies”, Felix Faber, Alexander Lindmaa, Anatole von Lilienfeld, and Rickard Armiento, International Journal of Quantum Chemistry, (2015), https://doi.org/10.1002/qua.24917

and

“Ewald summation techniques in perspective: a survey”, Abdulnour Y. Toukmaji, John A. Board Jr., Computer Physics Communications, (1996) https://doi.org/10.1016/0010-4655(96)00016-1

and

“R.A. Jackson and C.R.A. Catlow. Computer simulation studies of zeolite structure. Mol. Simul., 1:207-224, 1988, https://doi.org/10.1080/08927022.2013.840898

Parameters:
  • n_atoms_max (int) – The maximum nuber of atoms that any of the samples can have. This controls how much zeros need to be padded to the final result.

  • permutation (string) –

    Defines the method for handling permutational invariance. Can be one of the following:

    • none: The matrix is returned in the order defined by the Atoms.

    • sorted_l2: The rows and columns are sorted by the L2 norm.

    • eigenspectrum: Only the eigenvalues are returned sorted by their absolute value in descending order.

    • random: The rows and columns are sorted by their L2 norm after applying Gaussian noise to the norms. The standard deviation of the noise is determined by the sigma-parameter.

  • sigma (float) – Provide only when using the random-permutation option. Standard deviation of the gaussian distributed noise determining how much the rows and columns of the randomly sorted matrix are scrambled.

  • seed (int) – Provide only when using the random-permutation option. A seed to use for drawing samples from a normal distribution.

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

create(system, accuracy=1e-05, w=1, r_cut=None, g_cut=None, a=None, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the Ewald sum matrix for the given systems.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • accuracy (float) – The accuracy to which the sum is converged to. Corresponds to the variable \(A\) in https://doi.org/10.1080/08927022.2013.840898. Used only if g_cut, r_cut and a have not been specified. Provide either one value or a list of values for each system.

  • w (float) – Weight parameter that represents the relative computational expense of calculating a term in real and reciprocal space. This has little effect on the total energy, but may influence speed of computation in large systems. Note that this parameter is used only when the cutoffs and a are set to None. Provide either one value or a list of values for each system.

  • r_cut (float) – Real space cutoff radius dictating how many terms are used in the real space sum. Provide either one value or a list of values for each system.

  • g_cut (float) – Reciprocal space cutoff radius. Provide either one value or a list of values for each system.

  • a (float) – The screening parameter that controls the width of the Gaussians. If not provided, a default value of \(\alpha = \sqrt{\pi}\left(\frac{N}{V^2}\right)^{1/6}\) is used. Corresponds to the standard deviation of the Gaussians. Provide either one value or a list of values for each system.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

Ewald sum matrix for the given systems. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of systems.

Return type:

np.ndarray | sparse.COO

create_single(system, accuracy=1e-05, w=1, r_cut=None, g_cut=None, a=None)[source]
Parameters:
  • system (ase.Atoms | System) – Input system.

  • accuracy (float) – The accuracy to which the sum is converged to. Corresponds to the variable \(A\) in https://doi.org/10.1080/08927022.2013.840898. Used only if g_cut, r_cut and a have not been specified.

  • w (float) – Weight parameter that represents the relative computational expense of calculating a term in real and reciprocal space. This has little effect on the total energy, but may influence speed of computation in large systems. Note that this parameter is used only when the cutoffs and a are set to None.

  • r_cut (float) – Real space cutoff radius dictating how many terms are used in the real space sum.

  • g_cut (float) – Reciprocal space cutoff radius.

  • a (float) – The screening parameter that controls the width of the Gaussians. If not provided, a default value of \(\alpha = \sqrt{\pi}\left(\frac{N}{V^2}\right)^{1/6}\) is used. Corresponds to the standard deviation of the Gaussians.

get_matrix(system)[source]

The total energy matrix. Each matrix element (i, j) corresponds to the total interaction energy in a system with atoms i and j.

Parameters:

system (ase.Atoms | System) – Input system.

Returns:

Ewald matrix.

Return type:

np.ndarray

dscribe.descriptors.lmbtr module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.lmbtr.LMBTR(geometry=None, grid=None, weighting=None, normalize_gaussians=True, normalization='none', species=None, periodic=False, sparse=False, dtype='float64')[source]

Bases: DescriptorLocal

Implementation of the Local Many-body tensor representation.

You can use this descriptor for finite and periodic systems. When dealing with periodic systems or when using machine learning models that use the Euclidean norm to measure distance between vectors, it is advisable to use some form of normalization. This implementation does not support the use of a non-identity correlation matrix.

Notice that the species of the central atom is not encoded in the output, but is instead represented by a chemical species X with atomic number 0. This allows LMBTR to be also used on general positions not corresponding to real atoms. The surrounding environment is encoded by the two- and three-body interactions with neighouring atoms. If there is a need to distinguish the central species, one can for example train a different model for each central species.

Parameters:
  • geometry (dict) –

    Setup the geometry function. For example:

    "geometry": {"function": "distance"}
    

    The geometry function determines the degree \(k\) for MBTR. The order \(k\) tells how many atoms are involved in the calculation and thus also heavily influence the computational time.

    The following geometry functions are available:

    • \(k=2\)
      • "distance": Pairwise distance in angstroms.

      • "inverse_distance": Pairwise inverse distance in 1/angstrom.

    • \(k=3\)
      • "angle": Angle in degrees.

      • "cosine": Cosine of the angle.

  • grid (dict) –

    Setup the discretization grid. For example:

    "grid": {"min": 0.1, "max": 2, "sigma": 0.1, "n": 50}
    

    In the grid setup min is the minimum value of the axis, max is the maximum value of the axis, sigma is the standard deviation of the gaussian broadening and n is the number of points sampled on the grid.

  • weighting (dict) –

    Setup the weighting function and its parameters. For example:

    "weighting" : {"function": "exp", "r_cut": 10, "threshold": 1e-3}
    

    The following weighting functions are available:

    • \(k=2\)
      • "unity": No weighting.

      • "exp": Weighting of the form \(e^{-sx}\)

      • "inverse_square": Weighting of the form \(1/(x^2)\)

    • \(k=3\)
      • "unity": No weighting.

      • "exp": Weighting of the form \(e^{-sx}\)

      • "smooth_cutoff": Weighting of the form \(f_{ij}f_{ik}\),

        where \(f = 1+y(x/r_{cut})^{y+1}-(y+1)(x/r_{cut})^{y}\)

    The meaning of \(x\) changes for different terms as follows:

    • For \(k=2\): \(x\) = Distance between A->B

    • For \(k=3\): \(x\) = Distance from A->B->C->A.

    The exponential weighting is motivated by the exponential decay of screened Coulombic interactions in solids. In the exponential weighting the parameters threshold determines the value of the weighting function after which the rest of the terms will be ignored. Either the parameter scale or r_cut can be used to determine the parameter \(s\): scale directly corresponds to this value whereas r_cut can be used to indirectly determine it through \(s=-\log()\):.

    The inverse square and smooth cutoff function weightings use a cutoff parameter r_cut, which is a radial distance after which the rest of the atoms will be ignored. For the smooth cutoff function, additional weighting key sharpness can be added, which changes the value of \(y\). If a value for it is not provided, it defaults to 2.

  • normalize_gaussians (bool) – Determines whether the gaussians are normalized to an area of 1. Defaults to True. If False, the normalization factor is dropped and the gaussians have the form. \(e^{-(x-\mu)^2/2\sigma^2}\)

  • normalization (str) –

    Determines the method for normalizing the output. The available options are:

    • "none": No normalization.

    • "l2": Normalize the Euclidean length of the output to unity.

  • species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical speices as low as possible is preferable.

  • periodic (bool) – Set to true if you want the descriptor output to respect the periodicity of the atomic systems (see the pbc-parameter in the constructor of ase.Atoms).

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

  • dtype (str) –

    The data type of the output. Valid options are:

    • "float32": Single precision floating point numbers.

    • "float64": Double precision floating point numbers.

create(system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the LMBTR output for the given systems and given centers.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • centers (list) – Centers where to calculate LMBTR. Can be provided as cartesian positions or atomic indices. If no centers are defined, the LMBTR output will be created for all atoms in the system. When calculating LMBTR for multiple systems, provide the centers as a list for each system.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

The LMBTR output for the given systems and centers. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of centers and systems and the second dimension is determined by the get_number_of_features()-function.

Return type:

np.ndarray | scipy.sparse.csr_matrix

create_single(system, centers=None)[source]

Return the local many-body tensor representation for the given system and centers.

Parameters:
  • system (ase.Atoms | System) – Input system.

  • centers (iterable) – Centers for which LMBTR is created. Can be a list of integer numbers or a list of xyz-coordinates. If integers provided, the atoms at that index are used as centers. If cartesian positions are provided, new atoms are added at that position. If no centers are provided, all atoms in the system will be used as centers.

Returns:

The local many-body tensor representations of given centers, for k terms, as an array. These are ordered as given in centers.

Return type:

1D ndarray

property geometry
get_location(species)[source]

Can be used to query the location of a species combination in the the flattened output.

Parameters:
  • species (tuple) – A tuple containing a species combination as

  • as (chemical symbols or atomic numbers. The central atom is marked) –

  • example (species "X". The tuple can be for) –

  • "H")

Returns:

slice containing the location of the specified species combination. The location is given as a python slice-object, that can be directly used to target ranges in the output.

Return type:

slice

Raises:
  • ValueError – If the requested species combination is not in the

  • output or if invalid species defined.

get_number_of_features()[source]

Used to inquire the final number of features that this descriptor will have.

The number of features for the LMBTR is calculated as follows:

For the pair term (k=2), only pairs where at least one of the atom is the central atom (in periodic systems the central atom may connect to itself) are considered. This means that there are only as many combinations as there are different elements to pair the central atom with (n_elem). This number of combinations is the multiplied by the discretization of the k=2 grid.

For the three-body term (k=3), only triplets where at least one of the atoms is the central atom (in periodic systems the central atom may connect to itself) and the k >= i (symmetry) are considered. This means that as k runs from 0 to n-1, where n is the number of elements, there are n + k combinations that fill this rule. This sum becomes: \(\sum_{k=0}^{n-1} n + k = n^2+(n-1)*n/2\). This number of combinations is the multiplied by the discretization of the k=3 grid.

Returns:

Number of features for this descriptor.

Return type:

int

property grid
property normalization
property species
property weighting

dscribe.descriptors.mbtr module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.mbtr.MBTR(geometry=None, grid=None, weighting=None, normalize_gaussians=True, normalization='none', species=None, periodic=False, sparse=False, dtype='float64')[source]

Bases: DescriptorGlobal

Implementation of the Many-body tensor representation.

You can use this descriptor for finite and periodic systems. When dealing with periodic systems or when using machine learning models that use the Euclidean norm to measure distance between vectors, it is advisable to use some form of normalization. This implementation does not support the use of a non-identity correlation matrix.

Parameters:
  • geometry (dict) –

    Setup the geometry function. For example:

    "geometry": {"function": "atomic_number"}
    

    The geometry function determines the degree \(k\) for MBTR. The order \(k\) tells how many atoms are involved in the calculation and thus also heavily influence the computational time.

    The following geometry functions are available:

    • \(k=1\)
      • "atomic_number": The atomic number.

    • \(k=2\)
      • "distance": Pairwise distance in angstroms.

      • "inverse_distance": Pairwise inverse distance in 1/angstrom.

    • \(k=3\)
      • "angle": Angle in degrees.

      • "cosine": Cosine of the angle.

  • grid (dict) –

    Setup the discretization grid. For example:

    "grid": {"min": 0.1, "max": 2, "sigma": 0.1, "n": 50}
    

    In the grid setup min is the minimum value of the axis, max is the maximum value of the axis, sigma is the standard deviation of the gaussian broadening and n is the number of points sampled on the grid.

  • weighting (dict) –

    Setup the weighting function and its parameters. For example:

    "weighting" : {"function": "exp", "r_cut": 10, "threshold": 1e-3}
    

    The following weighting functions are available:

    • \(k=1\)
      • "unity": No weighting.

    • \(k=2\)
      • "unity": No weighting.

      • "exp": Weighting of the form \(e^{-sx}\)

      • "inverse_square": Weighting of the form \(1/(x^2)\)

    • \(k=3\)
      • "unity": No weighting.

      • "exp": Weighting of the form \(e^{-sx}\)

      • "smooth_cutoff": Weighting of the form \(f_{ij}f_{ik}\),

        where \(f = 1+y(x/r_{cut})^{y+1}-(y+1)(x/r_{cut})^{y}\)

    The meaning of \(x\) changes for different terms as follows:

    • For \(k=2\): \(x\) = Distance between A->B

    • For \(k=3\): \(x\) = Distance from A->B->C->A.

    The exponential weighting is motivated by the exponential decay of screened Coulombic interactions in solids. In the exponential weighting the parameters threshold determines the value of the weighting function after which the rest of the terms will be ignored. Either the parameter scale or r_cut can be used to determine the parameter \(s\): scale directly corresponds to this value whereas r_cut can be used to indirectly determine it through \(s=-\log()\):.

    The inverse square and smooth cutoff function weightings use a cutoff parameter r_cut, which is a radial distance after which the rest of the atoms will be ignored. For the smooth cutoff function, additional weighting key sharpness can be added, which changes the value of \(y\). If a value for it is not provided, it defaults to 2.

  • normalize_gaussians (bool) – Determines whether the gaussians are normalized to an area of 1. Defaults to True. If False, the normalization factor is dropped and the gaussians have the form. \(e^{-(x-\mu)^2/2\sigma^2}\)

  • normalization (str) –

    Determines the method for normalizing the output. The available options are:

    • "none": No normalization.

    • "l2": Normalize the Euclidean length of the output to unity.

    • "n_atoms": Normalize the output by dividing it with the number of atoms in the system. If the system is periodic, the number of atoms is determined from the given unit cell.

    • "valle_oganov": Use Valle-Oganov descriptor normalization, with system cell volume and numbers of different atoms in the cell.

  • species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical speices as low as possible is preferable.

  • periodic (bool) – Set to true if you want the descriptor output to respect the periodicity of the atomic systems (see the pbc-parameter in the constructor of ase.Atoms).

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

  • dtype (str) –

    The data type of the output. Valid options are:

    • "float32": Single precision floating point numbers.

    • "float64": Double precision floating point numbers.

create(system, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return MBTR output for the given systems.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

MBTR for the given systems. The return type depends on the ‘sparse’ attribute.

Return type:

np.ndarray | sparse.COO

create_single(system)[source]

Return the many-body tensor representation for the given system.

Parameters:

system (ase.Atoms | System) – Input system.

Returns:

A single concatenated output vector is returned, either as a sparse or a dense vector.

Return type:

np.ndarray | sparse.COO

derivatives_analytical(d, c, system, indices, return_descriptor)[source]
property geometry
get_location(species)[source]

Can be used to query the location of a species combination in the the output.

Parameters:

species (tuple) – A tuple containing a species combination as chemical symbols or atomic numbers. The tuple can be for example (“H”), (“H”, “O”) or (“H”, “O”, “H”).

Returns:

slice containing the location of the specified species

combination. The location is given as a python slice-object, that can be directly used to target ranges in the output.

Return type:

slice

Raises:

ValueError – If the requested species combination is not in the output or if invalid species defined.

get_number_of_features()[source]

Used to inquire the final number of features that this descriptor will have.

Returns:

Number of features for this descriptor.

Return type:

int

property grid
property normalization
property species
validate_derivatives_method(method)[source]

Used to validate and determine the final method for calculating the derivatives.

property weighting
dscribe.descriptors.mbtr.check_geometry(geometry: dict)[source]

Used to ensure that the given geometry settings are valid.

Parameters:

geometry – Dictionary containing the geometry setup.

dscribe.descriptors.mbtr.check_grid(grid: dict)[source]

Used to ensure that the given grid settings are valid.

Parameters:

grid (dict) – Dictionary containing the grid setup.

dscribe.descriptors.mbtr.check_weighting(k: int, weighting: dict, periodic: bool)[source]

Used to ensure that the given weighting settings are valid.

Parameters:
  • k – The MBTR degree.

  • weighting – Dictionary containing the weighting setup.

  • periodic – Whether the descriptor is periodic or not.

dscribe.descriptors.sinematrix module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.sinematrix.SineMatrix(n_atoms_max, permutation='sorted_l2', sigma=None, seed=None, sparse=False, dtype='float64')[source]

Bases: DescriptorMatrix

Calculates the zero padded Sine matrix for different systems.

The Sine matrix is defined as:

Cij = 0.5 Zi**exponent | i = j

= (Zi*Zj)/phi(Ri, Rj) | i != j

where phi(r1, r2) = | B * sum(k = x,y,z)[ek * sin^2(pi * ek * B^-1 (r2-r1))] | (B is the matrix of basis cell vectors, ek are the unit vectors)

The matrix is padded with invisible atoms, which means that the matrix is padded with zeros until the maximum allowed size defined by n_max_atoms is reached.

For reference, see:

“Crystal Structure Representations for Machine Learning Models of Formation Energies”, Felix Faber, Alexander Lindmaa, Anatole von Lilienfeld, and Rickard Armiento, International Journal of Quantum Chemistry, (2015), https://doi.org/10.1002/qua.24917

Parameters:
  • n_atoms_max (int) – The maximum nuber of atoms that any of the samples can have. This controls how much zeros need to be padded to the final result.

  • permutation (string) –

    Defines the method for handling permutational invariance. Can be one of the following:

    • none: The matrix is returned in the order defined by the Atoms.

    • sorted_l2: The rows and columns are sorted by the L2 norm.

    • eigenspectrum: Only the eigenvalues are returned sorted by their absolute value in descending order.

    • random: The rows and columns are sorted by their L2 norm after applying Gaussian noise to the norms. The standard deviation of the noise is determined by the sigma-parameter.

  • sigma (float) – Provide only when using the random-permutation option. Standard deviation of the gaussian distributed noise determining how much the rows and columns of the randomly sorted matrix are scrambled.

  • seed (int) – Provide only when using the random-permutation option. A seed to use for drawing samples from a normal distribution.

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

create(system, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the Sine matrix for the given systems.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

Sine matrix for the given systems. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of systems.

Return type:

np.ndarray | sparse.COO

get_matrix(system)[source]

Creates the Sine matrix for the given system.

Parameters:

system (ase.Atoms | System) – Input system.

Returns:

Sine matrix as a 2D array.

Return type:

np.ndarray

dscribe.descriptors.soap module

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.soap.SOAP(r_cut=None, n_max=None, l_max=None, sigma=1.0, rbf='gto', weighting=None, average='off', compression={'mode': 'off', 'species_weighting': None}, species=None, periodic=False, sparse=False, dtype='float64')[source]

Bases: DescriptorLocal

Class for generating a partial power spectrum from Smooth Overlap of Atomic Orbitals (SOAP). This implementation uses real (tesseral) spherical harmonics as the angular basis set and provides two orthonormalized alternatives for the radial basis functions: spherical primitive gaussian type orbitals (“gto”) or the polynomial basis set (“polynomial”).

For reference, see:

“On representing chemical environments, Albert P. Bartók, Risi Kondor, and Gábor Csányi, Phys. Rev. B 87, 184115, (2013), https://doi.org/10.1103/PhysRevB.87.184115

“Comparing molecules and solids across structural and alchemical space”, Sandip De, Albert P. Bartók, Gábor Csányi and Michele Ceriotti, Phys. Chem. Chem. Phys. 18, 13754 (2016), https://doi.org/10.1039/c6cp00415f

“Machine learning hydrogen adsorption on nanoclusters through structural descriptors”, Marc O. J. Jäger, Eiaki V. Morooka, Filippo Federici Canova, Lauri Himanen & Adam S. Foster, npj Comput. Mater., 4, 37 (2018), https://doi.org/10.1038/s41524-018-0096-5

Parameters:
  • r_cut (float) – A cutoff for local region in angstroms. Should be bigger than 1 angstrom for the gto-basis.

  • n_max (int) – The number of radial basis functions.

  • l_max (int) – The maximum degree of spherical harmonics.

  • sigma (float) – The standard deviation of the gaussians used to expand the atomic density.

  • rbf (str) –

    The radial basis functions to use. The available options are:

    • "gto": Spherical gaussian type orbitals defined as \(g_{nl}(r) = \sum_{n'=1}^{n_\mathrm{max}}\,\beta_{nn'l} r^l e^{-\alpha_{n'l}r^2}\)

    • "polynomial": Polynomial basis defined as \(g_{n}(r) = \sum_{n'=1}^{n_\mathrm{max}}\,\beta_{nn'} (r-r_\mathrm{cut})^{n'+2}\)

  • weighting (dict) –

    Contains the options which control the weighting of the atomic density. Leave unspecified if you do not wish to apply any weighting. The dictionary may contain the following entries:

    • "function": The weighting function to use. The following are currently supported:

      • "poly": \(w(r) = \left\{ \begin{array}{ll} c(1 + 2 (\frac{r}{r_0})^{3} -3 (\frac{r}{r_0})^{2}))^{m}, \ \text{for}\ r \leq r_0\\ 0, \ \text{for}\ r > r_0 \end{array}\right.\)

        This function goes exactly to zero at \(r=r_0\). If you do not explicitly provide r_cut in the constructor, r_cut is automatically set to r0. You can provide the parameters c, m and r0 as additional dictionary items.

        For reference see:

        ”Caro, M. (2019). Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials. Phys. Rev. B, 100, 024112.”

      • "pow": \(w(r) = \frac{c}{d + (\frac{r}{r_0})^{m}}\)

        If you do not explicitly provide r_cut in the constructor, r_cut will be set as the value at which this function decays to the value given by the threshold entry in the weighting dictionary (defaults to 1e-2), You can provide the parameters c, d, m, r0 and threshold as additional dictionary items.

        For reference see:

        ”Willatt, M., Musil, F., & Ceriotti, M. (2018). Feature optimization for atomistic machine learning yields a data-driven construction of the periodic table of the elements. Phys. Chem. Chem. Phys., 20, 29661-29668. “

      • "exp": \(w(r) = \frac{c}{d + e^{-r/r_0}}\)

        If you do not explicitly provide r_cut in the constructor, r_cut will be set as the value at which this function decays to the value given by the threshold entry in the weighting dictionary (defaults to 1e-2), You can provide the parameters c, d, r0 and threshold as additional dictionary items.

    • "w0": Optional weight for atoms that are directly on top of a requested center. Setting this value to zero essentially hides the central atoms from the output. If a weighting function is also specified, this constant will override it for the central atoms.

  • average (str) –

    The averaging mode over the centers of interest. Valid options are:

    • "off": No averaging.

    • "inner": Averaging over sites before summing up the magnetic quantum numbers: \(p_{nn'l}^{Z_1,Z_2} \sim \sum_m (\frac{1}{n} \sum_i c_{nlm}^{i, Z_1})^{*} (\frac{1}{n} \sum_i c_{n'lm}^{i, Z_2})\)

    • "outer": Averaging over the power spectrum of different sites: \(p_{nn'l}^{Z_1,Z_2} \sim \frac{1}{n} \sum_i \sum_m (c_{nlm}^{i, Z_1})^{*} (c_{n'lm}^{i, Z_2})\)

  • compression (dict) –

    Contains the options which specify the feature compression to apply. Applying compression can slightly reduce the accuracy of models trained on the feature representation but can also dramatically reduce the size of the feature vector and hence the computational cost. Options are:

    • "mode": Specifies the type of compression. This can be one of:

      • "off": No compression; default.

      • "mu2": The SOAP feature vector is generated in an element-agnostic way, so that the size of the feature vector is now independent of the number of elements (see Darby et al. below for details). It is still possible when using this option to construct a feature vector that distinguishes between elements by supplying element-specific weighting under “species_weighting”, see below.

      • "mu1nu1": Implements the mu=1, nu=1 feature compression scheme from Darby et al.: \(p_{inn'l}^{Z_1,Z_2} \sum_m (c_{nlm}^{i, Z_1})^{*} (\sum_z c_{n'lm}^{i, z})\). In other words, each coefficient for each species is multiplied by a “species-mu2” sum over the corresponding set of coefficients for all other species. If this option is selected, features are generated for each center, but the number of features (the size of each feature vector) scales linearly rather than quadratically with the number of elements in the system.

      • "crossover": The power spectrum does not contain cross-species information and is only run over each unique species Z. In this configuration, the size of the feature vector scales linearly with the number of elements in the system.

    • "species_weighting": Either None or a dictionary mapping each species to a species-specific weight. If None, there is no species-specific weighting. If a dictionary, must contain a matching key for each species in the species iterable. The main use of species weighting is to weight each element differently when using the “mu2” option for compression.

    For reference see:

    ”Darby, J.P., Kermode, J.R. & Csányi, G. Compressing local atomic neighbourhood descriptors. npj Comput Mater 8, 166 (2022). https://doi.org/10.1038/s41524-022-00847-y

  • species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical species as low as possible is preferable.

  • periodic (bool) – Set to True if you want the descriptor output to respect the periodicity of the atomic systems (see the pbc-parameter in the constructor of ase.Atoms).

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

  • dtype (str) –

    The data type of the output. Valid options are:

    • "float32": Single precision floating point numbers.

    • "float64": Double precision floating point numbers.

property compression
create(system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the SOAP output for the given systems and given centers.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • centers (list) – Centers where to calculate SOAP. Can be provided as cartesian positions or atomic indices. If no centers are defined, the SOAP output will be created for all atoms in the system. When calculating SOAP for multiple systems, provide the centers as a list for each system.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

The SOAP output for the given systems and centers. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of centers and systems and the second dimension is determined by the get_number_of_features()-function. When multiple systems are provided the results are ordered by the input order of systems and their positions.

Return type:

np.ndarray | sparse.COO

create_single(system, centers=None)[source]

Return the SOAP output for the given system and given centers.

Parameters:
  • system (ase.Atoms | System) – Input system.

  • centers (list) – Cartesian positions or atomic indices. If specified, the SOAP spectrum will be created for these points. If no centers are defined, the SOAP output will be created for all atoms in the system.

Returns:

The SOAP output for the given system and centers. The return type depends on the ‘sparse’-attribute. The first dimension is given by the number of centers and the second dimension is determined by the get_number_of_features()-function.

Return type:

np.ndarray | sparse.COO

derivatives_analytical(d, c, system, centers, indices, attach, return_descriptor=True)[source]

Return the analytical derivatives for the given system. :param system: Atomic structure. :type system: ase.Atoms :param indices: Indices of atoms for which the derivatives will be

computed for.

Parameters:

return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. This is true by default as it typically is faster to calculate both in one go.

Returns:

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a 3D numpy array. The dimensions are: [n_atoms, 3, n_features]. The first dimension goes over the included atoms. The order is same as the order of atoms in the given system. The second dimension goes over the cartesian components, x, y and z. The last dimension goes over the features in the default order.

derivatives_numerical(d, c, system, centers, indices, attach, return_descriptor=True)[source]

Return the numerical derivatives for the given system. :param system: Atomic structure. :type system: ase.Atoms :param indices: Indices of atoms for which the derivatives will be

computed for.

Parameters:

return_descriptor (bool) – Whether to also calculate the descriptor in the same function call. This is true by default as it typically is faster to calculate both in one go.

Returns:

If return_descriptor is True, returns a tuple, where the first item is the derivative array and the second is the descriptor array. Otherwise only returns the derivatives array. The derivatives array is a 3D numpy array. The dimensions are: [n_atoms, 3, n_features]. The first dimension goes over the included atoms. The order is same as the order of atoms in the given system. The second dimension goes over the cartesian components, x, y and z. The last dimension goes over the features in the default order.

get_basis_gto(r_cut, n_max, l_max)[source]

Used to calculate the alpha and beta prefactors for the gto-radial basis.

Parameters:
  • r_cut (float) – Radial cutoff.

  • n_max (int) – Number of gto radial bases.

Returns:

The alpha and beta prefactors for all bases up to a fixed size of l=10.

Return type:

(np.ndarray, np.ndarray)

get_basis_poly(r_cut, n_max)[source]

Used to calculate discrete vectors for the polynomial basis functions.

Parameters:
  • r_cut (float) – Radial cutoff.

  • n_max (int) – Number of polynomial radial bases.

Returns:

Tuple containing the evaluation points in radial direction as the first item, and the corresponding orthonormalized polynomial radial basis set as the second item.

Return type:

(np.ndarray, np.ndarray)

get_cutoff_padding()[source]

The radial cutoff is extended by adding a padding that depends on the used used sigma value. The padding is chosen so that the gaussians decay to the specified threshold value at the cutoff distance.

get_location(species)[source]

Can be used to query the location of a species combination in the the flattened output.

Parameters:

species (tuple) – A tuple containing a pair of species as chemical symbols or atomic numbers. The tuple can be for example (“H”, “O”).

Returns:

slice containing the location of the specified species combination. The location is given as a python slice-object, that can be directly used to target ranges in the output.

Return type:

slice

Raises:

ValueError – If the requested species combination is not in the output or if invalid species defined.

get_number_of_features()[source]

Used to inquire the final number of features that this descriptor will have.

Returns:

Number of features for this descriptor.

Return type:

int

init_internal_array(n_centers, n_types, n, l_max)[source]
init_internal_dev_array(n_centers, n_atoms, n_types, n, l_max)[source]
prepare_centers(system, centers=None)[source]

Validates and prepares the centers for the C++ extension.

property species
validate_derivatives_method(method, attach)[source]

Used to validate and determine the final method for calculating the derivatives.

dscribe.descriptors.valleoganov module

class dscribe.descriptors.valleoganov.ValleOganov(species, function, n, sigma, r_cut, sparse=False, dtype='float64')[source]

Bases: MBTR

Shortcut for implementing the fingerprint descriptor by Valle and Oganov for using MBTR. Automatically uses the right weighting and normalizes the output, and can only be used for periodic systems.

For more information on the weighting and normalization used here as well as the other parameters and general usage of the descriptor, see the MBTR class.

Parameters:
  • species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical species as low as possible is preferable.

  • function (str) –

    The geometry function. The order \(k\) tells how many atoms are involved in the calculation and thus also heavily influence the computational time.

    The following geometry functions are available:

    • \(k=2\)
      • "distance": Pairwise distance in angstroms.

    • \(k=3\)
      • "angle": Angle in degrees.

  • n (int) – Number of discretization points.

  • sigma (float) – Standard deviation of the gaussian broadening

  • r_cut (float) – Radial cutoff.

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

  • dtype (str) –

    The data type of the output. Valid options are:

    • "float32": Single precision floating point numbers.

    • "float64": Double precision floating point numbers.

Module contents

Copyright 2019 DScribe developers

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.