Atom-centered Symmetry Functions

Atom-centered Symmetry Functions (ACSFs) [1] can be used to represent the local environment near an atom by using a fingerprint composed of the output of multiple two- and three-body functions that can be customized to detect specific structural features.

Notice that the DScribe output for ACSF does not include the species of the central atom in any way. If the chemical identify of the central species is important, you may want to consider a custom output stratification scheme based on the chemical identity of the central species, or the inclusion of the species identify in an additional feature. Training a separate model for each central species is also possible.

The ACSF output is stratified by the species of the neighbouring atoms. In pseudo-code the ordering of the output vector is as follows:

for Z in atomic numbers in increasing order:
    append value for G1 to output
    for G2 in g2_params:
        append value for G2 to output
    for G3 in g3_params:
        append value for G3 to output
for Z in atomic numbers in increasing order:
   for Z' in atomic numbers in increasing order:
       if and Z' >= Z:
           for G4 in g4_params:
               append value for G4 to output
           for G5 in g5_params:
               append value for G5 to output

Setup

Instantiating an ACSF descriptor can be done as follows:

from dscribe.descriptors import ACSF

# Setting up the ACSF descriptor
acsf = ACSF(
    species=["H", "O"],
    r_cut=6.0,
    g2_params=[[1, 1], [1, 2], [1, 3]],
    g4_params=[[1, 1, 1], [1, 2, 1], [1, 1, -1], [1, 2, -1]],
)

The arguments have the following effect:

ACSF.__init__(r_cut, g2_params=None, g3_params=None, g4_params=None, g5_params=None, species=None, periodic=False, sparse=False, dtype='float64')[source]
Parameters:
  • r_cut (float) – The smooth cutoff value in angstroms. This cutoff value is used throughout the calculations for all symmetry functions.

  • g2_params (n*2 np.ndarray) – A list of pairs of \(\eta\) and \(R_s\) parameters for \(G^2\) functions.

  • g3_params (n*1 np.ndarray) – A list of \(\kappa\) parameters for \(G^3\) functions.

  • g4_params (n*3 np.ndarray) – A list of triplets of \(\eta\), \(\zeta\) and \(\lambda\) parameters for \(G^4\) functions.

  • g5_params (n*3 np.ndarray) – A list of triplets of \(\eta\), \(\zeta\) and \(\lambda\) parameters for \(G^5\) functions.

  • species (iterable) – The chemical species as a list of atomic numbers or as a list of chemical symbols. Notice that this is not the atomic numbers that are present for an individual system, but should contain all the elements that are ever going to be encountered when creating the descriptors for a set of systems. Keeping the number of chemical species as low as possible is preferable.

  • periodic (bool) – Set to true if you want the descriptor output to respect the periodicity of the atomic systems (see the pbc-parameter in the constructor of ase.Atoms).

  • sparse (bool) – Whether the output should be a sparse matrix or a dense numpy array.

Creation

After ACSF has been set up, it may be used on atomic structures with the create()-method.

from ase.build import molecule

water = molecule("H2O")

# Create MBTR output for the hydrogen atom at index 1
acsf_water = acsf.create(water, centers=[1])

print(acsf_water)
print(acsf_water.shape)

The call syntax for the create-function is as follows:

ACSF.create(system, centers=None, n_jobs=1, only_physical_cores=False, verbose=False)[source]

Return the ACSF output for the given systems and given centers.

Parameters:
  • system (ase.Atoms or list of ase.Atoms) – One or many atomic structures.

  • centers (list) – Indices of the atoms to use as ACSF centers. If no centers are defined, the output will be created for all atoms in the system. When calculating output for multiple systems, provide the centers as a list for each system.

  • n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the used cpus will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.

  • only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.

  • verbose (bool) – Controls whether to print the progress of each job into to the console.

Returns:

The ACSF output for the given systems and centers. The return type depends on the ‘sparse’-attribute. The first dimension is determined by the amount of centers and systems and the second dimension is determined by the get_number_of_features()-function. When multiple systems are provided the results are ordered by the input order of systems and their centers.

Return type:

np.ndarray | sparse.COO

The output will in this case be a numpy array with shape [n_positions, n_features]. The number of features may be requested beforehand with the get_number_of_features()-method.

[1]

Jörg Behler. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J. Chem. Phys., 134(7):074106, 2011.