dscribe.descriptors.descriptor module¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class dscribe.descriptors.descriptor.Descriptor(periodic, flatten, sparse, dtype='float64')[source]¶

Bases: abc.ABC

An abstract base class for all descriptors.

Parameters: flatten (bool) – Whether the output of create() should be flattened to a 1D array.

check_atomic_numbers(atomic_numbers)[source]¶

Used to check that the given atomic numbers have been defined for this descriptor.

Parameters

species (iterable) – Atomic numbers to check.

Raises

ValueError – If the atomic numbers in the given system are not
included in the species given to this descriptor. –

abstract create(system, *args, **kwargs)[source]¶

Creates the descriptor for the given systems.

Parameters

system (ase.Atoms) – The system for which to create the descriptor.
args – Descriptor specific positional arguments.
kwargs – Descriptor specific keyword arguments.

Returns

A descriptor for the system.

Return type

np.array | scipy.sparse.coo_matrix

create_parallel(inp, func, n_jobs, static_size=None, only_physical_cores=False, verbose=False, prefer='processes')[source]¶

Used to parallelize the descriptor creation across multiple systems.

Parameters

inp (list) – Contains a tuple of input arguments for each processed system. These arguments are fed to the function specified by “func”.
func (function) – Function that outputs the descriptor when given input arguments from “inp”.
n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the number of jobs will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.
output_sizes (list of ints) – The size of the output for each job. Makes the creation faster by preallocating the correct amount of memory beforehand. If not specified, a dynamically created list of outputs is used.
only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.
verbose (bool) – Controls whether to print the progress of each job into to the console.
prefer (str) –
The parallelization method. Valid options are:
- ”processes”: Parallelization based on processes. Uses the “loky” backend in joblib to serialize the jobs and run them in separate processes. Using separate processes has a bigger memory and initialization overhead than threads, but may provide better scalability if perfomance is limited by the Global Interpreter Lock (GIL).
- ”threads”: Parallelization based on threads. Has bery low memory and initialization overhead. Performance is limited by the amount of pure python code that needs to run. Ideal when most of the calculation time is used by C/C++ extensions that release the GIL.

Returns

The descriptor output for each given input. The return type depends on the desciptor setup.

Return type

np.ndarray | sparse.COO | list

derivatives_parallel(inp, func, n_jobs, derivatives_shape, descriptor_shape, return_descriptor, only_physical_cores=False, verbose=False, prefer='processes')[source]¶

Used to parallelize the descriptor creation across multiple systems.

Parameters

inp (list) – Contains a tuple of input arguments for each processed system. These arguments are fed to the function specified by “func”.
func (function) – Function that outputs the descriptor when given input arguments from “inp”.
n_jobs (int) – Number of parallel jobs to instantiate. Parallellizes the calculation across samples. Defaults to serial calculation with n_jobs=1. If a negative number is given, the number of jobs will be calculated with, n_cpus + n_jobs, where n_cpus is the amount of CPUs as reported by the OS. With only_physical_cores you can control which types of CPUs are counted in n_cpus.
derivatives_shape (list or None) – If a fixed size output is produced from each job, this contains its shape. For variable size output this parameter is set to None
derivatives_shape – If a fixed size output is produced from each job, this contains its shape. For variable size output this parameter is set to None
only_physical_cores (bool) – If a negative n_jobs is given, determines which types of CPUs are used in calculating the number of jobs. If set to False (default), also virtual CPUs are counted. If set to True, only physical CPUs are counted.
verbose (bool) – Controls whether to print the progress of each job into to the console.
prefer (str) –
The parallelization method. Valid options are:
- ”processes”: Parallelization based on processes. Uses the “loky” backend in joblib to serialize the jobs and run them in separate processes. Using separate processes has a bigger memory and initialization overhead than threads, but may provide better scalability if perfomance is limited by the Global Interpreter Lock (GIL).
- ”threads”: Parallelization based on threads. Has bery low memory and initialization overhead. Performance is limited by the amount of pure python code that needs to run. Ideal when most of the calculation time is used by C/C++ extensions that release the GIL.

Returns

The descriptor output for each given input. The return type depends on the desciptor setup.

Return type

np.ndarray | sparse.COO | list

property flatten¶

abstract get_number_of_features()[source]¶

Used to inquire the final number of features that this descriptor will have.

Returns: Number of features for this descriptor.
Return type: int

get_system(system)[source]¶

Used to convert the given atomic system into a custom System-object that is used internally. The System class inherits from ase.Atoms, but includes built-in caching for geometric quantities that may be re-used by the descriptors.

Parameters

system (ase.Atoms | System) – Input system.

Returns

The given system transformed into a corresponding: System-object.

Return type

System

property periodic¶

property sparse¶