Many-body Tensor Representation
===============================
The many-body tensor representation (MBTR) :cite:`mbtr` encodes a structure by
using a distribution of different structural motifs. It can be used directly
for both finite and periodic systems. MBTR is especially suitable for
applications where interpretability of the input is important because the
features can be easily visualized and they correspond to specific structural
properties of the system.

In MBTR a geometry function :math:`g_k` is used to transform a chain of
:math:`k` atoms, into a single scalar value. The distribution of these scalar
values is then constructed with kernel density estimation to represent the
structure.

.. figure:: /_static/img/mbtr.jpg
   :width: 1144px
   :height: 772px
   :scale: 40 %
   :alt: MBTR stratification
   :align: center

   Illustration of the MBTR output for a water molecule.

Setup
-----
Instantiating an MBTR descriptor can be done as follows:

.. literalinclude:: ../../../../examples/mbtr.py
   :language: python
   :lines: 2-12

The arguments have the following effect:

.. automethod:: dscribe.descriptors.mbtr.MBTR.__init__

Most noteworthy is that the geometry function determines the degree :math:`k` of
MBTR. This degree directly determines how many atoms need to be taken into
account when calculating the MBTR values (e.g. :math:`k=2` means that each pair
of atoms contributes to MBTR). Here is a brief summary of the currently
supported geometry functions and their weighting functions for each degree
:math:`k`:

.. list-table:: The geometry and weighting functions
   :widths: 10 45 45
   :header-rows: 1

   * -
     - Geometry functions
     - Weighting functions
   * - :math:`k=1`
     - ``"atomic_number"``: The atomic numbers.
     - ``"unity"``: No weighting.
   * - :math:`k=2`
     - ``"distance"``: Pairwise distance in angstroms.

       ``"inverse_distance"``: Pairwise inverse distance in 1/angstrom.
     - ``"unity"``: No weighting.

       ``"exp"``: Weighting of the form :math:`e^{-sx}`,
       where `x` is the distance between the two atoms.

       ``"inverse_square"``: Weighting of the form :math:`1/(x^2)`,
       where `x` is the distance between the two atoms.
   * - :math:`k=3`
     - ``"angle"``: Angle in degrees.

       ``"cosine"``: Cosine of the angle.
     - ``"unity"``: No weighting.

       ``"exp"``: Weighting of the form :math:`e^{-sx}`,
       where `x` is the perimeter of the triangle formed by the tree atoms.

       ``"smooth_cutoff"``: Weighting of the form :math:`f_{ij}f_{ik}` defined by
       :math:`f = 1+y(x/r_{cutoff})^{y+1}-(y+1)(x/r_{cutoff})^{y}`, where `x`
       is the distance between two atoms, :math:`r_{cutoff}` is the radial
       cutoff distance and y is a sharpness parameter which defaults to `2`.

Creation
--------
After MBTR has been set up, it may be used on atomic structures with the
:meth:`~.MBTR.create`-method.

.. literalinclude:: ../../../../examples/mbtr.py
   :language: python
   :start-after: Create
   :lines: 1-9

The call syntax for the create-function is as follows:

.. automethod:: dscribe.descriptors.mbtr.MBTR.create
   :noindex:

The output will in this case be a numpy array with shape :code:`[n_positions, n_features]`.
The number of features may be requested beforehand with the
:meth:`~.MBTR.get_number_of_features`-method.


Examples
--------
The following examples demonstrate common use cases for the descriptor. These
examples are also available in dscribe/examples/mbtr.py.

Locating information
~~~~~~~~~~~~~~~~~~~~
If the MBTR setup has been specified with :code:`flatten=True`, the output is
flattened into a single vector and it can become difficult to identify which
parts correspond to which element combinations. To overcome this, the MBTR class
provides the :meth:`~.MBTR.get_location`-method. This method can be used to
query for the slice that contains a specific element combination. The following
example demonstrates its usage.

.. literalinclude:: ../../../../examples/mbtr.py
   :start-after: Locations
   :language: python
   :lines: 1-6

Visualization
~~~~~~~~~~~~~
The MBTR output vector can be visualized easily. The following snippet
demonstrates how the output for :math:`k=2` can be visualized with matplotlib.

.. literalinclude:: ../../../../examples/mbtr.py
   :start-after: Visualization
   :language: python
   :lines: 1-32

.. figure:: /_static/img/mbtr_k2.png
   :width: 1144px
   :height: 772px
   :scale: 60 %
   :alt: MBTR k=2
   :align: center

   The MBTR output for k=2. The graphs for Na-Na and Cl-Cl overlap due to their
   identical arrangement in the crystal.

Finite systems
~~~~~~~~~~~~~~
For finite systems we have to specify :code:`periodic=False` in the constructor. Finite
systems can not have Valle-Oganov normalization, and an exception is raised if that is
given. The need to apply weighting depends on the size of the system: for small 
systems, such as small molecules, the benefits are small. However for larger systems,
such as clusters and bigger molecules, adding weighting will help in removing
"noise" coming from atom combinations that are physically very far apart and do
not have any meaningful direct interaction in the system. The following code
demonstrates both approaches.

.. literalinclude:: ../../../../examples/mbtr.py
   :start-after: Finite
   :language: python
   :lines: 1-25

.. figure:: /_static/img/mbtr_weighting.png
   :width: 1144px
   :height: 772px
   :scale: 60 %
   :alt: MBTR weighting
   :align: center

   The MBTR output for C60 without weighting and with exponential weighting for
   :math:`k=2`. Without the weighting the carbon pairs that are far away will
   have the highest intensity and dominate the output.

.. _norm-label:

Normalization
~~~~~~~~~~~~~
Depending on the application, the preprocessing of the MBTR vector before
feeding it into a machine learing model may be beneficial. *Normalization* is
one of the methods that can be used to preprocess the output. Here we go
through some examples of what kind of normalization can be used together with
MBTR, but ultimately the need for preprocessing depends on both the predicted
property and the machine learning algorithm.

The different normalization options provided in the MBTR constructor are:

 * **"none"**: No normalization is applied. If the predicted quantity is
   *extensive* - scales with the system size - and there is no need to weight
   the importance of the different :math:`k`-terms, then no normalization
   should be performed.
 * **"l2"**: Output is scaled to unit Euclidean norm. This option can be used if
   the predicted quantity is *intensive* - does not scale with the system size,
   and if the learning method uses the Euclidean norm to determine the
   similarity of inputs.
 * **"n_atoms"**: The whole output is divided by the number of atoms in the
   system. If the system is periodic, the number of atoms is determined from
   the given system cell. This form of normalization does also make the output
   for different crystal supercells equal, but does not equalize the norm of
   different k-terms.
 * **"valle_oganov"**: Normalization of Valle-Oganov descriptor is applied. 
   Each pair in :math:`k=2` is multiplied by :math:`V/4\pi N_{A}N_{B}`, where `V`
   is the volume of the current system and :math:`N_{A}` and :math:`N_{B}` are the
   numbers of each of the pair's elements in the cell. This normalization chosen
   so that the descriptor output of each pair converges at 1 (when created with
   reasonable values of :math:`sigma`). In a similar fashion, each triplet in
   :math:`k=3` is multiplied by :math:`V/N_{A}N_{B}N_{C}`.

Periodic systems
~~~~~~~~~~~~~~~~
When applying MBTR to periodic systems, use :code:`periodic=True` in the constructor.
For periodic systems a weighting function needs to be defined, and an exception
is raised if it is not given. The weighting essentially determines how many
periodic copies of the cell we need to include in the calculation, and without
it we would not know when to stop the periodic repetition. The following code
demonstrates how to apply MBTR on a periodic crystal:

.. literalinclude:: ../../../../examples/mbtr.py
   :start-after: Periodic
   :language: python
   :lines: 1-11

A problem with periodic crystals that is not directly solved by the MBTR
formalism is the fact that multiple different cells shapes and sizes can be
used for the same crystal. This means that the descriptor is not unique: the
same crystal can have multiple representations in the descriptor space. This
can be an issue when predicting properties that do not scale with system size,
e.g. band gap or formation energy per atom. The following code and plot
demonstrates this for different cells representing the same crystal:

.. literalinclude:: ../../../../examples/mbtr.py
   :start-after: Supercells
   :language: python
   :lines: 1-17

.. figure:: /_static/img/mbtr_periodic.png
   :width: 1144px
   :height: 772px
   :scale: 60 %
   :alt: MBTR unnormalized
   :align: center

   The raw MBTR output for different cells representing the same crystal. The
   shapes in the output are the same, but the intensities are different
   (intensities are integer multiples of the primitive system output)

However, if the output is normalized (see the example about different
normalization methods) we can recover truly identical output for the different
cells representing the same material:

.. literalinclude:: ../../../../examples/mbtr.py
   :start-after: Supercells
   :language: python
   :lines: 19-

.. figure:: /_static/img/mbtr_periodic_normalized.png
   :width: 1144px
   :height: 772px
   :scale: 60 %
   :alt: MBTR normalized
   :align: center

   The normalized MBTR output for different cells representing the same
   crystal. After normalising the output with :code:`normalization='l2_each'`,
   the outputs become identical.

DScribe achieves the completely identical MBTR output for different periodic
supercells by taking into account the translational multiplicity of the atomic
pairs and triples in the system. This is done in practice by weighting the
contribution of the pair and triples by how many different periodic repetitions
of the original cell are involved in forming the pair or triple.

.. bibliography:: ../../references.bib
   :style: unsrt
   :filter: docname in docnames