Tsne

From Eigenvector Research Documentation Wiki
Jump to navigation Jump to search

Purpose

Create t-distributed Stochastic Neighbor Embeddings for visualization.

Synopsis

model = tsne(x,options); %identifies model
tsne %Launches Analysis window with TSNE selected

Please note that the recommended way to build a TSNE model from the command line is to use the Model Object. Please see this wiki page on building and applying models using the Model Object.

Description

TSNE is one of many tools to visualize high-dimensional data. Our software utilizes the Scikit-Learn implementation of the TSNE method. Their documentation can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html. Similarity scores are calculated between data samples in the original space to joint probabilities, and the method tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the joint probabilities of the data in the original space. The model will return n_components embeddings. E.g. for an M by N matrix, if the dimension of the embedded space (n_components) is K the embeddings will be of shape M by K. This method cannot be applied to new data.

Note: The PLS_Toolbox Python virtual environment must be configured in order to use this method. Find out more here: Python configuration. At this time, one cannot terminate Python methods from building by the conventional CTRL+C. Please take this into account and mind the workspace when using this method.

Inputs

  • x = X-block (2-way array class "double" or "dataset").

Optional Inputs

  • options = discussed below.

Outputs

The output of TSNE is a model structure with the following fields (see Standard Model Structure for additional information):

  • modeltype: 'TSNE',
  • datasource: structure array with information about input data,
  • date: date of creation,
  • time: time of creation,
  • info: additional model information,
  • description: cell array with text description of model, and
  • detail: sub-structure with additional model details and results.

Note: The embeddings of the TSNE model can be found under detail.tsne.embeddings.

Options

options = a structure array with the following fields:

  • display: [ 'off' | {'on'} ], Governs level of display to command window,
  • plots: [ 'none' | {'final'} ], Governs level of plotting.
  • warnings : [{'off'} | 'on'], Silence or display any potential Python warnings. Only visible in the MATLAB command window.
  • preprocessing: {[]}, Cell array containing a preprocessing structure (see PREPROCESS) defining preprocessing to use on the data (discussed below),
  • n_components: [ {'2'} ], Dimension of the low dimensional embedded space.
  • perplexity: [ {'30'} ], Number of nearest neighbors TSNE considers when calculating conditional probabilities.
  • learning_rate: [ {'200'} ], The learning rate for TSNE, usually in the range [10.0, 1000.0] (as recommended from Scikit-Learn).
  • early_exaggeration: [ {'12'} ], Controls the tightness of clusters in the embedded space and the distance between clusters.
  • n_iter: [ {'1000'} ], Maximum number of iterations for optimization.
  • n_iter_without_progress: [ {'300'} ], Maximum number of iterations before aborting optimization without progress.
  • min_grad_norm: [ {'1e-7'} ], Gradient norm threshold for optimization abort.
  • metric: [ {'euclidean'} | 'manhattan' | 'cosine' | 'mahalanobis' ], The metric used to calculate distance between data samples.
  • init: [ {'random'} | 'pca' ], Initialization method for the embeddings.
  • random_state: [ {'1'} ], Random seed number. Set this to a number for reproducibility.
  • method: [ {'barnes_hut'} | 'exact' ], Gradient calculation algorithm.
  • angle: [ {'0.5'} ], Angular size of a distant node as measured from a point.
  • compression: [ {'none'} | 'pca' ], Type of data compression to perform on the x-block prior to calculating or applying the TSNE model. 'pca' uses a simple PCA model to compress the information.
  • compressncomp: [ {'2'} ], Number of latent variables (or principal components to include in the compression model).
  • compressmd: [ {'yes'} | 'no' ], Use Mahalnobis Distance corrected.

The default options can be retrieved using: options = tsne('options');.

PREPROCESSING

The preprocessing field can be empty [] (indicating that no preprocessing of the data should be used), or it can contain a preprocessing structure output from the PREPROCESS function. For example options.preprocessing = {preprocess('default', 'autoscale')}. This information is echoed in the output model in the model.detail.preprocessing field.

See Also

umap, pca, python