# Registerspec

## Contents

### Purpose

Shift spectra based on expected peak locations.

### Synopsis

- [data_i,axaxis,foundat,shiftfns] = registerspec(data,xaxis,peaks,options)
- peaks = registerspec(data,xaxis,options) % "peak-find" mode

### Description

REGISTERSPEC is used to correct spectra for shifts in x-axis (e.g. wavelength or frequency) registration. The alignment is based on either a polynomial or constrained-spline fit of reference peaks' observed position to their expected position. In contrast to other alignment methods (e.g. piecewise direct standardization or dynamic time warping), REGISTERSPEC may be more useful when

- a) x-axis shifts are small and potentially non-linear,
- b) only a few consistent reference peaks exist, and/or
- c) when some of the spectral bands are expected to undergo significant shape changes in the normal range of observations.

There are two modes used to call REGISTERSPEC:

- 1) The first mode is used to align new spectra given a set of reference peaks.
- 2) The second mode is used to help identify peaks in a calibration set that might be useful as reference peaks:

#### Usage Mode 1: Spectral Alignment:

A typical function call for spectral alignment is the following:

- [data_i,axaxis,foundat,shiftfns] = registerspec(data,xaxis,peaks,options)

When aligning new spectra to known reference peak positions, REGISTERSPEC takes as input:

- a matrix or DataSet object containing spectra to be aligned,
`data`, - an x-axis reference for those spectra,
`xaxis`, and - a vector containing the expected positions of previously-identified reference peaks,
`peaks`

**Note:** If the input `xaxis` is omitted and `data` is a DataSet object containing axisscale information for the variables (data.axisscale{2}), this axis will be used as `xaxis`. Otherwise, a lack of input for `xaxis` will cause REGISTERSPEC to assume that the spectral channels are evenly spaced starting from a value of 1.

Resulting outputs are

- the spectra aligned to the reference peaks
`data_i`, - the x-axis scale for those spectra
`axaxis`(generally the same as`xaxis`, except as discussed below) and - an array,
`foundat`, of the observed shifts for each reference peak (columns) and each spectrum in`data`(rows). - an array,
`shiftfns`, containing the corrections made to axisscale to arrive at the shifted data (each row of shiftfns corresponds to a row of data_i). Use with interp1 function,`xaxis`and`axaxis`to adjust other data in a similar manner:

`data_i = interp1(xaxis,data,axaxis+shiftfns,'spline')`

In addition to correcting for peak shifts, the sampling rate of the output spectra can be increased through cubic-spline interpolation. The options.interpolate setting (see below) controls the sampling rate of the output spectra. Generally, the output `axaxis` is the same as the input `xaxis`. However, when interpolation is performed, the output `axaxis` will contain the x-axis values that correspond to the interpolated spectra in the input data.

Various options can be set through the optional input structure `options`. These are described in detail below. It is recommended that three options fields- options.order, options.maxshift, and options.window- be reviewed prior to use. Note that options.maxshift and options.window are input in absolute x-axis units and the desired input values will vary depending on the original x-axis interval (i.e. data-point spacing) and expected peak widths. In addition, the order of polynomial used to correct for shifts should be reviewed (options.order). It is generally best to keep the order as low as possible (<3 is preferable) to avoid over-fitting and unusual shifting at the ends of the spectrum.

Often, this usage of REGISTERSPEC is for "preprocessing" of spectral data before developing a calibration model. In these cases, REGISTERSPEC should be run both on the original calibration data (first to locate reference bands, then a second time to subject the calibration data to the shift algorithm), as well as on any future data that is to be applied to the calibration.

#### Usage Mode 2: Reference Peak Identification:

A typical function call for reference peak identification is the following:

- peaks = registerspec(data,xaxis,options)

When using REGISTERSPEC to identify reference peaks, the spectral data and x-axis information is supplied alone without a list of reference peaks. In this mode, a set of spectra (often those used for a multivariate calibration model) are searched for peaks which show relatively consistent maxima. The algorithm first locates peaks on the mean spectrum by automatically identifying positions that show a clear inflection point as a peak maximum. Peaks located in the first step are then tested on the individual spectra, and must meet the following criteria:

- (1) For all obesrved spectra, the peak must contain a maximum value (i.e. the peak cannot be a shoulder without an inflection point).
- (2) For all observed spectra, the peak must not shift more than the value set by options.maxshift (default is 4 x-axis units) from the peak's position in the mean spectrum.

The output is a list of potential reference peaks. These should be examined carefully. There is no constraint that a peak have a signal to noise or signal to background level above that which permits the maximum to be found. Thus, very low-signal peaks could be returned as stable but not be observable in future spectra. Additionally, it may be useful to take the list of reference peaks and execute REGISTERSPEC on the calibration data itself to examine the extent and nature of shifting on the calibration data itself.

#### Inputs

**data**= matrix or DataSet containing spectral data**xaxis**= optional frequencies or energies associated with each 'variable in data {optional; default = use DataSet values, otherwise use 1:n}**peaks**= expected locations of peaks to use for shifting. If omitted, 'findpeaks' mode will be invoked and stable peaks will be found in the data (see below).

**Notes:** If input (peaks) is omitted, the algorithm identifies peaks in the mean spectrum by setting peaks at every variable and allowing these to drift to the nearest maximum. It then locates the same peaks in each of the individual spectra and keeps only those peaks which could be located in all spectra with less shift than specified in options.maxshift.

#### Outputs

**data_i**= shifted, interpolated data**axaxis**= interpolated xaxis (will be equal to`xaxis`if no interpolation is requested)**foundat**= matrix of peak shifts found for each peak (columns) in each spectrum (rows)**shiftfns**= matrix of corrections made to axisscale to arrive at the shifted data (each row of shiftfns corresponds to a row of data_i). Use with interp1 to adjust other data in a similar manner: data_i = interp1(xaxis,data,axaxis+shiftfns,'spline')**peaks**= (only for 'findpeaks' mode) Locations of found peaks in`xaxis`units.

### Examples

To locate stable peaks in (unshifted) calibration data

- peaks = registerspec(calibrationdata);

To correct x-axis shift in new data using previously identified peaks

- newdata_unshifted = registerspec(newdata,peaks);

### Options

**options** = a structure array with the following fields:

**display**: [ {'on'} | 'off' ] governs command-line output**plots**: [ {'none'} | 'fit' | 'final' ] governs plotting options**waitbar**: [ 'off' | 'on' | {'auto'} ] governs use of waitbar. If set to 'auto', waitbar will display if excessive time to process is expected.**nopeaks**: [ 'none' | {'warning'} | 'error' ] governs behavior when none of the reference peaks can be located.**shiftby**: [{-0.1}] minimum shifting interval. A positive value is interpreted as being in absolute`xaxis`units and a negative value as relative to the smallest`xaxis`interval.**interpolate**: [{[]}] interpolation interval for output spectra. Empty [] does no interpolation. A positive value is interpreted as being in absolute`xaxis`units and a negative value as relative to the smallest`xaxis`interval.**maxshift**: [in`xaxis`units, {4}] maximum allowed peak shift (peaks which require more shift than this will NOT be used for`xaxis`correction).**window**: [in`xaxis`units, {[]}] size of window to search for each peak. Empty [] uses automatic window based on maxshift.**order**: order of polynomial (only used for polynomial algorithms)**algorithm**:`xaxis`correction algorithm. One of:**pchip**: constrained piecewise spline (well behaved. The function is constrained not to extend beyond any peak's observed difference)**poly**: {default} Standard polynomial fit to found peaks. The`order`option defines the order of polynomial to use.**iterativepoly**: Iterative polynomial fitting. Peaks are located and shifts corrected over several iterations (cycles). The polynomial order is increased by 1 (one) after each iteration until it reaches the order defined by the`order`option. This algorithm works better for badly shifted spectra since the first low-order polynomial corrections bring the spectrum into approximate alignment where more peaks can be identified. Note that the`shiftfns`output will contain the equation used at each cycle for each spectrum rather than a single polynomial function.**findpeaks**: Locate non-moving peaks in whole dataset. Triggered by omission of the**peaks**input.

**smoothing**: [ 'off' | {'on'} ] governs use of smoothing algorithm during peak location. If 'on' each sub-window is smoothed prior to locating maximum in window.**smoothinfo**: [width order] smoothing parameters to be passed to smoothing function savgol if enabled by smoothing option above. width is width of window in number of variables, order is order of polynomial. Default is width of 5 and order 2: [5 2].

### Algorithm

**Spectral Shift Correction**

Correction of x-axis shift is achieved by these steps:

- . A window of points around each of the expected peak positions (including variables up to options.maxshift axis units away) is extracted from the spectrum-at-large and interpolated using spline interpolation.
- . The observed peak location is determined as the maximum of each interpolated window.
- . The observed maxima are compared to the expected peak locations, and the differences are fit with one of several possible functions (see options.algorithm).
- . The difference indicated by the fitting function is removed from the spectrum using interpolation.

**Reference Peak Identification**

Reference peak identification is achieved by assuming peaks are located continuously at every variable across the spectrum. Using the spectral shift algorithm described above, the maximum of each peak window is located for the average of all spectra. This is repeated for all individual spectra, and the list of located maxima is searched to identify peaks which were present and within options.maxshift points in all spectra.