Standardize

Standardize



standardization icon

Standardization techniques are crucial preprocessing steps in spectral analysis that help to bring all variables to a common scale, without distorting differences in the ranges of values or losing information. These standardization techniques can enhance the comparability and interpretability of the spectral data, and improve the accuracy and robustness of subsequent data analysis.

Mean Centering

Mean centering is a simple technique that shifts the data so that the mean (average) value of each variable is zero. It does this by subtracting the mean from every data point. This centering helps make the data easier to interpret, especially when the variables are measured on arbitrary scales with no meaningful zero point.

xcentered=xμx^{centered} = x - \mu

Where:
xx is the original spectra
μ\mu is the spectra mean

Researchers are studying the growth rates of different plant species. Since the plants vary in their initial heights, the growth rates are measured on different scales. By mean centering the growth rates, the researchers can compare how much each species deviates from its average growth, making the results more interpretable.

Standard Normal Variate (SNV)

SNV is a scaling technique that centers the data around zero (like mean centering) and also scales the data to have a standard deviation of one. This makes the data dimensionless and allows for fair comparisons between variables measured on different scales or units.

xsnv=xμσx^{snv} = \frac{x - \mu}{\sigma}

Where:
xx is the original data point
μ\mu is the mean of the data
σ\sigma is the standard deviation of the data
xSNVx_{SNV} is the SNV-transformed data point

In a chemistry lab, researchers analyze the concentrations of different elements in a set of samples using spectroscopy. Since the elements have different natural abundances, their concentrations vary widely across orders of magnitude. By applying SNV, the researchers can compare the relative concentrations of each element across samples, regardless of their absolute values or units.

Robust Scaling

Robust scaling is similar to standard scaling (dividing by the standard deviation), but it uses more robust measures of central tendency and spread: the median and the interquartile range (IQR). This makes robust scaling less sensitive to outliers in the data.

xrobust=xx~IQRx^{robust} = \frac{x - \tilde{x}}{IQR}

Where:
xx is the original data point
x~\tilde{x} is the median of the data
IQRIQR is the interquartile range of the data, calculated as IQR=Q3Q1IQR = Q_3 - Q_1, where Q1Q_1 and Q3Q_3 are the first and third quartiles, respectively.
xrobustx^{robust} is the robust-scaled data point

A team of environmental scientists is studying air pollution levels in a city. Some neighborhoods have unusually high pollution levels due to nearby industrial facilities or other localized sources. By using robust scaling, the scientists can analyze the pollution data without being unduly influenced by these outliers, giving a more representative picture of the overall pollution levels across the city.

Linear Detrending

Linear detrending is a technique used to remove any linear trends (increasing or decreasing patterns) from the data. This is useful when the data contains an underlying trend that obscures the patterns of interest.

y=x(ax+b)y = x - (ax + b)

Where:
xx is the original data point
aa and bb are the coefficients of the linear trend line, obtained by fitting a straight line to the data using least-squares regression
yy is the detrended data point, with the linear trend removed

A geologist is studying seismic activity in a region to understand the potential for earthquakes. The seismic data contains a linear trend due to the gradual movement of tectonic plates, which can mask the patterns of interest related to stress buildup and release. By applying linear detrending, the geologist can remove this underlying trend and focus on the deviations from the trend, which may reveal valuable insights into the seismic activity patterns that precede earthquakes