Statistical Analysis

While analyzing randomly rough surfaces we often need a statistical approach to determine some set of representative quantities. Within Gwyddion, there are several ways of doing this. In this section we will explain the various statistical tools and modules offered in Gwyddion, and also present the basic equations which were used to develop the algorithms they utilize.

Scanning probe microscopy data are usually represented as a two-dimensional data field of size N × M, where N and/or M represents the number of rows and/or columns within the data field. The real size of the field is denoted as Lx × Ly where Lx and Ly are the sizes of the respective axes. The sampling interval (distance between two adjacent points within the scan) is denoted Δ. We assume that the sampling interval is the same in both the x and y direction. We assume that the surface height at a given point xy can be described by a random function ξ(xy) that has given statistical properties.

Note that the AFM data are usually collected as line scans along the x axis that are concatenated together to form the two-dimensional image. Therefore, the scanning speed in the x direction is considerably higher than the scanning speed in the y direction. As a result, the statistical properties of AFM data are usually collected along the x profiles as these are less affected by low frequency noise and thermal drift of the sample.

Statistical Quantities Tool

Statistical quantities include basic properties of the height values distribution, including its variance, skewness and kurtosis. The quantities accessible within Gwyddion by means of the Statistical Quantities tool are as follows:

  1. Mean value, minimum, maximum and median.
  2. RMS value of the height irregularities: this quantity is computed from data variance.
  3. Ra value of the height irregularities: this quantity is similar to RMS value with the only difference in exponent (power) within the data variance sum. As for the RMS this exponent is q = 2, the Ra value is computed with exponent q = 1 and absolute values of the data (zero mean).
  4. Height distribution skewness: computed from 3rd central moment of data values.
  5. Height distribution kurtosis: computed from 4th central moment of data values.
  6. Projected surface area and surface area: computed by simple triangulation.
  7. Mean inclination of facets in area: computed by averaging normalized facet direction vectors.

Tip

By default, the Statistical Quantities tool will display figures based on the entire image. If you would like to analyze a certain region within the image, simply click and drag a rectangle around it. The tool window will update with new numbers based on this new region. If you want you see the stats for the entire image again, just click once within the data window and the tool will reset.

More precisely, RMS (σ), skewness (γ1), and kurtosis (γ2) are computed from central moments of i-th order μi according to the following formulas:

The surface area is estimated by the following method. Let zi for i = 1, 2, 3, 4 be values in four neighbour points (pixel centres), and hx and hy pixel dimensions along corresponding axes. If an additional point is placed in the centre of the rectangle which corresponds to the common corner of the four pixels (using the mean value of the pixels), four triangles are formed and the surface area can be approximated by summing their areas. This leads to the following formulas for the area of one triangle (top) and the surface area of one pixel (bottom):

The method is now well-defined for inner pixels of the region. Each value participates on eight triangles, two with each of the four neighbour values. Half of each of these triangles lies in one pixel, the other half in the other pixel. By counting in the area that lies inside each pixel, the total area is defined also for grains and masked areas. It remains to define it for boundary pixels of the whole data field. We do this by virtually extending the data field with a copy of the border row of pixels on each side for the purpose of surface area calculation, thus making all pixels of interest inner.

Figure 4.14.  Surface area calculation triangulation scheme.

Surface area calculation triangulation scheme.

Statistical Functions Tool

One-dimensional statistical functions can be accessed by using the Statistical Functions tool. Within the tool window, you can select which function to evaluate using the selection box on the left labeled Output Type. The graph preview will update automatically. You can select in which direction to evaluate (x or y), but as stated above, we recommend using the fast scanning axis direction. You can also select which interpolation method to use. When you are finished, click Apply to close the tool window and output a new graph window containing the statistical data.

Tip

Similar to the Statistical Quantities tool, this tool evaluates for the entire image by default, but you can select a sub-region to analyze if you wish.

Height and Angle Distribution Functions

The simplest statistical functions are the height and slope distribution functions. These can be computed as non-cumulative (i.e. densities) or cumulative. These functions are computed as normalized histograms of the height or slope (obtained as dreivatives in the selected direction – horizontal or vertical) values. In other words, the quantity on the abscissa in “angle distribution” is the tangent of the angle, not the angle itself.

The normalization of the densities ρ(x) (where x is the corresponding quantity, height or slope) is such that

Evidently, the scale of the values is then independent on the number of data points and the number of histogram buckets. The cumulative distributions are integrals of the densities and they have values from interval [0, 1].

First-Order vs. Second-Order Quantities

The height and slope distribution quantities belong to the first-order statistical quantities, describing only the statistical properties of the individual points. However, for the complete description of the surface properties it is necessary to study higher order functions. Usually, second-order statistical quantities observing mutual relationship of two points on the surface are employed. These functions are namely the autocorrelation function, the height-height correlation function, and the power spectral density function. A description of each of these follows:

Autocorrelation Function

The autocorrelation function is given by

where z1 and z2 are concrete values of heights at points [x1, y1], [x2, y2], τx =  x1 −  x2 and τy =  y1 −  y2. The function w(z1, z2, τx, τy) denotes the two-dimensional probability density of the random function ξ > (xy) corresponding to points [x1y1], [x2y2] and the distance between these points τ.

From the discrete AFM data one can evaluate this function as

where m = τx/Δx, n = τy/Δy. The function can thus be evaluated in a discrete set of values of τx and τy separated by the sampling intervals Δx and Δy, respectively. The two-dimensional autocorrelation function can be calculated with Data ProcessStatistics2D Autocorrelation.

For AFM measurements, we usually evaluate the one-dimensional autocorrelation function based only on profiles along the fast scanning axis. It can therefore be evaluated from the discrete AFM data values as

The one-dimensional autocorrelation function is often assumed to have the form of a Gaussian, i.e. it can be given by the following relation

where σ denotes the root mean square deviation of the heights and T denotes the autocorrelation length.

For the exponential autocorrelation function we have the following relation

Note

For optical measurements (e. g. spectroscopic reflectometry, ellipsometry) the Gaussian autocorrelation function is usually expected to be in good agreement with the surface properties. However, some articles related with surface growth and oxidation usually assume that the exponential form is closer to the reality.

Height-Height Correlation Function

The difference between the height-height correlation function and the autocorrelation function is very small. As with the autocorrelation function, we sum the multiplication of two different values. For the autocorrelation function, these values represented the different distances between points. For the height-height correlation function, we instead use the power of difference between the points.

For AFM measurements, we usually evaluate the one-dimensional height-height correlation function based only on profiles along the fast scanning axis. It can therefore be evaluated from the discrete AFM data values as

where m = τx/Δ. The function thus can be evaluated in a discrete set of values of τ separated by the sampling interval Δ.

The one-dimensional height-height correlation function is often assumed to be Gaussian, i.e. given by the following relation

where σ denotes the root mean square deviation of the heights and T denotes the autocorrelation length.

For the exponential height-height correlation function we have the following relation

In the following figure the height-height correlation function and autocorrelation function obtained for a simulated surface having Gaussian autocorrelation function are plotted. These functions are fitted by means of least-squares method using the formulae shown above. The resulting values of σ and T were practically same for both the approaches.

Figure 4.15.  Height-height correlation function and autocorrelation function obtained for simulated surface having Gaussian autocorrelation function.

Height-height correlation function and autocorrelation function obtained for simulated surface having Gaussian autocorrelation function.

Power Spectral Density Function

The two-dimensional power spectral density function can be written in terms of the Fourier transform of the autocorrelation function

Similarly to the autocorrelation function, we also usually evaluate the one-dimensional power spectral density function which is given by the equation

This function can be evaluated by means of the Fast Fourier Transform as follows:

where Pj(Kx) is the Fourier coefficient of the j-th row, i.e.

If we have chosen the Gaussian ACF, the corresponding Gaussian relation for the PSDF is

For the surface with exponential ACF we have

In the following figure the resulting PSDF and its fit for the same surface as used in the previous figure are plotted. We can see that the function can be again fitted by Gaussian PSDF. The resulting values of σ and T were practically same as those from the HHCF and ACF fit.

Figure 4.16.  PSDF obtained for data simulated with Gaussian autocorrelation function. Points represent computed data, line represents its fit.

PSDF obtained for data simulated with Gaussian autocorrelation function. Points represent computed data, line represents its fit.

We can also introduce radial PSDF Wr(K), which of course contains the same information as the one-dimensional PSDF for isotropic rough surfaces:

For a surface with Gaussian ACF this function is expressed as

while for exponential ACF surface as

Tip

Within Gwyddion you can fit all statistical functions presented here by their Gaussian and exponential forms. To do this, fist click Apply within the Statistical Functions tool window. This will create a new graph window. With this new window selected, click on GraphFit Graph.

Minkowski Functionals

The Minkowski functionals are used to describe global geometric characteristics of structures. Two-dimensional discrete variants of volume V, surface S, and connectivity (Euler-Poincaré Characteristic) χ are calculated according to following formulas:

Here N denotes the total number of pixels, Nwhite denotes the number of “white” pixels, that is pixels above threshold (pixels below threshold are referred to as “black”). The symbol Nbound denotes the number of white-black pixel boundaries. Finally, Cwhite and Cblack denote the number of continuous sets of white and black pixels respectively.

For an image with continuous set of values the functionals are parametrized by the height threshold value  ϑ that divides white pixels from black, that is they can be viewed as functions of this parameter. And these functions V(ϑ), S(ϑ), and χ(ϑ) are what is actually plotted.

Row/Column Statistics Tool

This tool calculates numeric characteristics of each row or column and plots them as a function of its position. This makes it kind of complementary to Statistical Functions tool. Available quantities include:

  1. Mean value, minimum, maximum and median.
  2. RMS value of the height irregularities computed from data variance.
  3. Surface line length. It is estimated as the total length of the straight segments joining data values in the row (column).
  4. Overall slope, i.e. the tangent of the mean line fitted through the row (column).
  5. Tangent of β0. This is a characteristics of the steepnes of local slopes, closely related to the behaviour of autocorrelation and height-height correlation functions at zero. For discrete values it is calculated as follows:

Two-Dimensional Slope Statistics

Several functions in Data ProcessStatistics operate on two-dimensional slope (derviative) statistics.

Slope Distribution calculates a plain two-dimensional distribution of derivatives, that is the horizontal and vertical coordinate on the resulting data field is the horizontal and vertical derivative, respectively. The slopes can be calculated as central derivatives (one-side on the borders of the image) or, if Use local plane fitting is enabled, by fitting a local plane through the neighbourhood of each point and using its gradient. Slope Distribution has also another mode operation called Per-angle graph. in which it plots the distribution of r2 over φ where we introduced polar coordinates (rφ) in the plane of derivatives. The relation between the derivative Cartesian coordinates of the two-dimensional slope distribution and the facet inclination angles are given by the following formula:

Angle Distribution function is a visualization tool that does not calculate a distribution in the strict sense. For each derivative v the circle of points satisfying

is drawn. The number of points on the circle is given by Number of steps.

Facet Analysis

Data ProcessStatisticsFacet Analysis

Facet analysis enables to interactively study orientations of facets occuring in the data and mark facets of specific orientations on the image. The left view displays data with preview of marked facets. The right smaller view, called facet view below, displays two-dimensional slope distribution.

The centre of facet view always correspond to zero inclination (horizontal facets), slope in x-direction increases towards left and right border and slope in y-direction increases towards top and bottom borders. The exact coordinate system is a bit complex and it adapts to the range of slopes in the particular data displayed.

Facet plane size controls the size (radius) of plane locally fitted in each point to determine the local inclination. The special value 0 stands for no plane fitting, the local inclination is determined from symmetric x and y derivatives in each point. The choice of neighbourhood size is crucial for meaningful results: it must be smaller than the features one is interested in to avoid their smoothing, on the other hand it has to be large enough to suppress noise present in the image.

Figure 4.17.  Illustration of the influence of fitted plane size on the distribution of a scan of a delaminated DLC surface with considerable fine noise. One can see the distribution is completely obscured by the noise at small plane sizes. The neighbourhood sizes are: (a) 0, (b) 2, (c) 4, (d) 7. The angle and false color mappings are full-scale for each particular image, i.e. they vary among them.

Illustration of the influence of fitted plane size on the distribution of a scan of a delaminated DLC surface with considerable fine noise. One can see the distribution is completely obscured by the noise at small plane sizes. The neighbourhood sizes are: (a) 0, (b) 2, (c) 4, (d) 7. The angle and false color mappings are full-scale for each particular image, i.e. they vary among them.

Both facet view and data view allow to select a point with mouse and read corresponding facet normal inclination value ϑ and direction φ under Normal. When you select a point on data view, the facet view selection is updated to show inclination in this point.

Button Find Maximum sets facet view selection to slope distribution maximum (the initial selection position).

Button Mark updates the mask of areas with slope similar to the selected slope. More precisely, of areas with slope within Tolerance from the selected slope. The facet view then displays the set of slopes corresponding to marked points (note the set of selected slopes may not look circular on facet view, but this is only due to selected projection). Average inclination of all points in selected range of slopes is displayed under Mean Normal.

One-Dimensional Roughness Parameters

Standardized one-dimensional roughness parameters can be evaluated with the roughness tool.

In the following formulas we assume the mean value of rj is zero, i.e.

Roughness Amplitude Parameters

Roughness Average Ra

Standards: ASME B46.1-1995, ASME B46.1-1985, ISO 4287-1997, ISO 4287/1-1997.

Arithmetical mean deviation. The average deviation of all points roughness profile from a mean line over the evaluation length

An older means of specifying a range for Ra is RHR. This is a symbol on a drawing specifying a minimum and maximum value for Ra.

Root Mean Square Roughness Rq

Standards: ASME B46.1-1995, ISO 4287-1997, ISO 4287/1-1997.

The average of the measured height deviations taken within the evaluation length and measured from the mean line

Maximum Height of the Profile Rt

Standards: ASME B46.1-1995, ISO 4287-1997.

Maximum peak-to-peak-valley height. The absolute value between the highest and lowest peaks

Maximum Profile Valley Depth Rv Rm

Standards: ASME B46.1-1995, ASME B46.1-1985, ISO 4287-1997, ISO 4287/1-1997.

Lowest valley. There is the depth of the deepest valley in the roghness profile over the evaluation length

Maximum Profile Peak Height Rp

Standards: ASME B46.1-1995, ASME B46.1-1985, ISO 4287-1997, ISO 4287/1-1997.

Highest peak. There is the height of the highest peak in the roughness profile over the evaluation length

Average Maximum Height of the Profile Rtm

Standards: ASME B46.1-1995, ISO 4287-1997.

Mean peak-to-valley roughness. It is determined by the difference between the highest peak ant the lowest valley within multiple samples in the evaluation length

where Rvm and Rpm are defined below.

For profile data it is based on five sample lengths (m = 5). The number of samples corresponded with the ISO standard.

Average Maximum Profile Valley Depth Rvm

Standards: ISO 4287-1997.

The mean valley depth based on one peak per sampling length. The single deepest valley is found in five sampling lengths (m = 5) and then averaged

where

Average Maximum Profile Peak Height Rpm

Standards: ISO 4287-1997.

The mean peak height based on one peak per sampling length. The single highest peak is found in five sampling lengths (m = 5) and then averaged

where

Base roughness depth R3z

Standards: ISO 4287-1997.

The distance between the third highest peak and the third lowest valley. A peak is a portion of the surface above the mean line crossings.

Base roughness profile depth R3zISO

Standards: ISO 4287-1997.

The height of the third highest peak from the third lowest valley per sampling length. The base roughness depth is found in five sampling lengths and then averaged.

Ten-point height Rz

Standards: ISO 4287-1997

The average absolute value of the five highest peaks and the five lowest valleys over the evaluation length.

Average peak-to-valley profile roughness RzISO

Standards: ISO 4287-1997.

The average peak-to-valley roughness based on one peak and one valley per sampling length. The single largest deviation is found in five sampling lengths and then averaged. It is identical to Rtm.

The Amplitude Distribution Function ADF

Standards: ISO 4287-1997.

The amplitude distribution function s a probability function that gives the probability that a profile of the surface has a certain height z at any position x.

The Bearing Ratio Curve BRC

Standards: ISO 4287-1997.

The Bearing Ratio Curve is related to the ADF, it is the corresponding cumulative probability distribution and sees much greater use in surface finish. The bearing ratio curve is the integral (from the top down) of the ADF.

Skewness Rsk

Standards: ISO 4287-1997.

Skewness is a parameter that describes the shape of the ADF. Skewness is a simple measure of the asymmetry of the ADF, or, equivalently, it measures the symmetry of the variation of a profile about its mean line

Kurtosis Rku

Standards: ISO 4287-1997.

Kurtosis is the ADF shape parameter considered. Kurtosis relates to the uniformity of the ADF or, equivalently, to the spikiness of the profile.