Chemometric methods application in pharmaceutical products and processes analysis and control
This chapter provides a basic theoretical background on chemometrics and chemometric methods for the analysis of multivariate data. Multivariate data analysis is essential for both product and process development and optimization. Depending on the problem studied, classification and/or regression multivariate methods are applied for data analysis. Different supervised and unsupervised methods for classification and regression are presented, followed by examples of their application in pharmaceutical technology. Some of the methods described include principal component analysis, various supervised classification methods, multiple linear regression, principal component regression, partial least squares regression, support vector machines, etc.
Chemometrics is a scientific discipline where chemistry and pharmaceutical science meet statistics and software (Massart and Buydens, 1988). The term ‘chemometrics’ was coined several decades ago to describe a new way of analyzing chemical data, in which elements of both statistical and chemical thinking are combined (Martens and Naes, 1996). In chemometric techniques, multivariate empirical modeling methods are applied to chemical data (Miller, 1999). Many definitions of chemometrics and chemometric methods are available.
Chemometric techniques, both multivariate data analysis and design of experiments (DoE), have a central role in the process analytical technology (PAT) initiative (Rajalahti and Kvalheim, 2011). The power of chemometrics is that it can be used to model systems that are both largely unknown and complex (Miller, 2005).
Development and availability of modern, computationally powerful software tools has led to a significant increase in chemometrics application in pharmaceutical sciences and industry. Multivariate data analysis has proven to be a proficient tool when combined with advanced characterization techniques (Rajalahti and Kvalheim, 2011).
Chemometric tools are methods designed to establish relationships between different measurements from a chemical system, or process with the state of the system, through the application of mathematical or statistical methods (Lopes et al., 2004). Chemometrics are, in the field of pharmaceutical technology, usually associated with vibrational spectroscopy techniques, such as infrared (IR), near infrared (NIR), and Raman imaging techniques, etc. Vibrational spectroscopy techniques are suitable for the analysis of solid, liquid, and biotechnological pharmaceutical dosage forms. They can be implemented during pharmaceutical development, in production for process monitoring, or in quality control laboratories (Roggo et al., 2007). These techniques produce data with high dimensionality, since each sample is described with hundreds or even thousands of variables. Multivariate analysis provides tools for effective process monitoring and control, enabling detection of multivariate relationships between different variables such as raw materials, process conditions, and end products (Rajalahti and Kvalheim, 2011).
A chemometric problem might be the definition of a relationship between properties of interest (which is sometimes difficult to measure/estimate), based on knowledge of other properties easily obtained and which affect the property of interest (Otto, 1998). In order to obtain this kind of relationship, sets of experiments are usually designed to cover the space of the property/process being analyzed. The next step is building and validation of a model by using multivariate regression or multivariate classification methods, depending on the purpose of the model (Lopes et al., 2004). As in most empirical modeling techniques, chemometric models need to be fed with large amounts of good data. The use of multiple response variables to build models can result in the temptation to overfit models and thus obtain artificially optimistic results (Miller, 2005). Chemometrics include several topics, such as DoE and information extraction methods (modeling, classification, and test of assumptions) (Roggo et al., 2007). There are many reviews and textbooks on the chemometrics available (Lavine, 2000; Otto, 1998; Brereton, 2003; Massart et al., 2003).
Conventional regression methods include multiple linear regression (MLR), principle component regression (PCR), and partial least squares (PLS) (Martens and Naes, 1996; Martens and Martens, 2001). Classification methods include discriminant linear analysis, principal component analysis (PCA), factor analysis (FA), and cluster analysis (CA) (Jolliffe, 1986). Non-linear techniques, such as neural networks and other artificial intelligence methods, are also used for this purpose.
Classification methods are usually connected with qualitative analysis (e.g. classification of samples according to their spectra). Classification can be unsupervised or supervised. Unsupervised classification of the data is performed with no a priori knowledge of their properties. Data are classified in clusters, which then need to be explained. In supervised classification, a model is first developed using the set of data with known categories and then validated by comparison of classification predictions to true categories of the data subset that was previously omitted.
One of the basic unsupervised multivariate data treatment methods is PCA. It is a feature reduction method that is especially useful due to its data visualization ability. PCA reduces the number of variables in an analyzed data set and represents it in visually comprehensible lower- dimensional space. Reduction of the original variables number results from generation of their linear combinations – new latent variables (LV), referred to as loadings.
In the case of a data set of M samples and N variables (Figure 4.1, a data set of 4 samples and 2 variables), each sample represents row vector Each variable j is described by its values for all samples with column vector Xj. If sample vectors are plotted in variable space, then the number of axes is equal to the number of variables N. This way, all the information in X, regarding the relationships (similarities or differences) between samples, can be displayed. The same stands for representation of variable vectors in sample space. The number of axes is equal to the number of samples and relationships (correlations or co-variances) between variables (Rajalahti and Kvalheim, 2011). This common variation (i.e. correlation) between variables is used to determine new LVs. An increase in the number of variables studied complicates the determination of LVs, such as dimensionality reduction. This task is usually achieved by projecting onto LVs. One of the most common algorithms used for computation of LVs is the non-linear iterative PALS (NIPALS) algorithm (Geladi and Kowalski, 1986). The idea behind the NIPALS algorithm is successive orthogonal projection of LVs. Prior to LV projection, weight vector wa is defined, on a different basis for different multivariate methods. The weight vector describes LVs in both variable and sample space. Therefore, the score vector ta and loading vector pa are different presentations of the same LV, carrying information about samples in variable space and variables in sample space, respectively (Rajalahti and Kvalheim, 2011).
Therefore, the main goal of PCA is decomposition of a data set into principal components (PCs, i.e. LVs), which carry most of the information. If the set of data points is represented in a two-dimensional coordinate system, then LV (i.e. PC) is a line that passes in a direction where maximal closeness to as many points as possible is achieved. In this way, most of the data variation is captured and as little as possible of the information is lost. The remaining variation is explained by the next LV, which is a line orthogonal to the previous one.
Therefore, PCs maximize explained variance in the data set and a constraint is put on their projection in terms of necessity of each successive PC to be orthogonal to the previous one. The resulting model is bilinear and represents the product of scores T and loadings P matrices, where T and P consist of orthogonal and orthonormal vectors (Rajalahti and Kvalheim, 2011):
If X is an M × N matrix that consists of M samples and N variables (columns), then T is an M × A matrix and PT is an A × N matrix, where A is the number of PCs. E is an M × N matrix containing residuals, that is, variance not explained by PCs (Rajalahti and Kvalheim, 2011). Matrix X is decomposed to the sum of products of score ta and loading pa vectors, where a = 1, 2,…, A. Constraint is such as that weight vector wa is equal to the loadings pa. Once the first PC (latent variable) is calculated, it is subtracted from data matrix, Xa ‐ taPaT, and the next PCs are calculated. Usually only several first PCs, which explain most of the data variance, are calculated and the rest of the noise is left in residuals. Therefore, information contained in the first PC is more significant than in the second, and the second component is more significant than the third, and so on (Massart and Buydens, 1988). PCA is especially useful for data presentation (visualization), since the score plots reveal patterns, such as clusters, trends, and outliers, in the data. Loading plots reveal covariances among variables and can be used to interpret patterns observed in the score plot. Therefore, score and loading plots should be interpreted simultaneously. For graphical purposes, the optimal number of PCs is two.
Unsupervised clustering methods can be hierarchical when successive partition of the data set results in a clusters sequence represented as a tree or dendrogram (Roggo et al., 2007). Non-hierarchical methods are Gaussian mixture models, K-means, density based spatial clustering of applications with noise, Kohonen neural networks, etc. (Lopes et al., 2004; Roggo et al., 2007).
The supervised classification methods most often used are correlation based methods, distance based methods, linear discriminant analysis (LDA), soft independent modeling of class analogy (SIMCA), and PLS discriminant analysis (PLS-DA) (Roggo et al., 2007). Some of the methods are more focused on discrimination between samples (LDA), whereas others are concerned with their similarity (SIMCA). Also, besides linear methods, non-linear classification methods such as neural networks can be used.
Correlation and distance based methods cluster data by measuring their (dis)similarity. Similarity of samples can be expressed by the correlation coefficient and/or distances (Euclidean, Mahalanobis) between samples (Massart et al., 2003).
Linear discriminant analysis (LDA) is similar to PCA in terms of feature reduction. It is a parametric method used to find optimal boundaries between classes. Analogous to PCA, a direction is sought that achieves maximum separation among different classes (Sharaf et al., 1986). Unknown samples are classified according to Euclidean distances.
K-nearest neighbors (KNN) is a non-parametric method, where an unknown sample is classified according to a class belonging to the majority of its neighbors. The neighborhood is defined by Euclidean distances between samples.
Soft independent modeling of class analogy (SIMCA) is a parametric classification technique that is based on PCA. The data set is first divided into classes of similar samples. PCA is then performed for each class separately, resulting in a PC model (Massart and Buydens, 1988). Cross- validation is used to determine the optimal number of PCs for each class. SIMCA puts more emphasis on similarity within a class than on discrimination between classes (Roggo et al., 2007).
PLS discriminant analysis (PLS-DA) is a parametric and linear method that identifies LVs in the featured spaces, which have maximal covariance within the predictor variables (Stahle and Wold, 1987; Roggo et al., 2007). It is a special case of PLS where the response variable is a binary vector of zeros and ones, describing the class membership for each sample in the investigated groups (Rajalahti and Kvalheim, 2011).
Among nonlinear methods used for classification purposes, ANNs have been proven as one of the most promising methods. Wang et al. (2004) discussed advantages and disadvantages of multivariate discriminant analysis and neural networks as classifiers. Classification was also performed with a probabilistic neural network (PNN), which has an exponential activation function (instead of the most commonly used sigmoid function) (Specht, 1990), and with a learning vector quantization neural (LVQ) network (e.g. a self-organizing map, SOM) (Kohonen, 1990).
It is recommended (Roggo et al., 2007) to use more than one classification method, since the optimal one cannot be known a priori, since classification is dependent on data being analyzed.
Multivariate analysis seeks relationships between a series of independent (explanatory) x-variables, and dependent (response) y-variables. This objective can be achieved by means of a model, where the observed result, that is the response (y), is described as a function of the x-variables (x1, x2, …, xN). The noise is left in residual ey (Rajalahti and Kvalheim, 2011):
Two main groups of multivariate regression methods are those based on MLR and the so-called FA methods. MLR methods are more easily understandable and applicable, since the goal is to directly correlate independent and dependent variables. However, FA methods first require derivation of the original data into a space with less dimensions (another coordinate system for data representation is used), which is then followed by the correlation investigation. The main advantage of FA methods is that factors (usually known as PCs) capture most of data variation and are capable of appointing more accurate x–y correlation in comparison to MLR methods. Typical MLR methods are the classic least squares method and the inverse least square method. The most prominent FA methods are PCA (although it is, in effect, a classification technique), PCR and PLS analysis.
MLR is one of the oldest regression methods and is used to establish linear relationships between multiple independent variables and the dependent variable (sample property) that is influenced by them. The developed model can be represented in the following way:
where y; is the sample property, bi is the computed coefficient for independent variable xi, and ei,j. is the error. Each independent variable is studied one after another and correlated with the sample property yj. Regression coefficients bi describe the effects of each calculated term.
Eq. 4.7 can also be written in the matrix form:
If all x-variables are controlled, then discrete levels of each x-variable can be selected so as to enforce orthogonality between them and their derived interactions and squared terms. The matrix XTX then becomes a diagonal matrix and b is easily calculated. When the x-variables are not controlled or the number of x-variables exceeds the number of experiments, co-linearity arises between the x-variables. The reader is advised to compare data analysis techniques described in Chapter 3 on DoE (Section 3.2.4).
Developed models are usually estimated by the least squares, whereby the sum of squares of the differences between the actual and predicted (by model) values for each sample in the data set is minimized:
where residual error . is the difference between the observed and predicted values of y, yi., and , respectively. The regression equation is estimated such that the total sum of squares (SST) can be partitioned into components due to regression (SSR) and residuals (SSE):
The inclusion of variables in a model is dependent on their predictive ability. Three modes of variables selection are forward, backward, and stepwise. When the variable correlation reaches a certain value, it is kept in the model (Martens and Naes, 1996). The forward stepwise method adds additional variables one by one, depending on the maximum reduction of the residual variance, whereas the backwards method excludes variables one by one.
Once the MLR model is developed, its accuracy in prediction of the dependent variable on the basis of knowledge of multiple independent variables is assessed by calculation of the correlation coefficient, which is calculated when true values are compared to predicted ones (predicted by MLR model). Correlation coefficient R can be calculated with the following formula:
Correlation coefficient is not reserved for MLR, as it is one of the most frequently used statistic parameters for assessment of validity of the developed model regardless of the model type. Except where x-variables are controlled in designed experimentation, measured data in pharmaceutical applications are typically multivariate and collinear and MLR cannot be used (Rajalahti and Kvalheim, 2011). Therefore, in these instances, other techniques should be applied.
In order to improve results of MLR modeling, LV regression methods (LVR) are used where the new set of variables (latent, orthogonal) is calculated from the original ones, thereby reducing dimensionality of variables. Collinear variables can be combined and described by fewer so-called factors or LVs, which describe the underlying structure in the data (Rajalahti and Kvalheim, 2011).
PCR is a combination of PCA and MLR. Once the PCs of the analyzed data are identified, MLR is performed on the scores of independent (predictor) variables. If only the major PCs are used, noise is significantly reduced and error in predictions of dependent variables is very low. When PCR is applied, it is important to note that derived PCs are not necessarily directly influencing dependent properties. PCs are revealing variation in independent (predictor) data that may or may not influence dependent (response) data. This problem is resolved by application of the Partial Least Squares (PLS) technique.
PLS (Partial Least Squares or Projection onto Latent Structures) is a multivariate technique used to develop models for LV variables or factors. These variables are calculated to maximize the covariance between the scores of an independent block (X) and the scores of a dependent block (Y) (Lopes et al., 2004). Both X and Y blocks (data sets) are modeled to find out the variables in an X matrix that will best describe the Y matrix. In this way, variability and correlation are addressed at the same time. In the PLS method, regressions are calculated with the least squares algorithm. In comparison to the other least squares algorithms (i.e. classical MLR), PLS is more robust to noise, co-linearity, and high dimensionality in the data (Ronen et al., 2011). PLS is advantageous, in comparison to PCR, because of the LV selection according to the covariance matrix between the data and the investigated parameters (Roggo et al., 2007). Therefore, the main difference between PLS and PCA/PCR is that normalized weight vector wa is calculated as the covariance between the response y and the data matrix X:
Scores and loadings are calculated by successive projections of the data matrix, as described for PCA. The part of X is explained by a pair of PLS scores, and loading vectors in each step are removed before the next pair is calculated. In comparison to PCA, the weight vector is no longer equal to pa and loading vectors are no longer orthogonal (nor unit vectors). Score vectors are kept orthogonal, which makes some of the calculation steps more easily performed. When applying linear PLS to nonlinear problems, the minor LVs cannot always be discarded, since they may not only describe noise. Nonlinear structures may be modeled using a combination of higher-order and lower-order LVs calculated from linear PLS, but the result of this approach can be an overfitted model that is too sensitive to noise in the modeling data (Ronen et al., 2011).
PLS regression can also be used as a supervised classification method (Rajalahti and Kvalheim, 2011), as described for the PLS-DA method.
There are many methods derived from PCR and PLS, in order to improve and/or ease interpretation of results. Some of these methods are target projection (TP), orthogonal PLS, etc. (Rajalahti and Kvalheim, 2011).
Support Vector Machines (SVM) are a group of supervised learning algorithms, which can be used for classification or regression purposes. The SVM algorithm is based upon the statistical learning theory and the Vapnik-Chervonenkis (VC) dimensions (Vapnik and Chervonenkis, 1974). Standard SVM is a binary classifier that separates inputs into two possible outputs (classes). In contrast to previously described FA methods, where dimensionality reduction enables finding of LVs, SVM algorithms are used to define a space of higher (even infinite) dimensions - a hyperplane. For classification purposes, good separation can be achieved once the distance between samples (in the hyperplane) belonging to different classes is large. Samples that were not separable in the previous space may then be distinguished in the newly created hyperplane (Roggo et al., 2010). Construction of the higher dimensional space by SVM is based upon definition of a kernel function K(x,y), which is applied on the data in the original space (Press et al., 2007). Kernel functions normally used are linear, polynomial, radial basis function (RBF), and sigmoidal, where the latter makes the SVM algorithm equivalent to a two-layer perceptron neural network (Section 220.127.116.11). RBF is the most often used kernel function, since it can handle cases when the relation between the class labels (the target values) and the attributes (the features of the training set) is nonlinear:
SVMs are similar to neural networks, with the main difference being the way in which the weights are adjusted during training. In SVMs, weights are adjusted by solving a quadratic programming problem with linear constraints. Independent (predictor) variables are denoted as attributes, whereas the transformed attribute that is used to define the hyperplane is called a feature. The task of choosing the most suitable representation is known as feature selection. A set of features that describes one sample (i.e. a row of independent, predictor values) is called a vector. Therefore, the goal of the SVM algorithm is to find the optimal hyperplane that separates clusters of vector in such a way that cases with one category of the target variable are on one side of the plane and cases with the other category are on the other side. The vectors at the boundary, which determine the maximal margin hyperplane, are the support vectors (Roggo et al., 2010). The kernel function transforms the data into a higher dimensional space to make it possible to perform the separation. Kernels operate in the input space and the solution of the classification problem is a weighted sum of kernel functions evaluated at the support vectors (Ivanciuc, 2007).
The use of ANNs as regression methods is described in more detail in Chapter 5. The following ANNs are normally used for regression analysis: multilayered perceptron (MLP), generalized regression neural network (GRNN), RBF neural network (RBFNN), etc. Some authors claim that ANNs can outperform PCR and PLS methods, when used for multivariate data analysis (Long et al., 1990; Gemperline et al., 1991), or that PLS- ANN models can better approximate the deviations from linearity in the relationship between spectral data, compared with either PLS or PCR models (Bhandare et al., 1993), whereas others report that different PLS approaches and ANNs can give comparable results (Blanco et al., 2000).
DoE is a multivariate data analysis method, but it deals with a rather limited number of variables and samples that are organized by experimental design. That is why it is usually separated from other multivariate methods, such as PCA or PLS. Multivariate data analysis can be considered a complementary tool to DoE effect and response surface analysis, providing additional information and confirmation of complex multivariate relationships in pharmaceutical product and process development (Huang et al., 2009). Many variables that are not placed in the original experimental design can be included and their effect is then analyzed using multivariate methods. Sometimes it is not possible to arrange all product/process variables systematically, therefore combination of DoE and multivariate methods solves this issue. DoE is described in more detail in Chapter 3.
Data organization for multivariate analysis can be a challenge. Usually, data are organized as two- and three-dimensional matrices, where rows of matrices represent samples (formulation, batch, etc.) and columns of matrices represent variables. A third dimension, which is sometimes included, represents a time point when a specific variable is measured (if time variability is of interest).
Selection of variables to be included in a regression model is the key in making model predictions accurately. Use of state-of-the-art in-process monitoring techniques in the pharmaceutical industry often results in acquisition of huge data sets that are of no relevance if there is no adequate technique for selection of significant variables. Different approaches in variables selection are used, and the main difference is whether one or multiple significant variables are investigated at the same time. Univariate selection is used when variables are analyzed separately from each other and is usually accompanied by t-statistics and ANOVA tests to compare sample groups. The drawback of this approach is that data interaction is not considered as leading to useful models being developed.
Multivariate variable selection is advantageous to univariate selection, as it can capture potential variable correlation. Some of the methods have already been explained, such as determination of PLS weights based on co-variances between the response and each variable (Hoskuldsson, 2001). Other methods include determination of regression coefficients size (Centner et al., 1996), variable importance on projection (VIP) (Eriksson et al., 2001), interval PLS (Norgaard et al., 2000), genetic algorithms (GA) (Lavine et al., 2004), etc.
Sometimes it is necessary to apply a pretreatment procedure, in order to prepare the data for modeling. The purpose of pretreatment is to remove outliers and noise from the data, as well as for easier comparison of different data sets. Data pretreatment is usually dependent on the technique used for data acquisition. Spectroscopy techniques often require normalization, differentiation, and multiplicative scatter correction (MSC) (Geladi et al., 1985), as well as orthogonal signal correction (OSC), optimized scaling (OS), standard normal variate (SNV), first and second derivative, de-trend correction, offset correction, etc. (Rajalahti and Kvalheim, 2011). There is no clear consensus or guidelines on selection of the pretreatment method, therefore it is often based upon experience and trial-and-error approach. In the stage of data pretreatment (preprocessing), sufficient knowledge on sources of variation in the data is required to ensure elimination of only unnecessary outliers and noise.
samples being analyzed. In all validation methods, data sets are divided into training and validation sets. A training set is used for model construction, whereas a validation set is used to test the model’s performance. Many different methods are available for testing of a model’s predictive performance, but cross-validation is still the preferred one. An overview of cross-validation methods is provided in relevant literature (Bro et al., 2008). A model for samples that are not related in time can be validated by the leave-one-out (LOO) approach, also referred to as internal validation, where one of the samples is left out to test the developed model. The procedure is repeated for each sample separately, such that in this way the whole data set is used for model testing. If samples are time-related, then entire batches are left out for model validation to avoid overfitting (Lopes et al., 2004).
where N is the number of samples, and yi and are experimentally obtained and predicted values for calibration samples (in the case of RMSECV) or validation samples (in the case of RMSECV). The samples used for cross-validation are not used in the model construction, therefore providing external testing (i.e. external validation) of the developed model.
Values predicted by the model are compared to experimentally obtained values, by using correlation coefficient (Eq. 4.16).
Measures of model suitability also often used are standard error of prediction (SEP) or standard error of cross-validation (SECV) (Doherty and Lange, 2006).
Standard deviation of predicted values can be determined by the bootstrapping technique, so that new sets of data are generated by random sampling from the original data set and standard deviation of ensemble of estimates is derived (Wehrens et al., 2000).
Note that pure application of chemometrics, that is multivariate analysis tools, does not necessarily improve knowledge of the problem (process, formulation) being studied. Firstly, careful assessment needs to be made to decide upon the optimal technique/method to be used for analysis. Sometimes the simplest method is enough, so it is not necessary for more complex tools to be introduced. Especially when chemometrics application is in the field of pharmaceuticals manufacturing and/or control, care has to be taken that each step of development and implementation of chemometrics tools is analyzed and discussed (Doherty and Lange, 2006). Also, it is often highlighted that a reference model is needed to confirm results obtained by chemometrics analysis, which can significantly increase the need for resources.
One of the most often encountered pitfalls in application of chemometrics (and any other modeling technique) is overfitting of the model. This means that a model with the apparent highest correlation obtained during its development is chosen with no independent testing of previously unseen data. Many of the traditional statistical tests assume that the data obey normal distribution, which is not always the case in real-life applications (Rajalahti and Kvalheim, 2011). With a greater number of variables in comparison to sample numbers, overfitting can occur (Brereton, 2006).
Also an issue that is often neglected, but that can be the source of serious misunderstanding, is discrepancy in terminology that is used for algorithms and methods in different software packages. We have to carefully analyze all the details of methods prior to comparison of results obtained, using the same methodology but with different software tools.
Review of pharmaceutical applications, where advanced characterization techniques are used in combination with multivariate data analysis methods, is provided in relevant references (Gendrin et al., 2008; De Beer et al., 2011; Gordon and McGoverin, 2011; Rajalahti and Kvalheim, 2011).
Tablets of identical formulation, but produced on different sites, were analyzed before and after storage, using NIR spectroscopy (NIRS). PCA of NIR spectra was computed and the score plot confirmed statistical differences between the production sites, and the loadings identified the key wavelengths and showed that the excipients were responsible for the differences (Roggo et al., 2004). Similarly, different sites of production of various proprietary tablets were compared. The PCA score plots showed that NIR spectra of tablets originating from different sites of manufacture often gave rise to statistically different populations. PCA loadings indicated that the differences were related to moisture content and excipients (Yoon et al., 2004).
NIRS was used to detect and identify changes in uncoated and coated tablets in response to pilot-scale changes in process parameters during melt granulation, compression, and coating (Roggo et al., 2005). It was shown that NIRS and PCA were capable of separating batches produced with different melt granulation parameters and could differentiate between cores compressed with different compression forces. PLS regression was used to predict production sample coating times and dissolution rates from the NIRS data.
The accuracy of linear and quadratic discriminant analysis (LDA and QDA) and the KNN method has been evaluated on tablet and capsule data sets, to classify samples for clinical studies (Candolfi et al., 1998).
The cascade correlation neural (CCN) network was used to classify qualified, unqualified, and counterfeit sulfaguanidine pharmaceutical powders (Cui et al., 2004).
PCA was applied to pharmaceutical powder compression (Roopwani and Buckner, 2011). A solid fraction parameter and a mechanical work parameter representing irreversible compression behavior were determined as functions of the applied load. The first principal component (PC1) showed loadings for the solid fraction and work values that agreed with changes in the relative significance of plastic deformation to consolidation at different pressures. The utility of PC1 in understanding deformation was extended to binary mixtures using a subset of the original materials.
Raman spectroscopy was used for identification of tablets (Roggo et al., 2010). Twenty-five product families of tablets have been included in the spectral library and a non-linear classification method, the SVM, was employed. Two calibrations were developed in the cascade; the first identifies the product family, while the second specifies the formulation. A product family comprises different formulations that have the same active pharmaceutical ingredient (API) but in a different amount. The correlation with the reference spectra and the control of the API peak positions in the tablets spectra were used as acceptance criteria in order to confirm the results provided by the SVM supervised classification. The SVM method used for determination of the product family is a hard classifier, meaning that a class will be predicted even if the unknown sample is not present in the calibration library. Strategy was successfully validated using unseen samples.
An interesting approach was presented (Buhse et al., 2005) for classification of over-the-counter (OTC) creams and lotions, by performing PCA with measurements for viscosity, specific gravity, loss on drying (LOD), and surface tension.
MLR and PLS were used in conjunction with NIRS to predict the hardness of tablets (Morisseau and Rhodes, 1997).
NIRS was applied to determine physical (tablet hardness) and chemical parameters (active principle and content uniformity) in intact individual pharmaceutical tablets. Quantization was done by using the PLS method (Blanco and Alcala, 2006).
NIRS was used to measure the percentage of drug dissolution from a series of tablets compacted at different compressional forces. Linear regression, quadratic, cubic, and partial least-square techniques were used to determine the relationship between dissolution profiles data and NIR spectra. Calibration curves, using quadratic and cubic regression, gave higher correlation coefficients than linear regression (Donoso and Ghaly, 2004).
Diffuse reflectance NIRS was used to analyze particle size of powder samples (Berntsson et al., 1998). PCA was first performed on each data set obtained for specific sieve fraction and was followed by removal of outliers. For each measured NIR spectra wavelength, an exponential function was fitted to the experimental data with non-linear least squares regression.
Numerous examples of various regression techniques applied in combination with NIRS for quantification of drugs in pharmaceuticals, polymorphism detection, and moisture content determination, are provided in relevant references (Roggo et al., 2007).
Powder blending was monitored and analyzed using NIRS (Sekulic et al., 1998). PCA, dissimilarity calculations, and block standard deviations were calculated in order to address blend homogeneity. Another approach in monitoring powder blend homogeneity using NIRS relied on PLS projection to latent structures regression (Berntsson et al., 2002). In this method, many collinear spectra variables are transformed to a small number of new orthogonal variables called PLS components, which contain the systematic information in the spectra that gives the best regression model.
The influence of critical granulation parameters (flow rate of granulation liquid and the granulation end point moisture content) on median particle size was studied using MLR (Rantanen et al., 2000). The regression model for two independent variables was first presented in the second-order polynomial form:
where a – f are model coefficients. The model was then simplified with a backward selection technique, which means that terms were removed one by one, so that only the significant terms were included in the final model.
Prediction capability of multivariate methods (PLS and ANN) was performed for in-line moisture measurement during fluid bed granulation (Rantanen et al., 2001). The back-propagation (BP) neural network approach was found to have more predictive power with the independent test data.
The PLS method was also used for quantitative analysis of film coating in fluidized bed process by in-line NIRS (Andersson et al., 2000).
Application of NIR in real-time release in tablet manufacturing, on the basis of multivariate analysis, was presented (Skibsted et al., 2007). The authors compare statistical process control to regression models. In the statistical model, new measurements are compared statistically to historical data from normal operating conditions batches that provide good quality products. Regression models were developed for instances when a quality parameter (intermediate property or the final product) was available. The drying process in a fluid-bed was one of the processes studied and NIR spectra were automatically collected every half minute, with a process reflectance probe inserted into the reactor. As a reference method, LOD was determined as % weight LOD for samples collected in close proximity to the NIR probe port. The spectrum that was recorded during the removal of the sample was assigned to the corresponding LOD reference value. A PLS model with 3 LVs was developed, using 28 calibration spectra of 12 batches.
Many different pre-processing methods were investigated and also wavelength selection routines were applied in order to minimize non-relevant spectral variation and improve model statistics. As a preprocessing method, Savitzky-Golay first derivative with a second-order polynomial fit using 17 spectral points was selected as the optimal one. The first 3 LVs explained 99.08% of the variation in X and 98.70% of Y variation. With 3 LVs, the RMSEC was 0.37 and the RMSECV was 0.53. This example demonstrated how a regression model between in-line NIR spectra and LOD provides monitoring capability for the fluid-bed drying process. It is possible to implement the model for real-time control of the drying. Another regression model was developed in the study to correlate process variables and final quality characteristic of the product - mean disintegration time for the tablets.
Two PLS models were developed (models I and II) using process variables and NIR spectra as predictors of tablet disintegration time. The NIR spectra consisted of more than 2250 spectral variables and, in order to perform data fusion between a few process variables and thousands of spectral variables, the NIR spectra were first decomposed using PCA and the mean centered scores were then fused with the process variables. Then the scores and process variables were auto-scaled and a PLS model established between the predictors and the mean disintegration time.
Process variables used as predictors were mixing time and granulation liquid flow (model I), and drying temperature, drying time, and upper punch force during tableting (model II). Model I used the first three PCs of the PCA model of average NIR spectrum from the mixing, whereas model II used the first three scores from three PCA models of the average NIR spectrum of granulation, end of drying process, and glidant mixing step. The root mean squared error obtained from cross-validation (RMSECV) of model II was 35.0 with 1 LV and 85.4% of the Y variation explained compared to 61.5% for model I. Thus, by adding more process information, the prediction error decreased and a better model was established. The prediction error of model II was also close to the standard deviation for the reference analysis (~ 30 s), so it might be difficult to improve the model further using the existing data. Both models I and II can be used for process control, for example, when adjusting granulation liquid flow (model I) or upper punch force during tableting (model II), since their influence on the tablet disintegration time is deciphered. Of course, definition of optimal processing parameters (and their changing) can never be dependent on just one quality attribute of a product (final or intermediate).
NIR was used for the quantification of API and excipients of a pharmaceutical formulation, accompanied by PCA and PLS analysis (Sarraguga and Lopes, 2009). The developed method was based on laboratory-scale samples as calibration samples and pilot-scale samples (powders and tablets) as model test samples. It was concluded that the use of laboratory-scale samples to construct the calibration set is an effective way to ensure the concentration variability in the development of calibration models for industrial applications. Furthermore, the optimal validation approach and the number of samples needed for successful validation was studied.
Univariate and multivariate methods for processing Raman chemical images were compared and used for analysis of pharmaceutical tablets (Šašić et al., 2004; Šašić, 2007). The study showed that quality of compositional images was improved by the use of multivariate techniques. Furthermore, it was demonstrated that some of the LVs of less importance (e.g. 8th or 12th LV/PC) can carry information about excipients present in low concentration in a pharmaceutical formulation (e.g. magnesium stearate).
Possible sources of variation in a pharmaceutical granulation process were investigated using the PCA method (Sochon et al., 2010).
A dissolution testing system for extended release tablets was validated using multivariate analysis (Gottfries et al., 1994).
An interesting area of chemometrics application is in the biotechnology industry. Review of chemometrics in bioprocess engineering and PAT applications has been provided (Lopes et al., 2004). Chemometrics were used for analysis of pyrolysis mass spectrometry data in α2-interferon production (McGovern et al., 1999). Use of chemometrics in development of an in-line monitoring system for bioprocessing was described (Roychoudhury et al., 2006). Multivariate analysis tools for on-line monitoring of bioprocesses were also reported (Amigo et al., 2008; Schenk et al., 2007). Possibilities to avoid pitfalls with chemometrics in the pharmaceutical and biotech industries are elaborated upon (Doherty and Lange, 2006).
Amigo, J.M., Surribas, A., Coello, J., Montesinos, J.L., Maspoch, S., and Valero, F. (2008) ‘On-line parallel factor analysis. A step forward in the monitoring of bioprocesses in real time’, Chemometr. Intell. Lab., 92(1): 44–52.
Andersson, M., Folestad, S., Gottfries, J., Johansson, M.O., Josefson, M., Wahlud, K.G. Quantitative analysis of film coating in a fluidized bed process by in-line NIR Spectrometry and Multivariate Batch Calibration. Anal. Chem.. 2000; 72:2099–2108.
Bhandare, P., Mendelson, Y., Peura, R.A., Janatsch, G., Kruse-Jarres, J.D., et al. Multivariate determination of glucose in whole blood using partial least-squares and artificial neural networks based on mid-infrared spectroscopy. Appl. Spectrosc.. 1993; 47(8):1214–1221.
Blanco, M., Coello, J., Iturriaga, H., Maspoch, S., Pagès, J. NIR calibration in non-linear systems: different PLS approaches and artificial neural networks. Chemometr. Intell. Lab.. 2000; 50(1):75–82.
Blanco, M., Alcalá, M. Content uniformity and tablet hardness testing of intact pharmaceutical tablets by near infrared spectroscopy: a contribution to process analytical technologies. Anal. Chim. Acta. 2006; 557:353–359.
Brereton, R.G. Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data. Trend. Anal. Chem.. 2006; 25:1103–1111.
Candolfi, A., De Maesschalck, R., Massart, D.L., Hailey, P.A., Harrington, A.C.E. The influence of data pre-processing in the pattern recognition of excipients near-infrared spectra. J. Pharmaceut. Biomed.. 1999; 21(1):115–132.
Cui, X., Zhang, Z., Ren, Y., Liu, S., Harrington, P.B. Quality control of the powder pharmaceutical samples of sulfaguanidine by using NIR reflectance spectrometry and temperature-constrained cascade correlation networks. Talanta. 2004; 64(4):943–948.
De Beer, T., Burggraeve, A., Fonteyne, M., Saerens, L., Remon, J.P., Vervaet, C. Near-infrared and Raman spectroscopy for the in-process monitoring of pharmaceutical production processes’. Int. J. Pharm.. 2011; 417:32–47.
Gottfries, J., Ahlbom, J., Harang, V., Johansson, E., Josefson, M., et al. Validation of an extended release tablet dissolution testing system using design and multivariate analysis. Int. J. Pharm.. 1994; 106:141–148.
Guenette, E., Barrett, A., Kraus, D., Brody, R., Harding, Lj, Magee, G. Understanding the effect of lactose particle size on the properties of DPI formulations using experimental design. Int. J. Pharm.. 2009; 380:80–88.
Huang, J., Kaul, G., Cai, C., Chatlapalli, R., Hernandez-Abad, P., et al. Quality by design case study: An integrated multivariate approach to drug product and process development. Int. J. Pharm.. 2009; 382:23–32.
Lourenço, V., Herdling, T., Reich, G., Menezes, J.C., Lochmann, D. Combining microwave resonance technology to multivariate data analysis as a novel PAT tool to improve process understanding in fluid bed granulation. Eur. J. Pharm. Biopharm.. 2011; 78:513–521.
McGovern, A., Ernill, R., Kara, B.V., Kell, D.B., Goodcare, R. Rapid analysis of the expression of heterologous proteins in Escherichia coli using pyrolysis mass spectrometry and Fourier transform infrared spectroscopy with chemometrics: application to α2-interferon production. J. Biotechnol.. 1999; 72(3):157–168.
Norgaard, L., Saudland, A., Wagner, J., Nielsen, J.P., Munck, L., Engelsen, S.B. Interval partial least squares regression (iPLS): A comparative chemometric study with an example from near infrared spectroscopy. Appl. Spectrosc.. 2000; 54:413–419.
Rantanen, J., Antikainen, O., Mannermaa, J.P., Yliruusi, J.K. Use of the Near-Infrared Reflectance Method for measurement of moisture content during granulation. Pharm. Dev. Technol.. 2000; 5(2):209–217.
Rantanen, J., Räsänen, E., Antikainen, O., Mannermaa, J.P., Yliruusi, J.K. In-line moisture measurement during granulation with a four-wavelength near-infrared sensor: an evaluation of process-related variables and a development of non-linear calibration model. Chemometr. Intell. Lab.. 2001; 56:51–58.
Roggo, Y., Jent, N., Edmond, A., Chalus, P., Ulmschneider, M. Characterizing process effects on pharmaceutical solid forms using near-infrared spectroscopy and infrared imaging. Eur. J. Pharm. Biopharm.. 2005; 61:100–110.
Roggo, Y., Chalus, P., Maurer, L., Lema-Martinez, C., Edmond, A., Jent, N. A review of near infrared spectroscopy and chemometrics in pharmaceutical technologies. J. Pharmaceut. Biomed.. 2007; 44:683–700.
Ronen, D., Sanders, C.F.W., Tan, H.S., Mort, P.R., Doyle, F.J., III. Predictive dynamic modeling of key process variables in granulation processes using partial least squares approach. Ind. Eng. Chem. Res.. 2011; 50:1419–1426.
Šašić, S., Clark, D.A., Mitchell, J.C., Snowden, M.J. A comparison of Raman chemical images produced by univariate and multivariate data processing – a simulation with an example from pharmaceutical practice. Analyst. 2004; 129:1001–1007.
Schenk, J., Marison, I.W., Stockar, U. Simplified Fourier-transform mid-infrared spectroscopy calibration based on a spectra library for the on-line monitoring of bioprocesses. Anal. Chim. Acta. 2007; 591(1):132–140.
Sekulic, S.S., Wakeman, J., Doherty, P., Hailey, P.A. Automated system for the on-line monitoring of powder blending processes using near- infrared spectroscopy. Part II. Qualitative approaches to blend evaluation. J. Pharmaceut. Biomed.. 1998; 17:1285–1309.
Yoon, W.L., Jee, R.D., Charvill, A., Lee, G., Moffat, A.C. Application of near-infrared spectroscopy to the determination of the sites of manufacture of proprietary products. J. Pharmaceut. Biomed.. 2004; 34(5):933–944.