Method Validation

Key Requirements

Intermediate Precision Determination

Estimation of Bias

Limits of Detection (LOD)

Quality Control and Detection Limit Maintenance

Key Requirements


There can be considerable confusion as to what is actually required of a laboratory in respect of its demonstrating that it has implemented a method properly so as to obtain reliable results. Much of this confusion arises because of some unfortunate wording of ISO 17025. When section 5.4 of the standard, which deals with methods, refers to validation it is only in the

 

context of validation of non-standard methods. Clause 5.4.5.2 shows an example. Many laboratories have been tempted to interpret this to mean that standard methods require no ‘validation’ and, if applied exactly as published and within their intended scope, can simply be assumed to give reliable results. This is blatantly absurd and will certainly not be accepted by any reputable accreditation body. Skills of staff and local instrument performance are obvious factors which might compromise a method even if it is applied exactly as published.

 

The problem really arises because the term ‘validation’ is used with a variety of shades of meaning. As applied in ISO 17025 it is clearly meant to refer to the extensive work needed to be done when a completely new test method is developed. Such a method will need full validation to show its scope and limitations of application, to establish all relevant performance characteristics and to show it is robust in application under differing circumstances. In the case of standard methods, which have already undergone this full validation as part of the exercise to have them accepted as standards, a laboratory can take much of the validation for granted. To take only one example; provided the test method is applied to samples within the scope of the published method it can be assumed that the method is suitable for that type of sample.

 

In effect the laboratory can assume that a standard method is fit for purpose in general terms but will still need to carry out an exercise to confirm that the method is working properly as implemented and to determine those performance characteristics which depend on local circumstances. This, much more restricted exercise, is also often also called validation. Undoubtedly this restricted investigation of a method is not what ISO 17025 means by the term ‘validation’. It is, therefore, quite wrong to interpret the standard’s implication that validation is not required for standard methods as meaning that nothing in the way of confirming their effective implementation is required. It is hoped that the next edition of ISO 17025 will be explicit as to what is required of laboratory’s in respect of ‘validating’ their implementation of standard methods. In any event at present accreditation bodies will normally provide guidance as to what they require in this respect.

 

The minimum requirements for establishing satisfactory performance of standard methods by the laboratory will be as follows.

 

  • The intermediate precision of the method covering typical sample types and the range of measurement. Typically a method is investigated at two points in the measurement range, at approximately 20% and 80% of the calibrated range, and for at least two sample types.

  • The bias of the method needs to be investigated by analysis of samples for which the result is known to within a confirmed level of uncertainty.

  • If the method is concerned with establishment of levels of, for example, a toxic residue, then the limit of detection and limit of quantitation will be required.

The vast majority of the methods applied in textile and leather testing are well established, published, standard methods; many of them of them with international application. Hence the restricted type of validation is all that is required. Often the terms ‘verification’ or ‘conformation of performance characteristics’ is applied to this exercise to distinguish it from full validation of novel methods.

Back to top

Intermediate Precision Determination


Sample Requirements

In order to determine intermediate precision a sample which is homogeneous in the property being measured is required. The test can then be repeated with the same, or usually replicate, samples so as to obtain an estimate of the standard deviation of the test results in routine use.

 

In textile and leather testing there are actually two types of sample involved in different test.

 

  • Samples used in physical testing; these samples are normally in the form of test pieces prepared from a bulk and are destroyed during the test. In this case precision can only be estimated using test pieces prepared to be as consistent as possible in the relevant property. Inevitably the resulting intermediate precision will incorporate any element of inhomogeneity in the bulk from which the test piece is prepared.

  • Samples used to determine contaminants or residues, e.g. chromium(VI), formaldehyde; in this case a homogenised bulk sample can be prepared, e.g. by cryogenic grinding of a leather. If samples with appropriate levels of the target analytes are not available then these can be generated by spiking of samples with low or zero natural levels of the target. Samples of powdered materials generated for precision studies must be carefully checked for homogeneity before use.

The strategy for obtaining test pieces from bulk samples so as to have a maximum possibility of homogeneity between the test pieces needs to be carefully planned. Suitable samples can be generated by preparing an appropriate number of test pieces from a single sheet of leather or fabric.

 

  • Use sample types which reflect the range of normal application of the method and which also reflect two points in the normal calibration range. If there is a wide range of types then more than one sample type may be needed to reflect the spectrum of matrices routinely tested.

  • Choose a sheet or fabric which is as uniform as possible. Tightly woven high quality fabrics are the best choice for textiles and for leathers thinner sheets intended for garment or upholstery rather than footwear are preferable. If possible the size should be sufficient for 50 to 60 test pieces. Condition the sheet for 24 hours as for normal samples.

  • Cut as many test pieces as possible from the centre of the piece. Avoid using the extreme edges of the piece and any selvedge which might be atypical.

  • Select adjacent test pieces from the centre of the piece. The objective is to take test pieces which are most likely to be the same in properties.

  • In the case of yarns take all of the samples from consecutive lengths in the centre of a large reel or cone.

Organisation and Conduct of Precision Estimate


The interest is in the precision of the method when used under the conditions of the routine work of the laboratory, the intermediate precision.

This means that the precision study should reflect the routine working as closely as possible so it should use the full range of operators authorised for the method and be spread over a period of at least two weeks.

The estimate should also attempt to cover the full numerical range of the method. Normally this should be taken as meaning that a minimum of two materials are to be run. One should be at approximately 20% of the calibration range and the other at approximately 80%.

An important point to note is that the separate precision determinations must reflect all of the processes involved in the test. For example, if normal testing requires that an instrument be standardised each day then the precision estimate must be carried out so that each sample is run on a different instrument standardisation. Samples run on the same standardisation are likely to underestimate the real variability since any effect of re-standardisation is omitted. The data from the precision study should be processed to give a mean and standard deviation.

 

Number of Replicates


In a precision determination replicate samples need to be tested so the inevitable question arises as to what is a reasonable number of replicates to work with. The object of the exercise is to determine the mean and standard deviation for the method, or in statistical terminology, the population mean and population standard deviation. In practice we can never do this since we would have to measure an infinite number of samples so we estimate these values by repeat testing of a finite number of samples.

 

The more samples we test the better estimate of the population mean and standard deviation is provided by the values we measure but an examination of the relationship between the number of samples tested and the uncertainty in the estimate of the population mean and standard deviation shows that a minimum of seven replicates are realistically required and also that there is little improvement after ten to fifteen replicates.

 

On this basis routine method verification should use at least seven replicates and no more than eleven.

Back to top

Estimation of Bias

Determination of Bias


Using Reference Materials

In this section the term reference material is used to mean any material for which we have an established value for the test parameter and with a quoted uncertainty. Wherever possible a formally certified reference material should be used but this is not always a possibility. Indeed certified reference materials in leather and textile testing are not widely available.

 

The exercise of checking for bias of a method in practical terms is similar to the determination of precision. Indeed if precision is determined using a reference with known values of the parameter being measured then the same data set can yield precision and bias.

 

The guidelines given for precision determination should be followed and seven to eleven measurements made on the reference. This will yield a mean value and standard deviation which must be compared with the quoted values for the reference.

 

In making this comparison we need to take into account our own precision of measurement and the quoted uncertainty in the value for the reference material. A method can be considered accurate, i.e. free from bias if the following criteria are met:-

 

Where the terms have the following meanings:-

 

μref Is the standard uncertainty quoted for the reference material. Most certificates will quote the 95% confidence interval. This should be divided by 2 to get the standard uncertainty.

s/x Is the intermediate precision which you have determined for your method expressed as a standard deviation.

Is the mean of the measurements made on the reference material by you.

c Is the certified or agreed value of the parameter in the reference material.

Put in words if the difference between your mean value and the certified value is less than or equal to twice the combined uncertainties for your measurement and the certification value then there is no evidence for bias at the 95% confidence level.

 

Example


A method for the determination of tensile strength of fibres has been demonstrated to have an intermediate precision of ±3 Mpa expressed as a standard deviation. A certified reference sample is available with a certified value of 203±0.5 Mpa. The ±0.5 is stated to be a 95% confidence interval.

 

Repeat testing of the reference material returns a mean value of 199 Mpa.

  • The standard deviation for the certified value is 0.5/2=0.25 Mpa.

  • Hence the combined uncertainty is Ö (0.252+32)=3.01 Mpa and the comparison term is 2x3.01=6.02 Mpa.

  • The difference between the certified value and the measured value is (199-203)=-4

  • The equation shows that there is no evidence of significant bias in the method since clearly
    -6.02≤-4≤+6.02

The following points need to be noted:-

 

  • The minimum bias which can be detected is limited to twice the standard uncertainty of the reference material so it is important to choose a reference material with as small an uncertainty as possible.

  • When carrying out the measurements on the reference material you must ensure that the method is delivering the precision that you have determined at verification. This means applying normal quality control procedures and rejecting any measurements of the reference which fail your normal quality criteria.

  • If your demonstrated precision depends on the value of the parameter remember to use a value for S which is typical of the range of values in which the reference material lies.

Using Inter-Laboratory Comparisons


Where there are no reference materials with universal acceptance, a common situation in textiles and leather testing, an alternative approach for investigating bias is to send a test items from a range of typical samples to several other laboratories who make the same measurement on a regular basis and then to compare data.

 

Ideally two samples with different values of the measurand for each matrix type typically tested should be sent for comparison.

 

The number of laboratories used should be as large as possible but in practice at least two will be required. If one laboratory returns a result which does not agree with yours it is impossible to resolve the conflict without data from another laboratory which will, hopefully, be consistent with one of the previous results.

 

The data required from the other laboratories should be their measurement result plus their estimate of the uncertainty of that measurement. These figures can then be used in the equation:-

 

This is essentially the same equation as used to determine whether the value obtained for a reference material was unbiased except that the certified parameters have been replaced by values supplied by the collaborating laboratories. The terms have the following meanings:-

 

μlab Is the standard uncertainty quoted for their measurement by the collaborating laboratory. The laboratory may supply the 95% confidence interval. If so this should be divided by 2 to get the standard uncertainty. See reference A.

 

s/x Is the intermediate precision which you have determined for your method expressed as a standard deviation.

 

Is the mean of the measurements made by you on the sample submitted.

clab Value reported by the collaborating laboratory.

 

You must, of course, carry out a separate calculation for each of the collaborating laboratories and determine whether you agree with them all. If there are one or more laboratories with which you do not agree it is a good idea to carry out the above calculation for these laboratories in comparison with all of the other participants to determine whether the laboratories with which you have a disagreement are generally outwith the consensus.

 

The overall pattern of agreement and disagreement has then to be studied so that you can make a judgement as to whether your data is anomalous.

 

Similar information can be obtained by participation in proficiency testing (PT) but suitable PT might not be available at the time of the validation. Another problem with use of PT for method validation is that proficiency schemes tend to only provide a restricted set of sample types and coverage of measurement range at each round. A typical PT round may well include only one sample.

Back to top

Limits of Detection (LOD)


Limit of detection is only an issue in textile and leather testing for tests, usually chemical, where levels of residues or contaminants or composition are being measured.

 

As a method, especially an instrumental method, is pushed to lower and lower levels there comes a point where the variability of the measurement is such that it is impossible to distinguish between random fluctuations in the value measured and an actual signal. In short the ‘noise’ on the determination is becoming comparable with the measurement.

 

Note that by ‘noise’ we mean not only instrument noise but any other contributions due, for example, to low levels of interfering signals generated by other substances being detected. For this reason the limit of detection must be determined by following the complete method through so as to take into account not only instrumental factors but also inputs from the preparation of the sample for analysis.

 

The limit of detection for a method can be generally defined as the smallest signal which can be distinguished reliably from the blank. In practice there are two possible error conditions in this process;

 

  • We may decide, wrongly, that a noise signal is due to analyte and so report a target as present when it is, in fact, absent;

  • We may decide that there is no signal distinguishable from the blank when the analyte is, in fact, present and so report the target as absent when it is in fact there.

Clearly this begs the question as to how ‘reliably’ is defined. Unfortunately there is no consensus on this in the world analytical community but we will approach it in terms of the commonest practice.

 

The key information which we need to establish LOD is the variability on the blank. This is what we must determine experimentally. We will then be able to obtain a mean value for the blank and the standard deviation on the blank.

 

The limit of detection is then commonly defined by the equation:-

LOD=yb+3 x sB


Where yB is the mean value measured from the blank and sB is the standard deviation on that value. This should be based on a minimum of seven measurements.

In words we are saying that the LOD is equal to the average result for the blank plus three standard deviations. In statistical terms this quantifies ‘reliable’ as it can be shown that the probability of making a wrong decision based on this LOD is 7%.

A minor variation on the equation above is sometimes used as follows:-

LOD=yb + 3.28 x sB

The changes of multiplier from 3 to 3.28 changes the probability of a wrong decision from 7% to 5%. Put another way this is the point at which we are 95% confident that the decision we make is right. A level of 95% confidence is widely regarded as the target to be set for decisions based on statistical factors.

 

Note that normally the value obtained for the blank, yB will be very close to zero. This explains the simplified definition of LOD as three times the standard deviation on the blank.

 

Practical Determination of LOD

In principle the determination of LOD is simple enough. We simply carry out seven determinations on the blank and calculate the mean and standard deviation. Note that the ‘blank’ needs to be a method blank not simply a blank calibration solution as it must pass through the entire method including any work-up. The matrix of the blank also needs to be as close as possible to a real sample.

 

One of the difficulties which arises with LOD determinations is that many instrumental methods will give identical zeros for a blank. What is happening here is that the instrument is highly stable but does not have adequate resolution to distinguish small variations in the blank determination. This will result in a measured standard deviation on the blank of zero and therefore an essentially infinite limit of detection. Clearly this is unrealistic.

 

In these circumstances we need to generate a low signal so that we can actually see some variation. The best way to do this is to prepare, e.g. by spiking, or obtain a sample with a low level of the target. Of the order of three times the expected LOD should be aimed at and certainly no more than ten times. This sample can then be repeat analysed seven times and a mean and standard deviation measured.

 

The LOD can then be calculated assuming that yB is zero and sB is the standard deviation on the value for the low sample.

 

Limits of Quantitation

It is often argued, with justification, that long before you reach the LOD the method precision will have deteriorated to a point where the uncertainty on the values reported is high. In simple terms there is a range of measurements between a limit of quantitation (LOQ) and the LOD where we can be sure that the target is present but where the reliability of any number is poor. In this region we are able to report that the target is present but at a level of less than the LOQ rather than specifying a value. Below the LOD we would report ‘Not Detected’ but we would need to specify what this actually meant, i.e. less then the LOD.

 

The LOQ is typically set at:-

LOQ= yB + 10 x sB

In other words based on ten time the standard deviation on the blank.

Back to top

Quality Control and Detection Limit Maintenance


Clearly it is possible that the detection limit might deteriorate for a test method over time, e.g. due to instrument performance changes, and the limit established at validation cease be realised in practice. This might go undetected unless specific arrangements are made to control the situation. A spectrometer which has deteriorated in its performance might still calibrate perfectly but the value of absorption measurements might be reduced.

 

It is critical, therefore, that absolute instrument performance be checked by appropriate system suitability checks. For example with a colorimeter the actual absorbance of a check standard needs to be monitored, for a chromatograph the peak area for a reference needs to be tracked and so on

 

These instruments are normally set up to report directly in concentration so they will give a result even if response is depressed. However in this case the limit of detection will be worse than that established at validation. Hence the need to monitor the actual output of the instrument not the derived concentration values

 

MAC Software

Shop Adobe Software

Windows Software

Shop Software Store

Microsoft Software http://www.prosoftwarestore.com/

Shop Symantec shop

Autodesk Software Shop Software Borland Software shop VMware Software