In support of marine monitoring measurement programs, the National Institute of Standards and Technology (NIST), in cooperation with the NOAA National Status and Trends Program (NS&T), and the EPA Environmental Monitoring and Assessment Program (EMAP), has conducted yearly interlaboratory comparison exercises to provide one mechanism for participating laboratories (and monitoring programs) to evaluate their quality and comparability of performance in measuring selected organic contaminates in environmental samples.

NIST efforts focus on providing mechanisms for assessing the interlaboratory and temporal comparability of data, and on improving measurements for the monitoring of organic contaminants such as polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyl congeners (PCBs), and chlorinated pesticides in bivalve, sediment and fish samples. This program includes the development of improved analytical methods, production of needed NIST Standard Reference Materials (SRMs) and other control materials, conduct of semi-annual interlaboratory comparison exercises, and the coordination of workshops to discuss the results of these exercises and to provide a forum for cooperative problem-solving efforts by participants. Current participants represent multi-laboratory monitoring programs as well as a number of individual programs, and include federal, state/municipal, university/college, private sector and international laboratories. In this performance-based program, each participating laboratory uses the methods currently being used by that laboratory for analysis of similar materials for its program customers. The target analytes are listed in Table 1.

Table 1. Analytes of Interest in NIST Intercomparison Exercise Program for Organic Contaminants in the Marine Environment

Chlorinated Pesticides

hexachlorobenzene

2,4'-DDE

alpha-HCH (alpha-BHC)

4,4'-DDE

gamma-HCH (gamma-BHC, Lindane)

2,4'-DDD

heptachlor

4,4'-DDD

heptachlor epoxide

2,4'-DDT

cis-chlordane (alpha-chlordane)

4,4'-DDT

trans-chlordane (gamma-chlordane)

aldrin

oxychlordane

dieldrin

cis-nonachlor

endrin

trans-nonachlor

endosulfan I

mirex

endosulfan II

Polychlorinated Biphenyl Congeners

PCB No.

Compound Name

8

2,4'-dichlorobiphenyl

18

2,2',5-trichlorobiphenyl

28

2,4,4'-trichlorobiphenyl

44

2,2',3,5'-tetrachlorobiphenyl

52

2,2',5,5'-tetrachlorobiphenyl

66

2,3',4,4'-tetrachlorobiphenyl

101

2,2',4,5,5'-pentachlorobiphenyl

105

2,3,3',4,4'-pentachlorobiphenyl

118

2,3',4,4',5-pentachlorobiphenyl

128

2,2',3,3',4,4'-hexachlorobiphenyl

138

2,2',3,4,4',5'-hexachlorobiphenyl

153

2,2',4,4',5,5'-hexachlorobiphenyl

170

2,2',3,3',4,4',5-heptachlorobiphenyl

180

2,2',3,4,4',5,5'-heptachlorobiphenyl

187

2,2',3,4',5,5',6-heptachlorobiphenyl

195

2,2',3,3',4,4',5,6-octachlorobiphenyl

206

2,2',3,3',4,4',5,5',6-nonachlorobiphenyl

209

decachlorobiphenyl

Polycyclic aromatic hydrocarbons (PAH)

naphthalene

fluoranthene

2-methylnaphthalene

pyrene

1-methylnaphthalene

benz[a]anthracene

biphenyl

chrysene

2,6-dimethylnaphthalen

benzofluoranthenes[b+j+k]

acenaphthylene

benzo[e]pyrene

acenaphthene

benzo[a]pyrene

1,6,7-trimethylnaphthalene

perylene

fluorene

indeno[1,2,3-cd]pyrene

phenanthrene

dibenz[a,h]anthracene

anthracene

benzo[ghi]perylene

1-methylphenanthrene

 

Note that the following are typically reported by exercise participants as the sums of the indicated components:

PAH

chrysene + triphenylene
benzo[b]- + benzo[j]- + benzo[k]fluoranthene
dibenz[a,h]anthracene + dibenz[a,c]anthracene

PCB congeners

*PCB 66 + PCB 95
PCB 101 + PCB 90
PCB 138 + PCB 163 + PCB 164
PCB 187 + PCB 182 + PCB 159
PCB 170 + PCB 190

*Note: Because PCB 66 and PCB 95 can now be separated by a significant number of participants, NIST has changed the table for reporting results so that participants may report these as two separate concentrations or as the sum of PCBs 66 and 95.

B&B Laboratories, (the lab affiliate of TDI-Brooks,) participated in the last four NIST intercalibration exercises for trace organics along with over 30 other submitting laboratories (including the NIST laboratory). For each exercise, NIST prepares and sends blind check samples to each lab, each to be analyzed for selected PAH, chlorinated pesticides, and PCB congeners. Each lab submits the results of these determinations to NIST. Laboratories are assigned numerical identification codes as they submit their results, but are otherwise not identified. Included in each NIST post-analysis report are exercise-assigned values along with the standard deviation, the % relative standard deviation, and the calculated 95% confidence interval of the assigned value for each analyte.

For each annual intercomparison exercise, samples of two natural-matrix-based homogeneous materials derived from the marine environment (not fortified with any of the target analytes) are analyzed by the participating laboratories.  Typical materials, such as mussel or fish homogenates and/or wetted marine sediment, typically have target contaminant levels in the 1 to 15,000 ng/g range.  Different materials have been used each year.

The following guidelines were used by the NIST exercise coordinators for the establishment of the exercise "assigned values" for these exercises.  In essence, the laboratory's performance on concurrent reference material analyses was used to determine if that laboratory's results would be included in the calculation of the exercise assigned value for the unknown material for a particular analyte.  The results reported for the unknown materials from laboratories that did not report results for the reference materials were not used in these calculations.  After the exercise assigned values, assigned values standard deviations, and 95% confidence limits were calculated, all reported results for the test materials were evaluated relative to these exercise "assigned values."

For further discussion about the NIST guidelines used in the establishment of “Assigned Values”, click here.  (Put everything below in blue under this link).

Determination of assigned values: For a particular analyte, the performance on the reference material was deemed acceptable for the purpose of this exercise if the laboratory result was within 30% of the confidence interval for analytes listed in the Certificates of Analysis for Certified/Standard Reference Materials. For each analyte of interest not certified in these materials, a "target" concentration and the associated uncertainty were calculated.

Laboratory results within target upper and lower limits, typically 30 to 40%, of these concentrations were deemed acceptable for this exercise. If a laboratory demonstrated acceptable performance on a particular analyte in the reference material, that laboratory's results for that analyte in the corresponding "unknown" exercise material was then used in the calculation of the analyte's exercise assigned value unless it was deemed an "outlier." For evaluation of potential outliers, statistical tests and expert analyst judgement were used after viewing both normal and log plots of the data. This judgement utilized knowledge of potential co-eluters based on the laboratory's reported methods. In instances in which the analyte concentration was below the detection limit of most participating laboratories, no exercise assigned value was calculated. In data sets such as this with a number of laboratories reporting results as "not detected" at various detection limits, there is no consensus as to what "numerical" value should be assigned to these results in the computation of grand means, etc., e.g., "0," ˝ Detection Limit (DL), and the DL value itself have all been used and the choice is influenced by the use of the particular data set.

Determination of laboratory analyte means: The laboratory analyte mean of the replicate (S1, S2, and S3) results was calculated. Non-numerical data were treated as follows: A mean "<value" was used when three "<values" were reported; NA (not analyzed/determined) was used for three reported NAs, etc.; and, if the reported results were of mixed type, e.g., S1 and S2 were numerical values and S3 was reported as "<value", the two similar "types" were used to either determine the mean or to set a non-numerical descriptor.

Numerical indices (z- and p-scores) were used to assess and track laboratory performance (for accuracy and precision, respectively) and provide a mechanism for assessing the comparability of data being produced by the participating laboratories for target analytes. IUPAC guidelines describe the use of z-scores and p-scores for assessment of accuracy and precision in intercomparison exercises such as these.  These indices assess the difference between the result of the laboratory and the exercise assigned value and can be used, with caution, to compare performance on different analytes and on different materials.

For more information about z- and p-scores, click here.  (Put everything in red below under this link).

Accuracy Assessment (z-score)

Where x is the individual laboratory result, X is the "Exercise Assigned Value," and s is the target value for standard deviation.

As described in the IUPAC guidelines, the choice of s is dependent upon data quality objectives of particular program. It can be "fixed" and arrived at by perception, prescription, or reference to validated methodology (e.g., s = 0.125 X, X is the analyte concentration) or it can be an estimate of the actual variation (e.g., the calculated s from the exercise data). The "fixed" performance criterion is more useful in the comparison of a laboratory's performance on different materials while the use of the actual variation may be more useful within a given exercise, for example, if the determination of a particular analyte is more problematic than usual.

NIST has calculated and reported z-scores using both approaches for each analyte for each laboratory. At a previous workshop, it was decided to use "25% of the exercise assigned value" as the fixed target value for standard deviation for this program, at least for a few years. We also calculated z-scores based on "one assigned-value standard deviation." The z-scores calculated for these exercises can thus be interpreted as shown in the following examples:

z-score (25% X):
+1 -> laboratory result is 25% higher than the assigned value
-2 -> laboratory result is 50% lower than the assigned value

z-score (s):
+ 1 -> laboratory result is one "exercise standard deviation" higher than the assigned value
-2 -> laboratory result is "two exercise standard deviations" lower than the assigned value

From a scientific point of view, IUPAC does not recommend the classification of z-scores but allows that it is possible to classify scores, e.g.:

| z | <= 2 Satisfactory
2 < | z | < 3 Questionable
| z | => 3 Unsatisfactory

The NIST report shows the calculated z-scores for each laboratory for each reported analyte. These tables of the results and performance include a summary of the number of reported analytes that fall within each category for each laboratory. Figures show the distribution of z-scores (25%) by analyte.


Precision Assessment (p-score)

Since 1995, laboratories have been requested to process each replicate in a different sample set for precision assessment. For the calculation of p-scores for this program, the current target CV for the three replicates is 15%. Tables show the calculated p-scores for each laboratory for each reported analyte.

Laboratories were assigned numerical identification codes in order of receipt of data with the exception of NIST, which is Lab 1 in these exercises. A laboratory was assigned the same code for each material.  In the NIST reports, the triplicate results, as reported by the laboratories for both the exercise materials and the two reference materials, are reported along with reference values for each of the materials and performance scores (numerical indicators of accuracy (bias) and precision (reproducibility)).

B&B Laboratories was assigned the following identifier numbers for the four NIST intercalibration exercises in which we have participated:

1997 Lab 27 (partial participation)
1998 Lab 31
1999 Lab 16
2000 Lab 9
2002 Lab 27

The NIST exercise coordinators recognized that different programs have different data quality needs. The acceptability of the results submitted by a particular Organics laboratory is decided by the individual program(s) for which the particular laboratory provides data. Typically, the program will use these exercise results in conjunction with the laboratory's performance in the analysis of certified reference materials and/or control materials, and of other quality assurance samples. The exercise results are shown in the NIST report in a number of ways to facilitate their use by these programs in their acceptability assessment. B&B Laboratories has consequently developed an objective method of numerically assessing the relative performance of each participating laboratory. This assessment follows:

For each lab in an annual exercise, we sum the ratings from each of the two z-scores with the p-score. An analyte score rated by NIST as Satisfactory (see discussing of z- and p-scores) was counted as one point. An analyte rated as Questionable or one not reported by the lab was not counted. An analyte rated as Unsatisfactory was counted as one negative point. Using this convention, (1) satisfactory accuracy accounts for 2/3 of the points and precision for 1/3, (2) a lab is neither penalized nor rewarded for not reporting an analyte, and (3) negative overall scores are possible. Summed points are plotted below for each lab, in order of decreasing total. Using the data from the reports issued by NIST, we have calculated and plotted the resulting scores for each participating laboratory. Plots of the performance rankings for the 1998, 1999, and 2000 exercises are shown in Figure 1. In these plots, the Lab Performance Rank numbers along the X-axis represent the relative order of performance, not the Laboratory number assigned by NIST. As these plots show, our lab has consistently performed well in the NIST intercalibration exercises.