The Importance of Considering Reproducibility in the Measurement Process Uncertainty - Intro
Most people in the metrology community will agree that a calibration laboratory's ability to reproduce measurement results belongs in an uncertainty budget. Several Accreditation Bodies require reproducibility to be at least considered as part of a calibration laboratory's Calibration and Measurement Capability (CMC). The question on Reproducibility is, does it only apply to my equipment, or should it be required for the calibration process as well? If the answer is both and it should be with force-measuring devices, we must have a debate on why it is acceptable for labs to have items calibrated where the calibration method does not test for reproducibility. Reproducibility of equipment is part of two very well-recognized force standards ISO 376 Metallic materials — Calibration of force proving instruments used for the verification of uniaxial testing machines and ASTM E74-18 Standard Practices for Calibration and Verification for Force-Measuring Instruments.
The ASTM E74 standard applies a term llf (lower limit factor), which is really a Type A uncertainty calculation that quantifies the reproducibility of the equipment from calculating a pooled standard deviation from a range of 10-11 force points. These deviations are found by applying a series of forces and rotating the instrument by varying degrees such as 0,120, 240, or 0,60,300 in the deadweight machine or calibration frame. If the force-measuring device is susceptible to or the force machine has bending, torsion, or unparallel surfaces, large deviations may occur when the device is rotated.
ASTM E74 and ISO 376 have rotational tests with the goal of capturing the reproducibility of the device when calibrated. This is an excellent first step, but a second step to obtain repeatability and reproducibility of the process with different operators, different machines, and different locations should be needed for the calculation of CMC. This blog will attempt to cite sources from various publications that may help anyone wanting to know to answer the question of what reproducibility is. We will then provide an example of how we feel short-term repeatability and reproducibility can be calculated.
VIM: International vocabulary of metrology
2.24 (3.7, Note 2) reproducibility condition of measurement reproducibility condition of measurement, out of a set of conditions that includes different locations, operators, measuring systems, and replicate measurements on the same or similar objects
NOTE 1 The different measuring systems may use different measurement procedures.
NOTE 2 A specification should give the conditions changed and unchanged to the extent practical.
ASTM E691
3.1.10 reproducibility, n—precision under reproducibility conditions. E177
3.1.11 reproducibility conditions, n—conditions where test results are obtained with the same method on identical test items in different laboratories with different operators using different equipment.
E177 3.1.12 reproducibility limit (R), n—the value below which the absolute difference between two test results obtained under reproducibility conditions may be expected to occur with a probability of approximately 0.95 (95 %).
E177 3.1.13 reproducibility standard deviation (sR), n—the standard deviation of test results obtained under reproducibility conditions.
NASA-HDBK-8739.19-4
Reproducibility The closeness of the agreement between the results of measurements of the value of an attribute carried out under different measurement conditions. The differences may include the principle of measurement, method of measurement, observer, measuring instrument(s), reference standard, location, conditions of use, and time.
Then under error sources lists
• Operator Bias (Reproducibility) - Error due to quasi-persistent bias in operator perception and/or technique.
MSA 4th Edition
Reproducibility This is traditionally referred to as the "between appraisers" variability. Reproducibility is typically defined as the variation in the average of the measurements made by different appraisers using the same measuring instrument when measuring the identical characteristic on the same part. This is often true for manual instruments influenced by the skill of the operator. It is not true, however, for measurement processes (i.e., automated systems) where the operator is not a major source of variation. For this reason, reproducibility is referred to as the average variation between systems or between conditions of measurement.
The ASTM definition goes beyond this to potentially include not only different appraisers but also different: gages, labs, and environments (temperature, humidity) as well as including repeatability in the calculation of reproducibility.
In order to better understand the effect of measurement system error on product decisions, consider the case where all of the variability in multiple readings of a single part is due to the gage repeatability and reproducibility. That is, the measurement process is in statistical control and has zero bias.
Between-appraisers (operators): the average difference between appraisers A, B, C, etc., caused by training, technique, skill, and experience. This is the recommended study for product and process qualification and a manual measuring instrument
Gage R&R is an estimate of the combined variation of repeatability and reproducibility. Stated another way, GRR is the variance equal to the sum of within-system and between-system variances.
Guidelines for Determining Repeatability and Reproducibility Page 41
The Variable Gage Study can be performed using a number of differing techniques. Three acceptable methods will be discussed in detail in this section.
These are:
· Range method
· Average and Range method (including the Control Chart method)
· ANOVA method Except for the Range method, the study data design is very similar for each of these methods.
The ANOVA method is preferred because it measures the operator-to-part interaction gauge error, whereas the Range and the Average and Range methods do not include this variation.
Many of the above definitions and text use different operators, different laboratories, and various equipment. If the lab only has one location, then we can remove different laboratories. Some parameters such as force measurement where one lab rarely has two of the same size machines rely on capturing the reproducibility of the measurement process by comparing operators. The ideal solution is to set up SPC procedures that can obtain long-term reproducibility. Morehouse offers a training course on SPC several times a year. However, using ANOVA and other methods can capture the reproducibility of a process in the short term, which is generally accepted.
ANOVA or Analysis of Variance will test for repeatability as well as reproducibility between operators. Repeatability and Reproducibility between technicians should be performed whenever there is a change in personnel, the first time a budget is established, new equipment is purchased, or whenever there is a change that may alter the measurement process. For example, upgrading a force-measuring system or load cells to ones provided by Morehouse shown below may drastically improve repeatability and reproducibility between operators
The above example uses two technicians recording readings at the same measurement point on the same equipment. Repeatability between technicians can be found by taking the square root of the averages of the variances of the readings from the technicians. Reproducibility between technicians is found by taking the standard deviation of the averages of readings for each technician. The ANOVA analysis in Microsoft Excel is a useful tool that can do the same calculation with a little manipulation. Below is an example of single-factor ANOVA. This is found in the data analysis section of Excel.
The results shown in each of these cases indicate that Reproducibility, in this case, maybe insignificant because F calculated < F critical. The F value is found by dividing two mean squares; it will determine whether the test is statistically significant. A large F value generally means that variation among group means is more than you would expect to see by chance or there is a significant difference between operators. In the example above the P-value, or probability value is 0.664251, which means there is a 66.4251 % chance that the operators will produce the same results. We can use the above ANOVA analysis to obtain reproducibility and repeatability.
Reproducibility is found by taking the square root of the between-groups' mean squared value and dividing that by the square root of the count (number of observed values per Technician 1). Repeatability is found by taking the square root of the mean squared value of the within groups.
Conclusion: This article has presented several definitions and defined a valid method for calculating reproducibility and determining its significance using an F-test. There is a significant issue with the parameter of force and in many cases, torque measurements as the reproducibility of the equipment is often not captured using these methods unless the reference standards are repositioned in machines, often they are not.
Therefore, there may be additional error sources for the reproducibility of the reference standards such as load cells. If the reference load cell is calibrated in accordance with the ASTM E74 or ISO 376 standard, then this issue becomes moot as both standards capture reproducible conditions at the time of calibration. That is unless the end-user alters the calibration by not using the right equation, uses different adapters than what was used for calibration, or makes physical changes to the load cell. If any of these happen, the system should be calibrated again. Those companies not using these calibration standards will have additional error sources that may be very difficult to quantify. This author believes that companies should use legal metrological standards to calibrate their equipment and not rely on 5 to 10-point calibrations often called commercial calibration for their force-measuring devices.
It is recommended that the end-user then test their equipment and the additional error from the interactions of bending, torsion, and uneven surfaces by comparing two force-measuring devices against each other. Both of which should have been calibrated by primary standards (deadweights). Comparing one standard calibrated by deadweights with another standard calibrated by deadweights against one another will show any additional measurement errors in the machine from not being truly plumb, level, square, rigid, and free from torsion. This error is called a dissemination error, and hardly any labs do this. It is a major problem with calibration laboratories making force measurements as these errors can be very large.
Morehouse has ILC rental kits that can be used to satisfy ILC/PT requirements per ISO/IEC 17025 as well as help you calculate repeatability and reproducibility. For more information on these kits, click here.
The Importance of Considering Reproducibility in the Measurement Process Uncertainty - Conclusion
If you have additional questions, please contact us at info@mhforce.com. We are here to help you improve your force and torque measurements.
If you enjoyed this article, check out our LinkedIn and YouTube channel for more helpful posts and videos.
Everything we do, we believe in changing how people think about force and torque calibration. Morehouse believes in thinking differently about force and torque calibration and equipment. We challenge the "just calibrate it" mentality by educating our customers on what matters and what causes significant errors, and focus on reducing them.
Morehouse makes our products simple to use and user-friendly. We also happen to make great force equipment and provide unparalleled calibration services.
Wanna do business with a company that focuses on what matters most? Email us at info@mhforce.com.