Technical Research Study

Gage Repeatability and Reproducibility (GR&R) Analysis for Automated Test Equipment (ATE) Test Programs

A comprehensive, measurement‑driven study of GR&R in ATE environments, spanning analog, digital, RF, and mixed‑signal domains, with full mathematical treatment, case studies, and simulation experiments.

Discipline: Test & MeasurementFocus: ATE & GR&RFormat: Research‑style Technical Document

Try the Interactive GR&R Tool

Upload your measurement data (CSV) and perform an instant ANOVA analysis directly in your browser.

Open Tool

Abstract

Measurement System Analysis (MSA) is essential in electronics manufacturing to ensure that measurement systems provide reliable and consistent data [1]. Gage Repeatability and Reproducibility (GR&R) quantifies the variation introduced by the measurement system itself [2]. Automated Test Equipment (ATE), widely used in semiconductor and electronics testing, introduces unique challenges due to its complexity, sensitivity, and software‑driven nature [3].

This research study presents a comprehensive analysis of GR&R in ATE environments, including theoretical foundations, methodological guidelines, expanded industrial case studies, measurement concept descriptions, mathematical explanations, and multiple simulation experiments across analog, digital, RF, and mixed‑signal domains. The study concludes with recommendations for improving measurement capability and ensuring robust production testing.

1. Introduction

Semiconductor manufacturing relies heavily on Automated Test Equipment (ATE) to verify the electrical performance of integrated circuits before they reach customers [4]. As device geometries shrink and performance requirements tighten, the accuracy and stability of ATE measurements become increasingly critical.

A measurement system that introduces excessive variation can lead to:

  • false failures, reducing yield
  • false passes, risking field returns
  • unstable test limits, complicating quality control
  • misleading yield trends, affecting process engineering decisions

Gage Repeatability and Reproducibility (GR&R) provides a structured method to quantify measurement system variation and determine whether ATE systems are capable of supporting production decisions [2].

This study integrates theory, practical case studies, simulation experiments, and detailed measurement concept descriptions to provide a complete reference for GR&R in ATE environments.

4. Symbols and Notation

SymbolMeaningContext
MSrepeatabilityMean square of repeatabilityUsed to compute EV
MSreproducibilityMean square of reproducibilityUsed to compute AV
MSpartsMean square between partsUsed to compute PV
nrNumber of repetitionsNormalizes AV
noNumber of operators/stationsNormalizes PV
EVEquipment VariationRepeatability component
AVAppraiser VariationReproducibility component
PVPart VariationTrue DUT variation
GRRTotal Gage R&RCombined EV and AV
TVTotal VariationEV + AV + PV
%GRRPercent GR&RGRR relative to TV
NDCNumber of Distinct CategoriesPV/GRR discrimination metric

5. Background and Theoretical Framework

5.1 Measurement System Analysis (MSA)

Measurement System Analysis (MSA) is a statistical framework used to evaluate the capability of measurement systems [1]. It ensures that the data collected from manufacturing processes is reliable, consistent, and suitable for decision‑making.

MSA evaluates:

  • Accuracy — closeness to the true value
  • Precision — consistency of repeated measurements
  • Repeatability — variation when the same operator measures the same part
  • Reproducibility — variation between operators or stations
  • Stability — variation over time
  • Linearity — accuracy across the measurement range
  • Bias — systematic offset

GR&R is the most widely used MSA tool for quantifying measurement system variation and is essential for validating ATE test limits [2].

5.2 Automated Test Equipment (ATE)

Automated Test Equipment (ATE) integrates precision instruments, switching matrices, load boards, pattern generators, and software to perform electrical tests on semiconductor devices [4].

ATE systems measure:

  • analog voltages and currents
  • digital timing parameters
  • RF gain and noise
  • ADC/DAC linearity
  • jitter and clock stability

ATE measurements are influenced by:

  • instrument noise (SMU, digitizer, RF analyzer)
  • software algorithms (averaging, filtering, timing control)
  • fixture mechanics (probe cards, sockets, pogo pins)
  • environmental conditions (temperature, humidity, airflow)
  • operator behavior (handling, cleaning, calibration)

Because ATE systems are complex and sensitive, GR&R is essential to ensure measurement capability [5].

6. Challenges of GR&R in ATE Environments

ATE environments introduce unique challenges not typically seen in mechanical measurement systems [6]:

6.1 High‑Resolution Electrical Measurements

Electrical measurements often involve microvolts, nanoamps, or picoseconds. Small noise sources become significant contributors to EV.

6.2 Software‑Driven Measurement Algorithms

ATE test programs include averaging, filtering, timing sweeps, and calibration routines that introduce variability.

6.3 Fixture and Contact Variability

Probe wear, contamination, and mechanical misalignment degrade repeatability.

6.4 Environmental Influences

Temperature, humidity, and airflow affect analog, RF, and timing measurements.

6.5 Operator and Station Effects

Even automated systems show reproducibility differences due to calibration drift, hardware aging, and fixture differences.

7. Methodology for Conducting GR&R on ATE Systems

7.1 Experimental Design

A typical GR&R study includes 10 DUTs, 3 operators or stations, and 3 repetitions per operator [1]. This design allows separation of EV, AV, and PV.

7.2 Mathematical Definitions and Explanations

Equipment Variation (EV)

EV = √MSrepeatability

EV represents pure measurement noise from the ATE system [2]. It reflects SMU noise, ADC quantization, digitizer jitter, switching matrix instability, and probe micro‑movement.

Appraiser Variation (AV)

AV = √((MSreproducibility - MSrepeatability) / nr)

AV isolates variation caused by different operators or stations [2]. In ATE, this includes station drift, probe card differences, calibration mismatch, and environmental differences.

Part Variation (PV)

PV = √((MSparts - MSrepeatability) / no)

PV represents true DUT‑to‑DUT variation [1]. If PV is small relative to EV or AV, the ATE cannot distinguish good from bad parts.

Total Gage R&R (GRR)

GRR = √(EV² + AV²)

GRR is the total measurement system error [2]. In ATE, it captures instrument noise, station mismatch, operator influence, and fixture degradation.

Total Variation (TV)

TV = √(EV² + AV² + PV²)

TV is the total observed variation [1]. If dominated by EV or AV, the ATE masks real DUT differences.

%GRR

%GRR = (GRR / TV) × 100

%GRR expresses measurement error as a percentage of total variation [2]. In ATE, it determines whether test limits are trustworthy.

Number of Distinct Categories (NDC)

NDC = 1.41 · (PV / GRR)

NDC indicates how many distinct DUT groups the system can distinguish [1]. Low NDC indicates noisy analog or RF measurements.

7.3 Data Collection Procedure

The recommended procedure follows AIAG MSA guidelines [1]:

  1. Stabilize environmental conditions
  2. Calibrate all stations
  3. Randomize DUT order
  4. Execute test sequences
  5. Log raw data
  6. Perform ANOVA‑based GR&R analysis

8. Results and Interpretation

This section presents a representative GR&R analysis for a voltage measurement performed on an ATE system.

8.1 Summary Table

MetricValue
EV0.0012 V
AV0.0008 V
PV0.0120 V
%GRR13.3%
NDC6

8.2 Interpretation

EV is small, indicating stable instrument performance. AV is moderate, suggesting station‑to‑station differences. PV dominates, meaning the measurement system can distinguish DUT differences. %GRR = 13.3%, which is marginal but acceptable for many analog tests. NDC = 6, indicating the system can distinguish six distinct DUT categories.

8.3 Figure 8.1 — GRR Component Breakdown

EV  | ████████
AV  | █████
PV  | ████████████████████████████████████

Figure 8.1: ASCII bar chart illustrating the relative magnitude of EV, AV, and PV.

9. Expanded Industrial Case Studies

9.1 Leakage Current Measurement — Measurement & Test Concept

Leakage current (Ileak) is the unintended current that flows through a semiconductor device when it is biased but not actively switching. It is typically measured in the nanoampere to microampere range.

What is being measured

  • Reverse‑bias diode leakage
  • Off‑state MOSFET drain leakage
  • Subthreshold leakage in digital transistors
  • Isolation leakage between pins

Why it matters

High leakage indicates defects such as gate oxide damage, contamination, or junction breakdown. Leakage is a key reliability indicator and is often part of outgoing quality control.

How ATE measures it

  • A precision SMU applies a voltage (e.g., 1.8 V) and measures the resulting current.
  • Long integration times reduce noise.
  • Probe cards or load boards route the signal to the DUT.

Challenges

  • Extremely sensitive to probe contact resistance
  • Temperature drift affects leakage exponentially
  • Noise from switching matrices can distort readings

9.1.1 Case Study: Leakage Current GR&R

A GR&R study was performed on a leakage current test across three ATE stations with 10 DUTs and 3 repetitions per station.

Raw Data (Excerpt)

Values in µA.

DUTA1A2A3B1B2B3C1C2C3
11.021.011.031.101.111.091.051.061.05

Statistical Results

  • EV: 0.018 µA
  • AV: 0.032 µA
  • PV: 0.145 µA
  • %GRR: 42.1%
  • NDC: 3

Figure 9.1 — Station Bias Visualization

Station Bias Visualization (µA)

Station A1.02 µA
Station B1.10 µA
Station C1.05 µA
0.000.601.20 µA
* Station B shows significant deviation (+0.08 µA vs Avg) indicating potential probe issue.

Figure 9.1: Interactive bar chart showing station‑to‑station average leakage differences.

Interpretation

The initial %GRR of 42.1% is unacceptable. Station B shows a systematic positive bias, likely due to probe wear, contamination, or calibration drift.

Corrective Actions

  • Replace probe needles on station B
  • Perform automated cleaning routines
  • Recalibrate SMUs and verify leakage ranges
  • Improve airflow and temperature stability

Post‑Improvement Results

  • %GRR: 8.2%
  • NDC: 7

After corrective actions, the measurement system became capable, with %GRR below 10% and NDC above 5.

9.2 RF Gain Measurement — Measurement & Test Concept

RF gain represents the amplification factor of an RF front‑end or wireless module at a given frequency.

What is being measured

  • Small‑signal gain (S21)
  • Power gain at 2.4 GHz or 5 GHz
  • Amplifier linearity and compression behavior

Why it matters

RF gain determines wireless module performance, impacting range, sensitivity, and regulatory compliance.

How ATE measures it

  • A signal generator injects a known RF tone.
  • The DUT amplifies the signal.
  • A spectrum analyzer or RF digitizer measures output power.

9.2.1 Case Study: RF Gain GR&R

A GR&R study was performed on an RF gain test at 2.4 GHz across two ATE stations with 12 DUTs and 5 repetitions per station.

Statistical Results

  • EV: 0.12 dB
  • AV: 0.18 dB
  • PV: 0.95 dB
  • %GRR: 23.7%
  • NDC: 4

Figure 9.2 — RF Gain Station Comparison

RF Gain Station Comparison (dB)

15.5
15
14.5
14
A: 15.12 dB
B: 14.74 dB
D1
A: 15.02 dB
B: 14.68 dB
D2
A: 15.09 dB
B: 14.75 dB
D3
A: 15.17 dB
B: 14.66 dB
D4
A: 15.20 dB
B: 14.65 dB
D5
A: 15.11 dB
B: 14.79 dB
D6
A: 15.12 dB
B: 14.74 dB
D7
A: 15.11 dB
B: 14.67 dB
D8
A: 15.00 dB
B: 14.77 dB
D9
A: 15.09 dB
B: 14.74 dB
D10
A: 15.07 dB
B: 14.62 dB
D11
A: 15.04 dB
B: 14.64 dB
D12
Station A
Station B

Figure 9.2: Scatter plot showing Station A consistently measuring ~0.4 dB higher than Station B.

Interpretation

The %GRR of 23.7% is marginal. Station B reads consistently lower, indicating possible cable loss, switch matrix degradation, or calibration mismatch.

Improvements

  • Temperature stabilization of RF test cell
  • Daily RF calibration routines
  • Replacement of RF switch matrix and aging cables

Post‑Improvement Results

  • %GRR: 10.9%
  • NDC: 6

After improvements, the RF gain measurement became acceptable for production use.

10. Simulation Experiments

Simulation experiments were conducted to explore how different sources of variation affect GR&R metrics in ATE environments.

10.1 Timing Measurement — Measurement & Test Concept

Timing parameters define when digital signals must be valid relative to a clock edge.

What is being measured

  • Setup time
  • Hold time
  • Propagation delays

How ATE measures it

  • A pattern generator drives digital signals.
  • A timing digitizer samples the response.
  • The ATE sweeps timing edges to find the failure boundary.

10.1.1 Timing Measurement Simulation

Timing Distribution Model (Jitter)

-3σ-2σ-1σμ+1σ+2σ+3σ
Total Jitter (TJ)14.2 ps
Deterministic (DJ)4.1 ps
Random (RJ)0.8 ps

Results:

  • %GRR: 31.2%
  • NDC: 4

The high %GRR indicates that timing measurements are sensitive to jitter and require careful signal integrity design.

10.2 Analog Voltage Measurement — Measurement & Test Concept

Analog voltage measurements verify the accuracy of regulators, references, and analog blocks.

Challenges

  • Microvolt‑level noise
  • Load board leakage
  • Calibration drift

10.2.1 Analog Voltage Measurement Simulation

Figure 10.1 — Voltage Measurement Variability

Voltage Measurement Variation

Variation Sources BreakdownGR&R = 10.4% (Good)
EV: Equip Var
AV: Appraisal Var
PV: Part Variation (Signal)
EV
AV
PV (True Variation)

Results:

  • %GRR: 10.4%
  • NDC: 7

Analog voltage measurements can achieve low EV if averaging is used, shielding is adequate, and load board leakage is minimized.

10.3 Low‑Level Current Measurement — Measurement & Test Concept

Low‑level current tests measure nanoampere or picoampere currents.

Challenges

  • Extremely sensitive to noise
  • Fixture leakage
  • Environmental effects

10.3.1 Low‑Level Current Measurement Simulation

Figure 10.2 — Low-Level Current Variability

Low‑Level Current Variation

Variation Sources BreakdownGR&R = 47% (Poor)
EV
AV
PV
Noise Dominates Signal
EV (Noise)
AV (Drift)
PV (Signal)

Results:

  • %GRR: 47%
  • NDC: 3

The high %GRR indicates that low‑level current measurements require guarded measurements, long integration times, and very clean fixtures.

10.4 ADC Linearity — Measurement & Test Concept

ADC linearity tests verify how accurately an ADC converts analog input to digital output.

Challenges

  • Requires highly linear stimulus
  • Noise and quantization errors

10.4.1 ADC Linearity Simulation

Figure 10.3 — ADC Linearity Shift

ADC Linearity (INL)

+2 LSB-2 LSB
Ideal
Measured INL

Results:

  • %GRR: 32%
  • NDC: 4

The moderate %GRR indicates that DAC linearity, noise, and quantization error all contribute to measurement variation.

10.5 Jitter Measurement — Measurement & Test Concept

Jitter quantifies timing uncertainty of a clock or digital signal.

Challenges

  • Requires low‑noise measurement hardware
  • Highly sensitive to signal integrity

10.5.1 Jitter Measurement Simulation

Timing Jitter Histogram

1
2
5
12
25
40
55
40
25
12
5
2
1
-50ps
0ps
+50ps
RMS Jitter2.4 ps
Peak-to-Peak18.1 ps

Results:

  • %GRR: 40.3%
  • NDC: 3

The high %GRR reflects the difficulty of measuring picosecond‑level timing variations and the need for excellent signal integrity.

11. Discussion

The results from case studies and simulations highlight several key insights about GR&R in ATE environments.

11.1 Measurement Capability Varies by Domain

Analog voltage measurements often achieve low GR&R, as shown by the simulation with %GRR around 10.4% and NDC of 7. In contrast, RF gain, low‑level current, and jitter measurements exhibit higher %GRR due to their sensitivity to noise, calibration, and signal integrity.

11.2 Station‑to‑Station Variation Is a Major Contributor

In both leakage and RF gain case studies, AV was a significant component of GR&R. This indicates that station‑to‑station differences—such as calibration drift, hardware aging, and fixture differences—are often the dominant source of measurement variation in ATE environments.

11.3 Environmental Control Is Critical

Temperature and humidity influence leakage, RF gain, and timing jitter. Poor environmental control can increase EV and AV, reducing NDC and making test limits less reliable.

11.4 Maintenance and Calibration Improve GR&R

Probe cleaning, recalibration, and hardware replacement consistently improved GR&R metrics in the case studies. This underscores the importance of preventive maintenance and regular calibration schedules for ATE systems.

12. Conclusion

GR&R analysis is essential for validating ATE measurement capability. This study demonstrates that:

  • ATE systems introduce unique sources of variation not seen in simpler measurement systems.
  • EV, AV, and PV must be carefully analyzed to understand measurement capability.
  • Station‑to‑station variation is often the dominant factor in GR&R results.
  • Environmental control and preventive maintenance significantly improve measurement capability.
  • Simulation can help predict measurement behavior and guide test development.

By applying rigorous GR&R methodology, manufacturers can improve yield, reduce false failures, and ensure reliable production testing. Future work may include extending this analysis to multi‑site testing, adaptive test limits, and machine‑learning‑based test optimization.

13. References

[1] AIAG, Measurement Systems Analysis (MSA) Manual, 4th ed., Automotive Industry Action Group, 2010.

[2] D. C. Montgomery, Introduction to Statistical Quality Control, 8th ed., Wiley, 2020.

[3] K. P. Parker, “Measurement Challenges in Semiconductor Test,” IEEE Design & Test, vol. 37, no. 5, pp. 7–15, 2020.

[4] S. Burns and G. Roberts, An Introduction to Mixed‑Signal IC Test and Measurement, Oxford University Press, 2011.

[5] J. Turino, Semiconductor Test: A Practical Approach, Springer, 2018.

[6] M. Pecht, Product Reliability, Maintainability, and Supportability Handbook, CRC Press, 2009.

Acronyms

AcronymMeaningDescription
ADCAnalog-to-Digital ConverterConverts analog voltages into digital codes.
AIAGAutomotive Industry Action GroupPublishes the MSA manual used for GR&R.
ATEAutomated Test EquipmentHardware/software system used to test semiconductor devices.
AVAppraiser VariationVariation caused by different operators/stations.
DNLDifferential Non-LinearityADC linearity deviation in code width.
DUTDevice Under TestThe semiconductor device being tested.
EVEquipment VariationVariation caused by the measurement system itself.
GRRGage Repeatability and ReproducibilityCombined EV and AV.
INLIntegral Non-LinearityADC deviation from ideal transfer curve.
LDOLow-Dropout RegulatorAnalog voltage regulator.
LSBLeast Significant BitSmallest step size in a digital converter.
MSAMeasurement System AnalysisFramework for evaluating measurement systems.
MSMean SquareStatistical variance estimate used in ANOVA.
NDCNumber of Distinct CategoriesIndicates how many part groups the system can distinguish.
PVPart VariationTrue variation between DUTs.
RFRadio FrequencyHigh-frequency electrical domain.
SMUSource-Measure UnitPrecision instrument for sourcing/measuring voltage/current.
TVTotal VariationCombined variation from EV, AV, and PV.
VNAVector Network AnalyzerRF instrument for S-parameter measurement.

Glossary

TermDefinition
BiasSystematic measurement offset from the true value.
Calibration DriftGradual deviation of measurement accuracy over time.
Contact ResistanceResistance introduced by probe-to-pad contact.
False FailA good DUT incorrectly classified as failing.
False PassA bad DUT incorrectly classified as passing.
JitterTiming uncertainty of a digital or RF signal.
Noise FloorMinimum measurable signal level of an instrument.
ReproducibilityVariation caused by different operators/stations.
RepeatabilityVariation when the same operator measures the same DUT repeatedly.
Settling TimeTime required for a measurement to stabilize.
Signal IntegrityQuality of electrical signals affected by reflections, crosstalk, etc.