Model Accuracy | DimensionsPot

Validation results for DimensionsPot API v1.4.0 — MAE per dimension, 95% prediction interval coverage, civilian holdout results, and anchor tier impact on precision.

What accuracy means for probabilistic predictions

DimensionsPot returns a statistical estimate for each dimension, not a measurement. Accuracy has two components:

Point accuracy — how close the predicted value is to the true measurement, on average. Expressed as Mean Absolute Error (MAE).
Interval calibration — whether the 95% prediction interval (range_95) actually contains the true measurement 95% of the time. A well-calibrated API is one where a stated 95% PI contains the true value in at least 95% of cases.

Both are validated separately. Point accuracy tells you how far off a typical prediction is. Interval calibration tells you whether the API is honest about its uncertainty.

Validation framework

API v1.4.0 (model adult_ridge_v4.0) was validated before deployment across 7 test tracks totalling 5,200+ API calls against ground-truth measurements from three independent public datasets. No cherry-picking: precision thresholds were set before the runs, and every track is included in the report.

Track	Population	Input	N	Reference
T1	ANSUR II athletic adults	Height + mass	300	ANSUR II 2012
T2	ANSUR II athletic adults	1–3 random anchors	200	ANSUR II 2012
T3	NHANES civilian adults	Height + mass	500	NHANES 2001–2018
T4	NHANES civilian adults	2–3 random anchors	300	NHANES 2001–2018
T5	Pediatric 0–18 y	Age-stratified	400	NHANES pediatric
T6	Multi body-build	Height + mass + circumferences	2,100	ANSUR II 2012
T7	Weak-anchor audit	Single anchor only	1,400	ANSUR II 2012

Server errors across all 5,200+ calls: 0.

Selected dimension performance — height + mass input

Under the most common production condition (height and mass only, no additional anchors), MAE and 95% PI coverage for selected BONE dimensions (Track 1, n=300):

Dimension	MAE	95% PI coverage
`bimalleolar_breadth`	2.8 mm	84.0%
`hand_breadth`	2.9 mm	88.3%
`lateral_malleolus_height`	3.9 mm	92.3%
`head_breadth`	4.3 mm	91.3%
`menton_sellion_length`	4.8 mm	94.3%
`wrist_circumference`	5.4 mm	86.7%
`head_length`	5.7 mm	89.0%
`hand_length`	6.6 mm	89.0%
`forearm_length`	10.0 mm	90.7%
`ankle_circumference`	10.8 mm	78.7%

FLESH dimensions (circumferences) carry higher MAE by design: soft-tissue volume is not fully determined by height and mass alone. Supplying circumference anchors directly reduces FLESH MAE — see Anchor Strategy for the precision impact of additional inputs.

NHANES civilian holdout (Tracks 3–4)

ANSUR II is a military dataset of athletic adults. To assess generalization to a civilian population, a separate holdout of 500 NHANES subjects (Track 3) and 300 with random anchors (Track 4) was run.

Metric	Result	Target
Average MAE across dimensions	14.1 mm	≤ 16 mm ✓
95% PI coverage	77–79%	> 75% ✓

Note on NHANES landmark differences: NHANES measures upper_arm_length from a different anatomical landmark than ANSUR II, producing a systematic ~50 mm offset. The engine produces values consistent with ANSUR II / ISO 7250-1 landmark definitions. If your application requires NHANES-convention measurements, apply a fixed 50 mm correction to upper_arm_length outputs.

Effect of circumference anchors — Track 6

When clients supply circumference measurements alongside height and mass (PRIMARY_RICH tier), the engine constrains soft-tissue predictions directly rather than estimating them from body mass.

Configuration	Dimensions improved	Neutral	Degraded
Height + mass + hip	15	35	5
Height + mass + waist	11	40	4
Height + mass + chest	15	37	3
Height + mass + neck + wrist	7	45	2
Height + mass + all 5 circumferences	34	17	0
Height + mass + all 5 (civilian build)	34	17	0
Height + mass + all 5 (overweight)	29	19	3

With 5 circumference anchors, 34 of 56 tested dimensions improve and zero degrade for athletic and civilian body types. Additional input signal propagates correctly through the prediction pipeline — it does not introduce regression artifacts.

Reproducibility

All results above were generated by tests/precision_validation_v4.py (RANDOM_SEED=42). Three independent runs — local development, local post-cleanup, and production Google Cloud Run EU — produced numerically identical output across all MAE, bias, and coverage metrics (0.0 mm delta). The validation is not environment-specific; it is a property of the API itself.

Limitations

Area	Detail
Training population	ANSUR II is US military: athletic adults, narrow BMI distribution. NHANES civilian holdout (n=800) confirms generalization, but predictions for extreme BMI subjects carry higher FLESH error.
MIDDLE_EAST female	Coefficients derived from global baseline. Dedicated regional survey not available.
AFRICA	Standard deviations proxied from ANSUR II with a conservative confidence penalty applied.
INDIA female	ASIA_PACIFIC fallback coefficients.
Torso circumferences without anchors	`waist_circumference`, `hip_circumference`, `calf_circumference` have higher MAE at `PRIMARY_BOTH` tier — body composition at a given BMI varies substantially across individuals. Supplying one circumference anchor brings these dimensions to `PRIMARY_RICH` accuracy.