Model Accuracy

Validation results for DimensionsPot API v1.4.0 — MAE per dimension, 95% prediction interval coverage, civilian holdout results, and anchor tier impact on precision.

What accuracy means for probabilistic predictions

DimensionsPot returns a statistical estimate for each dimension, not a measurement. Accuracy has two components:

  1. Point accuracy — how close the predicted value is to the true measurement, on average. Expressed as Mean Absolute Error (MAE).
  2. Interval calibration — whether the 95% prediction interval (range_95) actually contains the true measurement 95% of the time. A well-calibrated API is one where a stated 95% PI contains the true value in at least 95% of cases.

Both are validated separately. Point accuracy tells you how far off a typical prediction is. Interval calibration tells you whether the API is honest about its uncertainty.


Validation framework

API v1.4.0 (model adult_ridge_v4.0) was validated before deployment across 7 test tracks totalling 5,200+ API calls against ground-truth measurements from three independent public datasets. No cherry-picking: precision thresholds were set before the runs, and every track is included in the report.

TrackPopulationInputNReference
T1ANSUR II athletic adultsHeight + mass300ANSUR II 2012
T2ANSUR II athletic adults1–3 random anchors200ANSUR II 2012
T3NHANES civilian adultsHeight + mass500NHANES 2001–2018
T4NHANES civilian adults2–3 random anchors300NHANES 2001–2018
T5Pediatric 0–18 yAge-stratified400NHANES pediatric
T6Multi body-buildHeight + mass + circumferences2,100ANSUR II 2012
T7Weak-anchor auditSingle anchor only1,400ANSUR II 2012

Server errors across all 5,200+ calls: 0.


Selected dimension performance — height + mass input

Under the most common production condition (height and mass only, no additional anchors), MAE and 95% PI coverage for selected BONE dimensions (Track 1, n=300):

DimensionMAE95% PI coverage
bimalleolar_breadth2.8 mm84.0%
hand_breadth2.9 mm88.3%
lateral_malleolus_height3.9 mm92.3%
head_breadth4.3 mm91.3%
menton_sellion_length4.8 mm94.3%
wrist_circumference5.4 mm86.7%
head_length5.7 mm89.0%
hand_length6.6 mm89.0%
forearm_length10.0 mm90.7%
ankle_circumference10.8 mm78.7%

FLESH dimensions (circumferences) carry higher MAE by design: soft-tissue volume is not fully determined by height and mass alone. Supplying circumference anchors directly reduces FLESH MAE — see Anchor Strategy for the precision impact of additional inputs.


NHANES civilian holdout (Tracks 3–4)

ANSUR II is a military dataset of athletic adults. To assess generalization to a civilian population, a separate holdout of 500 NHANES subjects (Track 3) and 300 with random anchors (Track 4) was run.

MetricResultTarget
Average MAE across dimensions14.1 mm≤ 16 mm ✓
95% PI coverage77–79%> 75% ✓

Note on NHANES landmark differences: NHANES measures upper_arm_length from a different anatomical landmark than ANSUR II, producing a systematic ~50 mm offset. The engine produces values consistent with ANSUR II / ISO 7250-1 landmark definitions. If your application requires NHANES-convention measurements, apply a fixed 50 mm correction to upper_arm_length outputs.


Effect of circumference anchors — Track 6

When clients supply circumference measurements alongside height and mass (PRIMARY_RICH tier), the engine constrains soft-tissue predictions directly rather than estimating them from body mass.

ConfigurationDimensions improvedNeutralDegraded
Height + mass + hip15355
Height + mass + waist11404
Height + mass + chest15373
Height + mass + neck + wrist7452
Height + mass + all 5 circumferences34170
Height + mass + all 5 (civilian build)34170
Height + mass + all 5 (overweight)29193

With 5 circumference anchors, 34 of 56 tested dimensions improve and zero degrade for athletic and civilian body types. Additional input signal propagates correctly through the prediction pipeline — it does not introduce regression artifacts.


Reproducibility

All results above were generated by tests/precision_validation_v4.py (RANDOM_SEED=42). Three independent runs — local development, local post-cleanup, and production Google Cloud Run EU — produced numerically identical output across all MAE, bias, and coverage metrics (0.0 mm delta). The validation is not environment-specific; it is a property of the API itself.


Limitations

AreaDetail
Training populationANSUR II is US military: athletic adults, narrow BMI distribution. NHANES civilian holdout (n=800) confirms generalization, but predictions for extreme BMI subjects carry higher FLESH error.
MIDDLE_EAST femaleCoefficients derived from global baseline. Dedicated regional survey not available.
AFRICAStandard deviations proxied from ANSUR II with a conservative confidence penalty applied.
INDIA femaleASIA_PACIFIC fallback coefficients.
Torso circumferences without anchorswaist_circumference, hip_circumference, calf_circumference have higher MAE at PRIMARY_BOTH tier — body composition at a given BMI varies substantially across individuals. Supplying one circumference anchor brings these dimensions to PRIMARY_RICH accuracy.