How to Store Body Measurements: Database Schema Design

When you start integrating body measurement predictions into an application, one of the first design questions is: what exactly goes in the database?

The instinct is to save everything — input parameters plus the full API response. That instinct is wrong, and understanding why leads to a cleaner, more private, and more maintainable schema.

What not to store

The full API response is 130+ predicted body dimensions. Storing them is tempting because it avoids re-fetching. But consider: if you improve your model or your users’ inputs change, cached predictions go stale and you’ll query them confidently without knowing they’re wrong.

More importantly: every dimension you store is data you’re responsible for under GDPR, CCPA, and increasingly HIPAA-adjacent regulations. A database of predicted waist circumferences and chest sizes is biometric profile data — even if it was never directly measured.

The better pattern is input persistence + on-demand prediction. Store the inputs; compute predictions when needed.

Core schema

Here’s a PostgreSQL schema that separates concerns correctly:

-- Users (or anonymous sessions)
CREATE TABLE measurement_subjects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    external_user_id TEXT,           -- your app's user ID (nullable for anonymous)
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

-- Anthropometric input profiles
CREATE TABLE measurement_profiles (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    subject_id UUID NOT NULL REFERENCES measurement_subjects(id) ON DELETE CASCADE,
    
    -- Anthropometric inputs
    gender TEXT NOT NULL CHECK (gender IN ('male', 'female')),
    age_category TEXT NOT NULL DEFAULT 'ADULT'
        CHECK (age_category IN ('ADULT', 'INFANT', 'TODDLER', 'CHILD', 'PRE_TEEN', 'TEEN')),
    input_origin_region TEXT NOT NULL DEFAULT 'GLOBAL'
        CHECK (input_origin_region IN ('GLOBAL', 'EUROPE', 'ASIA_PACIFIC', 'AFRICA', 'LATAM', 'INDIA', 'MIDDLE_EAST')),
    body_height_mm INTEGER NOT NULL CHECK (body_height_mm BETWEEN 400 AND 2600),
    body_mass_kg NUMERIC(5,1) NOT NULL CHECK (body_mass_kg BETWEEN 2.0 AND 300.0),
    
    -- Optional refinement anchors
    chest_circumference_mm INTEGER CHECK (chest_circumference_mm BETWEEN 300 AND 1800),
    waist_circumference_mm INTEGER CHECK (waist_circumference_mm BETWEEN 250 AND 2500),
    hip_circumference_mm INTEGER CHECK (hip_circumference_mm BETWEEN 400 AND 2000),
    
    -- Profile lifecycle
    is_primary BOOLEAN NOT NULL DEFAULT true,
    source TEXT DEFAULT 'user_input',  -- 'user_input', 'imported', 'estimated'
    created_at TIMESTAMPTZ DEFAULT now(),
    
    -- At most one primary profile per subject
    CONSTRAINT one_primary_per_subject 
        EXCLUDE USING btree (subject_id WITH =) WHERE (is_primary)
);

-- Sizing decisions (what you actually need long-term)
CREATE TABLE size_assignments (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    subject_id UUID NOT NULL REFERENCES measurement_subjects(id),
    profile_id UUID NOT NULL REFERENCES measurement_profiles(id),
    
    product_category TEXT NOT NULL,  -- 'shirt', 'pants', 'helmet', etc.
    brand_code TEXT,                  -- 'BRAND_X'
    size_label TEXT NOT NULL,         -- 'M', 'L', '32x34', etc.
    size_confidence TEXT,             -- 'HIGH', 'MEDIUM', 'LOW'
    
    -- Capture the predicted dimensions used for this decision
    -- Store only the ones relevant to the sizing decision, not all 130
    chest_circumference_predicted_mm INTEGER,
    waist_circumference_predicted_mm INTEGER,
    hip_circumference_predicted_mm INTEGER,
    
    created_at TIMESTAMPTZ DEFAULT now(),
    order_id TEXT  -- link to the actual order if relevant
);

Handling measurement history

People’s bodies change. A measurement profile from two years ago may predict the wrong size today. Your schema needs to support multiple profiles per user while tracking which is current.

The is_primary flag with a partial unique index enforces one primary profile per subject. When a user updates their measurements, demote the old profile and create a new primary:

-- Demote existing primary
UPDATE measurement_profiles 
SET is_primary = false 
WHERE subject_id = $1 AND is_primary = true;

-- Insert new primary
INSERT INTO measurement_profiles (subject_id, gender, age_category, ...)
VALUES ($1, $2, $3, ...)
RETURNING id;

This keeps the full history (useful for understanding whether returns correlate with stale profiles) while making the current profile unambiguous.

Separating PII from measurement data

In regulated environments, separate the profile from the identity:

-- PII: stored in a high-compliance zone, encrypted at rest
CREATE TABLE user_identities (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email_hash TEXT UNIQUE NOT NULL,  -- SHA-256 of email, for lookup
    encrypted_pii BYTEA,              -- AES-256 encrypted name/email/etc.
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Measurements: no PII, linked only by opaque ID
CREATE TABLE measurement_subjects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    identity_id UUID REFERENCES user_identities(id) ON DELETE SET NULL,
    -- identity_id can be NULL for anonymous measurement sessions
    ...
);

If the identity table is subpoenaed, deleted, or anonymized, the measurement records can exist without any link to a real person.

Should you cache API responses?

For specific high-frequency use cases — like displaying a size recommendation on every product page view — re-calling the prediction API on each request is unnecessary. Cache the response with a TTL tied to how long a measurement profile is valid.

import hashlib
import json
import redis

PREDICTION_TTL_SECONDS = 86400 * 7  # 1 week

def get_cached_prediction(profile: dict, dimensions: list[str]) -> dict | None:
    cache_key = _make_cache_key(profile, dimensions)
    r = redis.Redis()
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)
    return None

def cache_prediction(profile: dict, dimensions: list[str], result: dict) -> None:
    cache_key = _make_cache_key(profile, dimensions)
    r = redis.Redis()
    r.setex(cache_key, PREDICTION_TTL_SECONDS, json.dumps(result))

def _make_cache_key(profile: dict, dimensions: list[str]) -> str:
    key_data = {
        "gender": profile["gender"],
        "region": profile["region"],
        "height_mm": profile["height_mm"],
        "mass_kg": profile["mass_kg"],
        "dims": sorted(dimensions)
    }
    key_str = json.dumps(key_data, sort_keys=True)
    return f"predict:{hashlib.sha256(key_str.encode()).hexdigest()}"

Invalidate the cache when the user updates their profile. Never cache longer than you’d trust the underlying measurements to be accurate.

What to store from a size assignment

When a user selects a size (or you recommend one), the size_assignments table is where you preserve the outcome. Include:

The predicted dimension values used for the decision (not all 130 — just the ones that drove the recommendation)
The confidence level (derived from the API’s confidence_score or your range_95 overlap calculation)
A link to the order or cart event

This data is what eventually lets you close the feedback loop: correlate size assignments with returns to measure recommendation accuracy in your specific context.

Schema for multi-tenant applications

If you’re building a platform (a sizing SDK for multiple brands), add a tenant layer:

CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    slug TEXT UNIQUE NOT NULL,
    api_key_hash TEXT NOT NULL  -- store hash of the key you issued to them
);

ALTER TABLE measurement_subjects ADD COLUMN tenant_id UUID REFERENCES tenants(id);
ALTER TABLE size_assignments ADD COLUMN tenant_id UUID REFERENCES tenants(id);

-- Row-level security
ALTER TABLE measurement_subjects ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON measurement_subjects
    USING (tenant_id = current_setting('app.tenant_id')::UUID);

PostgreSQL row-level security ensures that even application bugs can’t leak one tenant’s measurement data to another.

The schema design principle is: store inputs and outcomes, compute predictions on demand, and separate PII from anthropometric data at the architectural level. This keeps your schema lean, your data fresh, and your compliance posture defensible.