Sizing applications have a traffic pattern that’s difficult to plan for: perfectly quiet during most of the year, then enormous spikes during sale events, new product launches, and seasonal campaigns. A Black Friday traffic spike that’s 20x your normal volume is not unusual in e-commerce. A sizing feature that falls over under load is worse than not having one — it becomes a friction point at the highest-intent moment in the purchase flow.
Here’s how to architect for this.
The problem structure
Body measurement APIs are typically billed per request, with rate limits expressed as requests-per-second or requests-per-day. During a spike event, two failure modes appear:
-
Rate limit exhaustion: You exceed your plan’s per-second or per-minute limit. The API returns 429 Too Many Requests. Your application errors. Users see a broken sizing feature.
-
Downstream latency: Under high concurrent load, API response times increase. If your product page waits synchronously for a sizing recommendation, a slow API means a slow page load. Users leave.
The solution to both is the same: don’t make a live API call for every request.
Layer 1: Aggressive caching
A body measurement prediction for a given height, weight, gender, and region combination is deterministic — the same inputs always produce the same output. This makes caching extremely effective.
For sizing applications, the cache hit rate can be very high because:
- Most users have heights and weights that cluster around common values
- You have a bounded set of (height, weight) pairs that actually appear in practice
import redis
import hashlib
import json
import requests
from functools import wraps
r = redis.Redis(host="localhost", port=6379, db=0)
CACHE_TTL = 86400 * 30 # 30 days: measurement predictions don't expire quickly
def _cache_key(gender: str, height_mm: int, weight_kg: float, region: str, bundle: str) -> str:
key_data = json.dumps({
"g": gender, "h": height_mm, "w": weight_kg, "r": region, "b": bundle
}, sort_keys=True)
return f"bm:{hashlib.sha256(key_data.encode()).hexdigest()[:16]}"
def get_prediction_cached(
gender: str,
height_cm: float,
weight_kg: float,
region: str = "GLOBAL",
bundle: str = "TORSO"
) -> dict:
height_mm = round(height_cm * 10)
cache_key = _cache_key(gender, height_mm, round(weight_kg, 1), region, bundle)
# Check cache first
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss: call the API
response = requests.post(
"https://dimensionspot-bodysize-engine.p.rapidapi.com/v1/predict",
json={
"input_data": {
"input_unit_system": "metric",
"subject": {"gender": gender, "input_origin_region": region},
"anchors": {"body_height": height_mm, "body_mass": weight_kg}
},
"output_settings": {
"calculation": {"target_region": region, "body_build_type": "CIVILIAN"},
"requested_dimensions": {"bundle": bundle},
"output_format": {"include_range_95": True, "confidence_score_threshold": 60}
}
},
headers={
"X-RapidAPI-Key": "YOUR_API_KEY",
"X-RapidAPI-Host": "dimensionspot-bodysize-engine.p.rapidapi.com"
},
timeout=5
)
result = response.json()
# Store in cache
r.setex(cache_key, CACHE_TTL, json.dumps(result))
return result
With this pattern, only the first user with a given set of inputs incurs an actual API call. All subsequent users with the same inputs get the cached result instantly.
Layer 2: Pre-warming the cache
If you know a traffic spike is coming — a sale event, a product launch — pre-warm the cache before it starts. Generate predictions for the most common input combinations and load them into the cache.
import itertools
def prewarm_cache(api_key: str) -> dict[str, int]:
"""
Pre-generate predictions for the most common input combinations.
Call this several hours before a major sale event.
"""
# Common heights and weights in your user base
# Adjust these ranges based on your actual traffic patterns
heights_cm = list(range(155, 200, 2)) # 155–198cm in 2cm steps
weights_kg = list(range(50, 110, 5)) # 50–105kg in 5kg steps
genders = ["male", "female"]
regions = ["GLOBAL", "EUROPE", "ASIA_PACIFIC"]
total = 0
cache_hits = 0
for gender, height_cm, weight_kg, region in itertools.product(
genders, heights_cm, weights_kg, regions
):
height_mm = round(height_cm * 10)
cache_key = _cache_key(gender, height_mm, float(weight_kg), region, "TORSO")
if r.exists(cache_key):
cache_hits += 1
else:
get_prediction_cached(gender, height_cm, float(weight_kg), region, "TORSO")
total += 1
return {"total_combinations": total, "already_cached": cache_hits, "new_calls": total - cache_hits}
For a typical e-commerce user base, pre-warming covers the vast majority of actual traffic — most users fall within a predictable range of heights and weights.
Layer 3: Circuit breaker
If the upstream API becomes slow or unavailable during a spike, a circuit breaker prevents your entire application from waiting on failed requests. After a threshold of failures, the circuit “opens” and requests fall through to a degraded path immediately, without waiting for timeout.
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing, reject immediately
HALF_OPEN = "half_open" # Testing recovery
class CircuitBreaker:
def __init__(self, failure_threshold: int = 5, recovery_timeout: int = 60):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.failure_count = 0
self.last_failure_time: float | None = None
self.state = CircuitState.CLOSED
def call(self, fn, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
else:
raise RuntimeError("Circuit open: API temporarily unavailable")
try:
result = fn(*args, **kwargs)
if self.state == CircuitState.HALF_OPEN:
self.state = CircuitState.CLOSED
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
raise
prediction_circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=60)
Layer 4: Graceful degradation
When the circuit is open, or when a user’s inputs aren’t in the cache and the API is unavailable, your application needs a fallback. Options ranked by quality:
Fallback 1 — Show a size guide. Instead of a specific size recommendation, display the brand’s size chart and ask the user to find their size. Not as good as a recommendation, but honest and functional.
Fallback 2 — Use a simpler heuristic. Map BMI ranges to rough size buckets. This is less accurate than a proper prediction but better than an error page.
def fallback_size_estimate(gender: str, height_cm: float, weight_kg: float) -> dict:
"""
Simple BMI-based size estimate when the prediction API is unavailable.
This is a coarse approximation — communicate the uncertainty clearly.
"""
bmi = weight_kg / (height_cm / 100) ** 2
if gender == "female":
if bmi < 18.5:
return {"size": "XS", "confidence": "LOW", "method": "fallback_heuristic"}
elif bmi < 22:
return {"size": "S", "confidence": "LOW", "method": "fallback_heuristic"}
elif bmi < 26:
return {"size": "M", "confidence": "LOW", "method": "fallback_heuristic"}
elif bmi < 30:
return {"size": "L", "confidence": "LOW", "method": "fallback_heuristic"}
else:
return {"size": "XL", "confidence": "LOW", "method": "fallback_heuristic"}
else:
if bmi < 20:
return {"size": "S", "confidence": "LOW", "method": "fallback_heuristic"}
elif bmi < 24:
return {"size": "M", "confidence": "LOW", "method": "fallback_heuristic"}
elif bmi < 28:
return {"size": "L", "confidence": "LOW", "method": "fallback_heuristic"}
else:
return {"size": "XL", "confidence": "LOW", "method": "fallback_heuristic"}
When returning a fallback result, include method: "fallback_heuristic" in the response so your frontend can display appropriate messaging: “Estimated based on BMI — for a precise recommendation, try again in a moment.”
Fallback 3 — Request asynchronously. Queue the prediction request and notify the user when it’s ready (email, push notification). Works for non-real-time sizing scenarios like custom order workflows.
Monitoring for spikes
Set up monitoring on three metrics:
- Cache hit rate — should be high (>90%) during normal operation. A sudden drop means new input patterns are appearing.
- API call rate — your actual rate against the provider’s limit. Alert at 70% of limit.
- API error rate and latency — 429s and timeout rates. Alert immediately.
import time
def get_prediction_with_monitoring(gender, height_cm, weight_kg, region, bundle):
start = time.time()
cache_key = _cache_key(gender, round(height_cm * 10), round(weight_kg, 1), region, bundle)
cached = r.get(cache_key)
if cached:
# Increment hit counter
r.incr("metrics:cache_hits")
return json.loads(cached)
# Increment miss counter
r.incr("metrics:cache_misses")
try:
result = prediction_circuit.call(
_call_api, gender, height_cm, weight_kg, region, bundle
)
r.incr("metrics:api_success")
r.incr("metrics:api_latency_total", int((time.time() - start) * 1000))
r.setex(cache_key, CACHE_TTL, json.dumps(result))
return result
except RuntimeError:
# Circuit open
r.incr("metrics:circuit_open_rejections")
return fallback_size_estimate(gender, height_cm, weight_kg)
except Exception:
r.incr("metrics:api_errors")
raise
Sizing events: the 72-hour window
For a Black Friday-scale event, the practical approach is:
- 72 hours before: Run cache pre-warming across expected input ranges
- 24 hours before: Verify cache hit rate on staging with simulated traffic
- During event: Monitor cache hit rate, API error rate, circuit state
- After event: Review what inputs caused cache misses, adjust pre-warming ranges for next time
The goal is to make API rate limits irrelevant: if 95%+ of requests are served from cache, your actual API call rate stays far below limits regardless of frontend traffic volume.
Body measurement APIs are stateless by design, which makes caching architecturally clean. The inputs fully determine the output — there’s no user-specific state to invalidate. This property turns a potential scaling bottleneck into something you can pre-compute entirely, leaving the live API path as a fallback rather than the critical path.