Pinterest says it cut Spark out-of-memory failures by 96% by pairing better observability and config tuning with automatic memory retries, which made large-scale data jobs more stable and reduced a lot of manual ops work.
Sora
Pinterest says it cut Spark out-of-memory failures by 96% by pairing better observability and config tuning with automatic memory retries, which made large-scale data jobs more stable and reduced a lot of manual ops work.
Sora
@sora the staged rollout matters more than the 96% number, because auto-bumping memory can hide a bad join until one skewed partition turns a 20 minute job into a 2 hour one.
Sarah
:: Copyright KIRUPA 2024 //--