Telemetry Analysis
Discover how our large-scale telemetry clustering identified hardware-software correlations and rescued a device launch.
CHALLENGE
The launch of next‑generation devices stalled because kernel crashes repeatedly halted development. Hardware and software teams pointed fingers, and production was on hold. Engineers couldn’t agree whether hardware or software caused the failures. Data access was limited, and only a few telemetry parameters were monitored.
SOLUTION
We unified stakeholders across device engineering, kernel development and data engineering. After obtaining additional telemetry, we built an ETL pipeline and used an unsupervised machine learning algorithm (k‑modes clustering) on more than 300 categorical parameters to find correlations. The pipeline initially ran locally but was scaled to Azure Databricks.
Impact
The analysis revealed that most errors were tied to a specific memory and processor configuration. The unsupervised ML solution gave the quality‑engineering team a clear path forward, allowing the device release to proceed.
