Data Drift for Condition Based Monitoring
In our previous blog series on data analysis and ML development for sensor data, we discussed data preprocessing and how vibration data can be effectively used for condition monitoring (see Preprocessing Vibration Data for Machine Learning, & Detecting Anomalies in Belt-Driven Systems with Machine Learning).
In this post, we focus on how changes in acceleration distribution can reveal shifts in machine behavior. To explore this, we use the same small test rig from the previous blogs to simulate different operating conditions. For transparency, we collect all data using our own enDAQ data logger, which provides high‑quality, high‑sample‑rate acceleration measurements.
We then use this data to track data drift over time and detect changes in machine condition. This approach can help identify early signs of misalignment, wear, looseness, or other abnormal situations — ultimately preventing progressive damage, major failures, and costly downtime.
This blog includes:
Introduction to Data Drift
In industrial condition monitoring, we often assume that a machine operating under “normal” conditions will continue to produce similar data over time. In reality, this assumption rarely holds. Machines age, loads change, components wear, and operating environments evolve. As a result, the statistical properties of the data collected from sensors, such as vibration, temperature, or pressure, gradually or abruptly change. This phenomenon is known as data drift.
Data drift refers to a change in the underlying distribution of measured data over time. Even when a machine appears to be functioning normally, its sensor signals may slowly shift due to mechanical, structural, or operational changes. These shifts are not necessarily faults, but they often indicate that the system is moving away from its original baseline state.
In the context of condition monitoring, data drift is not a problem; it is a signal. Detecting and quantifying data drift allows engineers to track changes in machine behavior long before traditional alarm thresholds are crossed.
- Why Data Drift Matters in Condition Monitoring
Traditional condition monitoring approaches typically rely on fixed thresholds applied to scalar features such as RMS, peak amplitude, or kurtosis. While effective for detecting severe faults, these methods are often insensitive to gradual changes. Data drift, on the other hand, captures subtle distributional changes in sensor signals that may correspond to early-stage degradation, loss of stiffness, misalignment, or changes in load transmission.
By monitoring data drift:
- Early degradation can be detected before catastrophic failure
- Machine behavior can be tracked continuously, rather than classified as simply “healthy” or “faulty”
- Condition monitoring systems can adapt to changes in operating conditions
In essence, data drift provides a continuous measure of deviation from normal behavior, rather than a binary decision.
- Data Drift vs. Faults
It is important to distinguish data drift from faults. A fault is typically associated with a discrete mechanical failure, such as a cracked gear tooth or a damaged bearing. Data drift, however, describes a gradual or progressive change in data distributions that may or may not lead to a fault.
For example:
- Increasing belt slack may cause vibration distributions to widen
- Structural loosening may shift dominant vibration amplitudes
- Load redistribution may change how energy is transmitted through the machine
These effects manifest as data drift well before they become faults.
- A Distribution-Based View of Machine Health
Instead of monitoring individual signal values, data-drift-based condition monitoring focuses on entire signal distributions. By comparing the probability distributions of sensor data collected at different times, engineers can quantify how much the machine behavior has changed relative to a reference state.
Metrics such as:
- Jensen–Shannon divergence,
- Hellinger distance, and
- Wasserstein (Earth Mover’s) distance
provide mathematically rigorous ways to measure these changes. When applied to vibration signals, these metrics offer a powerful framework for detecting early degradation in a way that is robust to noise, sensor orientation, and machine-specific characteristics.
In the sections that follow, we will explore how data drift can be quantified using distribution-based metrics, how these metrics relate to physical changes in machines, and how they can be applied to real-world condition monitoring problems without relying on machine-specific thresholds.
Test design and data acquisition
A representative test rig was designed, consisting of two pulleys: a drive pulley and a driven pulley. The driven pulley is adjustable and can be moved toward the drive pulley to intentionally loosen the belt, while the drive pulley remains fixed. The test rig is shown in Figure 1.

Figure 1. The designed test rig which includes adjustable driven pulley and fixed drive pulley.
In this blog, a sampling rate of 4,000 Hz was used. The data logger was mounted on the rigid frame of the test rig and Z direction is normal to the ground, as shown in Figure 2. For each test, the rig was first powered on, after which the data logger was activated to record acceleration data for 60 seconds. Once data acquisition was complete, the data logger was switched off, followed by the shutdown of the test rig.

Figure 2. Data logger mounted on the rigid base.
The vibration data were collected under three operating conditions: healthy, where the belt tightness is normall (Figure 3 a); relatively loose, where the belt tightness was reduced by approximately 75% (Figure 3 b); and too loose, where the belt tightness was significantly (50%) reduced (Figure 3 c). The relatively loose condition is more representative of real-world operation and is therefore the primary focus for evaluating whether data drift can be detected under realistic fault conditions. The too loose condition is less likely to occur in practice but is included to illustrate how data drift manifests under severe belt loosening.
|
|
![]() |
|
|
(a) |
(b) |
(c) |
Figure 3. Collecting vibration data for (a) healthy, (b) 75% loose, and (c) 50% loose conditions.
import pandas as pd
import plotly.graph_objects as go
import endaq
from plotly.subplots import make_subplots
import plotly.express as px
# Updated: each label maps to a list of file paths
datasets = {
"Healthy": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000603_Healthy.ide',
# Add more Healthy files if needed
],
"75% Loose": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000606_75Loose.ide'
],
"50% Loose": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000604_50Loose.ide'
],
# "Detension": [
# r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000605_detension.ide'
# ],
}
# Column names to plot
columns_plot = ["X (40g)", "Y (40g)", "Z (40g)"]
# Assign consistent colors to each class label
color_palette = px.colors.qualitative.Set1
label_list = list(datasets.keys())
label_colors = {label: color_palette[i % len(color_palette)] for i, label in enumerate(label_list)}
# Plotting function
def plot_axes_combined(datasets, columns_plot, label_colors):
fig = make_subplots(
rows=3,
cols=1,
shared_xaxes=True,
vertical_spacing=0.05,
# subplot_titles=columns_plot
)
# For each label and list of file paths
for label, file_paths in datasets.items():
for idx, path in enumerate(file_paths):
try:
# Load and preprocess
doc = endaq.ide.get_doc(path)
accel = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
accel = accel - accel.median() # Remove DC offset
for i, axis in enumerate(columns_plot):
if axis in accel.columns:
fig.add_trace(
go.Scatter(
x=accel.index,
y=accel[axis],
mode='lines',
name=label if idx == 0 and i == 0 else None, # show legend only once per label
legendgroup=label,
line=dict(color=label_colors[label]),
showlegend=(idx == 0 and i == 0)
),
row=i + 1, col=1
)
else:
print(f"[{label}] Column '{axis}' not found in {path}")
except Exception as e:
print(f"Error loading file {path}: {e}")
# Add axis labels
for i, axis in enumerate(columns_plot):
fig.update_yaxes(title_text=axis, row=i+1, col=1)
fig.update_xaxes(title_text="Time (seconds)", row=3, col=1)
# fig.update_layout(
# height=500,
# title_text="Acceleration Signals from All Datasets (X, Y, Z)",
# showlegend=True
# )
fig.update_layout(
height=400,
width=1100,
title_text="Acceleration Signals from All Datasets (X, Y, Z)",
showlegend=True,
legend=dict(
orientation="h", # Horizontal legend
yanchor="bottom",
y=1.02, # Slightly above the top of the plot
xanchor="center",
x=0.5, # Centered horizontally
traceorder="grouped",
title_text='', # Optional: remove legend title
itemsizing='constant',
valign='top'
),
legend_tracegroupgap=0, # No space between grouped traces
margin=dict(t=100) # Extra top margin to fit the legend
)
fig.show()
# Run the plot function
plot_axes_combined(datasets, columns_plot, label_colors)

Figure 4. Acceleration timeseries for X, Y, and Z directions.
The healthy condition shows the widest spread, indicating higher vibration energy. As belt gets looser, the vibration amplitude visibly reduces across all axes. Therefore it is expected the distribution becomes narrower as the belt gets looser. Vibration distribution and KDE are provided and discussed in next sections.
Acceleration Distribution
General data preprocessing steps are described in detail in the blog post “Preprocessing Vibration Data for Machine Learning” (link). In this section, we focus on visualizing and analyzing the statistical distributions of acceleration signals measured along the X, Y, and Z axes.
Rather than examining raw time-domain signals, we analyze their distributions, which provide a compact statistical description of how acceleration values are spread over time. Distribution-based analysis is particularly useful for detecting subtle changes in machine behavior that may not be apparent in individual time traces.
A histogram is a graphical representation of the distribution of a dataset. It shows how frequently different ranges (or bins) of values occur. In the context of vibration analysis, a histogram describes how often specific acceleration amplitudes appear during operation.
Changes in a dynamic system condition can alter the shape, width, or symmetry of these histograms, even when traditional metrics such as RMS remain relatively unchanged.
The script below extracts acceleration data in the X, Y, and Z directions and plots their corresponding histograms (Figure 5). In addition to the individual axes, a resultant acceleration is computed to provide a direction-independent representation of vibration energy:
![]()
By including the resultant acceleration, we reduce sensitivity to sensor orientation and mounting direction, which is particularly important when comparing measurements across different machines or sensor locations.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import endaq
# =====================================================
# Dataset dictionary
# =====================================================
datasets = {
"Healthy": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000603_Healthy.ide',
# Add more Healthy files if needed
],
"75% Loose": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000606_75Loose.ide'
],
"50% Loose": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000604_50Loose.ide'
],
# "Detension": [
# r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000605_detension.ide'
# ],
}
# =====================================================
# Parameters
# =====================================================
columns = ["X (40g)", "Y (40g)", "Z (40g)", "Resultant"]
requested_start = 5
requested_end = 55
# Colors
label_colors = {
"Healthy": "#d62728",
"75% Loose": "#1f77b4",
"50% Loose": "#2ca02c",
}
# =====================================================
# Step 1: Find common overlapping time window
# =====================================================
time_ranges = {}
for label, file_list in datasets.items():
for file_path in file_list:
doc = endaq.ide.get_doc(file_path)
accel = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
time_ranges[file_path] = (accel.index.min(), accel.index.max())
max_start = max(t[0] for t in time_ranges.values())
min_end = min(t[1] for t in time_ranges.values())
common_start = max(max_start, requested_start)
common_end = min(min_end, requested_end)
if common_start >= common_end:
raise ValueError("❌ No common time window across all datasets.")
print(f"✅ Using common time window: {common_start:.2f}s → {common_end:.2f}s")
# =====================================================
# Step 2: Create subplots
# =====================================================
fig = make_subplots(
rows=1,
cols=len(columns),
subplot_titles=columns,
shared_yaxes=True
)
# =====================================================
# Step 3: Load, trim, and plot histograms
# =====================================================
for label, file_list in datasets.items():
color = label_colors[label]
legend_shown = False
for file_path in file_list:
try:
doc = endaq.ide.get_doc(file_path)
accel = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
# Detrend (median removal)
accel = accel - accel.median()
# Enforce common time window
accel = accel[
(accel.index >= common_start) &
(accel.index <= common_end)
]
# Compute resultant
accel["Resultant"] = np.sqrt(
accel["X (40g)"]**2 +
accel["Y (40g)"]**2 +
accel["Z (40g)"]**2
)
# Plot histograms
for j, col in enumerate(columns):
fig.add_trace(
go.Histogram(
x=accel[col],
nbinsx=50,
opacity=0.5,
marker=dict(color=color),
name=label if not legend_shown else None,
legendgroup=label,
showlegend=not legend_shown,
histnorm="probability density" # comparable PDFs
),
row=1,
col=j + 1
)
legend_shown = True
except Exception as e:
print(f"[{label}] Failed to process {file_path}: {e}")
# =====================================================
# Step 4: Layout
# =====================================================
fig.update_layout(
height=500,
width=1100,
title_text=(
"Overlayed Histograms of Acceleration "
f"(Common {common_start:.1f}–{common_end:.1f}s)"
),
barmode="overlay"
)
fig.update_xaxes(title_text="Acceleration (g)")
fig.update_yaxes(title_text="Probability Density")
fig.show()
# =====================================================
# Step 5: Save figure
# =====================================================
# Replace the path below with the path of your folder (r"...\histograms_common_window.html")
output_path = r"C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\histograms_common_window.html"
fig.write_html(output_path)
print(f"✅ Figure saved to: {output_path}")

Figure 5. Histograms of acceleration for X, Y, Z, directions as well as resultant.
Comparing acceleration distributions across different machine condition allows us to observe how vibration behavior evolves as the system condition changes. These distributional changes form the basis for data-drift metrics such as Jensen–Shannon divergence, Hellinger distance, and Wasserstein distance, which are discussed in the following sections.
Based on the histograms in Figure 5, reducing the belt tightness causes the distributions in the X, Y, and Z directions to become narrower. In the resultant plot, the distribution not only narrows but also shifts to the left.
KDE and Probability Distribution Distances
Kernel Density Estimation (KDE)
Kernel Density Estimation (KDE) is a non-parametric method used to estimate the underlying probability density function of a random variable from finite data samples. Unlike histograms, KDE provides a smooth and continuous representation of the distribution without requiring arbitrary binning choices. In vibration analysis, KDE enables comparison of signal amplitude distributions across operating conditions by preserving subtle shape differences. These estimated probability density functions form the basis for computing distribution-based distance and divergence metrics such as Jensen–Shannon divergence, Hellinger distance, and Wasserstein distance.
- Jensen–Shannon Divergence (JSD)
The Jensen–Shannon divergence is an information-theoretic measure that quantifies the similarity between two probability distributions. Unlike the Kullback–Leibler divergence, JSD is symmetric and always finite, making it well suited for practical condition monitoring applications. Given two distributions P and Q, JSD is defined as:

where KLdenotes the Kullback–Leibler divergence. JSD increases as the vibration behavior deviates from the reference condition.
Hellinger Distance
The Hellinger distance is a statistical metric that measures the geometric separation between two probability distributions. It is bounded between 0 and 1, which makes it easy to interpret and compare across different systems. For discrete probability distributions
P and Q, the Hellinger distance is defined as:

The Hellinger distance is particularly robust to noise and small probability values, making it suitable for vibration-based condition monitoring.
Wasserstein Distance
The Wasserstein distance, also known as the Earth Mover’s Distance, measures the minimum “cost” required to transform one probability distribution into another. Unlike divergence-based metrics, it accounts for the actual distance between values on the amplitude axis. In one dimension, it can be expressed as:
where FP and FQ are the cumulative distribution functions of P and Q. This metric is particularly sensitive to shifts in vibration amplitude caused by changes such as belt looseness.
The table below compares the mentioned metrics:
|
Metric |
Interpretability |
Sensitivity to Change |
Robustness |
Notes for Condition Monitoring |
|
Jensen–Shannon Divergence (JSD) |
Medium |
High |
High |
Detects subtle distribution shape changes; bounded and symmetric; well suited for tracking gradual drift |
|
Hellinger Distance |
High |
Medium–High |
Very High |
Bounded ([0,1]); less sensitive to noise and outliers; stable across different machines |
|
Wasserstein Distance |
High |
Very High |
Medium |
Sensitive to amplitude shifts; physically intuitive; may be influenced by operating load changes |
Practical Interpretation
- JSD is effective when small but meaningful changes occur in the vibration distribution shape.
- Hellinger provides the most stable behavior across sensors, machines, and operating conditions.
- Wasserstein is the most physically intuitive metric but can be affected by global amplitude scaling.
Condition Monitoring
In this section, the probability distribution distance metrics introduced in the previous section are computed using the collected vibration data for three operating conditions created on the test rig: healthy, relatively loose, and too loose. The resulting values are then analyzed to evaluate how they can be used for condition monitoring. The script below computes distribution-based distance metrics for each operating condition, presents the results in tabular form, and visualizes the corresponding kernel density estimation (KDE) curves for the X, Y, and Z acceleration components, as well as the resultant acceleration.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import gaussian_kde, entropy
from scipy.stats import gaussian_kde, entropy, wasserstein_distance
import endaq
# =====================================================
# Dataset dictionary
# =====================================================
datasets = {
"100": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000603_Healthy.ide',
# Add more Healthy files if needed
],
"75": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000606_75Loose.ide'
],
"50": [
r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000604_50Loose.ide'
],
# "Detension": [
# r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift\IDE\DAQ16837_000605_detension.ide'
# ],
}
# =====================================================
# Parameters
# =====================================================
columns = ["X (40g)", "Y (40g)", "Z (40g)"] #"X (40g)", "Y (40g)", "Z (40g)"
start_time = 5
end_time = 55
resultant = True
trim_start = 0.05
trim_end = 0.05
# =====================================================
# Helper: Jensen–Shannon Divergence
# =====================================================
def jensen_shannon_divergence(p, q):
p = np.array(p) / np.sum(p)
q = np.array(q) / np.sum(q)
m = 0.5 * (p + q)
return 0.5 * (entropy(p, m) + entropy(q, m))
# =====================================================
# Step 1: Load all data and find common overlap
# =====================================================
accels_raw = {}
time_ranges = {}
for label, files in datasets.items():
for file in files:
doc = endaq.ide.get_doc(file)
accel = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
accel = accel - accel.median()
accels_raw[label] = accel
time_ranges[label] = (accel.index.min(), accel.index.max())
print(f"[{label}] Time range: {accel.index.min():.2f}s to {accel.index.max():.2f}s "
f"({accel.index.max() - accel.index.min():.2f}s)")
# Find the maximum common overlap
max_start = max(r[0] for r in time_ranges.values())
min_end = min(r[1] for r in time_ranges.values())
# Enforce the requested start/end times if possible
common_start = max(max_start, start_time)
common_end = min(min_end, end_time)
print(f"\n✅ Using common time window: {common_start:.2f}s to {common_end:.2f}s")
# Define a shared time grid
N = 10000
common_time = np.linspace(common_start, common_end, N)
# =====================================================
# Step 2: Compute KDEs on the same time grid
# =====================================================
kdes = {}
x_ranges = {}
all_axes = columns.copy()
if resultant:
all_axes.append("Resultant")
for axis in all_axes:
kdes[axis] = {}
all_x = []
for label, accel in accels_raw.items():
# Interpolate to the common grid
accel_interp = accel.reindex(accel.index.union(common_time)).interpolate('index').loc[common_time]
# Compute resultant if needed
if resultant and axis == "Resultant":
if all(col in accel_interp.columns for col in columns):
accel_interp["Resultant"] = np.sqrt(
accel_interp[columns[0]]**2 +
accel_interp[columns[1]]**2 +
accel_interp[columns[2]]**2
)
else:
print(f"[{label}] Missing axes for resultant.")
continue
if axis not in accel_interp.columns:
continue
# Extract and trim
data = accel_interp[axis].dropna().values
n = len(data)
if n == 0:
continue
start_idx = int(trim_start * n)
end_idx = int(n * (1 - trim_end))
data = data[start_idx:end_idx]
if len(data) < 10:
print(f"[{label}] Too little data after trimming for {axis}.")
continue
kde = gaussian_kde(data)
kdes[axis][label] = kde
all_x.extend(data)
if len(all_x) > 0:
x_ranges[axis] = np.linspace(min(all_x), max(all_x), 1000)
else:
x_ranges[axis] = np.linspace(-1, 1, 1000)
# =====================================================
# Step 3: Compute JSD, max amplitude, JSD/MaxAmp, normalized JSD,
# and other distances (KLD, Hellinger, Wasserstein)
# =====================================================
all_results = {}
results_rows = []
for axis in all_axes:
jsd_results = {}
kld_results = {}
hellinger_results = {}
wasserstein_results = {}
max_amp_dict = {}
# Reference KDE (largest label) only used for JSD calculation
ref_label = max(kdes[axis].keys(), key=int)
ref_kde = kdes[axis][ref_label]
x = x_ranges[axis]
ref_pdf = ref_kde(x)
# Compute distances and max amplitude
for label, kde in kdes[axis].items():
pdf = kde(x)
# --- JSD ---
jsd_val = jensen_shannon_divergence(ref_pdf, pdf)
jsd_results[label] = jsd_val
# # --- KLD ---
p = np.where(ref_pdf == 0, 1e-12, ref_pdf)
q = np.where(pdf == 0, 1e-12, pdf)
# kld_results[label] = entropy(p, q)
# --- Hellinger ---
hellinger_results[label] = np.sqrt(np.sum((np.sqrt(p) - np.sqrt(q))**2)) / np.sqrt(2)
# --- Wasserstein ---
wasserstein_results[label] = wasserstein_distance(x, x, u_weights=p, v_weights=q)
# --- Max amplitude ---
if axis == "Resultant":
if all(col in accels_raw[label].columns for col in columns):
accel_data = np.sqrt(
accels_raw[label][columns[0]]**2 +
accels_raw[label][columns[1]]**2 +
accels_raw[label][columns[2]]**2
)
else:
accel_data = pd.Series(dtype=float)
else:
accel_data = accels_raw[label].get(axis, pd.Series(dtype=float))
n = len(accel_data)
if n > 0:
start_idx = int(trim_start * n)
end_idx = int(n * (1 - trim_end))
data_trimmed = accel_data.iloc[start_idx:end_idx].values
max_amp = np.max(np.abs(data_trimmed)) if len(data_trimmed) > 0 else 0.0
else:
max_amp = 0.0
max_amp_dict[label] = max_amp
# Sort items by JSD ascending
sorted_items = sorted(jsd_results.items(), key=lambda x: x[1])
labels_sorted = [item[0] for item in sorted_items]
jsd_sorted = [item[1] for item in sorted_items]
# Lowest and highest JSD for normalized JSD
lowest_jsd = jsd_sorted[0]
highest_jsd = jsd_sorted[-1]
# Compute ΔJSD vs lowest and normalized JSD
delta_jsd_from_lowest = [val - lowest_jsd for val in jsd_sorted]
normalized_jsd = [(val - lowest_jsd) / (highest_jsd - lowest_jsd) if highest_jsd != lowest_jsd else 0.0 for val in jsd_sorted]
# --- Print results ---
print(f"\n=== Axis: {axis} ===")
print(f"Reference for PDF: {ref_label}")
print(f"{'Label':>6} | {'%ΔTension':>12} | {'JSD':>10} | {'Hellinger':>10} | {'WassDist':>10}") # {'KLD':>10} |
print("-" * 70)
for i, label in enumerate(labels_sorted):
tension = float(label)
pct_tension = (tension - float(ref_label)) / float(ref_label) * 100
max_amp = max_amp_dict[label]
jsd_over_amp = jsd_sorted[i] / max_amp if max_amp != 0 else 0.0
print(f"{label:>6} | {pct_tension:12.2f}% | {jsd_sorted[i]:10.6f} | "
f"{hellinger_results[label]:10.6f} | {wasserstein_results[label]:10.6f}")
results_rows.append({
"Axis": axis,
"Label": label,
"Tension_%": tension,
"JSD": jsd_sorted[i],
"Hellinger": hellinger_results[label],
"Wasserstein": wasserstein_results[label],
})
# =====================================================
# Step 4: Plot KDE Distributions
# =====================================================
# Assign one consistent color per dataset
import plotly.express as px
labels = sorted(kdes[all_axes[0]].keys(), key=int)
base_colors = px.colors.qualitative.Plotly
color_map = {label: base_colors[i % len(base_colors)] for i, label in enumerate(labels)}
kde_fig = make_subplots(
rows=1,
cols=len(all_axes),
subplot_titles=all_axes,
shared_yaxes=True
)
for i, axis in enumerate(all_axes):
x = x_ranges[axis]
for label, kde in sorted(kdes[axis].items(), key=lambda x: int(x[0])):
y = kde(x)
kde_fig.add_trace(
go.Scatter(
x=x, y=y,
mode='lines',
name=f"{label}",
line=dict(color=color_map[label]),
legendgroup=label,
showlegend=(i == 0) # legend only in first subplot
),
row=1, col=i + 1
)
kde_fig.update_layout(
height=500,
width=250 * len(all_axes),
title_text=f"KDE of Vibration Signals (Common {common_start:.1f}–{common_end:.1f}s)
"
f"(Trim {trim_start*100:.0f}% start, {trim_end*100:.0f}% end)"
)
kde_fig.update_xaxes(title_text="Acceleration (g)")
kde_fig.update_yaxes(title_text="Density")
kde_fig.show()
# =====================================================
# Step 5: Save figure
# =====================================================
kde_output_path = (
r"C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift"
r"\kde_vibration_common_window.html"
)
kde_fig.write_html(kde_output_path)
print(f"✅ KDE plot saved to: {kde_output_path}")
# =====================================================
# Step 6: Save results as one csv table
# =====================================================
results_df = pd.DataFrame(results_rows)
csv_output_path = (
r"C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\3.DataDrift"
r"\kde_distance_metrics.csv"
)
results_df.to_csv(csv_output_path, index=False)
print(f"✅ Metrics table saved to: {csv_output_path}")

Figure 6. KDE vibration signals for X, Y, Z directions as well as resultant.
Based on the KDE curves shown in Figure 6, it can be observed that as belt gets looser, the KDE distributions become narrower, whereas tighter belt corresponds to wider distributions. This effect is most pronounced in the vertical vibration axis (Z). Another key observation from the KDEs is that, when the belt gets loosedr, the resultant acceleration distribution not only becomes narrower but also shifts to the left, indicating a decrease in acceleration amplitude when the belt is loose.
Table I. JSD, Hellinger, and Wasserstein metrics for X, Y, Z and resultant accelerations.
|
Axis |
Label |
Tension_% |
JSD |
Hellinger |
Wasserstein |
|
X (40g) |
100 |
100 |
0 |
0 |
0 |
|
75 |
75 |
0.010055 |
7.648881 |
0.003601 |
|
|
50 |
50 |
0.035609 |
14.69657 |
0.006962 |
|
|
Y (40g) |
100 |
100 |
0 |
0 |
0 |
|
50 |
50 |
0.008285 |
5.856887 |
0.003937 |
|
|
75 |
75 |
0.011525 |
6.960527 |
0.005344 |
|
|
Z (40g) |
100 |
100 |
0 |
0 |
0 |
|
75 |
75 |
0.052921 |
10.98605 |
0.023472 |
|
|
50 |
50 |
0.111831 |
16.3282 |
0.034014 |
|
|
Resultant |
100 |
100 |
0 |
0 |
0 |
|
75 |
75 |
0.055266 |
14.30971 |
0.024039 |
|
|
50 |
50 |
0.130089 |
23.21788 |
0.033132 |
Based on JSD, Hellinger, and Wasserstein metrics for different belt tensions provided in Table I:
X and Y directions:
In the X direction (vertical to the belt length), all metrics increase gradually as belt becomes looser, indicating moderate sensitivity to looseness. This suggests that reduced belt stiffness introduces additional vertical vibration, but the overall distribution remains relatively close to the healthy state. In contrast, the Y direction (along the belt) exhibits the smallest divergence values across all metrics. Vibrations along the belt length are therefore weakly affected by belt tightness, making this axis less informative for detecting belt looseness.
Z direction:
The Z direction (normal to the ground) shows a pronounced increase in all metrics with looseness, indicating strong sensitivity to belt dynamics and significant changes in vibration amplitude distribution.
Resultant acceleration:
The resultant signal exhibits the largest divergence values and a clear monotonic trend, demonstrating its effectiveness in aggregating multi-axis vibration changes into a robust fault indicator.
Summary and conclusion
Condition monitoring is essential for predictive maintenance, enabling early detection of degradation before unplanned downtime or failures occur. Vibration signals are particularly informative, as changes in stiffness, alignment, or load often appear as subtle shifts long before visible damage.
Using probability distribution distance metrics—such as Jensen–Shannon divergence, Hellinger distance, and Wasserstein distance—offers a powerful, distribution-based view of vibration behavior, capturing gradual degradation like belt loseness. Key takeaways:
- KDE enables comparison of full vibration distributions rather than single features.
- JSD and Hellinger are ideal for stable trend monitoring and early fault detection.
- Wasserstein distance provides intuitive sensitivity to amplitude shifts.
- Z-axis (normal to the ground) and resultant signals are the most reliable indicators for belt condition monitoring.
It should be noted that KDE can be sensitive to factors such as sensor orientation and location. A robust data preprocessing pipeline is therefore critical for accurate and reliable condition monitoring.
Resources, Scripts, and Files


