Detecting Anomalies in Belt-Driven Systems with Machine Learning

How Vibration Data Can be Leveraged for Anomaly Detection and Fault Diagnosis

In my previous blog post, Preprocessing Vibration Data for Machine Learning, we walked through the preprocessing steps required to prepare vibration data for use in a Machine Learning (ML) model. We also covered how to analyze the data by visualizing it, examining key statistical parameters, and understanding the distribution of each variable in the dataset. Finally, we discussed how the Fast Fourier Transform (FFT) can help extract meaningful insights from vibration time series data, including how to identify peaks in the FFT and what those peaks can tell us.

In this post, we’ll explore how vibration data can be leveraged for anomaly detection and fault diagnosis to identify potential issues in vibrating machinery. This approach is crucial for;

Minimizing maintenance time and costs
Preventing progressive damage, and
Reducing unplanned downtime.

Though any high-performance vibration data logger will suffice, in this instance I will be using our in-house enDAQ data logger to acquire vibration data from a small test rig designed to simulate various fault conditions on a rotating belt system with two pulleys.

We will cover the following essential topics in this guide:

Introduction to Anomaly detection and ML-based methods
The test designed for anomaly detection
Data analysis
Anomaly detection
Fault detection
Statistical methods
Summary
Resources, Scripts & Files

Introduction to Anomaly detection and ML-based methods

What is an Anomaly?

In the context of machine condition monitoring, an anomaly refers to any data pattern or behavior that deviates significantly from the norm. These deviations may not always indicate a failure, but they often serve as early warning signs of potential issues. In vibration analysis, anomalies can manifest as unexpected spikes, irregular frequency patterns, or abnormal amplitude changes in the signal.

What are Faults in Vibration Machinery?

Faults are specific types of anomalies that indicate a mechanical or structural problem in a vibrating machine. Common faults include:

Imbalance – Uneven mass distribution in rotating components.
Misalignment – Shaft or component alignment errors.
Bearing defects – Wear or damage in ball or roller bearings.
Looseness – Mechanical parts not firmly fastened.
Gear mesh issues – Irregularities in gear engagement.

Detecting these faults early is critical for preventing damage escalation and avoiding costly downtime.

Common Machine Learning Methods for Anomaly and Fault Detection

ML Method	Description	Use Cases in Vibration Analysis
K-Means Clustering	An unsupervised method that groups similar data points based on distance metrics.	Identifying clusters of normal vs. abnormal behavior.
Support Vector Machines (SVM)	Supervised method that finds the optimal boundary between classes.	Classifying normal and faulty machine states.
Random Forest	Ensemble learning method using multiple decision trees for classification.	Fault classification using labeled vibration data.
Autoencoders	Neural networks that reconstruct input data; high reconstruction error indicates anomaly.	Unsupervised anomaly detection in time series.
Isolation Forest	Tree-based model that isolates anomalies by randomly partitioning data.	Efficient for detecting outliers in high-dimensional datasets.
Statistical Methods	Techniques such as Z-score, IQR, or control charts to detect deviations.	Baseline anomaly detection with low computational cost.
LSTM (Long Short-Term Memory)	Recurrent neural network designed for sequential data with memory retention.	Capturing temporal patterns and detecting sequence anomalies.

What is LSTM?

Long Short-Term Memory (LSTM) is a special type of Recurrent Neural Network (RNN) designed to model sequential data and capture long-range dependencies in time series. Unlike traditional RNNs, which struggle to remember information over long sequences due to the vanishing gradient problem, LSTMs are designed to retain relevant information for extended periods.

What Kind of Neural Networks Are They?

LSTMs are part of the Recurrent Neural Network (RNN) family—neural architectures where connections between nodes form a directed graph along a sequence. This structure allows LSTMs to maintain and update an internal "memory" state, making them ideal for tasks where the order and timing of data points are critical, such as speech recognition, natural language processing, and vibration time series analysis.

How Do LSTMs Work?

LSTMs process input sequences one timestep at a time, updating a cell state that carries information throughout the sequence. What makes LSTMs unique is their ability to decide what to remember, what to forget, and what to output at each step using specially designed components called gates.

LSTM Architecture

Each LSTM unit (or cell) consists of:

Cell state: A memory line that runs through the entire sequence, modified at each timestep.
Hidden state: The output at the current timestep, which also feeds into the next timestep.
Three gates:
- Forget gate: Decides what information to discard from the cell state.
- Input gate: Determines what new information to add to the cell state.
- Output gate: Controls what information from the cell state is output at this timestep.

These gates are controlled by sigmoid activations, which output values between 0 and 1 to act like on/off switches for the flow of information.

Component	Function
Input (xₜ)	The input data at time t.
Hidden state (hₜ₋₁)	Carries output information from the previous timestep.
Cell state (Cₜ)	Maintains long-term memory across timesteps.
Forget gate	Uses sigmoid(Wf · [hₜ₋₁, xₜ] + bf) to remove irrelevant info.
Input gate	Uses sigmoid(Wi · [hₜ₋₁, xₜ] + bi) to add relevant info.
Output gate	Uses sigmoid(Wo · [hₜ₋₁, xₜ] + bo) to decide output.
Weights (W)	Learnable parameters for gates and state transitions.
Biases (b)	Additional parameters to fine-tune gate outputs.

Why Use LSTMs for Vibration Analysis?

LSTMs are particularly effective for vibration-based anomaly detection because:

They model temporal dependencies, which are critical for capturing evolving patterns in machine behavior.
They can handle noisy and irregular data better than simpler models.
Once trained on normal operating conditions, they can detect subtle deviations that traditional methods might miss.

The test designed for anomaly detection

To demonstrate how machine learning—and specifically LSTM (Long Short-Term Memory) networks—can be applied to anomaly detection, a test rig has been developed. As shown in the figure below, the setup includes two pulleys: a drive pulley and a driven pulley. The drive pulley is adjustable, allowing the pulleys to be positioned closer together to simulate a loose belt condition. Additionally, the driven pulley can be shifted to mimic a misalignment fault.

ML test rig used for anomaly detection

Figure 1. Test rig used for anomaly detection and fault detection using LSTM based on vibration data.

To simulate a change in mass, two zip ties have been attached to the belt. As a result, four distinct operating conditions are represented in the dataset:

Healthy
Loose belt
Misaligned pulley
Mass change

We will explore two different scenarios for using LSTM in anomaly detection:

Scenario A: The LSTM model is trained to distinguish between healthy and faulty conditions. In this setup, the healthy data is labeled as one class, while the other three conditions are grouped and labeled as "faulty." This approach allows the model to simply detect whether an anomaly exists.
Scenario B: The model is trained using all four classes separately. This enables the LSTM to not only detect an anomaly but also identify its root cause—whether it is due to belt looseness, pulley misalignment, or added mass.

In summary, the first scenario focuses on binary anomaly detection (healthy vs. faulty), while the second allows for multi-class fault classification.

Data Analysis

Moving forward, we will provide scripts for performing essential data analysis tasks. These include data visualization, statistical analysis, histogram generation, and anomaly detection.

To run these scripts, users will need at least one healthy dataset (containing vibration data along the x, y, and z axes) as well as any faulty dataset they may have—this could be one or more datasets representing different fault conditions.

The scripts are designed to be user-friendly and require minimal modification. In most cases, the only input required from the user is the file path to their datasets.

Importing libraries

First, we need to import the necessary libraries required for data analysis and LSTM-based anomaly detection.

# Import libraries
import decimal
import idelib
import endaq

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import roc_curve, auc, confusion_matrix, accuracy_score
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from sklearn.model_selection import train_test_split
import seaborn as sns

Visualization

Next, we plot the data using the code below. The only modification required from the user is to provide the correct file paths to their datasets.

In this example, we are working with four datasets:

healthy: Normal operating condition
F_Loose: Simulated loose belt
F_Misalignment: One pulley misaligned
F_MassChange: Two zip ties added to simulate a change in mass

If, for instance, you have one healthy dataset and two faulty ones, you’ll need to specify three file paths accordingly.


import pandas as pd
import plotly.graph_objects as go
import endaq
from plotly.subplots import make_subplots
import plotly.express as px

# Updated: each label maps to a list of file paths
datasets = {
    "Healthy": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Healthy\DAQ16837_000146.ide',
        # Add more Healthy files if needed
    ],
    "Looseness": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Loosness\DAQ16837_000158.ide'
    ],
    "Missalignment": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Misalignment\DAQ16837_000155.ide'
    ],
    "Mass imbalance": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Mass_imbalance\aeaa0a24-DAQ16837_000149-1744735366.ide'
    ]
}

# Column names to plot
columns_plot = ["X (40g)", "Y (40g)", "Z (40g)"]

# Assign consistent colors to each class label
color_palette = px.colors.qualitative.Set1
label_list = list(datasets.keys())
label_colors = {label: color_palette[i % len(color_palette)] for i, label in enumerate(label_list)}

# Plotting function
def plot_axes_combined(datasets, columns_plot, label_colors):
    fig = make_subplots(
        rows=3,
        cols=1,
        shared_xaxes=True,
        vertical_spacing=0.05,
        # subplot_titles=columns_plot
    )

    # For each label and list of file paths
    for label, file_paths in datasets.items():
        for idx, path in enumerate(file_paths):
            try:
                # Load and preprocess
                doc = endaq.ide.get_doc(path)
                accel = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
                accel = accel - accel.median()  # Remove DC offset

                for i, axis in enumerate(columns_plot):
                    if axis in accel.columns:
                        fig.add_trace(
                            go.Scatter(
                                x=accel.index,
                                y=accel[axis],
                                mode='lines',
                                name=label if idx == 0 and i == 0 else None,  # show legend only once per label
                                legendgroup=label,
                                line=dict(color=label_colors[label]),
                                showlegend=(idx == 0 and i == 0)
                            ),
                            row=i + 1, col=1
                        )
                    else:
                        print(f"[{label}] Column '{axis}' not found in {path}")

            except Exception as e:
                print(f"Error loading file {path}: {e}")

    # Add axis labels
    for i, axis in enumerate(columns_plot):
        fig.update_yaxes(title_text=axis, row=i+1, col=1)

    fig.update_xaxes(title_text="Time (seconds)", row=3, col=1)
    # fig.update_layout(
    #     height=500,
    #     title_text="Acceleration Signals from All Datasets (X, Y, Z)",
    #     showlegend=True
    # )

    fig.update_layout(
    height=400,
    width=1100,
    title_text="Acceleration Signals from All Datasets (X, Y, Z)",
    showlegend=True,
    legend=dict(
        orientation="h",         # Horizontal legend
        yanchor="bottom",
        y=1.02,                  # Slightly above the top of the plot
        xanchor="center",
        x=0.5,                   # Centered horizontally
        traceorder="grouped",
        title_text='',           # Optional: remove legend title
        itemsizing='constant',
        valign='top'
        ),
        legend_tracegroupgap=0,     # No space between grouped traces
        margin=dict(t=100)          # Extra top margin to fit the legend
    )


    fig.show()

# Run the plot function
plot_axes_combined(datasets, columns_plot, label_colors)

To use the code below, make sure to change the file paths to point to your own data files. Replace the example paths with the actual locations of your CSV files. Each entry in the file_paths dictionary should correspond to one of your datasets. If you only have, for example, a healthy file and two faulty ones, just remove the unused line(s) from the dictionary.


# Example file paths – update these with your actual file locations
datasets = {
    "Healthy": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Healthy\DAQ16837_000146.ide',
        # Add more Healthy files if needed
    ],
    "Looseness": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Loosness\DAQ16837_000158.ide'
    ],
    "Missalignment": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Misalignment\DAQ16837_000155.ide'
    ],
    "Mass imbalance": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Mass_imbalance\aeaa0a24-DAQ16837_000149-1744735366.ide'
    ]
}

You can also choose which vibration axes to plot by specifying the desired columns in the line below:

# Specify the column names you want to plot
columns_plot = ["X (40g)", "Y (40g)", "Z (40g)"]

The vibration signals along the X, Y, and Z axes will be visualized, as shown in the example plot below:

Figure 2. Vibration data along X, Y, and Z axis.

To gain deeper insights into the vibration patterns, the plots are zoomed in for closer examination (You can zoom in on Plotly-generated plots by drawing a selection box over the area of interest).

Figure 3. Vibration data along X, Y, and Z axis (zoomed-in).

Here's what we can observe from the detailed view:

X-axis Vibration: The vibrations in the Healthy and F_MassChange datasets fluctuate more than in the other two conditions. In contrast, the F_Loose and F_Misalignment datasets show less fluctuation. This observation aligns with expectations—when the belt is loose, vibrations tend to decrease. Based on this behavior, it's possible to begin distinguishing Healthy and Mass Change conditions from Loose and Misaligned states.
Y-axis Vibration: No clear or consistent patterns emerge from the Y-axis data, making it difficult to extract meaningful insights for classification or anomaly detection in this direction.
Z-axis Vibration: The F_MassChange dataset exhibits frequent spikes that rise noticeably above the other datasets. These jumps occur repeatedly throughout the data collection period, suggesting that mass changes have a significant and detectable impact along the Z-axis.

Statistics

Next, we perform a statistical analysis of the vibration datasets. The code below takes the file paths of the datasets and outputs a summary statistics table for each one:

import pandas as pd
import endaq

# Specify the start and end times for the analysis (in seconds)
start_time = 10
end_time = 50

# Updated: dictionary maps labels to lists of file paths
datasets = {
    "Healthy": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Healthy\DAQ16837_000146.ide',
        # Add more Healthy files if needed
    ],
    "Looseness": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Loosness\DAQ16837_000158.ide'
    ],
    "Missalignment": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Misalignment\DAQ16837_000155.ide'
    ],
    "Mass imbalance": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Mass_imbalance\aeaa0a24-DAQ16837_000149-1744735366.ide'
    ]
}

# Columns to analyze
columns = ["X (40g)", "Y (40g)", "Z (40g)"]

def get_statistics_table(datasets, columns, start_time=None, end_time=None):
    stats_data = []

    for label, file_list in datasets.items():
        label_stats = []

        for file_path in file_list:
            try:
                doc = endaq.ide.get_doc(file_path)
                accel = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
                accel = accel - accel.median()

                if start_time is not None:
                    accel = accel[accel.index >= start_time]
                if end_time is not None:
                    accel = accel[accel.index <= end_time]

                stats = {}
                for col in columns:
                    if col in accel.columns:
                        stats[f"{col} Min"] = accel[col].min()
                        stats[f"{col} Mean"] = accel[col].mean()
                        stats[f"{col} Max"] = accel[col].max()
                        stats[f"{col} Std"] = accel[col].std()
                        stats[f"{col} Median"] = accel[col].median()
                    else:
                        print(f"[{label}] Column '{col}' not found in {file_path}.")
                label_stats.append(stats)

            except Exception as e:
                print(f"Failed to process {file_path}: {e}")

        if label_stats:
            label_df = pd.DataFrame(label_stats)
            aggregated_stats = label_df.mean().to_dict()
            aggregated_stats = {"Dataset": label, **aggregated_stats}  # Ensure Dataset is first
            stats_data.append(aggregated_stats)

    df = pd.DataFrame(stats_data)
    
    # Reorder columns to put 'Dataset' first (just in case)
    cols = ['Dataset'] + [col for col in df.columns if col != 'Dataset']
    return df[cols]

# Generate the statistics table
statistics_table = get_statistics_table(datasets, columns, start_time, end_time)

# Display the result
display(statistics_table)

This table provides key statistical metrics—such as minimum, maximum, mean, and standard deviation—for each vibration axis.

From this basic analysis, several insights emerge:

X-axis (40g): The Mass Change condition shows a significantly different minimum and maximum value compared to the other datasets. This indicates more extreme vibration fluctuations along this axis in response to added mass.
Y-axis (40g): Similar behavior is observed here. The Mass Change dataset again stands out, with noticeably different minimum and maximum values compared to the others. This suggests that even though visual plots may not reveal clear differences on the Y-axis, statistical variations still exist.
Z-axis (40g): The Loose Belt condition is clearly distinguishable based on its minimum and maximum values. These statistics reflect the dampened vibration levels typically associated with a slack belt.

These insights demonstrate that even basic statistical analysis can provide meaningful differentiation between operating conditions, serving as a valuable first step in any anomaly detection workflow.

Histograms

Next, we visualize the distribution of vibration data using histograms. We'll generate three plots—one for each axis (X, Y, and Z)—to better understand how the data varies across different conditions. The code below creates interactive histograms using Plotly:

import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import endaq

# Dataset dictionary: label -> list of .ide file paths
datasets = {
    "Healthy": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Healthy\DAQ16837_000146.ide',
    ],
    "Looseness": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Loosness\DAQ16837_000158.ide'
    ],
    "Missalignment": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Misalignment\DAQ16837_000155.ide'
    ],
    "Mass imbalance": [
        r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Mass_imbalance\aeaa0a24-DAQ16837_000149-1744735366.ide'
    ]

    ### Big test rig
    # "Healthy": [
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000020_Healthy_0.ide',
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000021_Healthy_50.ide'
    # ],
    
    # "Loose": [
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000018_Loose_0.ide',
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000019_Loose_50.ide'
    # ],
    
    # "Grease": [
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000022_Grease_0.ide', 
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000023_Grease_50.ide'
    # ],
    
    # "Cut": [
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000026_Cut_0.ide',
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000027_Cut_50.ide'
    # ],
}

# Columns (axes) to plot
columns = ["X (40g)", "Y (40g)", "Z (40g)"]

# Time range in seconds
start_time = 10
end_time = 50

# Fixed color palette: one unique color per class
color_palette = px.colors.qualitative.Set1  # or Set2, Bold, etc.
label_list = list(datasets.keys())
label_colors = {
    "Healthy": "#d62728",       # Blue
    "Looseness": "#1f77b4",     # Orange
    "Missalignment": "#2ca02c", # Green
    "Mass imbalance": "#9467bd",# Red
    # "Loose": "#9467bd",         # Purple
    # "Grease": "#8c564b",        # Brown
    # "Cut": "#e377c2"            # Pink
}
# label_colors = {label: color_palette[i % len(color_palette)] for i, label in enumerate(label_list)}

# Create 3 subplots: one for X, Y, Z
fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=columns,
    shared_yaxes=True
)

# Loop over each class label
for label, file_list in datasets.items():
    color = label_colors[label]
    legend_shown = False  

    for file_path in file_list:
        try:
            doc = endaq.ide.get_doc(file_path)
            accel = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
            accel = accel - accel.median()

            # Filter by time
            if start_time is not None:
                accel = accel[accel.index >= start_time]
            if end_time is not None:
                accel = accel[accel.index <= end_time]

            # Add histograms for each axis
            for j, col in enumerate(columns):
                if col in accel.columns:
                    fig.add_trace(
                        go.Histogram(
                            x=accel[col],
                            nbinsx=50,
                            name=label if not legend_shown else None,
                            legendgroup=label,
                            opacity=0.5,
                            marker=dict(color=color),
                            showlegend=not legend_shown  # Only show once
                        ),
                        row=1, col=j + 1
                    )
                    legend_shown = True  # <--- Flip the flag after first trace
                else:
                    print(f"[{label}] Column '{col}' not found in {file_path}")

        except Exception as e:
            print(f"[{label}] Failed to process {file_path}: {e}")

# Final layout setup
fig.update_layout(
    height=500,
    width=1100,
    title_text="Overlayed Histograms of Acceleration (X, Y, Z) per Class",
    barmode='overlay'
)

fig.update_xaxes(title_text="Acceleration")
fig.update_yaxes(title_text="Frequency")

fig.show()

Tip: These histograms are interactive—clicking on a legend item will toggle its visibility. This makes it easy to compare specific conditions side by side.

Figure 4. Histogram of vibration data along each axis for heathy, loose, misaligned, and mass changed conditions.

Key Observations from the Histograms are:

Healthy vs. Loose: In both the X (40g) and Z (40g) axes, the Loose condition exhibits a noticeably taller and narrower histogram compared to Healthy. This indicates a tighter range of lower-amplitude vibrations, which aligns with expectations for a loose belt condition:

Figure 5. Comparing histograms of healthy and loose conditions.

Healthy vs. Misalignment: In the X-axis, the Misalignment condition has a taller histogram compared to Healthy, indicating a higher concentration of vibration values within a narrower range. This suggests more consistent vibrations in the X direction under misalignment. The Y and Z axes show similar distributions between Healthy and Misalignment, with no significant visual difference.

Figure 6. Comparing histograms of healthy and misaligned conditions.

Healthy vs. Mass Change: The X-axis histogram for Mass Change is both taller and wider than that of Healthy, reflecting more frequent and extreme variations. However, the Y and Z axes do not show significant changes in distribution.

Figure 7. Comparing histograms of healthy and mass changed conditions.

These histogram comparisons reinforce the insights gathered from earlier visualizations and statistical summaries, and help further differentiate the types of faults based on their vibration signatures.

Anomaly Detection

So far, we’ve explored the vibration data in its raw form—visualizing it, analyzing statistical features, and comparing distributions across different operating conditions. These initial steps helped us extract meaningful insights and identify basic differences between healthy and faulty states.

In this section, we’ll go a step further and apply an LSTM (Long Short-Term Memory) model—a powerful and commonly used machine learning technique for time-series data—to detect anomalies in machinery vibration.

The goal is to train the LSTM model to distinguish between healthy and unhealthy operating conditions based on vibration patterns.

Important:

The healthy dataset(s) should be kept separate.
All faulty conditions (e.g., loose belt, misalignment, mass change) should be combined into a single dataset labeled as faulty (which will be done with the provided code).
This binary classification approach will allow the model to learn the typical vibration behavior of a healthy machine and identify deviations from it.

Below is the code to build and train an LSTM model for anomaly detection:

import pandas as pd
import numpy as np
import endaq
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc, confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout, Input
from keras.utils import to_categorical

# -------- Load and Label Data -------- #
healthy_datasets = [
                    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Healthy\DAQ16837_000146.ide',
                   ] #["healthy_1.csv", "healthy_2.csv"]


#===================================== Small test rig:===============
faulty_datasets = {
    # 1: [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\F_Loose\DAQ16837_000158.ide'],
    # 2: [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\F_missAlignment\DAQ16837_000155.ide'],
    # 3: [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Faulty\aeaa0a24-DAQ16837_000149-1744735366.ide']
}


def load_and_label_data_from_endaq(file_path, channel_index, label):
    doc = endaq.ide.get_doc(file_path)
    accel = endaq.ide.to_pandas(doc.channels[channel_index], time_mode='seconds')
    accel = accel - accel.median()  # Remove DC offset
    accel['label'] = label
    return accel

def load_multiple_datasets_from_endaq(healthy_datasets, faulty_datasets):
    combined_data = pd.DataFrame()
    for file_path in healthy_datasets:
        healthy_data = load_and_label_data_from_endaq(file_path, channel_index=80, label=0)
        combined_data = pd.concat([combined_data, healthy_data], ignore_index=True)
    for fault_type, fault_files in faulty_datasets.items():
        for file_path in fault_files:
            faulty_data = load_and_label_data_from_endaq(file_path, channel_index=80, label=fault_type)
            combined_data = pd.concat([combined_data, faulty_data], ignore_index=True)
    return combined_data

# -------- Data Preparation -------- #
data = load_multiple_datasets_from_endaq(healthy_datasets, faulty_datasets)

# Select features and labels
X = data.drop(columns=['label']).values
y = data['label'].values

# Convert multi-fault labels to binary: 0=healthy, 1=faulty
y_binary = np.where(y != 0, 1, 0)

# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# -------- Sequence Creation -------- #
def create_sequences(X, y, time_steps=20):
    Xs, ys = [], []
    for i in range(len(X) - time_steps):
        Xs.append(X[i:(i + time_steps)])
        ys.append(y[i + time_steps])  # Label is from the last step in sequence
    return np.array(Xs), np.array(ys)

time_steps = 20
X_seq, y_seq = create_sequences(X_scaled, y_binary, time_steps)
y_encoded = to_categorical(y_seq)

# -------- Train-Test Split -------- #
X_train, X_test, y_train, y_test = train_test_split(X_seq, y_encoded, test_size=0.2, random_state=42)

# -------- LSTM Model -------- #
def build_lstm_model(input_shape, lstm_units=50):
    model = Sequential()
    model.add(Input(shape=input_shape))
    model.add(LSTM(lstm_units, return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(lstm_units))
    model.add(Dropout(0.2))
    model.add(Dense(y_encoded.shape[1], activation='softmax'))
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

model = build_lstm_model(X_train.shape[1:], lstm_units=50)
history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_data=(X_test, y_test))

# -------- Evaluation -------- #
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)

accuracy = accuracy_score(y_test_classes, y_pred_classes)
print(f'Accuracy: {accuracy:.4f}')

# -------- ROC Curve -------- #
fpr, tpr, _ = roc_curve(y_test_classes, y_pred[:, 1])
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

# -------- Confusion Matrix with Percentages -------- #
cm = confusion_matrix(y_test_classes, y_pred_classes)
cm_percent = cm.astype('float') / cm.sum(axis=1, keepdims=True) * 100

plt.figure(figsize=(6, 5))
plt.imshow(cm_percent, interpolation='nearest', cmap=plt.cm.Blues)
plt.title("Confusion Matrix (%)")
plt.colorbar()

class_names = ["Healthy", "Faulty"]
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)

thresh = cm_percent.max() / 2.
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        plt.text(j, i, f'{cm[i, j]}\n({cm_percent[i, j]:.1f}%)',
                 ha="center", va="center",
                 color="white" if cm_percent[i, j] > thresh else "black", fontsize=12)

plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.show()

To better understand how this works, let’s break the code into sections and explore what each one does.

import pandas as pd
import numpy as np
import endaq

pandas and numpy: Essential for data manipulation and numerical processing.

endaq: Used to read .ide files from the Mide/Endaq sensors, converting them to usable pandas DataFrames.

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc, confusion_matrix, accuracy_score

Scikit-learn utilities for scaling data, splitting datasets, and evaluating model performance using metrics like ROC AUC and confusion matrix.

import matplotlib.pyplot as plt

For visualizing model performance (ROC curve, confusion matrix).

from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout, Input
from keras.utils import to_categorical

Keras deep learning library (TensorFlow backend).

Sequential: Easy model-building API.

LSTM: Core layer for learning time-dependent patterns.

Dense, Dropout, Input: Standard layers for model architecture.

to_categorical: Converts integer labels into one-hot encoded format (required for softmax output in multi-class classification).

healthy_datasets = [...]
faulty_datasets = {...}

Paths to healthy and faulty datasets.
Faulty datasets are organized by fault type (e.g., loose, grease, cut), with integer labels.

def load_and_label_data_from_endaq(file_path, channel_index, label):
    doc = endaq.ide.get_doc(file_path)
    accel = endaq.ide.to_pandas(doc.channels[channel_index], time_mode='seconds')
    accel = accel - accel.median()  # Remove DC offset
    accel['label'] = label
    return accel

Loads vibration data from an Endaq .ide file.

Extracts the signal from a specific sensor channel (e.g., channel 80 = Z-axis).

Removes DC offset to center the data.

Adds a label column to classify the dataset (0 for healthy, 1–6 for different faults).

def load_multiple_datasets_from_endaq(healthy_datasets, faulty_datasets):
    ...

Loads and combines all healthy and faulty datasets into a single DataFrame.

Each record is labeled accordingly.

data = load_multiple_datasets_from_endaq(...)
X = data.drop(columns=['label']).values
y = data['label'].values
y_binary = np.where(y != 0, 1, 0)

Separates features (X) from labels (y).

Converts the multi-fault classification into binary classification:

0 = healthy
1 = any fault (loosely grouping all faults as "unhealthy")

def create_sequences(X, y, time_steps=20):
    ...

Splits the continuous vibration signal into overlapping time sequences for LSTM input.
Each sequence is time_steps long (e.g., 20 samples), and the label corresponds to the last point in the sequence.

X_seq, y_seq = create_sequences(X_scaled, y_binary, time_steps)
y_encoded = to_categorical(y_seq)

X_seq: 3D array for LSTM: [samples, time_steps, features].

y_encoded: One-hot encoded labels for categorical output.

X_seq: 3D array for LSTM: [samples, time_steps, features].

y_encoded: One-hot encoded labels for categorical output.

X_train, X_test, y_train, y_test = train_test_split(...)

Splits data into training and testing sets (80/20).

def build_lstm_model(input_shape, lstm_units=50):
    ...

time_steps = 20:

Defines the number of time steps in each sequence for the LSTM, meaning the model will learn from sequences of 20 time steps.

Input Layer (Input(shape=input_shape)):

Accepts input data of shape (time_steps, features).

LSTM Layers:

LSTM(lstm_units, return_sequences=True): First LSTM layer with 50 units, returning the full sequence for further LSTM processing.
LSTM(lstm_units): Second LSTM layer that processes the sequence into a single output (last time step).

Dropout Layers:

Dropout(0.2): Prevents overfitting by randomly dropping 20% of neurons during training.

Dense Layer:

Dense(y_encoded.shape[1], activation='softmax'): Final layer with softmax activation, outputting class probabilities.

Model Compilation:

Optimizer: adam, adjusts the learning rate for efficient training.
Loss Function: categorical_crossentropy, for multi-class classification.
Metrics: Accuracy is used to track model performance.

model = build_lstm_model(...)
history = model.fit(...)

Compiles and trains the model using categorical_crossentropy (since labels are one-hot encoded).

Trains for 5 epochs.

y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)
accuracy = accuracy_score(...)

Predicts class probabilities.

Converts predictions and labels to integer class labels.

Calculates accuracy.

fpr, tpr, _ = roc_curve(...)
roc_auc = auc(fpr, tpr)
...

Plots the ROC curve to visualize trade-offs between true positive and false positive rates.

Computes AUC (Area Under the Curve), a metric of overall classification performance.

cm = confusion_matrix(...)
...
plt.imshow(...)
...

Shows actual vs. predicted labels in a matrix format.
Converts raw counts to percentages for easier interpretation.
Annotates each cell with count and percentage.

After running the code, you will obtain three key outputs that evaluate the model's performance:

Accuracy Score – a single number indicating the overall classification accuracy.
ROC Curve – a visual representation of the model's ability to distinguish between healthy and faulty conditions.
Confusion Matrix – a detailed breakdown of true vs. predicted labels, showing how well the model performs on each class.

These results are shown below:

fig-8

Figure 8. ROC curve for anomaly detection.

The ROC (Receiver Operating Characteristic) curve is a graphical representation that shows the performance of a binary classification model. It plots the True Positive Rate (TPR) against the False Positive Rate (FPR) across different classification thresholds.

True Positive Rate (TPR), also known as sensitivity or recall, is the proportion of actual positives correctly identified by the model.
False Positive Rate (FPR), on the other hand, is the proportion of actual negatives that are incorrectly classified as positives.

The area under the curve (AUC) measures the overall ability of the model to discriminate between the two classes. A higher AUC (closer to 1) indicates a better model.

In our case, the ROC curve is positioned towards the top-left, which indicates a high accuracy. A curve that is close to the top-left corner means that the model has a high True Positive Rate (TPR) while maintaining a low False Positive Rate (FPR). This suggests that our model is excellent at identifying faulty conditions while minimizing misclassifications, resulting in high reliability.

The AUC value can also help quantify this—if it’s near 1.0, the model's performance is outstanding.

fig-9

Figure 9. Confusion matrix for anomaly detection.

The model achieved an impressive accuracy of 0.99, indicating a highly reliable performance in distinguishing between healthy and faulty conditions.

From the confusion matrix, we observe the following:

99.5% of healthy samples were correctly identified as healthy.
Only 0.5% of healthy samples (189 instances) were mistakenly classified as faulty.
On the other hand, almost all faulty samples were correctly identified, with only 232 instances incorrectly labeled as healthy.

These results confirm that the model is highly effective at detecting anomalies, with a very low rate of false positives and false negatives.

Here are a few practical tips to improve model performance if your LSTM model's accuracy is not as high as expected:

Increase lstm_units (e.g., from 50 to 100 or more) to give the model more learning capacity.

model = build_lstm_model(X_train.shape[1:], lstm_units=100)

Try more epochs (e.g., 20–50) to allow the model more time to learn.

history = model.fit(X_train, y_train, epochs=30, batch_size=32, validation_data=(X_test, y_test))

Adjust batch_size (e.g., try 16, 64) to find a better training dynamic.

history = model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))

Try longer sequences (e.g., time_steps = 40 or 60) if your vibration patterns have longer-term dependencies the model isn’t capturing.

time_steps = 40  # instead of 20
X_seq, y_seq = create_sequences(X_scaled, y_binary, time_steps)

A deeper model (e.g., 3 LSTM layers) may help capture more complex patterns in the vibration data.

def build_lstm_model(input_shape, lstm_units=50):
    model = Sequential()
    model.add(Input(shape=input_shape))
    model.add(LSTM(lstm_units, return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(lstm_units, return_sequences=True))  # <--- 2nd LSTM Layer
    model.add(Dropout(0.2))
    model.add(LSTM(lstm_units))  # <--- 3rd LSTM Layer (final one)
    model.add(Dropout(0.2))
    model.add(Dense(y_encoded.shape[1], activation='softmax'))
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

Fault detection

The LSTM model used in the previous section was designed to detect whether a machine is faulty or not—essentially a binary anomaly detection model. It was trained only on healthy machine data, so any deviation from that baseline was classified as a fault.

However, a common and more practical question users ask is:

"If a fault is detected, can we also determine what the root cause is?"

The answer is yes, but it depends heavily on the availability of labeled data for each specific fault type.

To enable the model to detect the type of fault, it must be trained not just on healthy data, but also on data corresponding to different known faults—each labeled accordingly. For example:

Data from normal machine operation should be labeled as "Healthy".
If you have a known issue such as belt looseness, you must collect data during that condition and label it as "Looseness".
Similarly, for other conditions like misalignment, mass change, or any other fault type, labeled data must be collected for each.

As discussed in Section 3, we applied specific faults to our test rig (e.g., loose belt, pulley misalignment, added mass) and collected vibration data for each case. Using this multi-labeled dataset, we can now train an LSTM model to not only detect an anomaly, but also classify which specific fault it is.

The Code Below Performs:

Training the LSTM model on multiple fault classes
Predicting the specific fault type if a fault is detected

# ================== SETUP ==================
import pandas as pd
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, auc
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.utils import to_categorical
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.utils.multiclass import unique_labels
import endaq.ide
import matplotlib.cm as mpl_cm

# ================== PARAMETERS ==================
start_time = 0
end_time = 50
time_steps_u = 60
test_size = 0.2
epochs_u = 2
batch_size_u = 32
lstm_units = 50
selected_columns = ['X (40g)', 'Y (40g)', 'Z (40g)']

# ================== DYNAMIC DATASET DICTIONARY ==================

datasets = {
    #### Small test rig
    "Healthy": [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Healthy\DAQ16837_000146.ide'],
 
    "Looseness": [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Loosness\DAQ16837_000158.ide'],
    "Misalignment": [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Misalignment\DAQ16837_000155.ide'],
    "Mass imbalance": [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Mass_imbalance\aeaa0a24-DAQ16837_000149-1744735366.ide']

    
    ### Big test rig
    # "Healthy": [
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000020_Healthy_0.ide',
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000021_Healthy_50.ide'
    # ],
    
    # "Loose": [
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000018_Loose_0.ide',
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000019_Loose_50.ide'
    # ],
    
    # "Grease": [
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000022_Grease_0.ide', 
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000023_Grease_50.ide'
    # ],
    
    # "Cut": [
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000026_Cut_0.ide',
    # r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\3.SmartBelt\2025.05.12.Anomaly_ML\DAQ15719_000027_Cut_50.ide'
    # ],
}

# ================== LABEL ENCODING ==================
class_names = list(datasets.keys())
class_to_id = {name: idx for idx, name in enumerate(class_names)}
id_to_class = {idx: name for name, idx in class_to_id.items()}
num_classes_u = len(class_to_id)

# ================== DATA LOADING FUNCTIONS ==================
def load_and_label_data(filepath, label, start_time=None, end_time=None):
    ext = os.path.splitext(filepath)[1].lower()
    if ext == '.csv':
        data = pd.read_csv(filepath)
        if start_time is not None:
            data = data[data['time'] >= start_time]
        if end_time is not None:
            data = data[data['time'] <= end_time]
    elif ext == '.ide':
        doc = endaq.ide.get_doc(filepath)
        data = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
        if start_time is not None:
            data = data[data.index >= start_time]
        if end_time is not None:
            data = data[data.index <= end_time]
        data['time'] = data.index
    else:
        raise ValueError(f"Unsupported file type: {ext}")
    data['label'] = label
    return data

def load_all_labeled_data(datasets_dict, start_time, end_time):
    all_data = pd.DataFrame()
    for class_name, file_list in datasets_dict.items():
        label = class_to_id[class_name]
        for filepath in file_list:
            labeled_data = load_and_label_data(filepath, label, start_time, end_time)
            all_data = pd.concat([all_data, labeled_data], ignore_index=True)
    return all_data

# ================== MODEL PREPARATION FUNCTIONS ==================
def preprocess_data(data, selected_columns):
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data[selected_columns])
    return scaled_data, scaler

def prepare_data_for_lstm(scaled_data, labels, time_steps=time_steps_u, num_classes=num_classes_u):
    X, y = [], []
    for i in range(time_steps, len(scaled_data)):
        X.append(scaled_data[i-time_steps:i, :])
        y.append(labels[i])
    X = np.array(X)
    y = to_categorical(y, num_classes=num_classes)
    return X, y

def build_lstm_model(input_shape, lstm_units=50, num_classes=num_classes_u):
    model = Sequential()
    model.add(LSTM(lstm_units, return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(LSTM(lstm_units, return_sequences=False))
    model.add(Dropout(0.2))
    model.add(Dense(num_classes, activation='softmax'))
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model


# ================== EXECUTION ==================
combined_data = load_all_labeled_data(datasets, start_time, end_time)
scaled_data, scaler = preprocess_data(combined_data, selected_columns)
labels = combined_data['label'].values
X, y = prepare_data_for_lstm(scaled_data, labels, time_steps_u, num_classes_u)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

model = build_lstm_model(X_train.shape[1:], lstm_units, num_classes_u)
model.fit(X_train, y_train, epochs=epochs_u, batch_size=batch_size_u, validation_data=(X_test, y_test))
history = model.fit(X_train, y_train, epochs=epochs_u, batch_size=batch_size_u, validation_data=(X_test, y_test))

# ================== EVALUATION & PLOTTING ==================
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)

accuracy = accuracy_score(y_test_classes, y_pred_classes)
print(f"\n✅ Accuracy: {accuracy:.4f}\n")

class_indices = unique_labels(y_test_classes, y_pred_classes)
labels_to_show = [id_to_class[i] for i in class_indices]

cm = confusion_matrix(y_test_classes, y_pred_classes, labels=class_indices)
cm_percent = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] * 100

annot = np.empty_like(cm).astype(str)
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        count = cm[i, j]
        percent = cm_percent[i, j]
        annot[i, j] = f'{count}\n({percent:.1f}%)'

plt.figure(figsize=(5.5, 5.5))
sns.heatmap(cm, annot=annot, fmt='', cmap='Blues',
            xticklabels=labels_to_show, yticklabels=labels_to_show,
            cbar=False, square=True, linewidths=0.5,
           annot_kws={"size": 10})
plt.title('Confusion Matrix with Counts and Percentages')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.tight_layout()
plt.show()

# ROC Curves
fpr = dict()
tpr = dict()
roc_auc = dict()

for i in range(num_classes_u):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_pred[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

plt.figure(figsize=(8, 6))
color_map = mpl_cm.get_cmap('tab10', num_classes_u)

for i in range(num_classes_u):
    plt.plot(
        fpr[i], tpr[i], lw=2,
        color=color_map(i),
        label=f"ROC curve for {id_to_class[i]} (AUC = {roc_auc[i]:.2f})"
    )

plt.plot([0, 1], [0, 1], 'k--', lw=2)
# plt.xlim([0.0, 1.0])
# plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic (ROC) Curve")
plt.legend(loc="lower right")
plt.grid(True)
plt.tight_layout()
plt.show()

The code above is similar to the one used in the previous section for anomaly detection.
The key difference is that instead of using only healthy vs. faulty data, this version is trained using multiple labeled datasets, each corresponding to a specific fault condition (e.g., Loose Belt, Misalignment, Mass Change).

By doing this, the model learns not just to detect whether a fault exists, but also to classify the exact type of fault when one is detected.
This approach enables multi-class fault detection, making the model more informative and practical for real-world diagnostics.

The most critical parameters that control the model's behavior and performance are found in this section of the code:

# Import necessary libraries
import os  # For file path handling
import numpy as np  # Numerical operations
import pandas as pd  # Data handling with DataFrames
import matplotlib.pyplot as plt  # Plotting and visualization
import matplotlib.patches as mpatches  # Custom legend patches for plots

# Machine learning and signal processing tools
from sklearn.decomposition import PCA  # Principal Component Analysis
from sklearn.cluster import KMeans  # KMeans clustering
from sklearn.preprocessing import StandardScaler  # Standardize data
from scipy.stats import kurtosis, skew  # Statistical features
from scipy.fft import rfft, rfftfreq  # Fast Fourier Transform for frequency features

# Endaq IDE tool for loading .ide files
import endaq.ide

# ------------------ Load and Label Data ------------------

# This function loads sensor data and attaches a label indicating the condition (e.g., healthy, faulty)
def load_and_label_data(dataset_name, label, start_time=None, end_time=None):
    ext = os.path.splitext(dataset_name)[1].lower()  # Get the file extension

    # If the file is a CSV
    if ext == '.csv':
        data = pd.read_csv(dataset_name)
        # Apply time filtering if specified
        if start_time is not None:
            data = data[data['time'] >= start_time]
        if end_time is not None:
            data = data[data['time'] <= end_time]

    # If the file is a .ide format (Endaq format)
    elif ext == '.ide':
        doc = endaq.ide.get_doc(dataset_name)
        # Convert a specific channel (channel 80) into a pandas DataFrame
        data = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
        # Filter data based on start and end time
        if start_time is not None:
            data = data[data.index >= start_time]
        if end_time is not None:
            data = data[data.index <= end_time]
        data['time'] = data.index  # Add time column if not present

    else:
        raise ValueError(f"Unsupported file type: {ext}")

    data['label'] = label  # Assign label to the dataset (e.g., 0 = healthy)
    return data

# Loads and combines multiple datasets (healthy and faulty)
def load_multiple_datasets(healthy_datasets, faulty_datasets, start_time=None, end_time=None):
    combined_data = pd.DataFrame()

    # Load and label each healthy dataset
    for dataset in healthy_datasets:
        healthy_data = load_and_label_data(dataset, label=0, start_time=start_time, end_time=end_time)
        combined_data = pd.concat([combined_data, healthy_data], ignore_index=True)

    # Load and label each faulty dataset based on its fault type
    for fault_type, fault_datasets in faulty_datasets.items():
        for dataset in fault_datasets:
            faulty_data = load_and_label_data(dataset, label=fault_type, start_time=start_time, end_time=end_time)
            combined_data = pd.concat([combined_data, faulty_data], ignore_index=True)

    return combined_data

# ------------------ Feature Extraction ------------------

# Extract statistical and frequency features from a vibration signal segment
def extract_features(signal, sampling_rate=4052):
    N = len(signal)  # Number of samples
    freqs = rfftfreq(N, d=1/sampling_rate)  # Frequency bins
    fft_vals = np.abs(rfft(signal))  # FFT values (magnitude)

    # Return a dictionary of time-domain and frequency-domain features
    return {
        'mean': np.mean(signal),
        'std': np.std(signal),
        'rms': np.sqrt(np.mean(signal**2)),
        'kurtosis': kurtosis(signal),
        'skewness': skew(signal),
        'crest_factor': np.max(np.abs(signal)) / np.sqrt(np.mean(signal**2)),
        'peak_to_peak': np.ptp(signal),
        'freq_mean': np.mean(fft_vals),
        'freq_std': np.std(fft_vals),
        'freq_peak': freqs[np.argmax(fft_vals)]  # Frequency with max amplitude
    }

# ------------------ Define File Paths ------------------

# Paths to healthy condition datasets
healthy_datasets = [
    r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Healthy\DAQ16837_000146.ide',
]

# Dictionary mapping fault types to datasets
faulty_datasets = {
    1: [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\F_Loose\DAQ16837_000158.ide'],
    2: [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\F_missAlignment\DAQ16837_000155.ide'],
    3: [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Faulty\aeaa0a24-DAQ16837_000149-1744735366.ide']
}

# ------------------ Load and Process Data ------------------

combined_data = load_multiple_datasets(healthy_datasets, faulty_datasets)

# Define a fixed window size for segmentation (e.g., 1-second if sampling rate is 1 kHz)
window_size = 1000
features_list = []
labels_list = []

# Loop through each labeled group (e.g., all healthy, all loose, etc.)
for label, group in combined_data.groupby("label"):
    values = group.iloc[:, 1]  # Assuming vibration data is in the second column
    for i in range(0, len(values) - window_size, window_size):
        segment = values.iloc[i:i+window_size].values
        if len(segment) == window_size:
            features = extract_features(segment)  # Extract features from segment
            features_list.append(features)
            labels_list.append(label)

# ------------------ PCA and Clustering ------------------

# Convert list of features into a DataFrame
features_df = pd.DataFrame(features_list)
true_labels = np.array(labels_list)

# Standardize features to have zero mean and unit variance
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features_df)

# Apply PCA to reduce to 2 dimensions for visualization
pca = PCA(n_components=2)
pca_result = pca.fit_transform(features_scaled)

# Apply KMeans clustering to the feature space
kmeans = KMeans(n_clusters=4, random_state=42)
clusters = kmeans.fit_predict(features_scaled)

# ------------------ Visualization: PCA by True Labels ------------------

# Map numeric labels to human-readable names
label_names = {
    0: "Healthy",
    1: "Looseness",
    2: "Misalignment",
    3: "Mass change"
}
colors = ['green', 'red', 'blue', 'orange']  # Colors for plotting

# Create scatter plot of PCA results colored by true condition label
plt.figure(figsize=(10, 6))
for label in np.unique(true_labels):
    idx = true_labels == label
    plt.scatter(
        pca_result[idx, 0], pca_result[idx, 1],
        c=colors[label],
        label=label_names[label],
        s=50
    )

plt.title("PCA Projection (Colored by True Labels)")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.grid(True)

# Add custom legend
handles = [mpatches.Patch(color=colors[i], label=label_names[i]) for i in label_names]
plt.legend(handles=handles, title="Data Type")
plt.show()

# ------------------ Visualization: PCA by Clusters ------------------

# Plot PCA result with colors based on KMeans clustering
plt.figure(figsize=(10, 6))
plt.scatter(pca_result[:, 0], pca_result[:, 1], c=clusters, cmap='viridis', s=50)
plt.title("PCA Projection (Colored by KMeans Clusters)")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.grid(True)
plt.colorbar(label="Cluster ID")
plt.show()

# ------------------ PCA Components (Feature Influence) ------------------

# Display the weights of each original feature in the PCA components
pca_components = pd.DataFrame(
    pca.components_,
    columns=features_df.columns,
    index=["PCA 1", "PCA 2"]
)

# Print PCA component contributions for interpretation
print(pca_components.T)

✅ start_time = 0

What it does:
Sets the starting point (in seconds) for the data segment to be used from each file.
Tip:
If the initial data contains startup noise or irrelevant behavior, consider increasing this value (e.g., start_time = 10).

✅ end_time = 50

What it does:
Defines the end point (in seconds) for the data segment.
Tip:
Use a smaller window (like 30–60 seconds) to keep the dataset manageable during experimentation. For final models, increase if you want the model to learn long-term behavior.

✅ time_steps_u = 60

What it does:
Sets the number of time steps (sequence length) for the LSTM input. Each sample fed into the LSTM includes 60 time steps of sensor data.
Tip:
If your vibration patterns have longer temporal dependencies, try increasing this to 80 or 100. For faster training, you can reduce it to 30–40.

✅ test_size = 0.2

What it does:
Reserves 20% of the dataset for testing, while the remaining 80% is used for training.
Tip:
For smaller datasets, use a slightly higher test size like 0.3. For large datasets, 0.2 or even 0.1 may be sufficient.

✅ epochs_u = 2

What it does:
Sets the number of complete training cycles (epochs) the model will go through.
Tip:
This is low for real training—start with 10–30 epochs to properly train your model. Use 2–5 epochs for quick debugging only.

✅ batch_size_u = 32

What it does:
Specifies the number of samples the model processes before updating weights.
Tip:
Common choices are 16, 32, 64. Lower values (e.g., 16) may help with noisy data, while higher values (e.g., 64) can speed up training but might require more memory.

✅ lstm_units = 50

What it does:
Sets the number of LSTM memory cells (neurons) in each LSTM layer.
Tip:
More units (e.g., 100 or 128) give the model more learning capacity. Use higher values if your data has complex patterns and you have enough training samples.

✅ selected_columns = ['X (40g)', 'Y (40g)', 'Z (40g)']

What it does:
Chooses the input features from the vibration sensor—here, acceleration along the X, Y, and Z axes.
Tip:
If you know one axis captures most of the signal, you can use just ['X (40g)'] to reduce dimensionality. Otherwise, keeping all three gives a fuller picture.

Once you provide the correct file paths for your datasets and run the code, the model will:

✅ Train using the labeled vibration data,
📈 Return the accuracy of the model on the test set,
📊 Display a confusion matrix showing how well each fault type was classified,
📉 Show ROC curves for each class, giving a visual indication of how well the model separates different conditions.

These outputs help evaluate both the overall performance and the model’s ability to distinguish between specific fault types.

Figure 10. ROC curve for detecting fault type.

Figure 11. ROC curve for detecting fault type.

Based on the confusion matrix, we can conclude that the ML model accurately detects and classifies fault types with high reliability:

Healthy: 99.2% of healthy samples are correctly classified as healthy. Only 0.8% are misclassified, mostly as "MassChange."
Loose: 99.9% of loose condition data are accurately detected. A very small number are misclassified as healthy or another fault type.
Misalignment: Nearly all misaligned samples are correctly labeled, indicating a distinct and consistent vibration signature that sets it apart from both healthy and other faulty conditions.
MassChange: 99.8% of mass-changed cases are correctly identified, with very few misclassifications.

These results show that the model is not only able to detect faults but also to identify the root cause with very high precision across all fault categories.

Statistical Methods

Apart from Machine Learning (ML) approaches, statistical tools can also be very effective for detecting significant changes or anomalies in databases. Unlike ML, which often requires training on labeled data, statistical methods analyze the data directly—looking for shifts in distribution, central tendency, or spread.

These methods are especially useful when:

You have limited labeled data.
You want to perform quick and interpretable checks.
You're dealing with simple or small-scale systems.

Here's a more advanced statistical and unsupervised learning methods table that includes powerful tools like PCA, t-SNE, UMAP, ICA, Autoencoders, and Factor Analysis—commonly used for anomaly detection, dimensionality reduction, or feature understanding:

Method	Linear?	Good for clustering?	Notes
PCA (Principal Component Analysis)	✅ Yes	⚠️ Limited	Great for dimensionality reduction; highlights variance and outliers in a compact space.
t-SNE (t-distributed Stochastic Neighbor Embedding)	❌ No	✅ Excellent	Preserves local structure well; best for visualizing clusters and anomalies.
UMAP (Uniform Manifold Approximation and Projection)	❌ No	✅ Excellent	Faster than t-SNE; preserves both local and some global structure; great for large data.
ICA (Independent Component Analysis)	✅ Yes	⚠️ Limited	Useful for separating mixed signals; often used in signal processing and vibration data.
Autoencoders	❌ No	✅ Excellent	Deep learning-based; detects anomalies based on reconstruction error.
Factor Analysis	✅ Yes	⚠️ Limited	Identifies latent variables behind observed data; good for understanding data structure.
Parallel Coordinates Plot (PCP)	✅ Yes	⚠️ Visual Aid Only	A visualization method to detect patterns or separability across features.

In this section, we'll perform PCA on the databases. Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of large datasets while preserving as much variability (information) as possible. It transforms the original variables into a new set of uncorrelated variables called principal components. These components are ordered by the amount of variance they capture from the data, allowing us to identify patterns, highlight similarities or differences, and simplify complex datasets for visualization or further analysis.

In the context of condition monitoring, PCA helps uncover hidden structure in sensor or measurement data by projecting it into a lower-dimensional space—making it easier to distinguish between different operating states or faults.

# ================== SETUP ==================
import pandas as pd
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, roc_curve, auc
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.utils import to_categorical
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.utils.multiclass import unique_labels
import endaq.ide
import matplotlib.cm as mpl_cm

# ================== PARAMETERS ==================
start_time = 0
end_time = 50
time_steps_u = 60
test_size = 0.2
epochs_u = 2
batch_size_u = 32
lstm_units = 50
selected_columns = ['X (40g)', 'Y (40g)', 'Z (40g)']

# ================== DYNAMIC DATASET DICTIONARY ==================

datasets = {
    #### Small test rig
    "Healthy": [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Healthy\DAQ16837_000146.ide'],
 
    "Loose": [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\F_Loose\DAQ16837_000158.ide'],
    "Misalignment": [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\F_missAlignment\DAQ16837_000155.ide'],
    "MassChange": [r'C:\Users\hdabiri\OneDrive - Hutchinson Industries Inc\HAMED\Marketing\Faulty\aeaa0a24-DAQ16837_000149-1744735366.ide']
}

# ================== LABEL ENCODING ==================
class_names = list(datasets.keys())
class_to_id = {name: idx for idx, name in enumerate(class_names)}
id_to_class = {idx: name for name, idx in class_to_id.items()}
num_classes_u = len(class_to_id)

# ================== DATA LOADING FUNCTIONS ==================
def load_and_label_data(filepath, label, start_time=None, end_time=None):
    ext = os.path.splitext(filepath)[1].lower()
    if ext == '.csv':
        data = pd.read_csv(filepath)
        if start_time is not None:
            data = data[data['time'] >= start_time]
        if end_time is not None:
            data = data[data['time'] <= end_time]
    elif ext == '.ide':
        doc = endaq.ide.get_doc(filepath)
        data = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
        if start_time is not None:
            data = data[data.index >= start_time]
        if end_time is not None:
            data = data[data.index <= end_time]
        data['time'] = data.index
    else:
        raise ValueError(f"Unsupported file type: {ext}")
    data['label'] = label
    return data

def load_all_labeled_data(datasets_dict, start_time, end_time):
    all_data = pd.DataFrame()
    for class_name, file_list in datasets_dict.items():
        label = class_to_id[class_name]
        for filepath in file_list:
            labeled_data = load_and_label_data(filepath, label, start_time, end_time)
            all_data = pd.concat([all_data, labeled_data], ignore_index=True)
    return all_data

# ================== MODEL PREPARATION FUNCTIONS ==================
def preprocess_data(data, selected_columns):
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(data[selected_columns])
    return scaled_data, scaler

def prepare_data_for_lstm(scaled_data, labels, time_steps=time_steps_u, num_classes=num_classes_u):
    X, y = [], []
    for i in range(time_steps, len(scaled_data)):
        X.append(scaled_data[i-time_steps:i, :])
        y.append(labels[i])
    X = np.array(X)
    y = to_categorical(y, num_classes=num_classes)
    return X, y

def build_lstm_model(input_shape, lstm_units=50, num_classes=num_classes_u):
    model = Sequential()
    model.add(LSTM(lstm_units, return_sequences=True, input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(LSTM(lstm_units, return_sequences=False))
    model.add(Dropout(0.2))
    model.add(Dense(num_classes, activation='softmax'))
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# ================== EXECUTION ==================
combined_data = load_all_labeled_data(datasets, start_time, end_time)
scaled_data, scaler = preprocess_data(combined_data, selected_columns)
labels = combined_data['label'].values
X, y = prepare_data_for_lstm(scaled_data, labels, time_steps_u, num_classes_u)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

model = build_lstm_model(X_train.shape[1:], lstm_units, num_classes_u)
model.fit(X_train, y_train, epochs=epochs_u, batch_size=batch_size_u, validation_data=(X_test, y_test))

# ================== EVALUATION & PLOTTING ==================
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test, axis=1)

accuracy = accuracy_score(y_test_classes, y_pred_classes)
print(f"\n✅ Accuracy: {accuracy:.4f}\n")

class_indices = unique_labels(y_test_classes, y_pred_classes)
labels_to_show = [id_to_class[i] for i in class_indices]

cm = confusion_matrix(y_test_classes, y_pred_classes, labels=class_indices)
cm_percent = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] * 100

annot = np.empty_like(cm).astype(str)
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        count = cm[i, j]
        percent = cm_percent[i, j]
        annot[i, j] = f'{count}\n({percent:.1f}%)'

plt.figure(figsize=(6, 6))
sns.heatmap(cm, annot=annot, fmt='', cmap='Blues',
            xticklabels=labels_to_show, yticklabels=labels_to_show,
            cbar=False, square=True, linewidths=0.5)
plt.title('Confusion Matrix with Counts and Percentages')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.tight_layout()
plt.show()

# ROC Curves
fpr = dict()
tpr = dict()
roc_auc = dict()

for i in range(num_classes_u):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_pred[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

plt.figure(figsize=(8, 6))
color_map = mpl_cm.get_cmap('tab10', num_classes_u)

for i in range(num_classes_u):
    plt.plot(
        fpr[i], tpr[i], lw=2,
        color=color_map(i),
        label=f"ROC curve for {id_to_class[i]} (AUC = {roc_auc[i]:.2f})"
    )

plt.plot([0, 1], [0, 1], 'k--', lw=2)
# plt.xlim([0.0, 1.0])
# plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic (ROC) Curve")
plt.legend(loc="lower right")
plt.grid(True)
plt.tight_layout()
plt.show()

And the results will be:

fig-12

Figure 12. PCA for healthy and faulty databases.

The PCA (Principal Component Analysis) plot reveals several insightful patterns in the data related to different belt conditions.

Loose Condition: Data associated with the loose condition forms a narrow cluster along the PCA1 axis and spans a wider range along PCA2. This cluster is also positioned far from other conditions in terms of both PCA1 and PCA2, suggesting that PCA could potentially be effective in identifying loose belt conditions. However, more data from additional "loose" condition datasets is necessary to validate this finding.
Mass-Changed Condition: Data for the mass-changed condition is spread out across both PCA1 and PCA2 axes and appears on the right side of the plot. This distribution may offer useful insights for clustering mass-changed conditions separately from other operational states.
Misaligned and Healthy Conditions: These two conditions largely overlap in the PCA space, implying that PCA produces similar results for both. Consequently, PCA may not be a reliable method for distinguishing between healthy and misaligned belt states.

It is important to note that these observations are specific to the current test dataset and should not be generalized to all belt-driven systems in real-world applications. Nevertheless, the results suggest that PCA can be a valuable preliminary tool for distinguishing between different operational conditions—without relying on machine learning-based methods.

The components of PCA1 and PCA2 are provided in the table below, offering a deeper understanding of the features that contribute most to the variance in each principal component. This information can be particularly useful when interpreting the positioning of different condition data in the PCA plot.

Table 1. Components of PCA1 and PCA2.

Parameter	PCA1	PCA2
Mean	0.077504	-0.530687
Std	0.386295	0.000617
RMS	0.385350	0.050540
Kurtosis	0.265002	0.395965
Skewness	-0.108547	0.678720
Crest_factor	0.295110	-0.150217
peak_to_peak	0.378750	0.115777
freq_mean	0.379757	0.028138
freq_std	0.370553	0.128894
freq_peak	0.319687	-0.212113

Conclusions

Vibration data plays a crucial role in condition monitoring and predictive maintenance of mechanical systems. By analyzing this data, we can detect anomalies early, helping to prevent costly damage and unexpected downtime. In this blog, we demonstrated how vibration data can be leveraged for fault detection and diagnosis.

We began by visualizing vibration signals along the X, Y, and Z axes, followed by statistical analysis and histogram representations to gain a clearer understanding of the signal characteristics. These visual and statistical insights provide a solid foundation for identifying abnormal patterns in system behavior.

To take this further, we employed Long Short-Term Memory (LSTM) networks—first for detecting anomalies in the system, and then for classifying specific fault types such as misalignment, looseness, and mass imbalance. LSTM's ability to capture temporal dependencies makes it well-suited for understanding the sequential nature of vibration signals.

One important takeaway is that detecting unhealthy conditions is relatively straightforward when we have access to healthy (baseline) vibration data, which can typically be collected during normal operation. However, identifying the exact fault type is more challenging, as it requires labeled vibration data for each specific fault condition—something that may not always be readily available in real-world scenarios.

In cases where labeled data is scarce, statistical and unsupervised methods such as clustering can be explored to group similar conditions. While this approach holds promise, it demands deeper analysis and careful consideration to ensure meaningful and accurate results.

Resources, Scripts & Files

Here are the files and scripts associated with this blog post.

Download this blog post as a PDF

White Paper: Vibration-Based Fault Detection in Belt Drive Systems Using FFT and LSTM Models: Presented at IWSHM Proceedings 2025

Machine Learning Blog Posts

Find Out More: Machine Learning Applications Page

Detecting Anomalies in Belt-Driven Systems with Machine Learning

How Vibration Data Can be Leveraged for Anomaly Detection and Fault Diagnosis

Introduction to Anomaly detection and ML-based methods

What is an Anomaly?

What are Faults in Vibration Machinery?

Common Machine Learning Methods for Anomaly and Fault Detection

What is LSTM?

What Kind of Neural Networks Are They?

How Do LSTMs Work?

LSTM Architecture

Why Use LSTMs for Vibration Analysis?

The test designed for anomaly detection

Data Analysis

Importing libraries

Visualization

Statistics

Histograms

Anomaly Detection

Fault detection

Statistical Methods

Conclusions

Resources, Scripts & Files

Download Python Script

Machine Learning Blog Posts

Hamed Dabiri

Detecting Anomalies in Belt-Driven Systems with Machine Learning

How Vibration Data Can be Leveraged for Anomaly Detection and Fault Diagnosis

Introduction to Anomaly detection and ML-based methods

What is an Anomaly?

What are Faults in Vibration Machinery?

Common Machine Learning Methods for Anomaly and Fault Detection

What is LSTM?

What Kind of Neural Networks Are They?

How Do LSTMs Work?

LSTM Architecture

Why Use LSTMs for Vibration Analysis?

The test designed for anomaly detection

Data Analysis

Importing libraries

Visualization

Statistics

Histograms

Anomaly Detection

Fault detection

Statistical Methods

Conclusions

Resources, Scripts & Files

Download Python Script

Machine Learning Blog Posts

Hamed Dabiri

Sign Up for Sensing Tips!