Qeexo has been invited by Arm to integrate the Arm Virtual Hardware (AVH) Platform ‘devices’ into Qeexo AutoML. The Arm Virtual Hardware Platform is a cloud-based hardware simulator of the Arm Cortex-M55 MCU and the Ethos-U55 microNPU Machine Learning Processor designs available from Arm through the AWS Marketplace. Qeexo users can apply Machine Learning functions with these new virtual ‘device’.

In this article:


Working with Projects

A. What is Project?

B. Creating and Managing Project

Data Management

A. Collecting data

B. Uploading dataset

C. Data check

D. Training data, Test data and Data operation

E. Viewing and managing project data

F. Visualizing data

Building Machine Learning Models

A. Getting started

B. Training settings

C. Training process

D. Training result

E. Test model performance on Test Data

F. Live Reply

Note, Arm Virtual Hardware is a virtual hardware, so sensor installation is NOT needed.


Working with Projects

A. What is Project?

"Project" is the basic unit of organization in the Qeexo AutoML system. A Project represents a collection of work to solve a specific machine learning problem on a particular target hardware (here in this article, we are using a virtual hardware).

You DO NOT need to install any hardware (sense module) to be able to create a project as AVH is virtual.

An example of a Project might be something like "Turbine-Predictive-Maintenance", where the data, models, and tests are compiled with the end goal of using machine learning for predictive maintenance on Arduino Nano 33 BLE devices attached to turbines.

B. Creating and Managing Project

To help you better understand the guide, we are presenting a demo project - “Demo-AVH” - to demonstrates every step. Please follow the IN DEMO CASE sign along articles.

1. As a new user, you will be taken to the “Create Project” page after logging in, where you can specify a “Project Name”, “Classification Type”, and the “Target Hardware”.

a. Project Name: Enter a name that is reflective of the purpose of your project.
IN DEMO CASE we name it as “Demo-AVH”.

b. Classification Type: Choose “Multi-Class Classification”.

For current AutoML version, Arm Virtual Hardware is ONLY supported by “Multi-Class Classification“.
Click the link to get more information about different type of classification.

IN DEMO CASE we select “Multi-Class Classification”.

c. Target Hardware: Select the hardware that will be used in your project which will be “ARM Virtual Hardware” for this article.

IN DEMO CASE we select “ARM Virtual Hardware”.

2. After the selection, click CREATE button to create you project.

Now you have successfully created your AVH project!

Data Management

Please refer this page for Arm Virtual Hardware Project Best Practices.

A. Collecting data

Navigate to the Data Collection page to collect data using the Qeexo AutoML web app. This can be done either by clicking the COLLECT TRAINING DATA button or the DATA COLLECTION tab. Then you will be navigated to data collection page.

Step 1: Build Environment

*What is Environment?

An Environment is a physical setting with a given set of properties (e.g. acoustics, temperature, lighting). The range of this set of properties should match the range of the environment where the final machine learning model will eventually run. For example, training the machine learning models with data in your office will likely not work very well once you test the trained models on the factory floor. Environments also contain information about the given sensor configuration settings. All data collected for a given Environment will have the same sensor configuration.

You can either BUILD NEW ENVIRONMENT by entering a unique "Environment Name", or SELECT AN ENVIRONMENT to add more data to a previously recorded Environment. If selecting an existing Environment, the Sensor Configuration (in Step 2: Configure Sensors) will automatically populate with the Environment's previous settings. You should name your Environment something easily recognizable to you, with details about the specific location. For example, "OfficeCoffeetable" or "VestasTurbineSolano".

IN DEMO CASE we name the environment as “Office1” as the model training is done in an office environment. Once you input the name, click SAVE to proceed to the next step.

Step 2: Configure Sensors

Click EDIT in Step 2: Configure Sensors to view the list of the supported sensors on the Target Hardware. After selecting the sensors, you will need to configure the corresponding sampling rate - ODR (Output Data Rate) for each sensor and the Full Scale Range (FSR) when available.

Currently AVH only supports Microphone sensor with a corresponding sampling rate(Output Data Range or ODR) of 16000 Hz.

After selecting the Microphone sensor, click on USE SENSOR CONFIG to save the sensor configurations for your AVH project.

Step 3: Collect Data

Qeexo AutoML currently supports a variety of supervised classification algorithms for machine learning. For each of these algorithms, all data used for training must have an associated Class Label(aka, COLLECTION NAME).

*For multi-class, at least two unique classes must be defined. For most problems, we recommend that at least one of the classes be a "baseline" class that represents the typical environmental noise or behavior.

Whether or not baseline data is necessary depends on the use case and data selected. In general, the classes collected for multi-class classification should represent the full set of possible states for the given Environment. For example, if you want to build a multi-class model which can recognize various types of keywords/speech(e.g. Yes, No), you should also collect data that represents a baseline class(e.g. Silence).

  • Baseline Data:

    • Baseline data can be collected by setting the data type to Continuous and leaving data collection application to run while the environment is in a steady state of rest or typical operating behavior.

    • Some machine learning problems require collecting baseline data to differentiate events of interest from normal environmental conditions.

    • Baseline data is usually associated with each Environment (since different Environments will often have different baseline data characteristics).

    • For example, baseline data might be "NoGesture" in gesture recognition, "None" in kitchen appliance detection, or "AtRest" in logistics monitoring, “Silence” in speech recognition.

  • Class Label / Collection Label:

    • A Class Label is a machine learning concept, normally a word or phrase to label the event or condition of interest. For example, "Yes", "No", and "Silence" can be classes in our Speech Recognition Project.

    • For Continuous data, the Class Label applies to all of the data collected.

    • You must define one Class Label at a time when collecting data by entering a text string in the given field.

    • Note that only alphabets, numbers, and underscores are allowed when naming class labels in MLC projects.

  • Number of Seconds

    • This sets the duration of the data collection.

    • More data generally leads to higher performance. Depending on the complexity of the use case, the number of classes, the quality of the data, and many other factors, the optimal and minimum number of seconds to collect can vary greatly. We recommend starting with at least 30 seconds for each Class Label, but much more data may be required if the classes are highly variable or if the problem is sufficiently complex.

IN DEMO CASE we are going to create 3 class labels which are “Yes”, “No” and “Silence”. “Silence” is our baseline class meaning data collected when the sensor doesn't sense any words. Move into the next section to see how to record data for each of the class label.
*Note you need to repeat step 3 for each label’s data recording.

3-1 Recording data

After completing the previous steps, the RECORD button should now become click-able (shown as green part in the picture above). If it is not, check previous steps.

After clicking RECORD, you will be directed to the Data Recording page, where you will see a prompt for “Attention”. This prompt is to remind you to position yourself directly in front of your computer microphone at a convenient distance when performing data collection and Live Replay. This helps in improving the model’s performance. Once ready, please go ahead and click CONFIRM.

You will now see the following screen, which means you are now all set to record data. When you are ready to start data collection, click START to begin. The text in the center cycle will change from “READY” to “INITIALIZE” while the data collection software is starting up.

After a few seconds, data collection will start when you see the circle turn green and display "GO". Data is now being collected. As you say “Yes” into the microphone, the center cycle will have a ““ sign.

Once the specific “number of seconds” have been collected, the labeled data will be uploaded to the database, and user will redirected to the Data Collection page.

You can collect more data of the same of different Class Label from the Data Collection page.

*Note that for a multi-class classification project, you will need at least 2 distinct classes (aka, 2 different Class Labels) to be able to train machine learning model.

IN DEMO CASE we will be creating 3 Class Labels in total, which are “Yes”, “No”, and “Silence”. You need to go through the process of Collect Data and Recording data for at least two more times for the two remaining classes (“Yes” and “Silence”).

The final result should look like below:

IN DEMO CASE Note, the SILENCE data will yield a WARNING for DATA CHECK. It is okay to continue proceed.

3-2 Re-recording data

If you believe a mistake has been made when recording data, and the data has been contaminated, you can re-record the data from the bottom of the Data Collection page. You can click "Re-Record" to overwrite the existing data. Alternatively, you can click on the Trash icon to delete the Dataset and start over.

B. Uploading dataset

From the Data page, you may also upload previously-collected datasets to AutoML directly. These uploaded datasets can be used to train machine learning models.
*Note: Data with the same Class Label must be of the same data type (Event or Continuous).

Click UPLOAD TRAINING DATASET to upload .csv file(s). Each .csv should contain one or more data collections. All data contained in the .csv file must come from the same sensor configuration, which you will enter after uploading the .csv file. If you have more than 70 MB of data, you will need to split it into multiple .csv files. Please refer this link for Qeexo AutoML-defined data format.

Select Build an environment, then put down an ENVIRONMENT NAME that is relevant to you dataset.

Then click CHOOSE FILE(S) to select the dataset that you would like to upload. Click NEXT to proceed.

*Note, Qeexo AutoML allows you to upload up to 10 files at a time with a maximum size of 70MB each. If you have over 10 files to upload, please upload multiple times to complete. For the first time, you may “Build an environment”. If you want to add more files to you existing environment, you may click “Select an environment” → select the existing environment to upload more files.

Then AutoML will verify you data, Click SAVE to start the uploading process.
*Note, the sample(audio) dataset is in big size which make take some time to finish uploading, please be patient waiting.

Once the process complete, it will jump to Data page where you can view and manage your uploaded data.

C. Data check

Data check verifies the quality of the data, whether uploaded or collected. A failure in data check will not prevent you from using the data to train machine learning models. However, poor data quality may result in poor model performance.

Qeexo AutoML currently looks for the following data issues:

  • collected data does not match the selected sensors in the Sensor Configuration step

  • collected data does not match the selected sampling rate in the Sensor Configuration step

  • collected data contain duplicate or missing timestamps

  • collected data has duplicate values or constant values

  • collected data contains invalid values including NaN or inf

  • collected data is saturated

Here is an example of a data check with warnings:


A green PASS icon indicates that data check has passed;

A yellow WARNING icon indicates that the data contains one or more issues from the list above;

A red ERROR icon indicates that something went wrong during data collection or during data check (connection error or device error), the data may not be usable if it remains ERROR after refresh.


D. Training data, Test data and Data operation

Click the link for more information about Training data, Test data and Data operations

E. Viewing and managing project data

All of the Datasets associated with the current Project can be viewed and managed from the Data page. You can review the Dataset Information including its Sensor Configurations and Data Check results, as well as visualize and delete them.

F. Visualizing data

AutoML provides users with the ability to plot and view sensor data directly from the platform using the onboard data visualization tool. To visualize training or test data, click the data visualization icon.

Navigate data using scroll, scale, and zoom options and view data in either Time Domain or Frequency Domain.

  • Time Domain visualization is a visual representation of the signal’s amplitude and how it changes over time. With time domain visualization, the x-axis represents time, whereas y-axis represents the signals amplitude.

  • Frequency Domain visualization, also known as spectrogram frequency visualization is a visual representation of the spectrum of frequencies of a signal as it varies with time. With spectrogram frequency visualization, the x-axis represents time, whereas y-axis represents the signal’s frequency.

Building Machine Learning Models

A. Getting started

  1. Navigate to the Data page to build machine learning models with uploaded training data.

  2. Select training datasets that you want to use for building machine learning models by clicking the checkbox at the left of each Datasets.
    *Note that the selected Datasets should ideally be from the same Environment, but Qeexo AutoML will allow you to train Datasets from different Environments as long as the selected sensors and Sensor Configuration are identical.
    IN DEMO CASE we select all 19 dataset that we uploaded from ‘train’ folder.

  3. Once the desired Datasets are selected, click START NEW TRAINING button to configure Training Settings.
    *Note that the START NEW TRAINING button is only clickable when Datasets containing 2 or more Class Labels are selected in Multi-class classification. However, for One-class classification, the button becomes clickable as soon as the one Class Label has been selected.

B. Training settings

Step 1: Group labels

This step is an optional step in case you want to group together multiple Class Labels into one Class Label before training the model.

This is an optional step that can be bypassed by pressing the SKIP button.

For example, for a single-class classification project applied to anomaly detection, you may have machinery data that is labelled based on two different types of motion: vertical rotation (UPDOWN) and horizontal rotation (LEFTRIGHT). Since both of these classes are expected behavior, it is convenient to group these labels as a "Normal" group to feed into single-class classification.

IN DEMO CASE we will skip Group Labels step as we don’t need to.

Step 2: Model Selection & Settings

This page is for you to select which model type(s) are trained. We are going to discuss steps with respect of Multi-Class Classification Classification type ONLY as Arm Virtual Hardware project only supported by Multi-Class Classification.

Please click the link for Model Selection&Setting procedure of all other “Classification Types”.

(1) Algorithm Selection

For Multi-Class Classification, Qeexo AutoML supports the following machine learning algorithms

*Selecting more than one type of algorithm is recommended, so that results could be compared.

Support for additional algorithms will be added in the future

⇲ Click to expand to check Multi-Class Classification supported Machine Learning Algorithms
  • Ensemble Methods:

    • Gradient Boosting Machine (GBM)

    • Random Forest (RF)

    • XGBoost (XGB)

    Neural Networks:

    • Artificial Neural Network (ANN)

    • Convolutional Neural Network (CNN)

    • Convolutional Recurrent Neural Network (CRNN)

    • Recurrent Neural Network (RNN)

    Support Vector Machines:

    • Polynomial Support Vector Machine (POLYSVM)

    • RBF Support Vector Machine (RBFSVM)

    • Support Vector Machine (SVM)


    • Decision Tree (DT)

    • Gaussian Naive Bayes (GNB)

    • Logistic Regression (LR)

*Note, Neural Networks models may take longer to train, due to the significant computation required for the training process.

*Additional - the CONFIGURE button

Note: many of these parameters interact with each other in unique and non-intuitive ways. Unless you have significant experience tuning deep learning models, you may want to consider using the automatic hyperparameter optimization tool.

Pressing CONFIGURE (available for some models) will yield the following configuration screen:

Quantization denotes an option to conduct quantization - aware training so as to achieve model size reduction.

There are additional configurable options to fine tune the ANN model

⇲ Click to expand to check configurable options details for ANN model

Configurable Option


Learning rate

Scaling parameter which sets the step size at each iteration in optimization of the cost function

Layer 1 units

Number of nodes in layer 1

Layer 2 units

Number of nodes in layer 2

Layer 3 units

Number of nodes in layer 3


Number of passes through the complete training dataset; one epoch means the network will use each training instance exactly once

Dropout rate

Fraction of units to drop during each training round, applied to all network layers

Batch size

Number of training examples in one training round; higher batch sizes may have faster runtimes, but are more likely to get stuck in local optima

Batch normalization

If true, apply normalization process to the output of each layer, typically helpful for improving the convergence and stability of the training process


Function applied to the outputs of the neurons

Similarly there are configurable options to fine tune the CNN model

⇲ Click to expand to check configurable options details for CNN model

Configurable Option


Tensor Length limit

Threshold length that determines whether to stop adding convolution layers (reducing the length limit will lead to more convolution layers)

Learning rate

Scaling parameter which sets the step size at each iteration in optimization of the cost function

Batch size

Number of training examples in one training round; higher batch sizes may have faster runtimes, but are more likely to get stuck in local optima

Dense layer units

Number of nodes in the final network layer

Dropout rates

Fraction of units to drop during each training round, applied to all network layers


Number of passes through the complete training dataset; one epoch means the network will use each training instance exactly once

Input layer filters

Number of filters in the first convolution layer

Intermediate layers filters

Number of filters in all the intermediate convolution layers

Input layer strides

Number of samples to move at each step along one direction for the first convolution layer

Intermediate layers strides

Number of samples to move at each step along one direction for all intermediate convolution layers


If true, apply data augmentation technique to prevent overfitting; will lead to higher training time due to larger amount of data

Batch normalization

If true, apply normalization process to the output of each layer, typically helpful for improving the convergence and stability of the training process


Function applied to the outputs of the neurons

Input layer kernel size

Filter kernel size in the first convolution layer

Intermediate kernel size

Filter kernel size in the all intermediate convolution layers

Configuration sub-menu for other algorithms will be added in the future

Select the algorithm(s) you want to train the model by clicking the Switch button. You can chose one or more algorithms.

Then click NEXT to proceed to Model Settings.

IN DEMO CASE We are going to select 4 algorithms - GBM, ANN, SVM and DT, then click NEXT to proceed to Model Settings page.

(2) Model Settings

There are two parts in Model Settings page for you to select and input information which are Generate Learning Curve(s) and Hyperparameter Tuning.

- Generate Learning Curve(s)

If enabled, this option will produce learning curves for the given data set. Learning curves visualize how your model is improving as more data is added. These curves can be extrapolated, which can be useful for determining if the model may benefit from additional data collection.

As shown in the example below, the "Circle" and "Punch" gestures are still improving with additional data. It is likely that they would continue to improve if more data is collected.

*Note: If the dataset that is used for training is very small, the learning curves may not be accurate. The model may be very good at classifying the limited data it's seen, but might not generalize to new cases. In that case, even if the learning curve does not show it, it is safe to assume that final model performance will improve with additional data collection.

- Hyperparameter Tuning

Hyperparameters are a set of adjustable parameters of machine learning models. These parameters affect the accuracy, runtime, and size of machine learning models. Different models have different parameters depending on the model architecture. AutoML provides built-in option for tuning these hyperparameters. There is a simply switch users need to flip if hyperparameter optimization is desired. If this option is enabled, AutoML tunes hyperparameter using a collection of optimization techniques tailored to TinyML applications. It maximizes accuracy while it ensures that all resource usages are under constraints (e.g., firmware binary size and memory usage). This option will often improve final model accuracy at the expense of additional runtime for model-building.

There are three settings that affect the duration of the hyperparameter tuning stage:

- Optimizer Time Limit:

- Optimizer Number of Trials:

- Optimizer Error Threshold:

Once you are ready, click START TRAINING to proceed to Training Process.

IN DEMO CASE For Model Settings, we will leave everything as default, simply click START TRAINING to proceed.

C. Training process

Once you clicked START TRAINING with one or more selected machine learning algorithms, the training process will begin.

Real-Time Training Progress pops up after training begins. The top row shows the progress of common tasks (e.g. featurization, data cropping, etc.) shared between different algorithms, followed by the build progress of each of the selected models.

At the end of the training process, Qeexo AutoML will flash, in sequence, each of the built models to the hardware device to test and measure the average latency for performing classifications.

IN DEMO CASE Note the demo may take up to hours to train model. Please be patient.

D. Training result

Click TRAINING RESULT to navigate to the Models page (also reachable from the top navigation bar), where all of the previous trainings will be listed, with the most recent one on top.

The current training will be expanded to show relevant information about model performance, including ML MODEL (the type of machine learning model), CROSS VALIDATION accuracy, LATENCY, SIZE, and additional PERFORMANCE SUMMARY. It also allows you to SAVE each model to your computer, PUSH TO HARDWARE (push a selected model to Target Hardware for LIVE TEST), LIVE CLASSIFICATION ANALYSIS and DELETE the model.

⇲ Click to expand to check the explanation and details for each relevant information
  • ML MODEL - Each entry is differentiated by the algorithm with which each model had been built. We also call these machine learning "packages" because they include supporting code such as sensor drivers in addition to the machine learning models built by Qeexo AutoML.

  • CROSS VALIDATION - This is the average classification accuracy for 8 different models, each trained and tested on different, mutually-exclusive subsets of the given data.
    *This is always a value between [0, 1], with 0 being the worst accuracy and 1 being perfect accuracy.

  • LATENCY - Latency is the average time (in milliseconds) required for the machine learning model to compute the prediction of a single instance. It includes time spent on featurization of sensor data and running inference with the model. We calculate this average empirically by first flashing each model to the Target Hardware, running 10 inferences, then taking the average.

    *Note that the concept of latency is not applicable to MLC projects.
    *The short the value, the better.

  • SIZE - This is the memory size of the model parameters and the model interpreter. The model interpreter executes the model parameters in combination with the sensor readings to provide the model results. This measure gives an idea of the impact of this model to on-device memory usage in comparison to other models trained on the same data.
    Consider the following notes to understand the ML Model Size measurement:
    - None of the model size measurements include the sensor data processing and featurization code. That can add 10KB – 20KB to the size of the final library for all models except raw-data-based models (CNN, CRNN, RNN)
    - The static library output from AutoML may be larger than the size reported due to the featurization code as well as other necessary interface utilities
    - The binary used for flashing to the device for Live Testing from AutoML will be significantly larger as it also must include other system libraries for the target platform - The concept of memory size with MLC projects does not apply as the decision tree is implemented in hardware

  • PERFORMANCE SUMMARY - Press "PERFORMANCE SUMMARY " to bring up a pop-up window with additional information about each of the machine learning models.
    *Note that the amount of model details depends on whether the project is for single-class or multi-class classification.

    • Multi-class classification

      • UMAP and PCA Plots: we are showing the dimensionality reduction UMAP and PCA plots as visual indications of how the training datasets are "clustered" in the given model.

      • Confusion Matrix: it represents True Labels and Predicted Labels. Diagonal (upper left to lower right) elements indicates instances correctly classified. Off-diagonal elements indicate instances mis-classified. Summing instances over each row should sum to total instances for the respective class. For Multi-Class Anomaly Classification, there will be an extra unknown class label.

      • Cross Validation: By-fold Accuracies vs Classes: it represents the spread of classification accuracies across the CV folds. This representation is done by-class. If the by-fold points are all shown close to the mean line, this shows that the average by-class accuracy is a precise measurement of how well the model should perform for the given class. More variance in the by-fold points suggests that the model may perform much better or much worse than expected.

      • Learning Curve: it illustrate the performance for each class at different number of instances of data collected/uploaded. Each point on the Learning Curve is the cross-validation accuracy at the respective data size. This gives an understanding of whether adding more data will help to improve the classification performance for each class and whether similar performance can be achieved with fewer instances of data.

      • ROC Curve: RoC Curves plot the False Positive Rate (FPR, x-axis) vs. True Positive Rate (TPR, y-axis) for each class in the classification problem. The dotted line indicates flip-of-the coin performance where the model has no discriminative capacity to distinguish between 2 classes. The greater the area under the curve (AUC), the better the model.

      • Matthew's Correlation Coefficient (MCC): it is a measure of discriminative power for binary classifiers. In the multi-class classification case, it can help show you which combinations of classes are the least well understood by your model.
        * The values can range between -1 and 1, although most often in AutoML the values will be between 0 and 1. A value of 0 means that your model is not able to distinguish between the given pair of classes at all, and a value of 1 means that your model can perfectly make this distinction. For Multi-Class Anomaly Classification, there will be an extra unknown class label.
        There will be one MCC value for every pair of class labels in the datasets (order does not matter). For example, there will be 3 coefficients for each combination of the 3 class labels, and 6 coefficients for 4 class labels.

      • F1-Score: F1-score factors false positives, false negatives, and true positives. F1-score thus is an important model performance metric. Accuracy only can obscure some important aspects of model performance if a large proportion of a dataset belongs to one class. In contrast, F1 score is more tolerant to this type of class imbalance problems. Operating the model at the peak of the F1-score means the rate of True Positives and False positives are optimized. To the either side of this point, either True positives or False positives dominates.
        *F1-score also lies within the unit interval (0-1]; the best score is 1, and it approaches 0 as performance gets worse.

    • Single-class classification

      • Confusion Matrix: For the single-class classification case, we only have data from the one given class. A perfect confusion matrix for single-class models has all of the cases concentrated in the top-left corner, meaning that none of the given class data was classified as not coming from that class.

      • Cross Validation: By-fold Accuracies vs Classes: Similar to the confusion matrix case, the most important information in the single-class classification by-fold results are the left-most case. This will show us how varied our single-class accuracies were for each fold of our cross validation.

      • Matthew's Correlation Coefficient: For single-class classification, there is only one Matthew's Correlation Coefficient, which measures the quality of the classification between the given class and things that do not belong to the given class.
        *The values can range between -1 and 1, although most often in AutoML the values will be between 0 and 1. A value of 0 means that your model is not able to recognize the given class at all, and a value of 1 means that your model can perfectly make this distinction.

  • SAVE
    For non-MLC projects, there are 2 options

    • "Save .bin" - download the model as a binary image to your machine.

    • "Save .zip" - download a compressed archive of header file and static library whereby users can build custom application on top of the machine learning model.

    For MLC projects, users can ONLY save the MLC configuration of the model as json file.

  • PUSH TO HARDWARE - Flashes the model to the Target Hardware.
    *Target Hardware must be connected.

  • LIVE TEST - Once the model has been PUSH TO HARDWARE, "LIVE TEST" becomes clickable, and will take you to Live Testing Page.

  • DELETE - When a model is no longer required, you can delete it. A confirmation dialog box will be presented.


    • Sensitivity Analysis
      *Note that there is no sensitivity analysis for single-class projects.
      For multi-class classification, the Sensitivity Analysis tool allows you to trade off accuracy between classes, depending on your specific use-case. You can re-weight the classes in your model and see how the cross-validation accuracies and confusion matrix is affected.

      The selected sensitivities are normalized and are used to scale the model output probabilities. Higher values for a given class will make the model more likely to ultimately make a classification of that type.

      The easiest way to understand how the Sensitivity Analysis page works is to train a multi-class model and then try a few different values. The accuracy plots and confusion matrix will update in real-time along with your changes to the sensitivities. Notice how the plots change when the sensitivity for the "Punch" class is increased from 1 to 100:

      Once you find sensitivity values that seem best for your use-case, press "SAVE" on the new sensitivity values. This will generate a new binary with your selected values. Click "Select" on the newly-compiled binary, and this updated binary will be the one that is flashed to your device when you go to test live classification.

    • Live-Data Collection and Analysis
      Live-data collection allows users to collect the live data for specific duration, class-by-class and then the subsequent analysis section shows a confusion matrix, ROC curves, Matthews correlation coefficient, and F1 score. Moreover, AutoML estimates the distribution of prediction scores using kernel density estimation (KDE).

      KDE plots provide detailed insights into error analysis. The example below is a KDE plot for a gesture recognition problem. All instances are collected as "Gesture2". Ideally, the scores for the other classes should be distributed around zero. However, the mode of "Gesture1" distribution (the blue line) is 0.3, and its tail extends beyond 0.5. These signify potential issues with the live-data or the model used for the analysis. We want to see the distribution of "Gesture1" and "Gesture2" as separate as possible with almost no to little overlap for both the classes. While we can observe that the "Stationary" class is quite well separated from "Gesture1" and "Gesture2". "Stationary" class has a peak around zero and also very narrow compared to "Gesture1" and "Gesture2" very well isolates it.

E. Test model performance on Test Data

You can test all your ML models' performance by using the uploaded test data (if you have one).

Click the button under EDIT TEST DATA, a Model Information window will pop out.

A window will pop out. Select the test data that you need, then click SAVE.

*Note: you can click the button under Training Data to select labels for each Test Data as shown in the screenshot below.

The Test data Evaluation then starts.

Once the evaluation is completed, you can find the result of Model performance on Test Data by clicking the buttons under PERFORMANCE SUMMARY of each model.

F. Live Reply

From the Test Result Page, you can select a model out of your interest, click LIVE REPLAY button to test your model on live. Once you clicked, it will take you to Live Testing page.

Here in the Live Testing page, you can record a up to 5 seconds audio by clicking START. Note that you don’t need to wait until the 5th second runs up, you can click STOP whenever you finished recording your audio.

Then click ANALYZE to analyze the data you just recorded.

IN DEMO CASE Click START, and say the word "Yes", Then click STOP -> ANALYZE to proceed.

Users can check recorded data at Data → Test Data page