This is a summary of some of the information presented in: Deep Learning in Shallow Water: 3D-FLS CNN-based target detection by Heath Henley, Austin Berard, Evan Lapisky and Matthew Zimmerman and presented at the OCEANS 2018 conference in Charleston, SC.

Introduction

Convolutional neural network (CNN) models have grown in popularity in recent years by demonstrating impressive performance in many fields. These models have been applied to complex problems in many domains. Given the added performance and flexibility offered by these models, we had to ask: “Can we incorporate a CNN based model into our current processing algorithm?”

*Figure 1: Raw sonar data and the resulting detections near a pier in Rhode Island.*

For more than a decade, FarSounder has developed algorithms based on traditional image and signal processing to perform the real-time mapping from raw sonar data (see Figure 1) to ‘detections’ of both the seafloor and any potential navigational hazards. Recently, detection and classification algorithms based on traditional image processing techniques have been outperformed and replaced by machine learning algorithms.

Models built using CNNs have recently demonstrated excellent performance for classification, segmentation and object recognition applications. Models built on CNNs and neural network models in general are becoming the state of the art in many fields. Recently, they have been successfully applied to 3D medical imaging data. For example, Cicek et al. (2016) presented a 3D version of U-Net capable of segmenting volumetric confocal microscopy data. Further, Milletari et al. ( 2016) presented a 3D V-Net architecture that is able to accurately segment 3D MRI images with minimal training data. Inspired by the success of these models, we decided to look into applying them to sonar data generated by our 3D Forward Looking Sonar (3D-FLS) system.

Model Development

Design Constraints

In most cases, FarSounder systems ship with a processing computer as part of the package. Ideally, any big processing changes will be performant enough to run on the majority of the production systems currently out in the wild. Essentially, any new model, running on a production system, must be able to keep pace with the hardware’s 1.6 second ping rate, and fit in the NVIDIA GTX 1050 GPU that is currently shipped in production systems.

Model Selection

A huge number of possible CNN architectures have been published in the literature, especially with the availability of high level libraries like TensorFlow and Keras. After reviewing the published results of some of the models that have been applied to volumetric data, two were chosen to evaluate as candidate models for this task (3D U-Net and V-Net). However, based on the design constraints highlighted above and the size of the raw sonar input data, the models were modified from their published form to fit within GPU memory on a production computer. The problem was formulated as a segmentation problem, that is, every voxel in the input volume was classified individually as either background, seafloor, or in-water target. Two dimensional versions of the architectures were also tested using slices (horizontally and vertically) through the 3D data as input were also considered.

Baseline

Training data is required to train a CNN-based model. Manually labelling volumetric sonar data while possible, is time consuming. And generating enough training data in this way is likely cost prohibitive. To avoid this issue, methods that could automatically generate labeled data for training our CNN-based models were sought out. The first and simplest approach that was realized: use the current processing algorithm to generate the labelled data. This labeled data can then be split into separate sets for training and testing CNN-based model candidates. Of course no model trained only on this data will ever outperform the current processing. However, using this approach allowed the capability of the CNN-models to fit the 3D-FLS to be evaluated. Further, the key performance metrics (prediction speed, accuracy, precision/recall, memory footprint) could be compared for each model under consideration.

Current Results

The four architectures introduced above, along with a number of parameter adjustments, optimizers, and loss functions were trained and tested using the baseline dataset. Based on the results, the top performing model was based on the 3D U-Net architecture. There is a strong class imbalance in the 3D FLS data considered here, such that there is significantly more background than seafloor and/or in-water targets in an average ping. A categorical cross-entropy loss was applied using a few different methods of compensating for the class imbalance (eg. computing appropriate weights), however a Dice coefficient based loss function performed significantly better. In the end, the final model chosen has 3D U-Net architecture and was trained using the Dice coefficient based loss function. The animated figures (Figures 2 and 3) below show the 3D U-Net CNN predictions side-by-side with predictions made using the baseline model for a number of pings in real-time (note animation speed is increased to 5x normal speed).

*Figure 2: Approaching a pier in Narragansett Bay, Rhode Island.*

*Figure 3: Navigating a river in Germany.*

In the animations above, the baseline data is reproduced quite well by the CNN model, especially the detections of the in-water target class. However, the seafloor is not detected as well with the current version of the CNN model, which suggests some modifications may be required to produce a model capable of performing as well, and eventually better than, the current baseline.

We are excited that the model, while subject to our design constraints, is able to nearly reproduce the data generated using the baseline processing and generalize to data not included in the training or validation set (Figure 3).

Improving Training Data

Reproducing the baseline processing is a great proof of concept for the CNN model. However to improve performance beyond the baseline model, improved training data is required. Work has already begun on this topic (see the conference paper) using NOAA MBES survey data to generate training data for the CNN model. We are also working on additional innovative ways to incorporate data from other sources into the model.

At FarSounder, we utilize, build on, and develop, the latest technologies to continuously improve our software and hardware products.

Deep Learning in Shallow Water