Name: VTUAD: Vessel Type Underwater Acoustic Data
Creator: Lucas Domingos
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Artificial Intelligence, Machine Learning

Abstract

A curated dataset containing underwater acoustic signals categorized into five different classes based on the vessel type: Cargo, Tanker, Tug, Passengership, and Background. Different subsets of data were generated from the original data considering the distance from the vessel to the hydrophone picking up the vessel's sound. These subsets, or scenarios, were created considering inclusion and exclusion radii: the first scenario has an inclusion radius of 2km and an exclusion radius of 3km; the second is defined within the interval of 3km and 4km between inclusion and exclusion radius; the third scenario is defined within the interval between 4km and 6km. Also, environmental information, obtained from the CTD recorder, is available, containing five different signals: temperature, measured in Celsius; conductivity, measured in siemens per metre; pressure, in decibar; salinity, measured in psu; and sound speed, measured in meters per second. For every different instance, the average of the measures was considered.

Instructions:

The files are divided in folders according to their scenarios. Each scenario folder has a metadata file, containing the needed information. The README.md file provides a complete information about the data and how to use it.

Comments

Any one successfully unzipped 3-5, 4-6 km files?

Submitted by Johnny Chen on Sat, 02/18/2023 - 15:37

I managed to unzip all the packages, only a few files showed up damaged

Submitted by Anqi Jin on Wed, 04/26/2023 - 23:31

Thanks for posting this data! The 1 sec time snippet is just long enough to sample a few cycles of engine and propeller rotation. Sonar operators focus a lot of attention on the engine and propeller cues using much longer time series -- perhaps 60 second time series, usually at lower sampling rate. The shorter time series data for machine learning has led me to look at higher-frequency classification cues differently with interesting results.

Submitted by Ronald Kessel on Fri, 09/08/2023 - 09:15

Can someone please share the dataset folder structure? Just want to verify if something is missing at my end.

Submitted by Momin Ali on Tue, 01/02/2024 - 08:06

Submitted by Momin Ali on Wed, 01/10/2024 - 08:01

I am having following issues:
1. Unlike the folder structure given in README.md I get the following folders on unzipping the given data, each zip file contains:
- metadata.csv
- cargo folder
- passenger folder
- tanker folder
- tug folder

2. The data distribution in metadata.csv is not the same as the data in the folders. Now, the path given in metadata.csv refers to the vessel folder which is not present in the zip files. And suppose the vessel folder is a combination of cargo, tanker, tug, and passenger folders even then the metadata does not match.

3. Lastly, I am trying to replicate the pipeline and results for the following paper:
An Investigation of Preprocessing Filters and Deep Learning Methods for Vessel Type Classification With Underwater Acoustic Data

using the following repository:
https://github.com/lucascesarfd/underwater_snd

however, due to messed up data I am unable to replicate the results.

Any help would be greatly appreciated.

Submitted by Momin Ali on Wed, 01/10/2024 - 08:00

using the following repository:
https://github.com/lucascesarfd/underwater_snd

however, due to messed up data I am unable to replicate the results.

Any help would be greatly appreciated.

Submitted by Momin Ali on Wed, 01/10/2024 - 08:00

Hi, Momin Ali

I updated the README.md file to reflect the current data structure. Also, for inclusion_3000_exclusion_5000 and inclusion_4000_exclusion_6000 I'm updating the zip files to remove corrupted audios.
You can ignore the column "path" (I'm currently removing this from the metadata). The name of the audio file is related to the "file_index" column.
This repository is also being updated with a refactor (improving documentation and fixing bugs regarding datasets).

Submitted by Lucas Domingos on Mon, 04/01/2024 - 14:00

I would like to ask if inclusion_3000_exclusion_5000 means the ship is 3 km away from the hydrophone, and inclusion_4000_exclusion_6000 means the ship is 4 km away from the hydrophone.

Submitted by Anqi Jin on Fri, 06/28/2024 - 05:13

Hi Anqi Jin,

Yes, the inclusion_3000_exclusion_5000 means that the ship is anywhere on a radius of 3km to 5km from hydrophone.

Alternatively, inclusion_4000_exclusion_6000 means from 4km to 6km and inclusion_2000_exclusion_4000 is on a radius of 2km until 4km.

For more information you can look into the paper: "An investigation of preprocessing filters and deep learning methods for vessel type classification with underwater acoustic data"

Submitted by Lucas Domingos on Tue, 07/02/2024 - 00:29

Dataset Files

inclusion_2000_exclusion_4000.zip (3.31 GB)
inclusion_2000_exclusion_4000 SHA256 Checksum inclusion_2000_exclusion_4000_sha256.txt (64 bytes)
inclusion_3000_exclusion_5000.zip (5.83 GB)
inclusion_3000_exclusion_5000 SHA256 Checksum inclusion_3000_exclusion_5000_sha256.txt (64 bytes)
inclusion_4000_exclusion_6000.zip (4.40 GB)
inclusion_4000_exclusion_6000 SHA256 Checksum inclusion_4000_exclusion_6000_sha256.txt (64 bytes)

Documentation

Attachment	Size
README.md	3.59 KB

Datasets

Standard Dataset

VTUAD: Vessel Type Underwater Acoustic Data

Abstract

Comments

Dataset Files

Documentation

QUESTIONS?