Abstract

We present two synthetic datasets on classification of Morse code symbols for supervised machine learning problems, in particular, neural networks. The linked Github page has algorithms for generating a family of such datasets of varying difficulty. The datasets are spatially one-dimensional and have a small number of input features, leading to high density of input information content. This makes them particularly challenging when implementing network complexity reduction methods. The linked research paper explores the effects on network performance by deliberately adding various forms of noise and expanding the feature set and dataset size.

Instructions:

First unzip the given file 'morse_datasets.zip' to get two datasets - 'baseline.npz' and 'difficult.npz'. These are 2 out of a family of synthetic datasets that can be generated using the given script 'generate_morse_dataset.py'. For instructions on using the script, see the docstring and/or the linked Github page.

To load data from a dataset, first download 'load_data.txt' and change its extension to '.py'

Then run the method 'load_data' and set the argument 'filename' to the path of the given dataset, for example './baseline.npz'

This will output 6 variables - xtr, ytr, xva, yva, xte, yte. These are the data (x) and labels (y) for the training (tr), validation (va) and test (te) splits. The y data is in one-hot format.

Then you can run your favorite machine learning / classification algorithm on the data.

Dataset Files

2 sample datasets from a family morse_datasets.zip (62.71 MB)

File to generate a family of datasets of varying difficulty generate_morse_dataset.py (7.14 kB)

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.