We present two synthetic datasets on classification of Morse code symbols for supervised machine learning problems, in particular, neural networks. The linked Github page has algorithms for generating a family of such datasets of varying difficulty. The datasets are spatially one-dimensional and have a small number of input features, leading to high density of input information content. This makes them particularly challenging when implementing network complexity reduction methods.

Instructions: 

First unzip the given file 'morse_datasets.zip' to get two datasets - 'baseline.npz' and 'difficult.npz'. These are 2 out of a family of synthetic datasets that can be generated using the given script 'generate_morse_dataset.py'. For instructions on using the script, see the docstring and/or the linked Github page.

To load data from a dataset, first download 'load_data.txt' and change its extension to '.py'

Then run the method 'load_data' and set the argument 'filename' to the path of the given dataset, for example './baseline.npz'

This will output 6 variables - xtr, ytr, xva, yva, xte, yte. These are the data (x) and labels (y) for the training (tr), validation (va) and test (te) splits. The y data is in one-hot format.

Then you can run your favorite machine learning / classification algorithm on the data.

Categories:
137 Views