Abstract

A 128-dimensional vector for one document in text format, where each dimension is represented as a single precision floating-point number。

Instructions:

The training dataset was randomly generated for accelerated machine learning algorithms that the coputing-intensive tasks are offload to FPGA accelerators. And the data is stored as a 128-dimensional vector for one document in text format, where each dimension is represented as a single precision floating-point number, so that we can increase the size of dataset easily to hundreds of GB or even more. The cosine distance is used to measure the vector similarity.

Comments

can not get

Dataset Rating:

Submitted by Liu Wei on Mon, 04/26/2021 - 09:28

Dataset Files

TrainData_4M.7z (2.70 MB)
TrainData-10M.7z (6.60 MB)
TrainData_50M.7z (32.89 MB)
800M_ByteArrayWritable.7z (526.10 MB)
800M_FloatArrayWritable.7z (525.90 MB)

QUESTIONS?

Login to Send Author a Private Message
Report a problem with this Dataset

Datasets

Standard Dataset

The training dataset for accelerated machine learning algorithms

Abstract

Comments

More from this Author

MultiModal dataset from Instragram

Dataset Files

QUESTIONS?

Datasets

Standard Dataset

The training dataset for accelerated machine learning algorithms

Abstract

Comments

More from this Author

Dataset Files

Related Datasets

QUESTIONS?