This is dataset I use in creating recommendation. I get it from bukalapak, one of the marketplace in Indonesia, which specific keyword "Gegep Tekiro". This dataset contains only 240 records / data.


This document describes the details of the BON Egocentric vision dataset. BON denotes the initials of the locations where the dataset was collected; Barcelona (Spain); Oxford (UK); and Nairobi (Kenya). BON comprises first-person video, recorded when subjects were conducting common office activities. The preceding version of this dataset, FPV-O dataset has fewersubjects for only a single location (Barcelona). To develop a location agnostic framework, data from multiple locations and/or office settings is essential.


Instructions are available on the attached document


Several experimental measurement campaigns have been carried out to characterize Power Line Communication (PLC) noise and channel transfer functions (CTFs). This dataset contains a subset of the PLC CTFs, impedances, and noise traces measured in an in-building scenario.

The MIMO 2x2 CTFs matrices are acquired in the frequency domain, with a resolution of 74.769kHz, in the frequency range 1 - 100MHz. Noise traces, in the time domain with a duration of about 16 ms, have been acquired concurrently from the two multi-conductor ports. 


The dataset is available in the MATLAB format *.mat. The instructions and basic examples to display data are available in "script_load_dataset.m".


The Android Malware Detection Dataset consists of different flavors and diversity of malware APK files that can be used for malware detection using machine learning. It is my research work and if you use this dataset please cite my work in your research papers.


No instructions


With the motivation of no good data sources available for all diseases (from generic to chronic) and their treatment courses, a new dataset is synthesized by exploring several medical websites and resources. It provides the precaution list corresponding to over 1000+ diaganosis. prec\_t.csv : (did, diagnose, pid) = (Disease identifier, Disease name, treatment course). This dataset can be utilized for many machine learning or deep learning based healthcare applications.


Depressive/Non-depressive tweets  between December 2019 and December 2020 originated largely from India and parts of Indian subcontinent. Sentiment Scores alloted using text blob. Tweets are extracted specifically keeping in mind the top 250 most frequently used negative lexicons and positive lexicons accesed using SentiWord and various research publications.

Tweet Amount : 1.4 Lakhs





Dataset asscociated with a paper in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems

"Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments"

If you use this code or data, please cite the above paper.



See the docs directory.


This is a CSI dataset towards 5G NR high-precision positioning,

which is fine-grainedgeneral-purpose and 3GPP R16 standards-complied.


5G NR is normally considered to as a new paradigm change of integrated sensing and communication (ISAC).

Possessing the advantages of wide-range-coverage and indoor-outdoor-integration, 5G  NR hence becomes a promising way for high-precision positioning in indoor and urban-canyon environment.



The dataset_[SNR]_[date]_[time].mat contains: 

1) a 4-D matrix, features, representing the feature data, and

2) a structure array, labels, labeling the ground truth of UE positions.

[SNR] is the noise level of features, [date] and [time] tell us when the dataset was generated.

The labels is a structure array. labels.position records the three-dimensional coordinates of UE (meters).

The features is a matrix, Ns-by-Nc-by-Ng-by-Nu, where Ns is the number of samples, Nc is the number of MIMO channels, Ng is the number of gNBs and the Nu is the number of UEs.

The value of Ng corresponds to the number of UEs in labels.


 Colsed beta test is running.

In the first phase, we plan to provide three researchers (groups) with a full version of dataset generation and 864 core/hours of computing resources. You can use CAD software to make custom map files and save them in '.stl' format. Supported scenarios include, but are not limited to, typical 5G positioning scenarios such as enclosed indoors, city canyons, etc., which should not exceed 1,000 square meters in area.


In addition, you can customize the location, number, and other specific parameters of the base stations and UEs in the map, such as carrier frequency, number of antennas, and bandwidth. If you don't know the specific parameters, you can just submit the map file, and we'll generate your custom dataset based on the default parameters.


Customized datasets with fine-grained CSI for each point and their detailed documentation will be returned after they are generated.

To get your dataset for 5G NR Positioning, please contact us by email. We will start your dataset-generation after confirming your identity and requirements.


 Release note 

2021-07-23 :

1) Recruit participants for colsed beta test.

2021-07-22 :

1)Expend our dataset with more CSI data with low SNR levels noise.

2)We set up an open system for researchers to upload their own scene maps to obtain customized data sets.

Closed beta test will start after suggestion collection.

2021-07-18 :

1)Expend our dataset with more CSI data with different SNR levels noise.

2)Publish map files for Scenario 1 indoor office.




This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.


The University of Turin (UniTO) released the open-access dataset Stoke collected for the homonymous Use Case 3 in the DeepHealth project ( UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP).


Visit to have a full companion code where a U-Net model is trained over the dataset.