The Bluetooth 5.1 Core Specification brought Angle of Arrival (AoA) based Indoor Localization to the Bluetooth Standard. This dataset is the result of one of the first comprehensive studies of static Bluetooth AoA-based Indoor Localization in a real-world testbed using commercial off-the-shelf Bluetooth chipsets.

The positioning experiments were carried out on a 100 m² test area using four stationary Bluetooth sensor devices each equipped with eight antennas. With this setup, a median localization accuracy of up to 18 cm was achieved.


This is a CSI dataset towards 5G NR high-precision positioning,

which is fine-grainedgeneral-purpose and 3GPP R16 standards complied



The corresponding paper is published here (

5G NR is normally considered to as a new paradigm change of integrated sensing and communication (ISAC).



The dataset_[SNR]_[Scenario]_[date_time].mat contains: 

1) a 4-D matrix, features, representing the feature data, and

2) a structure array, labels, labeling the ground truth of UE positions.

[SNR] is the noise level of features, [date] and [time] tell us when the dataset was generated.

The labels is a structure array. labels.position records the three-dimensional coordinates of UE (meters).

The features is a matrix, Ns-by-Nc-by-Ng-by-Nu, where Ns is the number of samples, Nc is the number of MIMO channels, Ng is the number of gNBs and the Nu is the number of UEs.

The value of Ng corresponds to the number of UEs in labels.


 Colsed beta test is running.

In the first phase, we plan to provide three researchers (groups) with a full version of dataset generation and 864 core/hours of computing resources. You can use CAD software to make custom map files and save them in '.stl' format. Supported scenarios include, but are not limited to, typical 5G positioning scenarios such as enclosed indoors, city canyons, etc., which should not exceed 1,000 square meters in area.


In addition, you can customize the location, number, and other specific parameters of the base stations and UEs in the map, such as carrier frequency, number of antennas, and bandwidth. If you don't know the specific parameters, you can just submit the map file, and we'll generate your custom dataset based on the default parameters.


Customized datasets with fine-grained CSI for each point and their detailed documentation will be returned after they are generated.

To get your dataset for 5G NR Positioning, please contact us by email. We will start your dataset-generation after confirming your identity and requirements.


 Release note 

2021-07-23 :

1) Recruit participants for colsed beta test.

2021-07-22 :

1)Expend our dataset with more CSI data with low SNR levels noise.

2)We set up an open system for researchers to upload their own scene maps to obtain customized data sets.

Closed beta test will start after suggestion collection.

2021-07-18 :

1)Expend our dataset with more CSI data with different SNR levels noise.

2)Publish map files for Scenario 1 indoor office.




Data are collected on a 5m×10msized test bed, which is set up at Kadir Has University,Istanbul. Wireless access points are located around the corners of the testbed and markers are placed at every 45 cm. RSSI measurements done on the grid shown in Figure are stored via NetSurveyor program running on a Lenovo IdeapadFLEX 4 laptop, which has an Intel Dual Band Wireless-AC8260 Wi-Fi adaptor.At each measurement point, RSSI data are collected for1 min with a sampling interval of 250 ms.


Data  are  collected  on  a  5m×10msized  test  bed,  which  is  set  up  at  Kadir  Has  University,Istanbul. Wireless access points are located around the cornersof  the  test  bed  and  markers  are  placed  at  every  45  cm.RSSI  measurements  done  on  the  grid  shown  in  Figure  2  arestored via NetSurveyor program running on a Lenovo IdeapadFLEX  4  laptop,  

which  has  an  Intel  Dual  Band  Wireless-AC8260 Wi-Fi adaptor.At  each  measurement  point,  RSSI  data  are  collected  for1  min  with  a  sampling  interval  of  250  ms.  XML file is read with MATLAB for data of full area and applied trajectory.


This dataset contains thousands of Channel State Information (CSI) samples collected using the 64-antenna KU Leuven Massive MIMO testbed. The measurements focused on four different antenna array topologies; URA LoS, URA NLoS, ULA LoS and, DIS LoS. The users channel is collected using CNC-tables, resulting in a dataset where all samples are provided with a very accurate spatial label. The user position is sweeped across a 9 squared meter area, halting every 5 millimeter, resulting in a dataset size of 252,004 samples for each measured topology.


The dataset contains Channel State Information (CSI) samples, recorded with the KU Leuven Massive MIMO testbed. This was done for many user positions laying on a grid. The Base Station (BS) is equipped with 64 antennas, each receiving a predefined pilot signal from each position. Using these pilot signals, the CSI is estimated for 100 subcarriers, evenly spaced in frequency over a 20 MHz bandwidth. As a result, the complex numbered matrix H represents the measured CSI for one location. This matrix spans N rows and K columns, with N being the number of BS antennas and K the number of subcarriers. For further details about the system, the National Instruments Massive MIMO Application Framework documentation can be consulted.  

To collect the CSI from many different user locations, four single-antenna User Equipments were positioned in an office. Their antennas were moved along a predefined route using CNC XY-tables. This route zigzagged along a grid taking steps of 5 mm. The total grid spans 1.25 m by 1.25 m. By using these XY-tables the error on the positional label is less than 1 mm, which results in a very accurate dataset. This resulted in a dataset containing 252004 CSI samples spatially labelled with an accuracy of less than 1 mm.  
Furthermore, the testbed’s BS is designed to be very flexible in the deployment of the antenna array. This allowed for the creation of three different datasets, each with a unique antenna deployment. First, a Uniform Rectangular Array (URA) of 8 by 8 antennas was deployed, both in LoS and NLoS, using a metal blocker. Second, Uniform Linear Array (ULA) of 64 antennas on one line was deployed. Finally, the antennas were distributed over the room in pairs of eight, making up the distributed (DIS) scenario.  

The different deployments can be seen in the picture in attachment. In all cases, the antennas are placed 1 m from the XY-tables. The yellow rectangles on the figure depict the 1.25 m by 1.25 m areas where the XY-tables are able to move the users in. The spacing in between the XY-tables is dictated by the space needed for the motors powering the movements and the cables connecting them to the controllers. These tables were synchronised over Ethernet with the BS to ensure the sampled H has a correct spatial label, enabling a highly accurate dataset. 

During the measurements, the BS was configured to use a centre frequency of 2.61 GHz, giving a wavelength λ of 114.56 mm. The system used a bandwidth of 20 MHz. The origin of the space was defined as the middle of the URA. From this point in space, the x- and y-positions of the users and antennas were measured. These locations are provided in 3D in the dataset.



Dataset used for "A Machine Learning Approach for Wi-Fi RTT Ranging" paper (ION ITM 2019). The dataset includes almost 30,000 Wi-Fi RTT (FTM) raw channel measurements from real-life client and access points, from an office environment. This data can be used for Time of Arrival (ToA), ranging, positioning, navigation and other types of research in Wi-Fi indoor location. The zip file includes a README file, a CSV file with the dataset and several Matlab functions to help the user plot the data and demonstrate how to estimate the range.



Copyright (C) 2018 Intel Corporation

SPDX-License-Identifier: BSD-3-Clause



Welcome to the Intel WiFi RTT (FTM) 40MHz dataset.


The paper and the dataset can be downloaded from:


To cite the dataset and code, or for further details, please use:

Nir Dvorecki, Ofer Bar-Shalom, Leor Banin, and Yuval Amizur, "A Machine Learning Approach for Wi-Fi RTT Ranging," ION Technical Meeting ITM/PTTI 2019


For questions/comments contact:,


The zip file contains the following files:

1) This README.txt file.

2) LICENSE.txt file.

3) RTT_data.csv - the dataset of FTM transactions

4) Helper Matlab files:

O mainFtmDatasetExample.m - main function to run in order to execute the Matlab example.

O PlotFTMchannel.m - plots the channels of a single FTM transaction.

O PlotFTMpositions.m - plots user and Access Point (AP) positions.

O ReadFtmMeasFile.m - reads the RTT_data.csv file to numeric Matlab matrix.

O SimpleFTMrangeEstimation.m - execute a simple range estimation on the entire dataset.

O Office1_40MHz_VenueFile.mat - contains a map of the office from which the dataset was gathered.



Running the Matlab example:


In order to run the Matlab simulation, extract the contents of the zip file and call the mainFtmDatasetExample() function from Matlab.



Contents of the dataset:


The RTT_data.csv file contains a header row, followed by 29581 rows of FTM transactions.

The first column of the header row includes an extra "%" in the begining, so that the entire csv file can be easily loaded to Matlab using the command: load('RTT_data.csv')

Indexing the csv columns from 1 (leftmost column) to 467 (rightmost column):

O column 1 - Timestamp of each measurement (sec)

O columns 2 to 4 - Ground truth (GT) position of the client at the time the measurement was taken (meters, in local frame)

O column 5 - Range, as estimated by the devices in real time (meters)

O columns 6 to 8 - Access Point (AP) position (meters, in local frame)

O column 9 - AP index/number, according the convention of the ION ITM 2019 paper

O column 10 - Ground truth range between the AP and client (meters)

O column 11 - Time of Departure (ToD) factor in meters, such that: TrueRange = (ToA_client + ToA_AP)*3e8/2 + ToD_factor (eq. 7 in the ION ITM paper, with "ToA" being tau_0 and the "ToD_factor" lumps up both nu initiator and nu responder)

O columns 12 to 467 - Complex channel estimates. Each channel contains 114 complex numbers denoting the frequency response of the channel at each WiFi tone:

O columns 12 to 125  - Complex channel estimates for first antenna from the client device

O columns 126 to 239 - Complex channel estimates for second antenna from the client device

O columns 240 to 353 - Complex channel estimates for first antenna from the AP device

O columns 354 to 467 - Complex channel estimates for second antenna from the AP device

The tone frequencies are given by: 312.5E3*[-58:-2, 2:58] Hz (e.g. column 12 of the csv contains the channel response at frequency fc-18.125MHz, where fc is the carrier wave frequency).

Note that the 3 tones around the baseband DC (i.e. around the frequency of the carrier wave), as well as the guard tones, are not included.



This RSSI Dataset is a comprehensive set of Received Signal Strength Indicator (RSSI) readings gathered from three different types of scenarios. Three wireless technologies were used which consisted of:

  • Zigbee (IEEE 802.15.4),
  • Bluetooth Low Energy (BLE), and
  • WiFi (IEEE 802.11n 2.4GHz band).

The scenarios took place in three rooms with different sizes and inteference levels. For the experimentation, the equipment utilized consisted of Raspberry Pi 3 Model Bs, Gimbal Series 10 Beacons, and Series 2 Xbees with Arduino Uno microcontrollers.



A set of tests was conducted to determine the accuracy between multiple types of system designs including: Trilateration, Fingerprinting with K-Nearest Neighbor (KNN) processing, and Naive Bayes processing while using a running average filter. For the experiments, all tests were done on tables which allowed tests to be simulated at a height where a user would be carrying a device in their pocket. Devices were also kept in the same orientation throughout all the tests in order to reduce the amount of error that would occur in the measuring of RSSI values.


Three different experimental scenarios were utilized with varying conditions in order to determine how the proposed system will function according to the environmental parameters.

Scenario 1 was a 6.0 x 5.5 m wide meeting room. The environmental area was cleared of all transmitting devices to create a clear testing medium where all the devices can transmit without interference. Transmitters were placed 4 m apart from one another in the shape of a triangle. Fingerprint points were taken with a 0.5 m spacing in the center between the transmitters. This created 49 fingerprints that would comprise the database. For testing, 10 points were randomly selected.

Scenario 2 was a 5.8 x 5.3 m meeting room. This area was a high noise environment as additional transmitting devices were placed around the environment in order to create interference in the signals. There were 16 fingerprints gathered with a larger distance selected between the points. In this Scenario, 6 testing points were randomly selected to be used for comparing the algorithms.

Scenario 3 was a 10.8 x 7.3 m computer lab. This lab was a large area with a typical amount of noise occurring due to the WiFi and BLE transmitting that were in the area. The large space also allowed for signals to experience obstructions, reflections, and interference. Transmitters were placed so Line-of-Sight (LoS) was available between the transmitters to the receiver. In total, 40 fingerprints were gathered with an alternating pattern occurring between the points. Points were taken to be 1.2 m apart in one direction, and 0.6 m apart in the other. For testing 16 randomly selected points were taken.


In the testing environment, fingerprints were gathered to be used in the creation of a database, while test points were selected to be used against the database for the comparison. The figures of each topology can be found inside the dataset folder. In the figures, the black dots represent the location of the transmitters and the red dots represent the locations where fingerprints and test points were gathered where appropriate. 

Related Publication

S. Sadowski, P. Spachos, K. Plataniotis, "Memoryless Techniques and Wireless Technologies for Indoor Localization with the Internet of Things", IEEE Internet of Things Journal.


The RSSI dataset contains a folder for each experimental scenario and furthermore on wireless technology (i.e. Zigbee, BLE, and WiFi). Each folder contains three additional folders where the data was gathered (Pathloss, Database, and Tests). Pathloss contains 18 files measuring the RSSI at varying distances from the devices. The number of files located in Database and Tests varies based on the scenario.

For each technology, the file name corresponds to the point as to where the data was gathered. For specific locations, the (x,y) coordinates can be seen in the appropriate .xlsx file.

For the files in the Database and Tests folders, there are approximately 300 reading. In the Pathloss folder, there are approximately 50 only occurring from a single node. Readings appear in the format "Node LetterValue" where:

Letter corresponds to the transmitter that signal was sent from, represented by 'A', 'B', or 'C'.

Value is the RSSI reading.


This dataset includes UWB range measurements performed with Pozyx devices. The measurements were collected between two tags placed at several distances and in two different conditions: with Line of Sight (LOS) and Non-Line of Sight (NLOS). The measurements include the range estimated by the Pozyx tag, the actual distance between devices, the timestamp of each measurement and the values corresponding to the samples of the Channel Impulse Response (CIR) after each transmission.


Dataset contains two zip files. One contains the raw rosbag records and the second one includes two matlab files (one for the LOS scenario, other for the NLOS) that include the final data once the actual distance is added and the CIR measurements are processed.

Rosbag files contain messages of type PozyxRangingWithCir. This type of message can be found in the next repository:

Each of the matlab files contains an array of structs. Each struct has these fields:

  • range: Pozyx estimation of distance.
  • distance: Current distance between devices.
  • rss: Estimation of received power.
  • seq: A ranging sequence number, between 0 and 255.
  • timestamp: Timestamp of the measure. 
  • cirPower: CIR power calculated using this formula: 10*log10(abs(cirRealPart.^2)).
  • cir: Samples of the CIR. 1016 complex values.

The Geomagnetic field can be used for classifying different landmark locations inside a big building.


The file contains raw data collected from 9 pedestrians. Three of them walked in Track 1, another three walked in Track 2 and the last three walked in Track 3. All the pedestrians ended their walks at the starting point. Track 1 and Track 3 cover a distance of 150.3m. While, the Track covers a distance of 111.4m.