Example datasets for AtyImo (Spark version)

Citation Author(s):
Robespierre Pita, Clicia Pinto, Marcos Barreto, Spiros Denaxas
Submitted by:
Marcos Barreto
Last updated:
Tue, 05/17/2022 - 22:17
DOI:
10.21227/H2K92G
Data Format:
Research Article Link:
Links:
License:
214 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

## FEDERAL UNIVERSITY OF BAHIA (UFBA)
## ATYIMOLAB (www.atyimolab.ufba.br)
## University College London (UCL)
## Denaxas Lab (www.denaxaslab.org)
## Robespierre Pita and Clicia Pinto and Marcos Barreto and Spiros Denaxas
 
/*
@(#)File:           $atyimo_dataset_info.txt$
@(#)Version:        $v1$
@(#)Last changed:   $Date: 2017/12/04 12:00:00 $
@(#)Purpose:        Example data sets for the AtyImo data linkage tool
@(#)Author:         Robespierre Pita and Clicia Pinto and Marcos Barreto and Spiros Denaxas
 
@(#)Usage:
 
@(#)Comments:
 
 (*) These are synthetic data sets generated using a random routine for names and dates of birth.
 (*) They are very similar in structure to those real (identifiable) data sets linked by AtyImo.
 
*/
 
## Linkage attributes (existent in most Brazilian government databases)
 
code : Unique key for each record
municipality_residence : IBGE (Brazilian Institute of Gepgraphy and Statistics) code for a specific municipality.
This code was chosen randomly considering Brazilian municipalities.
name : Person name
mother_name : Mother's name
birth_date : Date of birth (yyyy-mm-dd)
gender : Gender ("1" and "M" - MALE, "3" and "F" - FEMALE)
 
## Files (semicolon ";" separated values)
##  AtyImo - Spark
 
small/
- DATASET_1_5K_records.csv: 500,000 records
- DATASET_2_1M_records.csv: 1,000,000 records
 
large/
- DATASET_1_5M_records.csv: 5,000,000 records
- DATASET_2_5M_records.csv: 5,000,000 records
 
## Files (Bloom coded)
##  AtyImo - Hybrid (OpenMP, CUDA)
 
bloom_hybrid/
- input_1000.bloom, input_10000.bloom, input_500000.bloom
 

Instructions: 

## FEDERAL UNIVERSITY OF BAHIA (UFBA)
## ATYIMOLAB (www.atyimolab.ufba.br)
## University College London (UCL)
## Denaxas Lab (www.denaxaslab.org)
## Robespierre Pita and Clicia Pinto and Marcos Barreto and Spiros Denaxas
 
/*
@(#)File:           $atyimo_dataset_info.txt$
@(#)Version:        $v1$
@(#)Last changed:   $Date: 2017/12/04 12:00:00 $
@(#)Purpose:        Example data sets for the AtyImo data linkage tool
@(#)Author:         Robespierre Pita and Clicia Pinto and Marcos Barreto and Spiros Denaxas
 
@(#)Usage:
 
@(#)Comments:
 
 (*) These are synthetic data sets generated using a random routine for names and dates of birth.
 (*) They are very similar in structure to those real (identifiable) data sets linked by AtyImo.
 
*/
 
## Linkage attributes (existent in most Brazilian government databases)
 
code : Unique key for each record
municipality_residence : IBGE (Brazilian Institute of Gepgraphy and Statistics) code for a specific municipality.
This code was chosen randomly considering Brazilian municipalities.
name : Person name
mother_name : Mother's name
birth_date : Date of birth (yyyy-mm-dd)
gender : Gender ("1" and "M" - MALE, "3" and "F" - FEMALE)
 
## Files (semicolon ";" separated values)
##  AtyImo - Spark
 
small/
- DATASET_1_5K_records.csv: 500,000 records
- DATASET_2_1M_records.csv: 1,000,000 records
 
large/
- DATASET_1_5M_records.csv: 5,000,000 records
- DATASET_2_5M_records.csv: 5,000,000 records
 
## Files (Bloom coded)
##  AtyImo - Hybrid (OpenMP, CUDA)
 
bloom_hybrid/
- input_1000.bloom, input_10000.bloom, input_500000.bloom