This paper investigates the issue of generating multiple questions with respect to a given context paragraph. Existing designs of question generation (QG) model take no notice of intra-group similarity and type diversity for forming a question group. These attributes are critical for employing QG techniques in educational applications. This paper proposes a two-stage framework by combining neural language models and genetic algorithm for the question group generation task.


We constructed a rich AttackDB that consists of CTI from the MITRE ATT\&CK Enterprise knowledge base, the AlienVault Open Threat Exchange, the IBM X-Force Exchange and VirusTotal.


Bitcoin block format file obtained by Bitcoin-ETL (blk00000000-blk00159999)


This dataset is a supplementary material for paper "A Comprehensive and Reproducible Comparison of Cryptographic Primitives Execution on Android Devices"  with the measurements collected from 17 mobile devices and the code for reproducibility.


The primary data related to the collected data is located in folder Measurement and each device has the corresponding subfolder with the measurement file. The dataset consists of JSON files, each containing measurements of available devices' security primitives execution times. The data was gathered in a span of multiple 250 iterations. Each measurement was taken with a 50 repetitions interval for every primitive. We define the main components of the dataset in the following:


1)    context[] – provides the details about the device and OS including device name, model, battery-related information, Software Development Kit~(SDK) version, and basic technical specification.

2)    benchmarks[] – provides entries per primitive, such as:

i)      name – the overall identification title of the primitive, including paddung and other optional fields;

ii)     params – additional parameters unilized for the execution if any;

iii)   totalRunTimeNs – the overall time of the primitive's execution time;

iv)   metrics[] – provides entries per execution, such as:

(a)   timeNs[] – the collected/processed information of the collected data inluding entries per execution in runs[] and statistical parameters in maximumminimum, and median.

(b)  warmupIterations – number of iterations of warmup before measurements started;

(c)   repeatIterations – the number of iterations;

(d)  thermalThrottleSleepSeconds – the duration of sleep due to thermal throttling.


An example of the dataset entry:



    "context": {

        "build": {

            "device": "mooneye",

            "fingerprint": "mobvoi/mooneye/mooneye:8.0.0/OWDR.180307.020/5000261:user/release-keys",

            "model": "Ticwatch E", 

            "version": {

                "sdk": 26



        "cpuCoreCount": 2, 

        "cpuLocked": true, 

        "cpuMaxFreqHz": -1,

        "batteryCapacity, mAh": 300,

        "memTotalBytes": 514560000,

        "sustainedPerformanceModeEnabled": false


    "benchmarks": [


            "name": "benchmarkRsa4096EcbOaepSHA1AndMgf1Padding",

            "params": {},

            "className": "cz.vutbr.benchmark.AsymmetricDecryptionBenchmark",

            "totalRunTimeNs": 20873248463,

            "metrics": {

                "timeNs": {

                    "minimum": 242466000, 

                    "maximum": 284698307, 

                    "median": 245293231,

                    "runs": [







            "warmupIterations": 33,

            "repeatIterations": 1,

            "thermalThrottleSleepSeconds": 0








Note: Project group was supported by the Graduate School of Business National Research University Higher School of Economics. 


We study the ability of neural networks to steer or control trajectories of dynamical systems on graphs, which we represent with neural ordinary differential equations (neural ODEs). To do so, we introduce a neural-ODE control (NODEC) framework and find that it can learn control signals that drive graph dynamical systems into desired target states. While we use loss functions that do not constrain the control energy, our results show that NODEC produces control signals that are highly correlated with optimal (or minimum energy) control signals.


An overview of a real-world Chinese mathematics dataset removed duplicated questions and simple questions.


We present GeoCoV19, a large-scale Twitter dataset related to the ongoing COVID-19 pandemic. The dataset has been collected over a period of 90 days from February 1 to May 1, 2020 and consists of more than 524 million multilingual tweets. As the geolocation information is essential for many tasks such as disease tracking and surveillance, we employed a gazetteer-based approach to extract toponyms from user location and tweet content to derive their geolocation information using the Nominatim (Open Street Maps) data at different geolocation granularity levels. In terms of geographical coverage, the dataset spans over 218 countries and 47K cities in the world. The tweets in the dataset are from more than 43 million Twitter users, including around 209K verified accounts. These users posted tweets in 62 different languages.


GeoCoV19 Dataset Description 

The GeoCoV19 Dataset comprises several TAR files, which contain zip files representing daily data. Each zip file contains a JSON with the following format:

{ "tweet_id": "122365517305623353", "created_at": "Sat Feb 01 17:11:42 +0000 2020", "user_id": "335247240", "geo_source": "user_location", "user_location": { "country_code": "br" }, "geo": {}, "place": { }, "tweet_locations": [ { "country_code": "it", "state": "Trentino-Alto", "county": "Pustertal - Val Pusteria" }, { "country_code": "us" }, { "country_code": "ru", "state": "Voronezh Oblast", "county": "Petropavlovsky District" }, { "country_code": "at", "state": "Upper Austria", "county": "Braunau am Inn" }, { "country_code": "it", "state": "Trentino-Alto", "county": "Pustertal - Val Pusteria" }, { "country_code": "cn" }, { "country_code": "in", "state": "Himachal Pradesh", "county": "Jubbal" } ] }

Description of all the fields in the above JSON 

Each JSON in the Geo file has the following eight keys:

1. Tweet_id: it represents the Twitter provided id of a tweet

2. Created_at: it represents the Twitter provided "created_at" date and time in UTC

3. User_id: it represents the Twitter provided user id

4. Geo_source: this field shows one of the four values: (i) coordinates, (ii) place, (iii) user_location, or (iv) tweet_text. The value depends on the availability of these fields. However, priority is given to the most accurate fields if available. The priority order is coordinates, places, user_location, and tweet_text. For instance, when a tweet has GPS coordinates, the value will be "coordinates" even though all other location fields are present. If a tweet does not have GPS, place, and user_location information, then the value of this field will be "tweet_text" if there is any location mention in the tweet text.

The remaining keys can have the following location_json inside them. Sample location_json: {"country_code":"us","state":"California","county":"San Francisco","city":"San Francisco"}. Depending on the available granularity, country_code, state, county or city keys can be missing in the location_json.

5. user_location: It can have a "location_json" as described above or an empty JSON {}. This field uses the "location" profile meta-data of a Twitter user and represents the user declared location in the text format. We resolve the text to a location.

6. geo: represents the "geo" field provided by Twitter. We resolve the provided latitude and longitude values to locations. It can have a "location_json" as described above or an empty JSON {}.

7. tweet_locations: This field can have an array of "location_json" as described above [location_json1, location_json2] or an empty array []. This field uses the tweet content (i.e., actual tweet message) to find toponyms. A tweet message can have several mentions of different locations (i.e., toponyms). That is why we have an array of locations representing all those toponyms in a tweet. For instance, in a tweet like "The UK has over 65,000 #COVID19 deaths. More than Qatar, Pakistan, and Norway.", there are four location mentions. Our tweet_locations array should represent these four separately.

8. place: It can have a "location_json" described above or an empty JSON {}. It represents the Twitter-provided "place" field.


Tweets hydrators:

CrisisNLP (Java):

Twarc (Python):

Docnow (Desktop application):

If you have doubts or questions, feel free to contact us at: and