This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.
The TXT files in this dataset can be used in generating the trend graph. The peaks and drops in the trend graph can be made more meaningful by computing n-grams for those periods. To compute the n-grams, the tweet IDs of the Coronavirus (COVID-19) Tweets Dataset should be hydrated to form a tweets corpus.
Pseudo-code for generating similar trend dataset
current = int(time.time()*1000) #we receive the timestamp in ms from twitter
off = 600*1000 #we're looking for 10-minute (600 seconds) average data (offset)
past = current - off #getting timestamp of 10-minute past the current time
df = select recent most 60,000 #even if we receive 100 tweets per second, the no. of tweets do not cross this number in an interval of 10 minutes
new_df = df[df.unix > past] #here "unix" is the timestamp column name in the primary tweets dataset
avg_sentiment = new_df["sentiment"].mean() #calculate mean
store current, avg_sentiment into a database
Pseudo-code for extracting top 100 "unigrams" and "bigrams" from a tweets corpus
from collections import Counter
#loading a tweet corpus
with open ("/path/to/the/tweets/corpus", "r", encoding="UTF-8") as myfile:
data=myfile.read().replace('\n', ' ')
data = preprocess your data (use regular expression-perform find and replace operations)
data = data.split(' ')
stopwords = nltk.corpus.stopwords.words('english')
#removing stopwords from each tweet
for w in data:
if w not in stopwords:
#extracting top 100 n-grams
unigram = Counter(clean_data)
unigram_top = unigram.most_common(100)
bigram = Counter(zip(clean_data, clean_data[1:]))
bigram_top = bigram.most_common(100)
Data are collected before and after percutaneous transluminal angiography (PTA) for dialysis patients.
Each sample is labeled as a-b-before.wav or a-b-after.wav and the associated txt, where a is the patient id and b is the location id.
The first position was the arterial-venous junction, and the second point was 3 cm from the first position along the vein.
The distances between the adjacent positions were also about 3 cm.
Considering the ongoing works in Natural Language Processing (NLP) with the Nepali language, it is evident that the use of Artificial Intelligence and NLP on this Devanagari script has still a long way to go. The Nepali language is complex in itself and requires multi-dimensional approaches for pre-processing the unstructured text and training the machines to comprehend the language competently. There seemed a need for a comprehensive Nepali language text corpus containing texts from domains such as News, Finance, Sports, Entertainment, Health, Literature, Technology.
Here's a quick way to load the .txt file in your favourite IDE.
filename = 'compiled.txt'
file = open(filename, encoding="utf-8")
text = file.read()
The data are used to identify the kinematic parameters deviation of Cartesian robot, train Gaussian Process Regression (GPR) model, record the compensation result of four calibration methods under different loading conditions.
Compensation results file: It expresses the compensation results in 8 test points when using four calibration methods under different loading conditions. We can see Figure16 in this paper.
HCT+BD+GPR_training file: These data record 320 groups of position points of the end effector after using HCT+BD model to compensate. We can get 320 groups of residual error data by simply calculating the difference between these data and these designated positions. And they are used to train GPR model. 10-fold cross validation results of GPR model about x and z error are obtained by using these data. They are shown in Figure14 and Figure15 in this paper.
HCT+GPR_training file: These data record 320 groups of position points of the end effector after using HCT model to compensate. We can get 320 groups of residual error data by simply calculating the difference between these data and these designated positions. And they are used to train GPR model.
Identify_kinematic_parameter_deviation file: Using nonlinear least squares method to minimize the difference between the amended position and actual position. We can get the deviation of kinematic parameters. The procedure to identify the deviation of kinematic parameters is shown in Figure 4. And we can see the result of deviation in Table 2 in this paper.
Four groups of wind speed series
Long-term 3DC Dataset as described in
Virtual Reality to Study the Gap Between Offline and Real-Time EMG-based Gesture Recognition
See instruction at: https://github.com/UlysseCoteAllard/LongTermEMG
This data is related to the article “On the Spectral Quality of Time-Resolved CMOS SPAD-Based Raman Spectroscopy with High Fluorescence Backgrounds” that have been submitted to the IEEE Sensors Journal.
This data is related to the article “On the Spectral Quality of Time-Resolved CMOS SPAD-Based Raman Spectroscopy with High Fluorescence Backgrounds” that have been submitted to the IEEE Sensors Journal. The folder named “Fluorescence_to_Raman_ratio_(post-it_notes)” contains the data that was collected in the measurements where the effects of fluorescence-to-Raman ratio on the spectral quality were studied. Please, see the measurement procedures and results from the article under sections III. B and IV. A, respectively. The folder named “Recording_time_and_excitation_intensity_(oils)” contains the data that was collected in the measurement where the effects of the recording time and excitation intensity on the spectral quality was studied. Please, see the measurement procedures and results from the article under sections III. C and IV. B, respectively.
The measurement data is stored to the text files named as “Data.txt”. The datafiles have 8 columns and 256 rows. The columns represent the 8 time bins of the sensor and the rows in the datafiles represent the 256 spectral columns in the line sensor. The numbers in the cells of the datafiles represent the photon counts at a specific time bin and spectral column, i.e. at a specific wavenumber. The text files named as “Wavenumber_axis.txt” under the two main data folders contains the wavenumber values for each of the spectral columns in the sensor for the different measurements. The folders named as “DCR_corresction_data.txt” contains the dark count correction data for the different measurements.
This dataset was used in the article "Dias-Audibert FL, Navarro LC, de Oliveira DN, Delafiori J, Melo CFOR, Guerreiro TM, Rosa FT, Petenuci DL, Watanabe MAE, Velloso LA, Rocha AR and Catharino RR (2020) Combining Machine Learning and Metabolomics to Identify Weight Gain Biomarkers. Front. Bioeng. Biotechnol. 8:6. doi: 10.3389/fbioe.2020.00006", open access available at: https://doi.org/10.3389/fbioe.2020.00006.
WGMSML-Data folder contains the mass spectra input data for the Matlab scripts which are in WGMSML-MATLAB-SourceCode folder. WGMSML-ExecutionLogsAndPlots contains logs and plots generated by the execution of the Matlab code over the input data. Main scripts are enumerated in the order of execution.