An ontology for the multidisciplinary phenomenon of creating web application ove encrypted data

Instructions: 

The ontology OWL file is generated using protege tool and can be imported for future modifications.

Categories:
102 Views

 

 

Instructions: 

The .zip file contains 6 folders when unzipped. We provide the details of each folder below.

 

“Proteins” folder: Contains 20 protein targets organized into two folders (Benchmark and CASP) depending on the family each target belongs to. Data for each protein is provided in a subfolder named with its id. Each such subfolder contains the following 4 files.

  1. A .fasta file containing the amino-acid sequence of the protein.

  2. A .pdb file containing the native tertiary structure coordinates. Detailed format for a .pdb file can be found in http://www.wwpdb.org/documentation/file-format

  3. A .frag3 file containing the fragments of length 3 for the protein sequence generated from http://old.robetta.org/

  4. A .frag9 file containing the fragments of length 9 for the protein sequence generated from http://old.robetta.org/

 

“Generation” folder: Contains the generated ensembles for the protein targets in 20 subfolders, one for each target, named with their ids. Each subfolder contains 5 files, each containing the generated ensemble for one run. Each such file contains 14 columns and each row represents one generated structure. The first column provides the Rosetta score4 energy, the second column provides the lRMSD to the native structure, and each of the rest of the 12 columns provides one USR feature for the structure.

 

“Reduced” folder: Contains the reduced ensembles for each clustering technique in separate folders. Each such folder contains 20 subfolders, one for each target, named with their ids. Each such subfolder contains 5 files, each containing the reduced ensemble for one run. Each such file contains 2 columns and each row represents one structure in the reduced ensemble. The first column provides the Rosetta score4 energy and the second column provides the lRMSD to the native structure.

 

“Truncation” folder: Contains the reduced ensembles via truncation for the protein targets in 20 subfolders, one for each target, named with their ids. Each such subfolder contains 5 files, each containing the reduced ensemble for one run. Each such file contains 2 columns and each row represents one structure in the reduced ensemble. The first column provides the Rosetta score4 energy and the second column provides the lRMSD to the native structure.

 

“Ks” folder: Contains 4 separate files, one for each clustering technique, containing the number of clusters for each run of each protein target. These files can be used to plot the distributions for the number of clusters.

 

“Bars” folder: Contains 3 separate subfolders containing the information needed to plot the bar charts for the minimum, average, and standard deviation of lRMSDs to the native structure for the CASP targets. Each subfolder contains 10 files, one for each target. Each file contains 6 rows that provide the lRMSD value for original ensemble, reduced ensemble for hierarchical clustering, reduced ensemble for k-means clustering, reduced ensemble for GMM clustering, reduced ensemble for gmx-cluster clustering, and reduced ensemble for truncation, respectively.

Categories:
90 Views

Nodes represent personality facets (a description of each facet is provided in Table 3), green lines represent positive connections and red lines represent negative connections. Thicker lines represent stronger connections and thinner lines represent weaker connections. The node placement of all graphs is based on the adaptive LASSO network to facilitate comparison. The width and color are scaled to the strongest edge and are not comparable between graphs; edge strengths in the correlation network are generally stronger than edge strengths in the partial correlation network.

Categories:
113 Views

We develop a general group-based continuous-time Markov epidemic model (GgroupEM) framework for any compartmental epidemic model (e.g., susceptible-infected-susceptible, susceptible-infected-recovered, susceptible-exposed-infected-recovered). Here, a group consists of a collection of individual nodes of a network. This model can be used to understand the critical dynamic characteristics of a stochastic epidemic spreading over large complex networks while being informative about the state of groups.

Categories:
81 Views

This is a file containing the codes for IEEE paper

Categories:
136 Views

Evidence-Based Medicine (EBM) aims to apply the best available evidence gained from scientific methods to clinical decision making. A generally accepted criterion to formulate evidence is to use the PICO framework, where PICO stands for Problem/Population, Intervention, Comparison, and Outcome. Automatic extraction of PICO-related sentences from medical literature is crucial to the success of many EBM applications. In this work, we present our Aceso system, which automatically generates PICO-based evidence summaries from medical literature.

Categories:
37 Views

The age of Artificial Intelligence (AI) is coming. Since Natural Language Processing (NLP) is a core AI technology for communication between humans and devices, it is vital to understand technological trends. Early research on NLP focused on syntactic processing such as information extraction and subject modeling but later developed into the semantic-oriented analysis. To analyze technological trends concerning NLP, especially semantic analysis, patent data that contains objective and extensive information is analyzed.

Categories:
93 Views

The dataset is system activities captured by Procmon on Windows, including running malware WannaPeace and Infostealer.Dexter.

Categories:
109 Views

Considering the ongoing works in Natural Language Processing (NLP) with the Nepali language, it is evident that the use of Artificial Intelligence and NLP on this Devanagari script has still a long way to go. The Nepali language is complex in itself and requires multi-dimensional approaches for pre-processing the unstructured text and training the machines to comprehend the language competently. There seemed a need for a comprehensive Nepali language text corpus containing texts from domains such as News, Finance, Sports, Entertainment, Health, Literature, Technology.

Instructions: 

Here's a quick way to load the .txt file in your favourite IDE.

filename = 'compiled.txt'

file = open(filename, encoding="utf-8")

text = file.read()

Categories:
2634 Views

Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.

Instructions: 

The dataset has basically 2 dimensions, one for class and one for the features. The variations are specified on top of a default dataset, which has the following characteristics:

  • 1.000 entries
  • No outliers
  • No missing values
  • Two dimensions (one relevant feature and one class, no bad features)
  • 80\% Class separation
  • Two Classes
  • No Class Imbalance

 

Thus, six types of datasets were generated, one for each of the six characteristics in the default dataset. In each type of dataset, the system generated four datasets with slight differences in the associated characteristic. For instance, to vary the effect of the number of outliers, the system created datasets with 10\%, 20\%, 30\%, and 40\% of outliers, without changing the other characteristics. The variations of the characteristics are the following:

 

  • Amount of outliers: [10\%, 20\%, 30\%, 40\%, 50\%]
  • Class separation: [100\%, 90\%, 80\%, 70\%, 60\%]
  • Amount of missing values: [10\%, 20\%, 30\%, 40\%, 50\%]
  • Class imbalance: [50\%-50\%, 40\%-60\%, 30\%-70\%, 20\%-80\%, 10\%-90\%]
  • Bad features: [1-1, 1-3, 1-5, 1-7, 1-9]
  • Amount of classes: [2, 12, 22, 32, 42]
Categories:
122 Views

Pages