Skip to main content

Datasets

Standard Dataset

Beta Lactamase Sequences

Citation Author(s):
Muhammad Adeel Ashraf (University of Management and Technology)
Submitted by:
Muhammad Ashraf
Last updated:
DOI:
10.21227/jjwj-py59
48 views
Categories:
No Ratings Yet

Abstract

A well-known publicly available database namely UniProt was the main source for collection beta-lactamase and non-beta-lactamase protein sequences. To obtain relevant positive sequences ‘beta-lactamase’ was used as a keyword. The dataset was meticulously collected by excluding ambiguous sequences, only those sequences were selected which were not annotated with dubious words like potential, by similarity or probable. Moreover, the sequence should be a complete sequence and hence should not be annotated with words like fragment. beta-lactamase protein sequences as well. A total of 2172 sequences of β-lactamases were obtained after selecting representative sequences using CD-HIT. Similarly, non-beta-lactamase sequences were also collected. UniProt database was used to collect a set of 3463 non-beta-lactamase sequences.

Instructions:

1. A well-known publicly available database namely UniProt: https://www.uniprot.org/ was the main source for  the collection of beta-lactamase and non-beta-lactamase protein sequences.

2. To obtain relevant positive sequences ‘beta-lactamase’ was used as a keyword.

3. The dataset was meticulously collected by excluding ambiguous sequences; only those sequences were selected which were not annotated with dubious words like potential, by similarity or probable.

4. Moreover, the sequence should be a complete sequence and hence should not be annotated with words like fragment. beta-lactamase protein sequences as well.

5. A total of 2172 sequences of β-lactamases were obtained after selecting representative sequences using CD-HIT.

6. Similarly, non-beta-lactamase sequences were also collected. UniProt database was used to collect a set of 3463 non-beta-lactamase sequences.