Datasets
Standard Dataset
Beta Lactamase Sequences
- Citation Author(s):
- Submitted by:
- Muhammad Ashraf
- Last updated:
- Mon, 10/21/2019 - 09:55
- DOI:
- 10.21227/jjwj-py59
- License:
- Categories:
Abstract
A well-known publicly available database namely UniProt was the main source for collection beta-lactamase and non-beta-lactamase protein sequences. To obtain relevant positive sequences ‘beta-lactamase’ was used as a keyword. The dataset was meticulously collected by excluding ambiguous sequences, only those sequences were selected which were not annotated with dubious words like potential, by similarity or probable. Moreover, the sequence should be a complete sequence and hence should not be annotated with words like fragment. beta-lactamase protein sequences as well. A total of 2172 sequences of β-lactamases were obtained after selecting representative sequences using CD-HIT. Similarly, non-beta-lactamase sequences were also collected. UniProt database was used to collect a set of 3463 non-beta-lactamase sequences.
1. A well-known publicly available database namely UniProt: https://www.uniprot.org/ was the main source for the collection of beta-lactamase and non-beta-lactamase protein sequences.
2. To obtain relevant positive sequences ‘beta-lactamase’ was used as a keyword.
3. The dataset was meticulously collected by excluding ambiguous sequences; only those sequences were selected which were not annotated with dubious words like potential, by similarity or probable.
4. Moreover, the sequence should be a complete sequence and hence should not be annotated with words like fragment. beta-lactamase protein sequences as well.
5. A total of 2172 sequences of β-lactamases were obtained after selecting representative sequences using CD-HIT.
6. Similarly, non-beta-lactamase sequences were also collected. UniProt database was used to collect a set of 3463 non-beta-lactamase sequences.
Dataset Files
- Beta Lactamase Sequences.csv (2.74 MB)
- S1 File.csv (2.74 MB)