The Python Packaging Index is an invaluable resource that is used by developers to improve their projects; however, there are glaring issues in its implementation that will hinder development until resolved. The Python Packaging Index (PyPI) is the official third-party software repository for Python where the majority of open-source Python packages are published. Each package has wheel and egg files accessible from the Python package management system PIP, as well as queryable metadata that contains important package information. The PyPI metadata provides key information such as package license, source code, and dependencies which are necessary for developers working with a given open-source package to ascertain its security and legal risks; however since PyPI currently does not support an effective means of tracking some of this information, it forces a developer to cope with uncertain risk if they wish to use open-source code. In its current state, Python open-source code facilitates project development at the risk of unknown vulnerabilities; however, developers should not be sacrificing efficiency for security. This dataset provides some insight into the PyPI ecosystem, using the LowEndInsight analyzer to process all packages, traverse from module to source code, and then identify potential risks. The dataset is the result of upstream HPC processing against the PyPI generated raw data.
The dataset is a product of LowEndInsight - details here: https://github.com/gtri/lowendinsight