Dataset for An evaluation of X.509 certificate revocation and related privacy issues in the Web PKI ecosystem
In order to analyze the status of web domain certificates in the Web PKI, a relevant set of X.509 certificates must be built. This dataset was created based on the Alexa Top 1 Million (Top1M) List (available on 26 August 2021) and the Majestic Top 1 Million List (available on 21 November 2022) containing the most visited websites. We collected the X.509 certificates for the web domains in the Alexa Top1M list (file "ListAlexaFinal.txt") with two dedicated scripts in the dataset, namely "ScriptCollectCertificates.sh" and "StartScript_CollectCertificates.sh". The downloaded certificates have been stored in a directory, namely "CollectedCertificatesAlexaList2021". Helpful scripts (in Python) for analyzing the certificates' content are found in the directory "AnalysisScripts". Support for OCSP stapling mechanism provided by the websites can be checked with the scripts "StartScript_OCSPstaplingcheck.sh" and "ScriptOCSPstaplingcheck.sh" in the dataset. The certificates in the Majestic Top1M list (file "majestic_million.txt" in the "MajesticTop1MList2022_Script" directory) have been analyzed for OCSP stapling support, and SCT extension, with the help of script named "tls_client.py".
See the short description in the Abstract and the Attachment.