To provide machine learning and data science experts with a more robust dataset for model training, the well-known Palmer Penguins dataset has been expanded from its original 344 rows to 100,000 rows. This substantial increase was achieved using an adversarial random forest technique, effectively generating additional synthetic data while maintaining key patterns and features. The method achieved an impressive accuracy of 88%, ensuring the expanded dataset remains realistic and suitable for classification tasks.
This dataset is a network representation of authors linked to the publications they have authored or co-authored, collected from OpenAlex.org using the free, open-source tool available at https://openalex4nodexl.netlify.app/. It is provided as a CSV flat file, formatted for use with NodeXL, a popular tool for social network analysis.