Skip to content
Start main Content

IEEE DataPort Datasets: Free Access Only for a Limited Time

IEEE DataPort is an online data repository developed by IEEE. This post is a preliminary review of this relatively new offering. Potential issues on access and deposit of datasets are highlighted.

Overview

As the “About” page describes, IEEE DataPort offers free uploads of any dataset up to 2TB for those that need to retain and manage their valuable research data. The platform was first introduced in 2016. Since late 2019, IEEE began to heavily promote the platform and encouraged researchers to deposit their research data through events such as its annual Dataset Upload Contests and various Data Competitions. Since March 2020, over 105 IEEE journals and magazines have been integrated with IEEE DataPort, which enables authors to have direct access to IEEE DataPort and the opportunity to store data as part of the article submission process. As such, the platform has grown to have over 1,500 datasets and over 670,000 global users.

Access Datasets with Subscription

There are three types of datasets in IEEE DataPort. Access model to these datasets is different:

  • Standard (Non-Open Access) Dataset: Must be IEEE Paid DataPort subscriber to access files
  • Open Access Dataset: All IEEE account users may access files
  • Data Competition Dataset: Subject to the Access Policy set by the Competition Organizer

As the word “paid” infers, DataPort is designed as a fee-based service. Currently, IEEE is providing free access to all IEEE Society members through at least December 2021. For non-IEEE members, a subscription which originally costs US$40 per month can be waived by using a coupon code at checkout.

To claim my free subscription to access standard IEEE datasets, I applied the discount code in my order. However, it turned out that billing info (i.e., credit card) is required for me to complete the transaction:

payment system in ieee dataport

Billing info is required for activating the subscription

Since the website does not mention how long the promotion will last, I have concerns over me being charged for a monthly subscription if it suddenly resumes, and therefore I exited the payment system. I guess I’m not the only person who got intimated by the purchase system and gave up accessing the datasets.

Submit Datasets

Publicity materials of IEEE emphasizes that users can upload datasets to IEEE DataPort for free. Yet, we should point out that it is the Standard Datasets that can be uploaded with no charge. The processing fee for publishing Open Access Dataset would be US$1,950. Although at this moment users can use a coupon code to publish OA dataset for free, this discount is limited to 1 per user.

As of 27 May 2021, we can find 507 IEEE DataPort Datasets linked IEEE publications in IEEE Xplore. These five hundred datasets could be underlying data used to support the publications, or they could be the output of the research projects. I randomly browsed some datasets and have already identified two records with issues. The errors could be introduced by the dataset authors when they uploading the files to the platform.

Standard (Non-OA) Dataset for an OA Paper

A recent paper “Density Sensitive Random Walk for Local Community Detection” with dataset was published in IEEE’s flagship OA journal IEEE Access. Following the dataset’s link, we will find that the dataset is a Standard, meaning the dataset will not be accessible to users because of the subscription paywall (once it starts), or to users like me who hesitate to leave IEEE their credit card information. However, in the full-text of the article, it says: “All the datasets and codes used in this paper will be open source later.” It is likely that the author accidentally or unknowingly picked the wrong dataset type when uploading the files to IEEE DataPort, but shouldn’t there be any quality measurements/mechanism preventing this from happening?

Mismatch Dataset with Paper

I was much surprised to see a 1993 paper “Database Mining: a Performance Perspective” supplemented with a dataset “Population Sample Artificially Generated” as I wouldn’t expect papers published back then would be accompanied with a dataset.

A dataset mismatched to a 1993 paper

A recent dataset was associated with a paper published in 1993.

As it turned out, the dataset is a set of data generated using the WEKA Agrawal classification generator, which is based on this paper. The author of the dataset may have wished to cite the original paper for his dataset using the “Link to Xplore articles” option when describing the dataset using metadata. However, it ended up being presented as a dataset used in the original paper in IEEE Xplore, which could be very confusing to users.

CC-BY License for Both Standard and OA Datasets

It also came to my attention that when submitting datasets, authors do not have the flexibility to customize/define the terms of use of their data. Rather, they shall agree that the content (no matter whether the dataset is OA or not) will be made available subject to the terms of the Creative Commons Attribution (CC-BY) License (see 2.g in Terms of Use). So this is another area that researchers shall take into consideration should they consider using IEEE Dataport for data sharing. 

Summary & Additional Support

Indeed IEEE DataPort has many benefits. For example, the dataset can be published immediately, the publishing is integrated with IEEE’s article submission workflow, the dataset size can be up to 2TB each, and digital object identifier (DOI) can be automatically generated for each dataset. However, we should be aware that many datasets in IEEE DataPort are underlying data provided by researchers as supplemental information for their IEEE articles however they may be set as Standard Datasets (non-OA).  For a limited time, access to this collection is free, but eventually the product may be commercialized and as we can see, the platform already has a functional subscription system. So researchers should be mindful about the potential commercialization of your Standard datasets. 

In addition, researchers need to be very careful in providing dataset metadata, giving detailed documentation and instructions, and choosing appropriate license type. For HKUST community members who are considering sharing your research data via data repositories, please consider choosing DataSpace@HKUST, our institutional data repository. Our colleagues in the DataSpace Team (lbds@ust.hk) are happy to create the metadata following standards, tidy up any formatting issues, and publish the dataset for you.

– By Jennifer Gu, Library

Hits: 3558

Go Back to page Top

Tags: , ,

published May 27, 2021
last modified March 11, 2022