domain adaptation benchmark datasets

Citation Author(s):: Kaizhong Jin
Submitted by:: Kaizhong Jin
Last updated:: Wed, 02/21/2024 - 05:04
DOI:: 10.21227/fkye-zp36

204 views

Categories:

Machine Learning

Keywords:

differential privacy

domain adaptation

ACCESS DATASET CITE

Abstract

We evaluate our approach on three popular domain adaptation benchmark datasets. The first one is Office-Caltech10 dataset, which contains images of 10 object categories from an office environment (e.g., keyboard, laptop) in 4 sources: Amazon, Caltech256, DSLR, and Webcam. We encode each source into 4096-dimensional feature vectors. Using each source as a domain, we get four domains leading to 12 domain adaptation tasks. The second one is Office-Home dataset, which contains images of 65 object categories found typically in Office and Home settings. The dataset includes 4 domains: Art, Clipart, Realworld and Product. We encode each domain into 4096-dimensional feature vectors. Similarly, twelve domain adaptation tasks are conducted by taking one sub-dataset as the source domain and the other one as the target domain. The third one is Amazon review dataset that is used for sentimental analysis of text. The dataset contains Amazon reviews on 4 domains: Book, DVD, Kitchen and Electronics, yielding 12 domain adaptation tasks of source-target domain pairs.

Instructions:

The Office-Caltech10 dataset contains 4 domains: Amazon, Caltech256, DSLR, and Webcam with 157, 1,123, 295 and 958 image samples respectively. Amazon contains online retail images from Amazon.com. Caltech256 is a collection of object images from the california institute of technology. DSLR refers to images captured using a digital camera. Webcam represents low-resolution images taken by a web camera. The Office-Home dataset contains 4 domains: Art, Clipart, Realworld and Product with 4000 samples per domain. Art contains images of creations, such as paintings, sketches and artistic depictions. Clipart is collected from the clipart images. Realword and Product consist of regular images captured with a camera with and without background, respectively. The Amazon review dataset contains Amazon reviews on 4 domains: Book, DVD, Kitchen and Electronics. There are 2000 positive and 2000 negative reviews on each domain. Book contains the opinions on content, writing, and reader satisfaction of books. DVD covers movie and TV show reviews. Kitchen includes reviews of kitchen appliances and tools. Electronics comprises reviews of devices like smartphones and laptops.