NAFARROAKO ondare materiagabearen ARTXIBOA

  • Argitaratze urtea:
    2022
  • Egileak:
  • -   Tao, F.
    -   Hao, W.
    -   Yueyan, L.
    -   Sanhong, D.
  • Aldizkaria:
    Data Analysis and Knowledge Discovery
  • Bolumena:
    6
  • Zenbakia:
    2-3
  • Orrialdeak:
    329–337
  • ISSN:
    20963467 (ISSN)
Digital Humanities; Image Classification; Multimodal Classification;
[Objective] This paper proposes a new method combining images and texual descriptions, aiming to improve the classification of Intangible Cultural Heritage (ICH) images. [Methods] We built the new model with multimodal fusion, which includes a fine-tuned deep pre-trained model for extracting visual semantic features, a BERT model for extracting textual features, a fusion layer for concatenating visual and textual features, and an output layer for predicting labels. [Results] We examined the proposed model with the national ICH project-New Year Prints to classify the Mianzu Prints, Taohuawu Prints, Yangjiabu Prints, and Yangliuqing Prints. We found that fine-tuning the convolutional layer strengthened the visual semantics features of the ICH images, and the F1 value for classification reached 72.028%. Compared with the baseline models, our method yielded the best results, with a F1 value of 77.574%. [Limitations] The proposed model was only tested on New Year Prints, which needs to be expanded to more ICH projects in the future. [Conclusions] Adding textual description features can improve the performance of ICH image classification. Fine-tuning convolutional layers in image deep pre-trained model can improve extraction of visual semantics features.