Grand Challenge on Detecting Cheapfakes

1University of Bergen, 2SimulaMet, 3KU Leuven, 4NICT, 5University of Science - VNUHCM (HCMUS)

Deepfakes (left): These are falsified media created using sophisticated AI-based media manipulation tools and techniques. Cheapfakes (right): These include falsified media created with/without contemporary non-AI based editing tools which are easily accessible. Photoshopping tools can be used to tamper with images. Videos can be sped up or slowed down to change the intent or misrepresent the person in the video. Re-contextualizing includes associating falsified or unrelated claims with a genuine image to misrepresent events or persons. This challenge is focused on detecting re-contextualized cheapfakes.


Cheapfake is a recently coined term that encompasses non-AI ("cheap") manipulations of multimedia content. Cheapfakes are known to be more prevalent than deepfakes. Cheapfake media can be created using editing software for image/video manipulations, or even without using any software, by simply altering the context of an image/video by sharing the media alongside misleading claims. This alteration of context is referred to as out-of-context (OOC) misuse of media. OOC media is much harder to detect than fake media, since the images and videos are not tampered. In this challenge, we focus on detecting OOC images, and more specifically the misuse of real photographs with conflicting image captions in news items. The aim of this challenge is to develop and benchmark models that can be used to detect whether given samples (news image and associated captions) are OOC, based on the recently compiled COSMOS dataset.

Challenge Tasks

An image serves as evidence of the event described by a news caption. If two captions associated with an image are valid, then they should describe the same event. If they align with the same object(s) in the image, then they should be broadly conveying the same information. Based on these patterns, we define out-of-context (OOC) use of an image as presenting the image as an evidence of untrue and/or unrelated event(s)

Task 1

Every image in the dataset is accompanied by two related captions. If the two captions refer to same object(s) in the image, but are semantically different, i.e., associate the same subject to different events, this indicates out-of-context (OOC) use of the image. However, if the captions correspond to the same event, irrespective of the object(s) the captions describe, this is defined as not-out-of-context (NOOC) use of the image.

In this task, the participants are asked to come up with methods to detect conflicting image-caption triplets, which indicates miscontextualization. More specifically, given (Image,Caption1,Caption2) triplets as input, the proposed model should predict corresponding class labels (OOC or NOOC). The end goal for this task is not to identify which of the two captions is true/false, but rather to detect the existence of miscontextualization. This kind of a setup is considered particularly useful for assisting fact checkers, as highlighting conflicting image-caption triplets allows them to narrow down their search space.

Task 2

A NOOC scenario from Task 1 makes no conclusions regarding the veracity of the statements. In a practical scenario, multiple captions might not be available for a given image. In such a scenario, the task boils down to figuring out whether a given caption linked to the image is genuine or not. We argue that this is a challenging task, even for human moderators, without prior knowledge about the image origin. Luo et al. verified this claim with a study on human evaluators who were instructed not to use search engines, where the average human accuracy was around 65%.

In this task, the participants are asked to come up with methods to determine whether a given (Image,Caption) pair is genuine (real) or falsely generated (fake). Since our dataset only contains real, non-photoshopped images, it is suitable for a practical use case and challenging at the same time.

Evaluation Criteria

The submitted models will be evaluated for both Effectiveness and Efficiency.

  • Effectiveness: The effectiveness of the submitted models will be calculated using three classification metrics including, (1) Accuracy, (2) Average Precision, and (3) F1-Score. We suggest challenge participants to calculate these three metrics for their model/s and include the scores in their submission document.
  • Efficiency: In some situations, light-weight models capable of running in real-time and with minimal resources can be more crucial than the detection performance alone. We take this aspect into consideration by introducing an additional evaluation criteria, i.e., having low latency and low model complexity. We refer to this as efficiency. Submitted models will also be evaluated for efficiency using the following three metrics, i.e., (1) Number of Trainable Parameters, (2) GFLOPs (should be calculated for 1000 test samples in the public test set), and (3) Model Size (storage size in MBs). We suggest challenge participants to calculate these three metrics for their model/s and include the numbers in their submission document.

Important Dates

Date Activity
January 02, 2024 Dataset release (public training set).
January 22, 2024 Dataset release (public test set).
March 15, 2024
March 22, 2024
Paper and Model submission deadline.
March 25, 2024 Model evaluation results announcement.
March 31, 2024 Paper acceptance notification.
April 25, 2024 Camear-ready paper due.

Paper Submission Guidelines

  • Paper Length: Papers must be no longer than 6 pages, including all text, figures, and references.
  • Format: Grand Challenge papers have the same format as regular papers. See the example paper under the General Information section below. However, their review is single blind.
  • Submission: Submit the written component via CMT under the appropriate Grand Challenge track. Submit the data component, if any, directly to the Grand Challenge organizers as specified on the appropriate Grand Challenge site.
  • Review: Submissions of both written and data components will be reviewed directly by the Grand Challenge organizers. Accepted submissions (written component only) will be included in the USB Proceedings and the authors will be given the opportunity to present their work at ICMR. “Winning” submissions will be announced by the Grand Challenge organizers at the conference. Submissions may be accompanied by up to 20 MB of supplemental material following the same guidelines as regular and special session papers.
  • Presentation guarantee: As with accepted Regular and Special Session papers, accepted Grand Challenge papers must be registered by the author deadline and presented at the conference; otherwise they will not be included in IEEE Xplore. A Grand Challenge paper is covered by a full-conference registration only.

More details regarding submissions can be found here

Code Submission Guidelines

  • Participants can submit their solution as a notebook, e.g., Google Colab or Jupyter Notebook or Python Executable (.py file).
  • Please make sure that the notebook, or the .py file is directly executable by the challenge organizers without significant modifications.
  • Data injection should be possible by changing a single line to update the parameter INPUT_FOLDER. This value should be assumed to be the path to the folder containing the hidden test split file test.json.
  • Participants can upload their project (codes, models) to Google Drive, Microsoft Drive or any other similar online storage. The link must be shared with challenge organizers and also mentioned in the papers.
  • If the participants would like to share their solutions as Docker containers, please follow instructions here.
  • Participants can email the links to their Colabs, code repositories, or projects on Google Drive etc to Also, please make sure that the solution must accompanied by a report (word document, pdf) explaining the proposed solution, the achieved results, and other relevant details which can be helpful for the organizers to run and evaluate solutions.


Table 1 below shows official results for ACM ICMR 2024 Grand Challenge on Detecting Cheapfakes. The submitted models where evaluated on private test set which was specifically collected for this year's challenge.

Table 1 - Performance (Acc) of submitted models on private test set collected for ACM ICMR 2024 Grand Challenge on Detecting Cheapfakes.
ID Team Task 1 - Public Task 2 - Public Task 1 - Private Task 2 - Private
496 Vo-Hoang et. al. 86.00% - 62.50% -
497 Pham et. al. 88.90% - 72.20% -
498 Le et. al. 79.40% - 52.82% 54.84%
500 Nguyen et. al. 95.60% 93.00% 61.69% 45.16%
501 Vu et. al. 82.90% - 64.52% -
502 Seo et. al. 71.90% 55.70% 64.11% 50.81%

Table 2 - Additional performance details of submitted models on both 2023 and 2024 private test sets.
Team Task 1 - Public Task 2 - Public Task 1 - Private 2023 Task 2 - Private 2023 Task 1 - Private 2024 Task 2 - Private 2024
Vo-Hoang et. al. 86.00% - 80.90% - 62.50% -
Pham et. al. 88.90% - 87.70% - 72.20% -
Le et. al. 79.40% - 71.36% 63.00% 52.82% 54.84%
Nguyen et. al. 95.60% 93.00% 88.64% 65.50% 61.69% 45.16%
Vu et. al. 82.90% - 72.27% - 64.52% -
Seo et. al. 71.90% 55.70% 75.90% 61.50% 64.11% 50.81%