Duplicate Detection

Duplicate articles are often found when screening the scientific literature. Correctly identifying them is time consuming if performed manually, and becomes even more burdensome as more sources of data are added.

MLM-AI natively performs duplicate detection using different strategies, simplifying screening workflows and removing manual effort.

This page describes how MLM-AI performs duplicate detection, and how users can enable automated screening of duplicates in their results.

Design Principles

The MLM-AI duplicate engine is designed for high precision: to minimize the risk of an erroneous duplicate match, the engine only marks articles as duplicates with high confidence, employing multiple verification approaches.

Because the engine emphasizes safe duplicate detections, certain duplicate articles may not be flagged in all cases, although such instances are expected to be infrequent.

The methods and verification approaches used in MLM-AI are described in the following sections.

How it Works? Duplicate Detection in the MLM-AI Database

MLM-AI identifies duplicate articles according to these criteria:

  • Duplicate articles are only detected for results from the same Monitor.

  • To be marked as a duplicate the article must:

    • Have same ID and same source database, or

    • Have same Document Object Identifier (DOI), or

    • Have similar abstract content (with high confidence)

  • Candidate duplicate articles must also pass an article title similarity check

Same ID and Source Database

In MLM-AI each article receives a unique identifier generated at the source database (in PubMed, for example, this is the article PMID). Articles with same ID are marked as duplicates.

This method is straightforward and useful when Reviews return results for overlapping date ranges.

Same Document Object Identifier

This approach relies on the Digital Object Identifier (DOI) of the article. The DOI is a unique ID assigned for each publication. Hence, if the same article appears on different databases it will still preserve the same DOI.

When an article DOI is available, MLM-AI will present it in the Details tab, linking the article directly to its authoritative source:

If the source journal has published valid DOI information, MLM-AI can use it to detect duplicates.

Similar Content

Finally, MLM-AI can also detect duplicates using content similarity. This happens in two stages:

  • Select articles with similar content from the MLM-AI database, based on title and abstract

    • Only articles with a valid abstract are eligible for comparison

    • Duplicate abstracts do not need to match exactly, but must be highly similar

This option is only available for results obtained from the MLM-AI database. This excludes results from uploads from external sources.

Title Match

  • For all types of duplicate, perform a second verification step on the article titles, to prevent detection of false positives

Enabling Automated Duplicate Detection

MLM-AI can perform automated screening on duplicate articles. This can be enabled when configuring Monitors:

This option is enabled by default. When enabled, duplicate articles are pre-screened and appear in the "Duplicates" tab of the Review results:

Inspecting Duplicate Articles

Any article where a duplicate has been detected will also contain a "Duplicates" tab, displaying the duplicate reason and linking to the duplicate article. This tab is always available for convenient inspection of duplicates, irrespective of whether pre-screening of duplicates is enabled for your Monitor.

Limitations

  • Duplicate detection works only on results from the same Monitor. This is by design, to prevent results from different monitors from interfering with one another.

  • Duplicate detection by content similarity is only available to results generated from the MLM-AI database. However duplicates by ID and DOI are available on uploaded results.

Learn More

Last updated