Duplicate Detection
Last updated
Last updated
Duplicate articles are often found when screening the scientific literature. Correctly identifying them is time consuming if performed manually, and becomes even more burdensome as more sources of data are added.
MLM-AI natively performs duplicate detection using different strategies, simplifying screening workflows and removing manual effort.
This page describes how MLM-AI performs duplicate detection, and how users can enable automated screening of duplicates in their results.
The MLM-AI duplicate engine is designed for high precision: to minimize the risk of an erroneous duplicate match, the engine only marks articles as duplicates with high confidence, employing multiple verification approaches.
Because the engine emphasizes safe duplicate detections, certain duplicate articles may not be flagged in all cases, although such instances are expected to be infrequent.
The methods and verification approaches used in MLM-AI are described in the following sections.
MLM-AI identifies duplicate articles according to these criteria:
Duplicate articles are only detected for results from the .
To be marked as a duplicate the article must:
Have same ID and same source database, or
Have same Document Object Identifier (DOI), or
Have similar abstract content (with high confidence)
Candidate duplicate articles must also pass an article title similarity check
In MLM-AI each article receives a unique identifier generated at the source database (in PubMed, for example, this is the article PMID). Articles with same ID are marked as duplicates.
This method is straightforward and useful when Reviews return results for overlapping date ranges.
When an article DOI is available, MLM-AI will present it in the Details tab, linking the article directly to its authoritative source:
If the source journal has published valid DOI information, MLM-AI can use it to detect duplicates.
Finally, MLM-AI can also detect duplicates using content similarity. This happens in two stages:
Select articles with similar content from the MLM-AI database, based on title and abstract
Only articles with a valid abstract are eligible for comparison
Duplicate abstracts do not need to match exactly, but must be highly similar
Title Match
For all types of duplicate, perform a second verification step on the article titles, to prevent detection of false positives
This option is enabled by default. When enabled, duplicate articles are pre-screened and appear in the "Duplicates" tab of the Review results:
Any article where a duplicate has been detected will also contain a "Duplicates" tab, displaying the duplicate reason and linking to the duplicate article. This tab is always available for convenient inspection of duplicates, irrespective of whether pre-screening of duplicates is enabled for your Monitor.
Duplicate detection works only on results from the same Monitor. This is by design, to prevent results from different monitors from interfering with one another.
Duplicate detection by content similarity is only available to results generated from the MLM-AI database. However duplicates by ID and DOI are available on uploaded results.
This approach relies on the (DOI) of the article. The DOI is a unique ID assigned for each publication. Hence, if the same article appears on different databases it will still preserve the same DOI.
This option is only available for results obtained from the MLM-AI database. This excludes results from .
MLM-AI can perform automated screening on duplicate articles. This can be enabled when :
Learn more about how the techniques used in automated duplicate detection on: