Institute Disambiguation

Using Crossref metadata, we have devised a method to perform Institute Disambiguation. While most Institute Disambiguation methods use some variation of string matching to disambiguate, we use a radically different approach of looking at co-occurrences of Author-Affiliation pairs. Initial results are very promising and we show that our method is complementary to the existing state of the art methods. Our method is devised to leverage the power of TPUs to process 50,000 papers in few minutes. Additionally we have enabled batch processing to make the method scalable. The method and its results can be used by ROR to improve their database with aliases and new entries at scale. This will help improve the accuracy of the current Institute Disambiguation tools.

CHECKOUT OUR RESULTS BY CLICKING THE BUTTONS BELOW 👇


Higher threshold gives better accuracy.
Threshold 3 Threshold 4 Threshold 5