[Heb-NACO] Fwd: Identifying duplicate records using machine learning—non-Latin data labeling for WorldCat

Shinohara, Jasmin jshino at upenn.edu
Mon Oct 23 11:14:59 EDT 2023


Dear colleagues, please see below for an opportunity to contribute to OCLC deduplication efforts specific to records for Hebraica. I hope you'll consider participating!

Besorot tovot, Jasmin



From: Whitacre,Cynthia
Sent: Monday, October 23, 2023 9:53 AM
To: OCLC-CJK <OCLC-CJK at oclclists.org>; OCLC-CAT <OCLC-CAT at oclclists.org>
Subject: Identifying duplicate records using machine learning—non-Latin data labeling for WorldCat



As part of continuous quality improvement efforts, earlier this year we began implementation of our machine learning model to identify duplicate records in WorldCat. The initial phase focused on records for print books and e-books published in Latin-script languages; the results of these efforts have improved the cataloging, discovery, and interlibrary loan experiences for library staff and end users across the world.

 In our next phase, we’re expanding the focus to records that include non-Latin script describing print books, e-books, audiobooks, journals, and videos published in Chinese, Japanese, Korean, Thai, Arabic, Hebrew, and Russian. We invite metadata experts in those languages to validate our machine learning model’s understanding of duplicates by participating in a data labeling exercise using a simple, intuitive, online interface. Are the two records functionally equivalent? Do they represent the same manifestation? Or are they truly different manifestations that just seem the same, except for some important differences that only you can spot.

Put my skills to the test!<https://urldefense.com/v3/__https://labelduplicates.worldcat.org/__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHl-AUWm8w$>

 The interface will remain open through 15 December 2023, at which time we will begin analyzing the collected data. For more information, check out the participation instructions<https://urldefense.com/v3/__https://www.oclc.org/content/dam/oclc/worldcat/data-labeling-participation-instructions.pdf__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHkbEELBYQ$> and read the FAQs<https://urldefense.com/v3/__https://www.oclc.org/content/dam/oclc/worldcat/data-labeling-faq.pdf__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHkkO-nP7A$>.

With your help we can better scale the resolution of duplicate records in WorldCat, saving countless hours of time and improving the experience for the global library community. Thanks for all you do to advance the mission of libraries worldwide—we appreciate your ongoing collaboration!





Cynthia M. Whitacre  (she/her)

OCLC · Senior Metadata Operations Manager, Membership & Research Division

6565 Kilgour Place, Dublin, Ohio, 43017  United States

T +1-800-848-5878, ext. 6183

Direct: +1-614-764-6183

[cid:image001.png at 01DA0596.217A56F0]<https://urldefense.com/v3/__https://protect-us.mimecast.com/s/7wIlCOYGoPuZ7LzZuvR3Uh?domain=help.oclc.org__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHliAkCyFw$>

OCLC.org<https://urldefense.com/v3/__https://protect-us.mimecast.com/s/UX8vCPNGpzINj18NS1ZnRF?domain=oclc.org__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHk8Uo727g$>· Twitter<https://urldefense.com/v3/__https://protect-us.mimecast.com/s/YT5MCQWXq6Hl21MlckPl_B?domain=twitter.com__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHk8MpyZZQ$> · Facebook<https://urldefense.com/v3/__https://protect-us.mimecast.com/s/KHXACR6XrPh0MDj0uqo-w_?domain=facebook.com__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHmCrpBhsQ$> · YouTube <https://urldefense.com/v3/__https://protect-us.mimecast.com/s/sSk7CVOWy7fkWnNkikfdzU?domain=youtube.com__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHlpsdkfow$> · LinkedIn<https://urldefense.com/v3/__https://protect-us.mimecast.com/s/J7P7CW61zQhXMLRXS1PUAo?domain=linkedin.com__;!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHkirCmFgg$>· Instagram<https://urldefense.com/v3/__https://protect-us.mimecast.com/s/StQXCXDPAruD8PVDfMpqJy?domain=instagram.com*__;Lw!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHlna6HjyQ$> ·Next blog<https://urldefense.com/v3/__https://protect-us.mimecast.com/s/rOX0CYEZByHpnWEpt21W42?domain=oclc.org*__;Lw!!IBzWLUs!TYO67SfBPZlR4CVdXw5Pj6h41nrYXMhqkVCnHoXKNCjKflObJygQHsKxJYhKEe2yplzf4l1warHvzHlHv3B1gA$>






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/heb-naco/attachments/20231023/511a3294/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 15635 bytes
Desc: image001.png
URL: <http://lists.osu.edu/pipermail/heb-naco/attachments/20231023/511a3294/attachment-0001.png>


More information about the Heb-naco mailing list