Quicksilver – A Natural Language Processing System that Writes Wikipedia Entries
- Quicksilver is a machine learning powered tool that creates and updates articles on Wikipedia
- It has so far generated 40,000 summaries for scientists missing from Wikipedia articles
- Quicksilver was tested in New York during an edit-a-thon and the results were pretty encouraging
Whenever we search for a famous personality on Google, their Wikipedia page is usually the first thing that pops up. The free-for-all encyclopedia has become the go-to tool for people of all ages, from students looking for homework material to journalists looking for confirmation on their research. But a disturbing trend has emerged of late.
Quite a few people pointed out that Wikipedia was suffering from gender bias, that is, a lot of the popular female personalities did not have their dedicated page. Take the example of Mirian Adelson. She is a well known physician and has published tons of research papers in her career. Until recently, she did not even have a Wikipedia entry!
Thousands of such names were flagged by Quicksilver, a software tool introduced by Primer, a startup in San Fransisco. Particularly targeting women in science, Quicksilver found that only 18% biographies on Wikipedia were of women. Further digging also revealed that approximately 84-90% of the Wikipedia editors are male.
Wikipedia has pages and subsequent details for innumerable topics and figuring out the blind spots would understandably be an impossible task for the editors. Quicksilver has been designed with the aim of helping the editors accomplish this task. As stated by the developers, Quicksilver uses machine-learning algorithms to scour news articles and scientific citations to find notable scientists missing from Wikipedia. Not only this, it can also write fully sourced draft entries for them.
How Quicksilver works
Their algorithm was trained on 30,000 Wikipedia articles about scientists. It detects signals in the articles that correlate with a researcher having an entry on the site which is used by Quicksilver to find notable missing names. This is done by cross-referencing existing Wikipedia entries with a list of 200,000 scientific authors drawn from an academic search engine (called Semantic Scholar). The software sources the facts needed to write missing entries from a collection of 500 million news articles and feeds them into a system trained to generate biographical entries from past examples.
Below is a draft created for Miriam Adelson –
Miriam Adelson is a doctor and chairman of The Dr. Miriam & Sheldon G. Adelson Clinic for Drug Abuse Treatment and Research. With her husband, Sheldon Adelson, she owns the Las Vegas Review-Journal and Israel Hayom. She was listed by Forbes in June 2015 as having a fortune of $28 billion, making him[sic] the 18th richest person in the world. She has frequently been cited in media reports as the newspaper’s owner, including by JTA.
Quicksilver has already prepared 40,000 summaries for both male and female scientists which was missing from Wikipedia. You can see a sample of hundred entries here. These summaries are not directly published on the site but are meant to be a starting point for Wikipedia’s editors.
Quicksilver can also help the editors at keeping the present articles up to date. It was tested in New York during an edit-a-thon, conducted by the American Museum of Natural History, with the aim of improving the existing eateries on women scientists. Quicksilver created sparse Wikipedia bio for women scientists by scraping facts from the web and providing source links along with them. This helped 35 first-time editors in updating their pages for 70 women scientists within 2 hours! Pretty impressive, isn’t it?
Our take on this
Given how popular Wikipedia is and the wide-ranging implications any bias related to it might have, it’s good to see machine learning bridging that gap. This research not only helps bring under-represented scientists in the spotlight, it is also a shining example of how useful ML can be.
I personally feel that we are just scraping the surface as far as the uses of NLP are concerned. The algorithm behind Quicksilver is fairly easy to understand (though of course equally complex to design). Do check out Primer.ai and see the different domains they are breaching using NLP.
There is so much text out there – both existing and being generated every day. I’m looking forward to seeing more of these researches soon!
Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!