Hello and welcome!
It’s Thursday, the fourth day of the conference. We are beginning to look forward to waking up to the cheerful morning e-mails from Hideo Joho of the Organizing Committee, with key information on events for the day and practical matters. For early birds, there are daily 8:30-9:00AM announcements by an organizing committee member in one of the conference halls. The main keynotes are over, but there are still plenty – maybe even too many – activities to choose from. As always, there are talks on research: full papers, short papers and posters, plus demos.
A noteworthy short paper resource session features work on valuable new resources to research communities, including IR. Our feelings of appreciation are expressed well by the title of the short paper, "Finally, a Downloadable Test Collection of Tweets," by Royal Sequiera and Jimmy Lin of the University of Waterloo. They conducted in-depth studies to show that that captures of a tweet collection by the non-profit Internet Archive can indeed serve as "a drop in replacement … (for) the Tweets2013 collection used in the TREC 2013 and 2014 Microblog Tracks." The authors would like to "share with the community all code and data associated with analyses in this paper as well as instructions for replicating reported data conditions to serve as the basis of future work" at: https://github.com/castorini/Tweets2013-IA .
Other catchy titles in the resource session include: "Cookpad Image Dataset: An Image Collection as Infrastructure for Food Research," by Jun Harashima, Yuichiro Someya, and Yohei Kikuta of Cookpad Inc. According to the authors, "In food-related services, image information is as important as text information for users." The authors reported on their work constructing the Cookpad Image Dataset, "a novel collection of food images taken from Cookpad, the largest recipe search service in the world. The dataset includes more than 1.64 million images after cooking, and it is the largest among existing datasets. Additionally, it includes more than 3.10 million images taken during the cooking process."
In all, there were 15 short resource papers, all of which are very interesting and offer opportunities for collaborations and/or further research. Many of the authors are willing to share parts of their work, i.e., data or code. The other presentations in the session were:
- "Experiments with Convolutional Neural Network Models for Answer Selection" by Jinfeng Rao, Hua He, and Jimmy Lin;
- "Luandri: A Clean Lua Interface to the Indri Search Engine" by Bhaskar Mitra, Fernando Diaz, and Nick Craswell ;
- "SogouT-16: A New Corpus to Embrace IR Research" by Cheng Luo, Yukun Zheng, Yiqun Liu, Xiaochuan Wang, Jingfang Xu, Min Zhang, and Shaoping Ma;
- "A Test Collection for Evaluating Retrieval of Studies for Inclusion in Systematic Reviews" by Harrisen Scells, Guido Zuccon, Bevan Koopman, Anthony Deacon, Leif Azzopardi, and Shlomo Geva;
- "One Million Posts: A Data Set of German Online Discussions" by Dietmar Schabus, Marcin Skowron, and Martin Trapp;
- "KASANDR: A Large-Scale Dataset with Implicit Feedback for Recommendation" by Sumit Sidana, Charlotte Laclau, Massih R. Amini, Gilles Vandelle, and André Bois-Crettez;
- "A Collection for Detecting Triggers of Sentiment Spikes" by Anastasia Giachanou, Ida Mele, and Fabio Crestani;
- "Anserini: Enabling the Use of Lucene for Information Retrieval Research" by Peilin Yang, Hui Fang, and Jimmy Lin;
- "A Stream-based Resource for Multi-Dimensional Evaluation of Recommender Algorithms" by Benjamin Kille, Andreas Lommatzsch, Frank Hopfgartner, Martha Larson, and Arjen P. de Vries;
- "A Large-Scale Query Spelling Correction Corpus" by Matthias Hagen, Martin Potthast, Marcel Gohsen, Anja Rathgeber, and Benno Stein;
- "DBpedia-Entity v2: A Test Collection for Entity Search" by Faegheh Hasibi, Fedor Nikolaev, Chenyan Xiong, Krisztian Balog, Svein Erik Bratsberg, Alexander Kotov, and Jamie Callan;
- "A Cross-Platform Collection for Contextual Suggestion" by Mohammad Aliannejadi, Ida Mele, and Fabio Crestani; and
- "RELink: A Research Framework and Test Collection for Entity-Relationship Retrieval" by Pedro Saleiro, Natasa Milic-Frayling, Eduarda Mendes Rodrigues, and Carlos Soares.
I highly recommend the short paper resource section of the proceedings to practitioners in the field who are looking for a public data set for quick prototyping without the hassles of data extraction, or data for implementation studies for patenting and publishing.
Another interesting session for scientists in industry was The SIGIR Symposium on IR in Practice (SIRIP), organized by Sumio Fujita of Yahoo! Japan and Vanessa Murdoch of Microsoft. SIRIP spanned a full day and featured three sessions: Start-Ups and Beyond; Start-up Research and Academic Collaboration, and Research at Large-scale Search Engines. Unfortunately, SIRIP papers are only 1-page documents in the proceedings so it is difficult to understand works in depth. Engineers at start-ups are pushed for time (even for sleep) and most details are proprietary, so requiring speakers to submit longer documents would be unrealistic.
SIRIP included work from two very famous companies in Asia that are not as well known in the West: Sogou and Naver. Currently, Sogou is the second-largest search engine in the People’s Republic of China. Sogou recognized that Chinese search engine users are interested in "accessing the large amount of foreign language information (to) understand what is happening all over the world." In "Cross-Lingual Information Retrieve in Sogou Search," JingFang Xu, Feifei Zhai, and Zhengshan Xue of Sougou, Beijing describe Sogou English as a cross-lingual information retrieval system that enables users to conduct internet searches in English, using Chinese as the language for input and display of results. Sougou English automatically: (1) translates Chinese queries into English, (2) conducts searches over the Internet in English, (3) retrieves information in English, (4) translates retrieved results into Chinese, and (5) displays results to the user in Chinese.
Naver is the most popular search engine in South Korea. In his talk "Naver Search - Deep Learning Powered Search Portal for Intelligent Information Provision," Inho Kang of Naver described the search engine’s history and its activities in South Korea. In Japan, Naver is famous for its subsidiary Line, a messaging tool that surged in popularity among teens and college age youth shortly after its introduction a few years ago. I came to know of Line’s business for cute stickers with cartoon characters that users can buy (typically for 50 to 100 yen, or roughly 50 cents to one US dollar) to place at the end of Line messages. Within a year, the business expanded to include licensing of Line characters for t-shirts, stuffed toys, etc. Anyone in Tokyo who steps into a store frequented by young teenage girls will notice signs saying to register to join the store’s e-mail club with a friend on Line. Club members and friends from Line will periodically receive discount coupons.
In addition to their participation in the conference sessions, Sogou, Naver, and other corporate sponsors, e.g., Yahoo! Japan, Baidu, Alibaba, Amazon, Rakuten, Wider Planet, Microsoft, Yahoo! Research, Google, IBM, eBay, Huawei, Yandex, Hitachi, JASIST, and Facebook, had booths in one of the conference ballrooms. Young employees in the booths were happy to answer questions about employment opportunities. The zaniest fun giveaway was Yahoo! Research’s purple rubber duckie – the purple counterpart of the classic yellow bathtub toy. The classiest was Alibaba’s miniature statue of the Monkey King from the Beijing Opera with a little curl on top of the head (Alibaba’s trademark). The most practical was Naver’s t-shirt (and maybe eBay’s hoodie for male nerds).
SIGIR featured many social events, the highlight of which was the main banquet at Chinzanso, a luxury hotel built in a green oasis in central Tokyo, dating back over 700 years. A stroll through the Kaiyu-style Japanese garden, with smiling cherub-like statues and a three-story pagoda, brings a sense of serenity and escape from the harried lifestyle of modern Japan. Basho Matsuo, one of Japan’s famous poets, lived by the gardens in the early Edo Era; Basho’s haikus are well known for evoking quiet contemplation and appreciation of nature.
The hot and humid weather on the night of the banquet made the stroll through the gardens tough, but there were many other exciting activities inside. The party began with "kagami biraki, a traditional Japanese ceremony that is believed to foster harmony and good fortune." We had a practice run calling out, "Yoisho! Yoisho! Yoisho!" to ensure our spirited cheers would enable the PC Chairs to open the sake barrel. Our hosts had warned us, "No Yoisho, No Sake!" There were so many types of gourmet foods and drinks that I can’t remember them all: cocktails with dozens of appetizers, then sushi, beef, vegetarian salads, Halal foods, … An artisan delighted guests as he made colorful, 3-D animal-shaped lollipops. Of course, audience members could take home their favorite miniature animal as a gift to remember the occasion. We were really full by the end of the evening and were ready to go to sleep.
On Thursday morning we were jolted awake. At the conference venue, Hideo Joho sighed as he began his announcements: "So far, we have survived a typhoon and a minor earthquake" (magnitude ~5.0, with epicenter just north of Tokyo). We were asked to minimize use of Wi-Fi as much as possible (no video or massive file downloads, please); the expectedly huge number of registrants was straining capacity. He smiled to re-assure us that all was fine. We were coming down the home stretch. We were going to pass the 900 mark for registrants, and all conference events were proceeding swimmingly. In the end, we not only survived, we prevailed. We learned a lot of new things, made friends, and had a great time.
This is the fifth and final blog on ACM SIGIR 2017 held in Shinjuku-ku, Tokyo, Japan. I would like to express my thanks to the conference organizers, participants, attendees, and the readers of this blog. To your good health, happiness and work on IR – Cheers!
I look forward to meeting you again. … !
My previous blogs on SIGIR are:
Blog #4 Keynotes at SIGIR 2017
Blog #3 SIGIR2017: Diversity and Inclusion
Blog #2 Neural Networks in IR: full-day tutorial
Mei Kobayashi is manager, Data Science/Text Analysis at NTT Communications.