Imagine a travel agent who knows in detail all the popular tourists spots in major cities around the world, how long it takes to see each site, and the required travel time from one to the next. Imagine she also knows your travel preferences — that you like art museums and churches, but not shopping — and she knows your travel history; for example, that you always walk the Left Bank when you are in Paris. Not only that, she can update your itinerary in seconds if you tell her you will not be going to the Left Bank on your upcoming trip.
Researchers at the Federal University of Ceará (UFC) in Brazil and the Institute of Information Science and Technologies (ISTI) of the Italian National Research Council (CNR) in Pisa are prototyping a digital travel agent with those abilities, which they are calling TripBuilder (TB).
The prototype employs a number of computational techniques – including machine learning and optimization — to combine and balance the interests and time budgets of individual users with the "wisdom of crowds" mined from Wikipedia, Flickr, and Google Maps. The creators of TB, in technical papers on the theory and the application, say TB can offer substantially better results offered by simpler existing methods.
Using a multi-step unsupervised learning process (with no labeled training data provided), TB builds a knowledge base for a city or region of interest. Using Wikipedia, it defines points of interest (PoIs) in this city or region, each coded by category (such as museum) and geographical co-ordinates. TB then mines photographs for each PoI in Flicker to learn its relative popularity based on the total number of photos of it on the site, and the average time required to see it, based on the earliest and latest time stamp for each user album of the site. Finally, TB computes the time required to move from site to site based on the calculations of Google Maps.
TB can then suggest to a user various site "trajectories" that fit within the user's time budget. It can do this based on a graded profile of likes and dislikes entered by the user, by deriving a profile of interests from the user's own travel history, or through a combination of these methods. In addition, users can vary a parameter that assigns relative weights to their own interests vs. the public popularity of each site.
At the heart of TB is the TripBuilder Engine, which has two parts. TripCover chooses the set of trajectories (an itinerary) that best fits the tourist's interests while meeting the user's time budget. It does this using an approximation algorithm for solving the Generalized Maximum Cover problem, a class of problems in which one is presented with an (often large) number of sets, some of which have elements in common, and a number k, and one must then select a number of sets not to exceed k such that the maximum number of elements are included. In the TB version of the problem, each element has a weight or cost (the visit and travel times) associated with it, and the limit is not some number of sets k, but the total budget associated with the selection. At the completion of this step, the TrajSP module combines the various trajectories computed by the TB Engine into an overall itinerary, using a variation on the Traveling Salesman Problem.
TB is built from optimization algorithms and code provided by other developers and modified by the TB project team, and from public system tools such as Apache Storm, the open-source distributed real-time stream processing system, and the Apache Hadoop Distributed File System for data storage.
In limited tests so far, the researchers have found combining a user's personal interests with public popularity gives significantly better results than simpler baseline methods that consider just one of those factors. "The big improvement comes from using the trajectories of previous tourists," says Franco Maria Nardini, a researcher at ISTI-CNR. "Earlier methods only allowed you to define solutions by using the [public] information about PoIs."
The researchers say measuring the merit of a TB-recommended itinerary is tricky, but one way to make the system produce better recommendations over time is to ask users to report their satisfaction with trial itineraries and with their actual travel. "It's like Netflix; when you select a movie you can rate it, then that improves its subsequent recommendations," says Igo Brihante, a developer at UFC. He says that kind of learning might also enable the system to make "serendipitous" recommendations of places en route from one major site to another that otherwise would be overlooked by the user.
Jose Antonio Macedo at UFC says he and his colleagues are forming a company to introduce TB as a web service and smartphone app. In the meantime, they will work to expand the knowledge base of cities covered and will refine and standardize the "noisy and weak ontology" that Wikipedia uses to classify sites. "We are using natural language processing approaches to do this," he says.
Gary Anthes is a technology writer and editor based in Arlington, VA.