Organizing email by task… article
Software organizes email by task
March 9/16, 2005
By Kimberly Patch, Technology Research News
There’s a lot of structure to a person’s email. Rather than random isolated documents, individual email messages are often portions of a larger activity. Despite the inherent structure, and despite organizational tools such as folders, much of the world’s email remains relatively unorganized.
Researchers from the University College Dublin in Ireland and IBM Research have developed a way to use the inherent structure of related email messages to automatically organize the messages by task.
“We realized that quite a lot of emails are not just random isolated messages, but rather relate in some specific way to earlier messages [by way of] some underlying activity or process” such as travel, meetings, or asset management, said Nicholas Kushmerick, a lecturer in computer science at the University College Dublin in Ireland and visiting scientist at IBM’s Dublin Software Laboratory. “Our email activity management technique enables the email client to automatically recognize the structure of these activities and group messages,” he said.
The method could eventually be used in tools that automatically organize email and allow a user to query the system based on the underlying organization. It could also lead to related tools like task schedulers, according to Kushmerick.
The researchers’ prototype uses a three-phase process, said Kushmerick.
First, the system groups messages according to the activity or task they relate to. For a user who participates in multiple eBay auctions simultaneously, the system would partition eBay messages into messages pertaining to the different auctions — for example, a desk auction, a bed auction, and a dollhouse auction, said Kushmerick.
Second, the system detects occurrences of the process across those activities and re-groups the messages. EBay auction steps include email acknowledgments of bids and notifications of outbids. For example, the eBay messages would be grouped into the ‘thanks for the bid’ messages for the desk, bed and dollhouse, and the ‘you’ve been out bid’ with messages for the desk, bed, and dollhouse, said Kushmerick.
Third, the system organizes activities and steps into a single representation, or process model that stipulates the order in which the process steps occur, said Kushmerick. “The complication is that many real-world processes can contain loops — a single eBay auction might contain many pairs of bid-out bid messages,” he said.
The researchers used text classification, text clustering and automata induction algorithms to carry out the process, said Kushmerick. Text classification algorithms determine some level of meaning for words. Text clustering algorithms group documents into related sets. Automata induction algorithms generate process flow models. Each of these pieces has been developed independently for decades, but no one had previously thought to apply them to this particular problem or integrate them in this manner, he said.
Compared to artificial intelligence approaches to data representation, the three-phase process is shallow, but appropriate, said Kushmerick. “We can get away with shallow techniques because the messages and processes we’re dealing with are generally quite structured,” he said. Every message from eBay, for example, contains a unique identifier such as 9188139a; information like this can be exploited to organize messages.
From the user’s point of view, the system is entirely automated, said Kushmerick. The user “gives the system a set of messages and says ‘please organize them’,” he said. “There is no need for the user to provide background such as ‘I have pending reports for two trips — to Singapore, and Berlin’.”
The researchers’ next step is to begin real-world testing of the technique with large collections of messages. They are also working on completing the user interface. They’re aiming to make the system easy to visualize and easy to correct, said Kushmerick.
The interface will enable users to correct system mistakes and to provide hints to help the system generalize correctly, said Kushmerick. “No matter how carefully we tune our algorithms, fully automated techniques will probably never be 100 percent accurate, so we need to make sure that occasional mistakes do not harm the benefit that accrues from our more sophisticated activity-centric presentation,” he said.
The current prototype is 91 percent accurate at classifying and grouping messages, according to Kushmerick.
The researchers are also working on ways to allow the user and machine to cooperate to discover the appropriate task structure when the computer cannot do it on its own. When the algorithm is stumped, “we don’t want the computer to throw up its hands and say ‘I’ve no idea’,” said Kushmerick. “Instead, the computer should ask a series of pointed questions that will disambiguate the situation, such as… ‘it would appear that eBay occasionally sends bid acknowledgments twice. Is that correct?'”
They are also working on enabling the user to carry out high-level queries that have to do with the underlying activities, said Kushmerick. For example, “Calculate the average amount I spent on each online grocery order last year,” or “Check the travel reimbursement transactions to estimate the total number of days I spent away from home for business purposes in the last six months.”
One drawback of the prototype is that it is overly dependent on computer-generated messages, said Kushmerick. “Extending our techniques from machine-person messages to person-person messages will be very challenging, but we have already started to make some progress,” he said.
The technology could be ready for commercial use in two to three years, said Kushmerick. “A patent application covering our technology has already been filed, and IBM is currently exploring potential avenues to commercialization,” he said.
Further down the line, the researchers’ system could be used to automate other tools like schedulers and email analysis tools. An analysis tool could, for instance, automatically notice when you send a message to someone requesting that they send you a document, add this request to a “pending” list, and automatically mark the request “satisfied” when the document arrives, said Kushmerick.
The ultimate aim is to enable ordinary end-users, as opposed to specialized technical support personnel, to personalize and customize their computing environments, said Kushmerick. “Each of us has a distinctive suite of activities that we engage in, preferences for the way information is presented, notions of what constitutes high-priority, constraints about divulging confidential information, et cetera,” he said.
In the last several years machine learning technologies have improved enough that it is possible to envision practical self-customizing software and high-level tools that allow ordinary users to customize applications, said Kushmerick.
Kushmerick’s research colleague was Tessa Lau. They presented the work at the Intelligent User Interfaces Conference (UIC 2005) held January 9 to 12, 2005 in San Diego. The research was funded by IBM’s T. J. Watson Research Center.