I remember exactly when I knew I didn't want to be a pure mathematician. It was the Spring of my sophomore year and I was knee-deep in a math major at Stanford. It was 2003 and everyone was a computer science (CS) major, either prepping to be shipped off to Google or doing their own startups. So of course, being myself, I took not being a CS major as a badge of honor; I wasn't part of the code monkey zombie bandwagon. I had taken a programming course, but felt like my ratio of fighting the languages (C/C++) to actually expressing ideas made CS nowhere near as stimulating as math. As a tangent, I do think that impression would've been pretty different if my first exposure had been with a functional programming language (Scheme or Clojure for instance). That said, I was even more impatient at 19, so that might not be true.
What made me want to go into CS was actually a really good math department seminar. It was about Algebraic Topology and for the first forty minutes or so I was riveted by the standard math cycle: Concepts were defined, abstractions erected, machinery churned into theorems. Then towards the end, the speaker started talking about applications to digit and handwriting recognition. At the time, I didn't know anything about machine learning (ML), but I left skeptical that if you really wanted to tackle a problem like digit recognition that you would end up doing anything with Betti numbers or Algebraic topology at all. This suspicion was confirmed that night during a Googling session on state-of-the-art techniques in digit recognition: if you started from the perspective of actually wanting to solve the problem, there were better, simpler, and more direct ways to do so.
This was my first brush with machine learning research and there was something specific about it that appealed to me. The math geek in me liked the technical machinery and how you adapt the same ideas to fit different setups. But what I really found fascinating was the process of looking at data and thinking about how to express intuitions about a problem into actual code. I didn't appreciate this at the time, but that process of taking qualitative ideas and struggling to represent them computationally is the core of artificial intelligence (AI).
The little about machine learning that I learned that night caused me to do a pretty big about face. Luckily, I was already at Stanford which had a fantastic suite of AI related courses (machine learning, graphical models, etc.), which I proceeded to take. I then went to UC Berkeley to do a Ph.D. in CS, specializing in statistical natural language processing (NLP) and machine learning. I learned more from my awesome Ph.D. advisor, Dan Klein, than anyone else academically or professionally. Under his guidance and mentorship, I became a solid NLP researcher, winning multiple best-paper awards for our work. By the end of my time in grad school, I got a tenure-track faculty job offer at UMass Amherst and planned to do a post-doc at MIT before becoming a professor at UMass. I was on the path to having a pretty promising academic career, if I do say so myself.
At some point while at MIT, I decided to leave and do a startup because I felt my work as an academic wasn't going to have the impact I wanted it to have. I went into academic CS in order to design NLP models which would become the basis of mainstream consumer products. I left because that path from research to product rarely works, and when it does it's because a company is built with research at its core (think Google). This wasn't a sudden realization, but one I had stewed on after observing academia and industry for years.
During grad school, I did a lot of consulting for 'data startups' (before 'big data' was a thing) and consistently ran into the same story: smart founders, usually not technical, have some idea that involves NLP or ML and they come to me to just 'hammer out a model' for them as a contractor. I would spend a few hours trying to get concrete about the problem they want to solve and then explain why the NLP they want is incredibly hard and charitably years away from being feasible; even then they'd need a team of good NLP people to make it happen, not me explaining ML to their engineers on the board a few hours a week. Useable fine-grained sentiment analysis is not going to be solved as a side project.
Often the founders of these companies were indeed finding real pain points, but their view of ML was that it was some kind of 'magic sauce' they could sprinkle on an idea to make a product. None of the thinking was constrained by what was feasible or likely to even work. They also couldn't recognize the data problems they had and which ones they could solve, because they weren't used to viewing the world in that way. Machine learning, if it's a key part of a product, can't just be grown and attached to a company's core unless it's there from the start and baked into the foundation.
On the academia side, I had become increasingly frustrated by the kinds of problems being worked on in my field statistical natural language processing. Like any academic community, the work within NLP had become largely an internal dialogue about approaches to problems the community had itself reified into importance. Take for example syntactic parsing of natural language (essentially automatically diagramming a sentence). It's a problem with a whole history rooted in linguistics that goes back the better part of a century. The motivation for NLP work in the area has been that at some point if we want to really have semantic understanding of sentences, we need to nail syntax first. Of course, there have been more concrete uses of syntax parses in machine translation and other areas, but the problem has had the status it has because of this historical roadmap. It's a completely valid problem, but it's frustratingly removed from other more direct problems out there now: How can I summarize all the news or email I get into something digestible? How can I map customer questions directly to database queries? How can I find interesting discourse about a topic?
People in NLP do good work in these areas and in general on end-user problems, but by and large the community is oriented around developing tools to induce traditional linguistic structure which in principle will facilitate downstream applications. Obviously, if we solved syntactic parsing, doing these and other real-world tasks might be easier, but on the other hand, if we worked more directly on these kind of problems, we might find that linguistic analysis isn't as essential as we thought or we might have a better idea of the higher-order linguistic abstractions which are actually worth inducing because they are roadblocks for specific applications.
My response to this concern was to focus my own research on setting up and tackling problems that I thought were closer to these kind of real-world applications. It was while I was at MIT working on one such project, that I began to doubt even this strategy. I was working on extracting snippets from social media reviews covering various aspects of a restaurant; for instance, in the domain of restaurant reviews, "I loved the chicken parm, but the waiter was incredibly snooty," you would register "I loved the chicken parm" as a positive sentiment regarding the food, and "The waiter was incredibly snooty" as a negative sentiment regarding service, while ignoring the rest of the review which might not have any concrete or useable snippets.
Something like this would indeed be useful for a number of applications, but merely as an add-on to something like Yelp. That would in fact be useful, but the NLP wouldn't be at the core of the product. In fact, when I thought about most of the uses of NLP research I had seen in products, most were peripheral to the core experience: increase ad clicks by 2%, increase session lengths by a minute or two, increase 'likes' by 1%. I left because the only way NLP was going to be a core piece of the product is if someone like me was part of the formation. So I moved back from MIT to the bay area and co-founded Prismatic with Bradford Cross.
Nearly two years later, after a lot of learning about industry and making real products, I can confidently say that I'm happy I left academia. Prismatic is a pretty tight realization of how I would've wanted NLP and ML to work in a startup and manifest in product. The relationship is symbiotic: the machine learning and technology is informing possibilities for the product, and conversely product needs are yielding interesting research. Various pieces of the machine learning (like the topics in a topic model) are first-class product elements. Many of the more ambitious NLP ideas I thought about during grad school will become first-class aspects of the product over the next few years.
Getting here wasn't easy.
My co-founder and spent six unpaid months figuring out what would make a high-engagement experience and why other 'smart news' entrants never really stuck. Once we got seed funding, we didn't just rush to a fast MVP (mimimum viable product), we took the better part of a year tackling a tough research problem in a startup and thoughtfully converging on a high-engagement product through lots of trial-and-error. We also couldn't have asked for a better early team than our first two hires, who I knew from my time in Ph.D.-land: Jason Wolfe (from Berkeley) and Jenny Finkel (Stanford). All in all, I think we've carved out an interesting niche of strong computer science and artificial intelligence tightly focused on making smart useable products a reality. What I like most about our approach is that we're always motivated by direct and real problems and the solutions are free to delve into deep abstractions and the technical trenches.
Aria Haghighi is co-founder of Prismatic. He is a multi award-winning Statistical Natural Language Processing researcher and a former Microsoft Research Fellow. He holds a Ph.D. in Computer Science from UC Berkeley and a BS in Mathematics with distinction from Stanford.