A conversation with Ciro Greco and Mattia Pavoni, founders of the startup Tooso, on how when it comes to online business you should really, really care about semantics.
Why, if I search for “sleeveless t-shirts” on a search bar, all I get is … t-shirts with sleeves?
To answer this question, we have to get back to the notion of meaning. We ourselves are not really sure of what ‘meaning’ is and the mental process our brain goes through to make sense out of a sentence. It goes without saying that not having a clear idea of how meaning “happens” makes it quite hard to have machines understand us as naturally as we understand each other. This given, there are two main perspectives to the notion of meaning that regulate the functioning of search engines. The first one can be labeled as ‘full text search’: it’s about counting strings of characters in documents and records and ranking the items that have been fetched in some order that is somehow relevant. The second, instead, can be labeled as ‘semantic search’: meaning, here, is considered as a function that comes from putting together different pieces of language and making sense out of them all.
Now, traditional search engines are mostly based on the full-text perspective of meaning. In order to make sense of our inputs, machines translate them based on mathematical models which are not as sophisticated as natural languages are. In simple terms, if you search for “t-shirt without sleeves” on a traditional search engine, this will take every single word individually and look for it in its corresponding e-commerce indexed catalog. Incidentally, the most important word of your search (“without”) is a so-called ‘functional word’ and will be left out: since its meaning is built only in relation to the sentence it is in, it doesn’t fit the statistical model of the engine and it is therefore discarded. Finally, outside of the items tagged with “t-shirt” and “sleeves” it finds, the engine will show you those with the most instances. With the result of having the final outcome being exactly the opposite of what you were looking for. Old fashioned engines are based on this logic. In the best case scenario, they can have some AI components, possibly based on neural networks to learn how to optimize some patterns. The problem is that most of these approaches cannot really think in a symbolic way, so you lose the edge to treat some facts about natural languages in a principled way.
Is semantics the reason why, say, Siri correctly processes complex requests such as “Is there an open pizzeria within my current location,” while most e-commerce websites can’t give relevant results to the “sleeveless t-shirt” query?
Possibly. I don’t know exactly what Siri does behind the curtains. But there’s also another very important factor that can act as a game changer and that is the amount of data one can process. Big Data is powerful. No doubt about that. The world is basically split in two: on the one hand, there are companies, usually tech giants, that have enough data (and probably will have always more) to fuel Big Data AI, Deep Learning application is a great example of this; on the other, there are those who don’t, and probably never will. If it’s backed with Big Data, we can find a way to brute force the optimization of a traditional search engine. The problem comes when businesses don’t have or don’t generate enough data: in this case, traditional search engines are hard to optimize without doing an enormous deal of manual non-scalable work. And the truth is that most companies are in this category. A quick note: there are also many businesses that do have a lot data, but whose search engines are not as good as one might expect. So it’s a very widespread problem.
How can we make search engines have relevant results, for those businesses that don’t have enough data?
Getting back to the beginning, they might want to try a different approach, not based on ‘full-text,’ but on the idea that we should model meaning somewhere. In other words, since for most companies Big Data is really not an immediately viable option, we need to find a way to have search engines understand and mimic the way human beings process meaning. And this is where formal semantics, the discipline studying the instruction set in which the bricks of our language can be put together, comes in handy.
Let’s get back to the “sleeveless t-shirt” example and see why and how humans make sense out of it: to put it simply, our mind has some sort of “internal map of the world” from which we retrieve the meaning for “t-shirt” and “sleeves” and use a grammar that tells us that the word “without” or the suffix “-less” switches the polarity of the words that follow: so sleeveless means something like ‘not-sleeves. This process is impossible for a traditional engine: we can make it understand that six words are different than five, but if it doesn’t process the sentence as we do, making sense out of the combination of words and not operating on each word individually, it will never understand us.
How can you make a search engine understand this, and other complex queries, then?
You tap into formal and computational semantics. The engine we have built at Tooso is rooted in this concept and it basically is a model of formal semantics that’s joined forces with Machine Learning. What we do is take a piece of semi-structured data, like the product catalog of a retailer or a brand, and turn it into an ontology, a representation of a set of concepts within a domain with all the relationships between those concepts. Then, on top of this ontology, we use a Natural Language Processing (NLP) engine, so that the end-user can make a query and expect the engine to grasp part of the meaning. At that point we can apply more traditional Machine learning techniques like neural networks: we use them to personalize search results and improve the customer experience, but in our case, they have little to do with figuring out the meaning of words.
What will be the next big AI leap?
AI is amazing, but de facto, as it works today, it is limited, because most people think AI is about modeling predictions. The reason behind this is that the biggest commercialization of AI began with the Deep Learning revolution. Don’t get me wrong: that was phenomenal. AI became so much more efficient at so many tasks. As a matter of fact, that’s what still works best at the moment, especially when it comes to prediction problems. But in AI there’s much more than that: its potential when it comes to concept representations and modelization is still widely unexplored, and actually quite neglected. This is where AI’s next big leap can be. And I personally follow very closely what is happening around Cambridge and Boston right now.
The Harvard Business Review recently stated that the future is going to be about less data, rather than more. Do you agree with that?
When it comes to Big Data, there are two trends. To a certain extent, of course, things won’t change: the more data the better and tech giants have an unfair advantage there. But not all data can be collected from the world, like in B2C (business to consumer) scenarios. I see great opportunities in the B2B (business to business) space. Let’s say that an enterprise company wants to automate some of its internal processes: for instance, we want to automate the process with which an insurance company reimburses its clients. Or we want to automate the helpdesk of a company that has thousands of employees.
The company data that we have access to, to solve this kind of problem, might not be enough, both in terms of quality or quantity, to apply techniques like Deep Learning for example. And external third-party data wouldn’t help. In these cases, the added value can be provided by learning algorithms that can make the best out of a smaller pool of data points. In this sense, yes: the future, will (also) be about less data. To put it another way, data can be knowledge—we build exponentially on that—or like rubbish—we just keep on accumulating more and more. The question is: how many and what kind of levels of abstraction do we need to make use out of them? There are bottom-up techniques and top-down levels. At Tooso we strongly believe that we should do our best to get the best of both worlds.