One of the issues that modern machine learning approaches face is their dependence on large and well-annotated datasets. Since the beginning of this decade, we have known that the most efficient and successful way to approach image recognition is to use a neural network architecture called a Convolutional Neural Network (CNN). Several CNNs, starting from the seminal works by a group of Geoffrey Hinton’s students at Toronto University in 2012, have obtained human parity results in scene classification and object recognition tasks, as measured by The Inclusive Images Competition.
It is well known that CNNs are very capable of generalizing from examples, but even these architectures run into trouble when shown images that are completely outside the scope of their training or when familiar objects appear in unfamiliar ways. Many limitations that arise when using trained networks in real contexts depend on the characteristics of the datasets used for training.
The ImageNet challenge, launched by Google last September, is an effort to expand what we may call the “cultural fluency” of image-recognition software. The problem is that the most popular datasets used to train image recognition networks, such as ImageNet and OpenImages, are US- and Western-centric, because those Western images dominated the Internet when the datasets were compiled. As a result, even the best systems trained on these datasets often fail to precisely classify scenes from other cultures and locales.
Take wedding photos: standard image-recognition systems, trained on open-source data sets, will fail to recognize a bride dressed in a sari from an Indian ceremony, even though it can recognize a bride in a white dress, as per the classic Western tradition.
There is, of course, a straightforward method to solve the problem, that is to create more diverse datasets, that represent the diversity of the world, and Google and others are taking this approach. Google has asked the community to reduce the bias in a computer vision system trained on a culturally biased image dataset, just by tweaking the machine-learning algorithms themselves, without changing the dataset.
The results were presented at the Conference on Neural Information Processing Systems (NeurIPS), during the 2018 edition in Montreal, one of the AI capitals of the world. The contest showed that, at least for now, models trained on “western images” do not perform well on “different images”.
As Google have put it in their challenge motivation: “Good solutions will help ensure that even when some data sources aren’t fully inclusive, the models developed with them can be”. We can add that reducing bias without adding new data also helps to ensure that the model acquires some notions about the real phenomenon, beyond pattern detection. This, in turn, can help to generalize for future instances, wherein the context has changed, but the phenomenon has remained the same at its core. Given the results, we can say that we do not have, at the moment, a solution capable of being more “inclusive” when learning from imperfect data.
The story above is an excellent example of the challenges that neural networks-based computer vision, even today, at the peak of its success, is still facing. In fact, the problem of understanding images has not really been solved, if you take the standard meaning of “understanding” seriously.