The human side of robots

Scientists, programmers and researchers have been working hard to develop machines which possess distinctively human features, bringing us deeper into the “uncanny valley”.


AI 23 November 2017

Robots have always been perceived as artificial beings and have faced many limitations in the real world. They have had inexpressive faces, moved too rigidly, and were more alien than human in nature. But then between 2015 and 2016 something happened, a slew of gynoids were presented to the world: Erica, Jia Jia and Audrey Hepburn lookalike Sophia. These robots forced us to redefine what we knew about humanoids. All were human-like AIs, programmed to interact in real time with real people, and capable of sustaining conversations and changing facial expressions according to the tone of the talk. They can all be improved, and undoubtedly will be.

But the most critical, revolutionary and unprecedented goal that science is pursuing is another: to create robots which not only can talk, move and behave like human beings but that also have feelings and emotions. Science is trying to diminish the the main dichotomy between humans and machines: that of emotion – and it’s devoting lots of energy and resources to do it.

Affective computing, also known as Emotion AI, it is a multidisciplinary field which combines engineer and computer science with psychology, neurobiology, cognitive science, neuroscience, sociology, psychophysiology and many others. Professor Picard, founder and director of the Affective Computing Research Group at MIT Media Lab, describes it as: “Computing that relates to, arises from or deliberately influences emotions or other affective phenomena”.

Already by the mid-nineties, programmers and researchers were trying to teach the robots how to recognize human emotions and to interact with people accordingly. Professor Cynthia Breazel and her team enacted experiments with Kismet, a kind of robotic head with moveable eyes, eyelids and lips. Breazel, who is the director of the MIT Medial Laboratory and is considered a pioneer of social robotics, designed a robot who could interact with people and express a limited range of feelings, like sadness, fear, happiness and the need for attention. An erratic or close proximity movement could scare the little electronic head, just like a smile could make it smile too.

In the same university, two computer scientists, Irfan Essa and Alex Pentland, developed computer vision systems that used measurements taken from video for detailed facial modelling and the tracking of facial expressions: it worked well, with each feeling resulting in a facial expression which required a unique combination of muscle movement. The program was able to test if a volunteer cheated by simulating an emotion 98 times out of 100.

Of course, to recognize a facial expression or to mimic one doesn’t necessarily equate to sentience. Another robot, Pepper, was unveiled in 2015 – this small bot was designed and manufactured by the Aldebaran Robotics SA, a French company which, 3 years earlier, had been bought by the Japanese SoftBank. The 4 foot tall Pepper was universally hailed as the first robot to have emotions. It had cameras and sensors that helped it to read body language and understand an individual’s mood. The source of its emotional states were displayed on a tablet-sized screen on its chest.

Human emotions are, in and of themselves, a mystery. Emotions are generated by external as well as internal stimuli and are then revealed by physical signs. Respiratory rate, pulse, gesture and facial expressions change along with our mood.

Another revealing sign, however, is our voice and how we use it. Last year, a group of researchers at the Hong Kong University of Science and Technology announced “the first known system that can recognize a dozen human emotions from tone of speech and in real time”. The “real time” element eliminates the delay in processing caused by the “feature engineering”. As explained by professor Pascale Fung, “emotions in speech are represented by not just the pitch, but the chroma, the tempo, and the speed of that voice. Machine learning needs to first perform feature engineering to extract these characteristics. For tone of voice, feature engineering typically extracts 1000-2500 characteristics from the input audio. This process requires time and caused a delay in the communication between the human and the robot. Now however, with advances in understanding deep learning methods such as the Convolutional Neural Networks, researchers have solved this problem.

Last month, Google announced a new labeled data set of human actions taking place in videos. Their Atomic Visual Actions (or AVA for short) aim to understand “what humans are doing, what might they do next, and what they are trying to achieve”. AVA will teach machines how humans move and operate in the world, with each three second clip (of which there are 57,000) being bundled with a file that outlines the person that the machine learning should be watching, accompanied by a description of their pose and whether or not they are interacting with anyone or anything else.

Once the secrets of our facial expressions and voice’s features are known, robots can be programmed to replicate them. The result is a technology which is not at all human but has the ability of detecting our feelings and responding appropriately. This is particularly valuable for home/hospital care robots – the robot Pepper for example was also engineered to work with patients affected by dementia.

Affective computing is a fast-growing sector. According to a report published last March by the research and consultancy firm MarketsandMarkets, affective computing’s global market is expected to skyrocket in the next few years. In 2016, its value was believe to be around 12.2 billion dollars, by 2021 it will reach some 53.98 billion dollars.

Last February Global IoT solutions provider Hubble presented Hugo, “the first truly intelligent smart camera with personality, featuring Emotion AI video analytics”, powered by the Boston based company. This device can identify family members and ascertain their emotional states and could become a staple household item in the near future.

Affectiva, an emotion recognition software and analysis company, is in a particularly favorable position. This start up specializes in emotion recognition software and has now built a huge database of emotions, possibly the biggest in the world. It comprises of 24,000 ads and over 5 million faces (recorded and analyzed) across 75 countries. Being built on the analysis of so many samples, the emotion-catching software can be incredibly precise and unbiased.

The technology Affectiva masters so well, however, has a potential which goes well beyond the home security systems market or that of the SDKs and APIs sold to developers. Affectiva can effectively detect emotions “leading market research firms like Millward Brown, LRW, Added Value and Unruly, and 1/3 of the Fortune Global 100, including 1,400 brands like Mars, Kellogg’s and CBS, use Affectiva’s Affdex for Market research to optimize content and media spend”. These areas will benefit greatly from these technological developments.

We are still away from having human humanoids, and it is still possible that it may never truly happen. But affective computing is making big gains in driving us into this new realm of technological possibility. Along the way, this AI is going to uplift the capabilities of sectors such as advertising and market research, and, as it improves, more sectors will begin capitalise on this technology in ways we cannot yet imagine. Most importantly however, this technology has the potential to bridge the gap between technology and nature, whether not we should burden these entities with emotions is yet to be seen, but right now developments are beginning to put a bigger emphasis on the human part of humanoid than ever before.