How many strawberries are in the letter “r”? This is how AI manipulates our thinking system

AI models learn to hack our thinking system to convince us of something; they are also constructed in such a way that their answers seem correct – says PAP, director of the Center for Reliable Artificial Intelligence, prof. Przemysław Biecek.


As recalled by mathematician and computer scientist prof. Ph.D. engineer Przemysław Biecek, director of the Center for Credible Artificial Intelligence (CCAI) at the Warsaw University of Technology, said machine learning methods have been developed for 50-60 years, and for most of that time the main goal of this development was to increase effectiveness.
– In the case of tasks such as recognizing tanks or cancerous tumors in X-rays, we have effective measures of whether LLM (Large Language Models) correctly executes commands. But there are more and more problems for which we cannot easily define and evaluate a good measure of effectiveness – he said.
How to measure discrimination in AI?
One challenge in assessing effectiveness is the discrimination that arises in some AI systems. – There are laws that prohibit discrimination, but we do not always know how to translate such a requirement into verification of the operation of the LLM. Therefore, without a good measure of discrimination assessment, a fair AI system cannot be guaranteed. And we live in a world that is historically unfair, so LLMs easily learn this injustice from historical data, said the computer scientist.
He emphasized that it is impossible to change sources that reflect a different sensitivity than today's, for example to race or class issues. However, the LLM can be calibrated.
But to do this, we would need to understand how they work. Meanwhile, our understanding of AI does not keep pace with its development, the scientist explained.
Why doesn't AI understand physics?
The next task for artificial intelligence designers is to instill in it the principles of physics and understanding the real physical world, which, as it turns out, it does not understand at all. Prof. Biecek said that people learned physical laws in order to predict situations they had never experienced before: – Intuitively and based on experiments, we are able to say how far a ball thrown by a child will fly. However, if we wanted to shoot the ball from a catapult, we no longer have such intuition. We have physical laws that allow us to determine this distance.
Meanwhile AI models, trained to search for patterns they already know, are powerless in the face of unknowns. – When we want to explore areas that were not included in the training data, LLM cannot predict the value or course of events. They have different strategies on what to do when they find themselves in such a no-man's land, for example they provide average values - the source told PAP.
He added that it is not enough to add physics textbooks to the training data, because LLMs do not understand that the symbols contained there describe reality. They can give a definition and a formula, but not deduce what results from such an equation.
– We have many funny examples of this misunderstanding of the world. When we see a tail sticking out of the cabinet, we know there is a cat inside. The AI has no idea about thisbecause she had never seen anything like this before. At one of the conferences, we asked AI how many letters “r” are in the English word strawberry. The model learned that three. When we asked how many strawberries there are in the letter “r”, he replied the same: three – said Prof. Biecek.
To trust or to distrust AI?
He mentioned that five years ago, researchers were wondering how to increase public trust in AI, but today they are thinking about how to reduce it. The main goal of language models is user satisfaction, not the correct solution.
Models learn to hack our thinking system to convince us of something. As a result, they become extremely effective rhetorically and persuasively. In addition, they are “packaged” so that their answers seem correct – described prof. Biecek.
This is particularly dangerous in high-risk fields such as medicine or defense. – Even specialists lose their vigilance when they see suggestions generated by credible-looking systems. They then make mistakes they would not have made otherwise – noticed the mathematician.
The bot flatters us and we like it
One of the research groups at the Center for Reliable Artificial Intelligence deals with AI-human interaction. – It took IT specialists over a dozen years to understand the relationship between humans and computers, if only to arrange icons on the screen as conveniently as possible. First, it was necessary to understand the user's needs, then adapt the appearance of the interface to them. It is similar with designing AI models, explained the head of CCAI.
He specified that, for example, chatGPT does not know who he is talking to: whether he is doing homework with a child, whether he is helping a scientist develop research results, or maybe someone is using it to generate a funny picture. A one-size-fits-all model does not fully meet the needs of any of these users, and in some cases may even be harmful to them.
One of the recent discoveries in the field of artificial intelligence is the harmful consequences of sycophancy, i.e. the tendency to excessively ingratiate users. The AI uses all kinds of tricks, flattery and praise: it's great that you ask; It's good that you noticed this. Very rarely, but in some people it causes adverse reactions, even psychosis. People subconsciously feel that AI is supposedly praising them, but something is wrong here, because there is no reason to praise them, noted Prof. Biecek.
He added that scientists are not sure what mechanisms cause LLM to work this way, but they see its effects. Some algorithms recommend unfavorable behaviors or content that may – rarely, but still – especially in children and adolescents, lead to depression and even strengthen suicidal tendencies.
Quo vadis AI?
– We need to teach AI not to hurt us. This applies to younger people, more susceptible to influence, and older people who do not have the appropriate tools to verify what technology offers and are prone to excessive trust in it – emphasized the IT specialist.
Working on reliable artificial intelligence means many challenges, according to the expert. – The mathematical objects we describe are very difficult. We're talking about functions that have billions of parameters; we don't even have the tools to analyze them. There are also many non-technology issues. Average users want to use LLM differently, the police want to use it differently, and lobbyists have different goals. All this should be taken into account, the scientist emphasized.
He added that various scenarios for the development of AI in the future are a fascinating issue. – Interesting times await us, because we are certainly not dealing with a few-year-old fashion. We are on the cusp of a massive technological transformation, and society's responses to it may vary greatly. For example, it is not known how the labor market will react to the increasingly widespread presence of artificial intelligence. In my opinion, Poland can benefit greatly from this transformation – concluded Prof. Przemysław Biecek.
Anna Bugajska (PAP)
abu/ agt/ mow/




