A few weeks ago, the European Parliament voted on the AI Act (523 votes in favor, 46 against, and 49 abstentions). Thierry Berthier, a cybersecurity researcher and leader of the ‘Artificial Intelligence Security’ group at the France IA Hub, agreed to answer our questions and analyze the key principles of this new legislation: a constructive critique that helps to understand the strengths and pitfalls of the text.
The European Scientist: The text of the AI Act voted by the European Parliament aims to “protect fundamental rights, democracy, the rule of law, and environmental sustainability against risks related to artificial intelligence (AI).” What is your analysis? The EU stands out as the first to implement this kind of legislation, can we be satisfied with it?
Thierry Berthier: Indeed, true to its reputation, Europe has maintained its leading position in its ability to produce regulations and standards. This is more of an observation than a criticism. The recent advancements in generative AI are such that they impact and will impact all segments of human activities (economic, industrial, research and development, engineering, medical, etc.). The power and impact of Large Language Models (GPT-4, Llama 2, Midjouney, DALL-E2) on our societies are so significant that it is essential to establish a regulatory framework, at the very least, on data usage and privacy protection. This is accomplished with the overwhelmingly majority vote on the AI Act. It can be predicted that a lighter version of the AI Act will likely emerge across the Atlantic because the primary producers of large generative AI models (and thus the primary stakeholders in their potential misuses) are American. The AI Act contains many case-specific clauses, which is not unusual given the complexity of the domain being regulated.
At the heart of this regulation, one point seems fundamental: the origin of the data used to train (and currently being used to train) LLMs. What are these data? Have they been purchased or subjected to financial transactions? Have they been “retrieved” from freely available public data? Has there been any form of consent from the data producers, which we all are? The answer is obviously not clear at all, as exemplified by the remarkable interview given by Mura Murati, CTO of OPEN AI!
When a journalist asks how the SORA model was trained and what video data was used, panic sets in her eyes, she “grabs for straws,” realizing that this question was predictable and that she did not anticipate the “trap” well. The underlying question was: Did OPEN AI use the vast reservoir of YouTube videos to train SORA? If so, under what commercial agreement? This case illustrates the need for regulation in the capture and use of data used to train LLMs. The AI Act applies throughout the entire LLM production chain until their deployment, with a classification by level of AI risk: AI with unacceptable risk, AI with limited risk, and AI without inherent risk (*).
Prohibited AI specifically includes: social scoring systems and manipulative AI. It is understood that Europe does not wish to replicate the Chinese model, but the definition of manipulative AI remains quite vague. Cynics might argue that AI is by nature manipulative since its outputs will influence the user’s perception and potentially their future actions if they use AI to arbitrate them. Therefore, the concept of manipulative AI seems particularly vague and questionable.
TES: To summarize the three main axes of this law, there are prohibited applications (such as facial recognition), applications deemed high-risk (due to their potential significant harm to health, safety, fundamental rights, environment, democracy, and the rule of law), and transparency requirements (especially regarding copyrights). What do you think of this framework?
TB: The question of copyrights is indeed central. There is a real risk of dilution of the production of human-origin content in a swamp of data whose origin becomes increasingly untraceable and non-human over time. The overall volume of synthesized data produced by generative AIs will quickly surpass the volume of data produced by human users. The overall ratio of Human-Origin Data Volume / Total Data Volume will decrease, and it is on this point that traceability of the origin of data would make perfect sense. There are cryptographic techniques for watermarking or digitally marking an image or video that could provide a starting point for this. Association with a traceability blockchain could guarantee the intellectual property of a digital artwork, for example.
In general, it is difficult to precisely map the risk associated with a platform incorporating machine learning components. Besides the risk, it is also necessary to measure the impact and rely on a dual “Risk-Impact” measurement.
The unilateral ban on facial recognition is questionable if it indirectly impacts our ability to combat crime and terrorism. A total ban would have a very negative impact on the security tools industry. It would jeopardize an entire segment of the “Computer Vision” industry in Europe, leaving this sensitive specialty vulnerable to American and Asian competitors and giants. Therefore, we must exercise moderation in our desire to implement the principle of digital precaution at all levels of the European house.
It is important to remember that what is prohibited by decree or regulation will exist elsewhere, will be developed clandestinely by others, and will be systematically used if this application of AI brings new skills and performances. “Prohibition always reinforces what it prohibits.” We must consider this common-sense proverb.
TES: We know that EU member states have shortcomings in terms of AI, and their digital sovereignty is threatened, despite the existence of a few gems. What do you think of the innovation encouragement aspect of this law? Will it be sufficient?
T.B.: Any encouragement for innovation is welcome. Europe has fallen far behind in generative AI. American giants are far ahead of European tech. It is often forgotten that Google is the inventor of the Transformer architecture underlying LLMs, which was later exploited and capitalized on by OpenAI through the global successes of GPT-3.5, GPT-4, and derivative products. Recent large fundraisings by French teams relied on American funds and agreements with Microsoft. It seems difficult to speak of European digital sovereignty when hundreds of millions of dollars come from American investors. The reality on the ground shows that European investment is not up to the challenges of the AI and robotics revolution. A European startup embarking on an AI project may find initial financing in Europe but will struggle much more to organize subsequent fundraising rounds (B, C, D) with European financial sources. It will necessarily have to look to the United States or Asia to continue its development and enter the big leagues. Like in any human endeavor, individual talent and creativity alone are not enough to ensure success; a bit of luck is also needed to meet the right contacts, persuasion to convince new investors, and agility to enter the right markets. It is easier to align these factors in Silicon Valley than in France or Europe, unfortunately… Furthermore, we sometimes struggle to adopt the right metrics to detect, in advance, the most innovative projects and technologies, and those that are most likely to succeed. These metrics exist. They are based solely on performance measurement and reject any influence of connivance. Connivance in arbitration is the greatest poison. Europe must start by adopting effective metrics that allow us to “see the future” of a startup and a technological project two years and then five years ahead, with an acceptable probability. This proactive approach would enable us to better select the true European gems and avoid industrial accidents that still exist in France in 2024.
To end on a positive note, I suggest the reader keep an eye on the news from the French robotics sector, which boasts several international champions who have received no special assistance to reach the global summit. These are Darwinian champions who relied solely on their excellence to survive in a hostile competitive environment. French roboticists are not lagging behind global leaders; they are among them!
Image by ThankYouFantasyPictures de Pixabay
This post is also available in: FR (FR)