Start the console of your choice and install it
Posted: Sat Jan 25, 2025 6:41 am
The engine tries all possible paths until a match is found or all paths have been tried and failed ("backtracking"). This is problematic because an input of length n has to go through an exponentially large number of paths, resulting in a worst-case running time of O(n)=2^n.
The engine, in turn, tries to switch from nondeterministic automation to deterministic automation. This is problematic because the conversion can take exponentially long depending on the execution path.
So a regex denial of service occurs when either of these two algorithms is applied to a given regular expression. A malicious user can exploit this to trigger either of these two conditions, resulting in the worst runtime complexity of the regular expression engine.
Which types of regular expressions are vulnerable to DoS attacks?
Let's consider an example of a regular expression that is vulnerable to a DoS attack. To analyze the runtime of a command, we use the CLI tool gnomon.
Suppose we have a pattern, /^(\w+\s?)*$/, which contains a group of words with bahrain consumer email list an optional space after each word. The quantifiers ^and $refer to the words at the beginning and end of the line.
Now let's try a group of words without special characters:
node -p "/^(\w+\s?)*$/.test('Nur valide Character')" | gnomon
We see that the words match and that it took 0.0058 seconds to execute this regular expression on the terminal.
Now let's try to form a sentence with a special character at the end of the last word:
node -p "/^(\w+\s?)*$/.test('Invalide Character!')" | gnomon
As expected, false was returned and the regular expression took about 0.0061 seconds to execute.
Perfect, everything works. The problem is that it can take a long time for the regex engine to execute the regular expression for a much longer sentence with special characters.
Let's see this in action. Run the following in a terminal:
node -p "/^(\w+\s?)*$/.test('Ein langer Satz mit invaliden Zeichen, dessen Abgleich so viel Zeit in Anspruch nimmt, dass die CPU-Auslastung moeglicherweise drastisch ansteigt!!!')" | gnomon
No result is expected from this command... If we open our Task Manager, we can see that the process in question is using a very large portion of the CPU to execute this regular expression. In fact, we should see a large increase in the total current CPU usage.
As you can see, an attacker can exploit a seemingly simple regex pattern to make our system consume more resources than expected, so longer inputs can cause our system to hang or crash.
Let’s take a closer look at why this is the case:
The main cause of this problem is a feature available in regex engines called backtracking. The engine first goes through the input and tries \w+\s?to check the contents of the parentheses.
Since the quantifier +is greedy, it tries to find as many valid words as possible and therefore returns:einen langen Satz von ungültigen Zeichen, deren Überprüfung so viel Zeit in Anspruch nimmt, dass die CPU-Belastung drastisch ansteigen kann.
The star quantifier (\w+\s?)*can then be applied, but there are no more valid words in the input, so it returns nothing.
Because of the $quantifier in our pattern, the regex engine tries to find the end of the input. There we have an invalid word, ansteigt!!!, so there is no match.
The machine goes back one step to the previous position and tries another path in the hope of finding a match. +So the quantifier reduces the number of repetitions, goes back one word and tries to match the rest of the input - in this caseEin langen Satz mit ungültigen Zeichen, dessen Abgleich so viel Zeit in Anspruch nimmt, dass die CPU-Belastung drastisch ansteigen kann.
The engine, in turn, tries to switch from nondeterministic automation to deterministic automation. This is problematic because the conversion can take exponentially long depending on the execution path.
So a regex denial of service occurs when either of these two algorithms is applied to a given regular expression. A malicious user can exploit this to trigger either of these two conditions, resulting in the worst runtime complexity of the regular expression engine.
Which types of regular expressions are vulnerable to DoS attacks?
Let's consider an example of a regular expression that is vulnerable to a DoS attack. To analyze the runtime of a command, we use the CLI tool gnomon.
Suppose we have a pattern, /^(\w+\s?)*$/, which contains a group of words with bahrain consumer email list an optional space after each word. The quantifiers ^and $refer to the words at the beginning and end of the line.
Now let's try a group of words without special characters:
node -p "/^(\w+\s?)*$/.test('Nur valide Character')" | gnomon
We see that the words match and that it took 0.0058 seconds to execute this regular expression on the terminal.
Now let's try to form a sentence with a special character at the end of the last word:
node -p "/^(\w+\s?)*$/.test('Invalide Character!')" | gnomon
As expected, false was returned and the regular expression took about 0.0061 seconds to execute.
Perfect, everything works. The problem is that it can take a long time for the regex engine to execute the regular expression for a much longer sentence with special characters.
Let's see this in action. Run the following in a terminal:
node -p "/^(\w+\s?)*$/.test('Ein langer Satz mit invaliden Zeichen, dessen Abgleich so viel Zeit in Anspruch nimmt, dass die CPU-Auslastung moeglicherweise drastisch ansteigt!!!')" | gnomon
No result is expected from this command... If we open our Task Manager, we can see that the process in question is using a very large portion of the CPU to execute this regular expression. In fact, we should see a large increase in the total current CPU usage.
As you can see, an attacker can exploit a seemingly simple regex pattern to make our system consume more resources than expected, so longer inputs can cause our system to hang or crash.
Let’s take a closer look at why this is the case:
The main cause of this problem is a feature available in regex engines called backtracking. The engine first goes through the input and tries \w+\s?to check the contents of the parentheses.
Since the quantifier +is greedy, it tries to find as many valid words as possible and therefore returns:einen langen Satz von ungültigen Zeichen, deren Überprüfung so viel Zeit in Anspruch nimmt, dass die CPU-Belastung drastisch ansteigen kann.
The star quantifier (\w+\s?)*can then be applied, but there are no more valid words in the input, so it returns nothing.
Because of the $quantifier in our pattern, the regex engine tries to find the end of the input. There we have an invalid word, ansteigt!!!, so there is no match.
The machine goes back one step to the previous position and tries another path in the hope of finding a match. +So the quantifier reduces the number of repetitions, goes back one word and tries to match the rest of the input - in this caseEin langen Satz mit ungültigen Zeichen, dessen Abgleich so viel Zeit in Anspruch nimmt, dass die CPU-Belastung drastisch ansteigen kann.