Hi!, I am trying to use the REGEX formula, but I cannot use it propertly. I need to extract names from a string. To make it easier the names are written after a hashtag #. This are examples
#TOM
write the essay of a client
Complete the packing list for
#Peter
From an
old post
I managed to use this formula REGEX_EXTRACT({Transaction Name}, “#[^\s]+”))
written by
Justin Barret
This formula kind of works, but is giving me the answer with the hasthag in front
#Peter
I needed it without the hashtag. Any clue about how to do it?
I am getting lots of trouble to understand the sintaxis of the REGEX formulas.
Thanks!!
Hey
@Tristan-ARG
!
The regex behavior in Airtable’s formulas is disappointing, and if you’re trying to write more in-depth regex patterns, you will quickly run up against issues.
If you want to skip straight to the fixed formula and pattern, here’s a shortcut lol.
If you want a bit more of a walkthrough on how your pattern is behaving, there’s a bit below kinda explaining it.
How Your Regex Pattern Is Behaving
The first thing to note, is that your regex formula is actually working as intended.
So,
#[^\s]+
is your pattern.
Let’s break it down.
Let’s say we have this string that we want to extract from:
Hello! I want to circulate a hashtag such as
#Tom
!
:star: This will literally match the actual character #.
This is why your extract is pulling in the hashtag.
What We’ve Extracted So Far:
“#”
[^\\s]
:star: This will match any character
only if it is not
a whitespace character.
A quick side note here. Airtable’s regex syntax requires two
\\
.
When you have something contained in square brackets, it will generally mean to look for a collection of characters that you have inside of the brackets.
e.g. [a-z] will look for lowercase characters in the range of the entire English alphabet.
Similarly, [bsw] will look for any character as long as it is either b, s, or w.
Now, in regex, the carrot (
^
) denotes when you don’t want something.
e.g. [^atn] will match any character that is anything
except
for a, t, and n.
This will take the token you put it next to, and repeat its behavior.
(More context below).
In this case, it will repeat the behavior of the
[^\\s]
.
This is called a quantifier. There are a few types, but this one will only return a match if it returns
one or more
times. This is distinct from
?
which indicates
zero or one
, and
*
which indicates
zero or more
times.
e.g. The pattern
\w?
will look for a single letter, assuming it even exists.
The pattern
\w*
will look for a single letter, and if it finds one, it will match everything next to it until it runs into something that is not a letter. (We call this a greedy quantifier, as it will match as many things as it can consecutively).
Final Solution
You’ll want to transform the string that the formula returns to you to remove the hashtag.
This formula should do it:
{Transaction Name},
SUBSTITUTE(
REGEX_EXTRACT(
{Transaction Name},
“#[^\s]+”
This formula is built with this control flow in mind:
If the
{Transaction Name}
field has a value:
Return the regex extract from the field,
then
…
Take the “#” character, and remove it from the extracted string.
If the
{Transaction Name}
field has no value:
Do nothing. (Will return blank)
Hey
@Tristan-ARG
!
The regex behavior in Airtable’s formulas is disappointing, and if you’re trying to write more in-depth regex patterns, you will quickly run up against issues.
If you want to skip straight to the fixed formula and pattern, here’s a shortcut lol.
If you want a bit more of a walkthrough on how your pattern is behaving, there’s a bit below kinda explaining it.
How Your Regex Pattern Is Behaving
The first thing to note, is that your regex formula is actually working as intended.
So,
#[^\s]+
is your pattern.
Let’s break it down.
Let’s say we have this string that we want to extract from:
Hello! I want to circulate a hashtag such as
#Tom
!
:star: This will literally match the actual character #.
This is why your extract is pulling in the hashtag.
What We’ve Extracted So Far:
“#”
[^\\s]
:star: This will match any character
only if it is not
a whitespace character.
A quick side note here. Airtable’s regex syntax requires two
\\
.
When you have something contained in square brackets, it will generally mean to look for a collection of characters that you have inside of the brackets.
e.g. [a-z] will look for lowercase characters in the range of the entire English alphabet.
Similarly, [bsw] will look for any character as long as it is either b, s, or w.
Now, in regex, the carrot (
^
) denotes when you don’t want something.
e.g. [^atn] will match any character that is anything
except
for a, t, and n.
This will take the token you put it next to, and repeat its behavior.
(More context below).
In this case, it will repeat the behavior of the
[^\\s]
.
This is called a quantifier. There are a few types, but this one will only return a match if it returns
one or more
times. This is distinct from
?
which indicates
zero or one
, and
*
which indicates
zero or more
times.
e.g. The pattern
\w?
will look for a single letter, assuming it even exists.
The pattern
\w*
will look for a single letter, and if it finds one, it will match everything next to it until it runs into something that is not a letter. (We call this a greedy quantifier, as it will match as many things as it can consecutively).
Final Solution
You’ll want to transform the string that the formula returns to you to remove the hashtag.
This formula should do it:
{Transaction Name},
SUBSTITUTE(
REGEX_EXTRACT(
{Transaction Name},
“#[^\s]+”
This formula is built with this control flow in mind:
If the
{Transaction Name}
field has a value:
Return the regex extract from the field,
then
…
Take the “#” character, and remove it from the extracted string.
If the
{Transaction Name}
field has no value:
Do nothing. (Will return blank)