Regexp : any chr but ignore word

Question

Trying to capture server names from string.

A server name can be

letters + digits
letters + digits + letters (but not 'root')

Problem is that in circumstances the word 'root' gets added to the end of the string by the data source.

ab-vol-bapp000123-use-dev
ab-vol-bapp000123sql-use-dev

ab-vol-bapp000123root-use-dev
ab-vol-bapp000123sqlroot-use-dev

In the above cases, I need to get either

app000123

Or

app000123sql

However, struggling to capture the chrs after the digits whilst ignoring/excluding 'root'

This is my best attempt:

(^ab-vol-)   # literal
([a-z]{2,4}) # 2-4 alphas
([0-9]{4,6}) # 4-6 numerics
(
  (?!root)   # ignore 'root'
  [a-z]{0,4} # 0-4 alphas
)?

Obviously my "ignore 'root'" is not doing as described (last test line below fails), and I can see why - I just don't know what the alternative answer is 😭

Appreciate any guidance! Thanks

(Notes :Working in AWS redshift)

What's the context for this? What language? Surely it would be better to use your simple regex and then search the matched string for literal "root". — Tim Roberts, Commented Aug 10 at 22:18
How about ^(?:ab-vol-)([a-z]{2,4})([0-9]{4,6})(.*?)(?:(?:root)?-use-dev)$? — 123, Commented Aug 10 at 22:32

The fourth bird · Accepted Answer · 2024-08-11 11:19:42Z

What you might do is match as least as possible 0-4 chars and assert that to the right is either the word "root" or a hyphen or the end of the string.

^(ab-vol-)([a-z]{2,4})([0-9]{4,6})([a-z]{0,4}?)(?=root\b|-|$)

The pattern matches

^ Start of string
(ab-vol-) Capture the literal text
([a-z]{2,4}) Capture 2-4 chars a-z
([0-9]{4,6}) Capture 4-6 digits
([a-z]{0,4}?) Capture 0-4 times a char a-z, as least as possible
(?= Positive lookahead, assert the to the right of the current position is
- root\b|-|$ Match either the word root or a hyphen or assert the end of the string
) Close the lookahead

See a regex demo.

If you just want to match all chars that are not followed by the word "root", you could match all chars a-z except for r, and then only match r when not directly followed by oot and a word boundary.

 (^ab-vol-)([a-z]{2,4})([0-9]{4,6})([a-qs-z]*(?:r(?!oot\b)[a-qs-z]*)*)

See a regex demo.

tbold · Accepted Answer · 2024-08-10 22:27:40Z

1

I added a negative lookahead to match 0-4 letters that are not followed by "root".

(^ab-vol-)([a-z]{2,4})([0-9]{4,6})(?:(?!root)[a-z]){0,4}?

Result:

Or without the last ?:

(^ab-vol-)([a-z]{2,4})([0-9]{4,6})(?:(?!root)[a-z]){0,4}

Result:

answered Aug 10 at 22:27

tbold

5261 silver badge8 bronze badges

Add a comment |

blhsing · Accepted Answer · 2024-08-11 00:40:34Z

1

You can lazily match letters after digits up to where root possibly occurs before a word boundary:

\b[a-z]+[0-9]+[a-z]*?(?=(?:root)?\b)

Demo: https://regex101.com/r/Evq49E/2

answered Aug 11 at 0:40

blhsing

105k9 gold badges83 silver badges129 bronze badges

Add a comment |

123 · Accepted Answer · 2024-08-10 22:31:35Z

0

Another way is to make use of the common end suffix and write a much simple regex:

^(?:ab-vol-)([a-z]{2,4})([0-9]{4,6})(.*?)(?:(?:root)?-use-dev)$

Note that, you may not really want to capture the known suffix and prefix:
(?:ab-vol-): is a non-capture group including the prefix.
(?:(?:root)?-use-dev): is a non-capture group including the suffix.

edited Aug 10 at 22:31

answered Aug 10 at 22:26

123

1862 gold badges2 silver badges10 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Regexp : any chr but ignore word

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
regex
amazon-redshift
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged regexamazon-redshift or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
regex
amazon-redshift
or ask your own question.