4

Trying to capture server names from string.

A server name can be

  • letters + digits
  • letters + digits + letters (but not 'root')

Problem is that in circumstances the word 'root' gets added to the end of the string by the data source.

ab-vol-bapp000123-use-dev
ab-vol-bapp000123sql-use-dev

ab-vol-bapp000123root-use-dev
ab-vol-bapp000123sqlroot-use-dev

In the above cases, I need to get either

  • app000123

Or

  • app000123sql

However, struggling to capture the chrs after the digits whilst ignoring/excluding 'root'

This is my best attempt:

(^ab-vol-)   # literal
([a-z]{2,4}) # 2-4 alphas
([0-9]{4,6}) # 4-6 numerics
(
  (?!root)   # ignore 'root'
  [a-z]{0,4} # 0-4 alphas
)?

Obviously my "ignore 'root'" is not doing as described (last test line below fails), and I can see why - I just don't know what the alternative answer is 😭

Appreciate any guidance! Thanks

(Notes :Working in AWS redshift)

enter image description here

2
  • 2
    What's the context for this? What language? Surely it would be better to use your simple regex and then search the matched string for literal "root". Commented Aug 10 at 22:18
  • How about ^(?:ab-vol-)([a-z]{2,4})([0-9]{4,6})(.*?)(?:(?:root)?-use-dev)$?
    – 123
    Commented Aug 10 at 22:32

4 Answers 4

2

What you might do is match as least as possible 0-4 chars and assert that to the right is either the word "root" or a hyphen or the end of the string.

^(ab-vol-)([a-z]{2,4})([0-9]{4,6})([a-z]{0,4}?)(?=root\b|-|$)

The pattern matches

  • ^ Start of string
  • (ab-vol-) Capture the literal text
  • ([a-z]{2,4}) Capture 2-4 chars a-z
  • ([0-9]{4,6}) Capture 4-6 digits
  • ([a-z]{0,4}?) Capture 0-4 times a char a-z, as least as possible
  • (?= Positive lookahead, assert the to the right of the current position is
    • root\b|-|$ Match either the word root or a hyphen or assert the end of the string
  • ) Close the lookahead

See a regex demo.


If you just want to match all chars that are not followed by the word "root", you could match all chars a-z except for r, and then only match r when not directly followed by oot and a word boundary.

 (^ab-vol-)([a-z]{2,4})([0-9]{4,6})([a-qs-z]*(?:r(?!oot\b)[a-qs-z]*)*)

See a regex demo.

1

I added a negative lookahead to match 0-4 letters that are not followed by "root".

(^ab-vol-)([a-z]{2,4})([0-9]{4,6})(?:(?!root)[a-z]){0,4}?

Result:

regex1

Or without the last ?:

(^ab-vol-)([a-z]{2,4})([0-9]{4,6})(?:(?!root)[a-z]){0,4}

Result: regex2

1

You can lazily match letters after digits up to where root possibly occurs before a word boundary:

\b[a-z]+[0-9]+[a-z]*?(?=(?:root)?\b)

Demo: https://regex101.com/r/Evq49E/2

0

Another way is to make use of the common end suffix and write a much simple regex:

^(?:ab-vol-)([a-z]{2,4})([0-9]{4,6})(.*?)(?:(?:root)?-use-dev)$
  • Note that, you may not really want to capture the known suffix and prefix:

  • (?:ab-vol-): is a non-capture group including the prefix.

  • (?:(?:root)?-use-dev): is a non-capture group including the suffix.

Not the answer you're looking for? Browse other questions tagged or ask your own question.