• python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Facebook Twitter Instagram
Devs Fixed
  • python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Devs Fixed
Home ยป Resolved: How to extract substring between parentheses that start with a digit, and has multiple sets of parentheses

Resolved: How to extract substring between parentheses that start with a digit, and has multiple sets of parentheses

0
By Isaac Tonny on 16/06/2022 Issue
Share
Facebook Twitter LinkedIn

Question:

My goal is to extract the substring between a set of parentheses, but only if it starts with a digit. Several of the strings will have multiple sets of parentheses but only one will contain a string that starts with a digit.
Currently, it is extracting everything between the first parenth and the last one, rather than it seeing 2 seprate sets of them.
As far as only using the parentheses with a substring that starts with a digit, I am lost as to how to even approach this.
Any help is appreciated.
import pandas as pd

cols = ['a', 'b']
data = [
    ['xyz - (4 inch), (four inch)', 'abc'],
    ['def', 'ghi'],
    ['xyz - ( 5.5 inch), (five inch)', 'abc'],
]
df = pd.DataFrame(data=data, columns=cols)
df['c'] = df['a'].str.extract("\((.*)\)") 
Desired output:
                                a    b       c
0     xyz - (4 inch), (four inch)  abc  4 inch
1                             def  ghi     NaN
2  xyz - ( 5.5 inch), (five inch)  abc     NaN
current output:
                                a    b                       c
0     xyz - (4 inch), (four inch)  abc     4 inch), (four inch
1                             def  ghi                     NaN
2  xyz - ( 5.5 inch), (five inch)  abc   5.5 inch), (five inch

Answer:

The following pattern should do the job: \((\d[^.)]+)\)
What it does is
  • Matches the character ‘(‘
  • Start capturing numbers and everything that doesn’t contain ‘)’ or ‘.’.
  • End capturing.
  • Matches the character ‘)’

You can see a detailed explanation on regex101
Final code:
import pandas as pd

cols = ['a', 'b']
data = [
    ['xyz - (4 inch), (four inch)', 'abc'],
    ['def', 'ghi'],
    ['xyz - ( 5.5 inch), (five inch)', 'abc'],
]
df = pd.DataFrame(data=data, columns=cols)
df['c'] = df['a'].str.extract("\((\d[^.)]+)\)") 

print(df)
Output generated:
a    b       c
0     xyz - (4 inch), (four inch)  abc  4 inch
1                             def  ghi     NaN
2  xyz - ( 5.5 inch), (five inch)  abc     NaN

If you have better answer, please add a comment about this, thank you!

python regex
Share. Facebook Twitter LinkedIn

Related Posts

Resolved: PostgreSQL resample 1 minute OHLCV candle data into 5 minute OHLCV candle data

27/03/2023

Resolved: How do I navigate a table without any easily accessible distinctions?

27/03/2023

Resolved: Can a pod make itself unavailable temporarily in kubernetes?

27/03/2023

Leave A Reply

© 2023 DEVSFIX.COM

Type above and press Enter to search. Press Esc to cancel.