split multiple columns in pandas dataframe by delimiter
·
Answer a question
I have survey data which annoying has returned multiple choice questions in the following way. It's in an excel sheet There is about 60 columns with responses from single to multiple that are split by /. This is what I have so far, is there any way to do this quicker without having to do this for each individual column
data = {'q1': ['one', 'two', 'three'],
'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
'q3' : ['a/b/c', 'd/e/f','g/h/i']}
df = pd.DataFrame(data)
df[['q2a', 'q2b', 'q2c']]= df['q2'].str.split('/', expand = True, n=0)
df[['q3a', 'q3b', 'q3c']]= df['q2'].str.split('/', expand = True, n=0)
clean_df = df.drop(df[['q2', 'q3']], axis=1)
Answers
We can use list comprehension with add_prefix, then we use pd.concat to concatenate everything to your final df:
splits = [df[col].str.split(pat='/', expand=True).add_prefix(col) for col in df.columns]
clean_df = pd.concat(splits, axis=1)
q10 q20 q21 q22 q30 q31 q32
0 one one two three a b c
1 two a b c d e f
2 three d e f g h i
If you actually want your column names to be suffixed by a letter, you can do the following with string.ascii_lowercase:
from string import ascii_lowercase
dfs = []
for col in df.columns:
d = df[col].str.split('/', expand=True)
c = d.shape[1]
d.columns = [col + l for l in ascii_lowercase[:c]]
dfs.append(d)
clean_df = pd.concat(dfs, axis=1)
q1a q2a q2b q2c q3a q3b q3c
0 one one two three a b c
1 two a b c d e f
2 three d e f g h i
更多推荐

所有评论(0)