How do I unnest a column in a pandas DataFrame?

up vote
11
down vote

favorite

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

edited Nov 9 at 16:21

Boann

36.5k1287120

asked Nov 9 at 2:19

W-B

94.8k72860

This question has an open bounty worth +100
reputation from W-B ending in 6 days.

This question has not received enough attention.

This question need more attention, after posted it I still see so many question related to the same issue

2

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 at 12:00

add a comment |

up vote
11
down vote

favorite

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

edited Nov 9 at 16:21

Boann

36.5k1287120

asked Nov 9 at 2:19

W-B

94.8k72860

This question has an open bounty worth +100
reputation from W-B ending in 6 days.

This question has not received enough attention.

This question need more attention, after posted it I still see so many question related to the same issue

2

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 at 12:00

add a comment |

up vote
11
down vote

favorite

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

edited Nov 9 at 16:21

Boann

36.5k1287120

asked Nov 9 at 2:19

W-B

94.8k72860

I have the following DataFrame where one of the columns is an object (list type cell):

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})

df

Out[458]: 

   A       B

0  1  [1, 2]

1  2  [1, 2]

My expected output is:

What should I do to achieve this?

python pandas dataframe

edited Nov 9 at 16:21

Boann

36.5k1287120

asked Nov 9 at 2:19

W-B

94.8k72860

edited Nov 9 at 16:21

Boann

36.5k1287120

asked Nov 9 at 2:19

W-B

94.8k72860

edited Nov 9 at 16:21

Boann

36.5k1287120

edited Nov 9 at 16:21

Boann

36.5k1287120

edited Nov 9 at 16:21

Boann

36.5k1287120

asked Nov 9 at 2:19

W-B

94.8k72860

asked Nov 9 at 2:19

W-B

94.8k72860

asked Nov 9 at 2:19

W-B

94.8k72860

This question has an open bounty worth +100
reputation from W-B ending in 6 days.

This question has not received enough attention.

This question need more attention, after posted it I still see so many question related to the same issue

This question has an open bounty worth +100
reputation from W-B ending in 6 days.

This question has not received enough attention.

This question need more attention, after posted it I still see so many question related to the same issue

2

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 at 12:00

add a comment |

2

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 at 12:00

Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 at 12:00

add a comment |

4 Answers
4

active

oldest

votes

up vote
11
down vote

accepted

As an user with both R and python and spent one year in this site, I have seen this type of question couple times.

Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.

I know object columns type always make the data hard to convert by pandas' function. When I received the data like this , the first thing come into my mind is to 'flatten' or unnesting the columns .

Method 1
apply + pd.Series (easy to understand but in term of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly same as before , adding reindex at the end

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contain unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance :

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case have two columns type object

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

I am using pandas and python function for this type of question , if you worry about the speed of above solutions I provided , you can check user3483203's answer , since he is using numpy and most of the time numpy is faster . Just a suggestion if the speed is do matter for your case , I will recommend Cpython and numba

edited 3 hours ago

answered Nov 9 at 2:20

W-B

94.8k72860

3

Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
– coldspeed
Nov 9 at 3:32

add a comment |

up vote
4
down vote

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 at 4:15

answered Nov 9 at 2:35

user3483203

29.2k72351

add a comment |

up vote
2
down vote

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered 6 hours ago

Daniel Mesejo

9,0331923

Nice one :-) I like those numpy solution
– W-B
6 hours ago

add a comment |

up vote
1
down vote

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 at 2:40

U9-Forward

10.2k2834

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53218931%2fhow-do-i-unnest-a-column-in-a-pandas-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
11
down vote

accepted

As an user with both R and python and spent one year in this site, I have seen this type of question couple times.

Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in term of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly same as before , adding reindex at the end

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contain unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance :

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case have two columns type object

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited 3 hours ago

answered Nov 9 at 2:20

W-B

94.8k72860

3

Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
– coldspeed
Nov 9 at 3:32

add a comment |

up vote
11
down vote

accepted

As an user with both R and python and spent one year in this site, I have seen this type of question couple times.

Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in term of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly same as before , adding reindex at the end

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contain unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance :

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case have two columns type object

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited 3 hours ago

answered Nov 9 at 2:20

W-B

94.8k72860

3

Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
– coldspeed
Nov 9 at 3:32

add a comment |

up vote
11
down vote

accepted

As an user with both R and python and spent one year in this site, I have seen this type of question couple times.

Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in term of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly same as before , adding reindex at the end

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contain unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance :

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case have two columns type object

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited 3 hours ago

answered Nov 9 at 2:20

W-B

94.8k72860

As an user with both R and python and spent one year in this site, I have seen this type of question couple times.

Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.

Method 1
apply + pd.Series (easy to understand but in term of performance not recommended . )

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})

Out[463]: 

   A  B

0  1  1

1  1  2

0  2  1

1  2  2

Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

df

Out[465]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2

Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .

Solution : join or merge with the index after 'unnest' the single columns

s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))

s.join(df.drop('B',1),how='left')

Out[477]: 

   B  A

0  1  1

0  2  1

1  1  2

1  2  2

If you need the column order exactly same as before , adding reindex at the end

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

Method 3 recreate the list

pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Out[488]: 

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

If more than two columns

s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])

s.merge(df,left_on=0,right_index=True)

Out[491]: 

   0  1  A       B

0  0  1  1  [1, 2]

1  0  2  1  [1, 2]

2  1  1  2  [1, 2]

3  1  2  2  [1, 2]

Method 4 using reindex or loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))

Out[554]: 

   A  B

0  1  1

0  1  2

1  2  1

1  2  2



#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

Method 5 when the list only contain unique values:

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})

from collections import ChainMap

d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))

pd.DataFrame(list(d.items()),columns=df.columns[::-1])

Out[574]: 

   B  A

0  1  1

1  2  1

2  3  2

3  4  2

Method 6 using numpy for high performance :

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))

pd.DataFrame(data=newvalues[0],columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Method 7 : using base function itertools cycle and chain: Pure python solution just for fun

from itertools import cycle,chain

l=df.values.tolist()

l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]

pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)

   A  B

0  1  1

1  1  2

2  2  1

3  2  2

Special case have two columns type object

df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})

df

Out[592]: 

   A       B       C

0  1  [1, 2]  [1, 2]

1  2  [3, 4]  [3, 4]

Self-def function

def unnesting(df, explode):

    idx=df.index.repeat(df[explode[0]].str.len())

    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)

    df1.index=idx

    return df1.join(df.drop(explode,1),how='left')



unnesting(df,['B','C'])

Out[609]: 

   B  C  A

0  1  1  1

0  2  2  1

1  3  3  2

1  4  4  2

Summary :

edited 3 hours ago

answered Nov 9 at 2:20

W-B

94.8k72860

edited 3 hours ago

answered Nov 9 at 2:20

W-B

94.8k72860

answered Nov 9 at 2:20

W-B

94.8k72860

answered Nov 9 at 2:20

W-B

94.8k72860

3

Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
– coldspeed
Nov 9 at 3:32

add a comment |

3

Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
– coldspeed
Nov 9 at 3:32

Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
– coldspeed
Nov 9 at 3:32

add a comment |

up vote
4
down vote

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 at 4:15

answered Nov 9 at 2:35

user3483203

29.2k72351

add a comment |

up vote
4
down vote

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 at 4:15

answered Nov 9 at 2:35

user3483203

29.2k72351

add a comment |

up vote
4
down vote

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 at 4:15

answered Nov 9 at 2:35

user3483203

29.2k72351

Option 1

If all of the sublists in the other column are the same length, numpy can be an efficient option here:

vals = np.array(df.B.values.tolist())    

a = np.repeat(df.A, vals.shape[1])



pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

Option 2

If the sublists have different length, you need an additional step:

vals = df.B.values.tolist()

rs = [len(r) for r in vals]    

a = np.repeat(df.A, rs)



pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Option 3

I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:

df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],

                   'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})

   A          B          C  D

0  1     [1, 2]  [1, 2, 3]  A

1  2  [1, 2, 3]     [1, 2]  B

2  3        [1]     [1, 2]  C

def unnest(df, tile, explode):

    vals = df[explode].sum(1)

    rs = [len(r) for r in vals]

    a = np.repeat(df[tile].values, rs, axis=0)

    b = np.concatenate(vals.values)

    d = np.column_stack((a, b))

    return pd.DataFrame(d, columns = tile +  ['_'.join(explode)])



unnest(df, ['A', 'D'], ['B', 'C'])

    A  D B_C

0   1  A   1

1   1  A   2

2   1  A   1

3   1  A   2

4   1  A   3

5   2  B   1

6   2  B   2

7   2  B   3

8   2  B   1

9   2  B   2

10  3  C   1

11  3  C   1

12  3  C   2

Functions

def wen1(df):

    return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})



def wen2(df):

    return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})



def wen3(df):

    s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))

    return s.join(df.drop('B', 1), how='left')



def wen4(df):

    return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)



def chris1(df):

    vals = np.array(df.B.values.tolist())

    a = np.repeat(df.A, vals.shape[1])

    return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)



def chris2(df):

    vals = df.B.values.tolist()

    rs = [len(r) for r in vals]

    a = np.repeat(df.A.values, rs)

    return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)

Timings

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

from timeit import timeit



res = pd.DataFrame(

       index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],

       columns=[10, 50, 100, 500, 1000, 5000, 10000],

       dtype=float

)



for f in res.index:

    for c in res.columns:

        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

        df = pd.concat([df]*c)

        stmt = '{}(df)'.format(f)

        setp = 'from __main__ import df, {}'.format(f)

        res.at[f, c] = timeit(stmt, setp, number=50)



ax = res.div(res.min()).T.plot(loglog=True)

ax.set_xlabel("N")

ax.set_ylabel("time (relative)")

Performance

enter image description here

edited Nov 9 at 4:15

answered Nov 9 at 2:35

user3483203

29.2k72351

edited Nov 9 at 4:15

answered Nov 9 at 2:35

user3483203

29.2k72351

answered Nov 9 at 2:35

user3483203

29.2k72351

answered Nov 9 at 2:35

user3483203

29.2k72351

add a comment |

up vote
2
down vote

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered 6 hours ago

Daniel Mesejo

9,0331923

Nice one :-) I like those numpy solution
– W-B
6 hours ago

add a comment |

up vote
2
down vote

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered 6 hours ago

Daniel Mesejo

9,0331923

Nice one :-) I like those numpy solution
– W-B
6 hours ago

add a comment |

up vote
2
down vote

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered 6 hours ago

Daniel Mesejo

9,0331923

One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:

import numpy as np

import pandas as pd





def unnest(frame, explode):

    def mesh(values):

        return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))



    data = np.vstack(mesh(row) for row in frame[explode].values)

    return pd.DataFrame(data=data, columns=explode)





df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

print(unnest(df, ['A', 'B']))  # base

print()



df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})

print(unnest(df, ['A', 'B', 'C']))  # multiple columns

print()



df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],

                   'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})



print(unnest(df, ['A', 'B']))  # uneven length lists

print()

print(unnest(df, ['D', 'B']))  # different types

print()

Output

answered 6 hours ago

Daniel Mesejo

9,0331923

answered 6 hours ago

Daniel Mesejo

9,0331923

answered 6 hours ago

Daniel Mesejo

9,0331923

answered 6 hours ago

Daniel Mesejo

9,0331923

Nice one :-) I like those numpy solution
– W-B
6 hours ago

add a comment |

Nice one :-) I like those numpy solution
– W-B
6 hours ago

Nice one :-) I like those numpy solution
– W-B
6 hours ago

add a comment |

up vote
1
down vote

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 at 2:40

U9-Forward

10.2k2834

add a comment |

up vote
1
down vote

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 at 2:40

U9-Forward

10.2k2834

add a comment |

up vote
1
down vote

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 at 2:40

U9-Forward

10.2k2834

Something pretty not recommended (at least work in this case):

df=pd.concat([df]*2).sort_index()

it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])

df['B']=df['B'].apply(lambda x:next(it))

concat + sort_index + iter + apply + next.

Now:

print(df)

Is:

If care about index:

df=df.reset_index(drop=True)

Now:

print(df)

Is:

answered Nov 9 at 2:40

U9-Forward

10.2k2834

answered Nov 9 at 2:40

U9-Forward

10.2k2834

answered Nov 9 at 2:40

U9-Forward

10.2k2834

answered Nov 9 at 2:40

U9-Forward

10.2k2834

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl