How do I unnest a column in a pandas DataFrame?











up vote
11
down vote

favorite
5












I have the following DataFrame where one of the columns is an object (list type cell):



df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})
df
Out[458]:
A B
0 1 [1, 2]
1 2 [1, 2]


My expected output is:



   A  B
0 1 1
1 1 2
3 2 1
4 2 2


What should I do to achieve this?










share|improve this question

















This question has an open bounty worth +100
reputation from W-B ending in 6 days.


This question has not received enough attention.


This question need more attention, after posted it I still see so many question related to the same issue












  • 2




    Related, unnesting strings: stackoverflow.com/q/48197234/4909087
    – coldspeed
    Nov 12 at 12:00















up vote
11
down vote

favorite
5












I have the following DataFrame where one of the columns is an object (list type cell):



df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})
df
Out[458]:
A B
0 1 [1, 2]
1 2 [1, 2]


My expected output is:



   A  B
0 1 1
1 1 2
3 2 1
4 2 2


What should I do to achieve this?










share|improve this question

















This question has an open bounty worth +100
reputation from W-B ending in 6 days.


This question has not received enough attention.


This question need more attention, after posted it I still see so many question related to the same issue












  • 2




    Related, unnesting strings: stackoverflow.com/q/48197234/4909087
    – coldspeed
    Nov 12 at 12:00













up vote
11
down vote

favorite
5









up vote
11
down vote

favorite
5






5





I have the following DataFrame where one of the columns is an object (list type cell):



df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})
df
Out[458]:
A B
0 1 [1, 2]
1 2 [1, 2]


My expected output is:



   A  B
0 1 1
1 1 2
3 2 1
4 2 2


What should I do to achieve this?










share|improve this question















I have the following DataFrame where one of the columns is an object (list type cell):



df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})
df
Out[458]:
A B
0 1 [1, 2]
1 2 [1, 2]


My expected output is:



   A  B
0 1 1
1 1 2
3 2 1
4 2 2


What should I do to achieve this?







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 9 at 16:21









Boann

36.5k1287120




36.5k1287120










asked Nov 9 at 2:19









W-B

94.8k72860




94.8k72860






This question has an open bounty worth +100
reputation from W-B ending in 6 days.


This question has not received enough attention.


This question need more attention, after posted it I still see so many question related to the same issue








This question has an open bounty worth +100
reputation from W-B ending in 6 days.


This question has not received enough attention.


This question need more attention, after posted it I still see so many question related to the same issue










  • 2




    Related, unnesting strings: stackoverflow.com/q/48197234/4909087
    – coldspeed
    Nov 12 at 12:00














  • 2




    Related, unnesting strings: stackoverflow.com/q/48197234/4909087
    – coldspeed
    Nov 12 at 12:00








2




2




Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 at 12:00




Related, unnesting strings: stackoverflow.com/q/48197234/4909087
– coldspeed
Nov 12 at 12:00












4 Answers
4






active

oldest

votes

















up vote
11
down vote



accepted












As an user with both R and python and spent one year in this site, I have seen this type of question couple times.





Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.





I know object columns type always make the data hard to convert by pandas' function. When I received the data like this , the first thing come into my mind is to 'flatten' or unnesting the columns .





Method 1
apply + pd.Series (easy to understand but in term of performance not recommended . )



df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
Out[463]:
A B
0 1 1
1 1 2
0 2 1
1 2 2




Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )



df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})
df
Out[465]:
A B
0 1 1
0 1 2
1 2 1
1 2 2


Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .



Solution : join or merge with the index after 'unnest' the single columns



s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
s.join(df.drop('B',1),how='left')
Out[477]:
B A
0 1 1
0 2 1
1 1 2
1 2 2


If you need the column order exactly same as before , adding reindex at the end



s.join(df.drop('B',1),how='left').reindex(columns=df.columns)




Method 3 recreate the list



pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
Out[488]:
A B
0 1 1
1 1 2
2 2 1
3 2 2


If more than two columns



s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
s.merge(df,left_on=0,right_index=True)
Out[491]:
0 1 A B
0 0 1 1 [1, 2]
1 0 2 1 [1, 2]
2 1 1 2 [1, 2]
3 1 2 2 [1, 2]




Method 4 using reindex or loc



df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
Out[554]:
A B
0 1 1
0 1 2
1 2 1
1 2 2

#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))


Method 5 when the list only contain unique values:



df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})
from collections import ChainMap
d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))
pd.DataFrame(list(d.items()),columns=df.columns[::-1])
Out[574]:
B A
0 1 1
1 2 1
2 3 2
3 4 2


Method 6 using numpy for high performance :



newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
pd.DataFrame(data=newvalues[0],columns=df.columns)
A B
0 1 1
1 1 2
2 2 1
3 2 2




Method 7 : using base function itertools cycle and chain: Pure python solution just for fun



from itertools import cycle,chain
l=df.values.tolist()
l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
A B
0 1 1
1 1 2
2 2 1
3 2 2




Special case have two columns type object



df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})
df
Out[592]:
A B C
0 1 [1, 2] [1, 2]
1 2 [3, 4] [3, 4]




Self-def function



def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')

unnesting(df,['B','C'])
Out[609]:
B C A
0 1 1 1
0 2 2 1
1 3 3 2
1 4 4 2




Summary :



I am using pandas and python function for this type of question , if you worry about the speed of above solutions I provided , you can check user3483203's answer , since he is using numpy and most of the time numpy is faster . Just a suggestion if the speed is do matter for your case , I will recommend Cpython and numba






share|improve this answer



















  • 3




    Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
    – coldspeed
    Nov 9 at 3:32


















up vote
4
down vote













Option 1



If all of the sublists in the other column are the same length, numpy can be an efficient option here:



vals = np.array(df.B.values.tolist())    
a = np.repeat(df.A, vals.shape[1])

pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)




   A  B
0 1 1
1 1 2
2 2 1
3 2 2




Option 2



If the sublists have different length, you need an additional step:



vals = df.B.values.tolist()
rs = [len(r) for r in vals]
a = np.repeat(df.A, rs)

pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)




   A  B
0 1 1
1 1 2
2 2 1
3 2 2




Option 3



I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:



df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],
'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})




   A          B          C  D
0 1 [1, 2] [1, 2, 3] A
1 2 [1, 2, 3] [1, 2] B
2 3 [1] [1, 2] C




def unnest(df, tile, explode):
vals = df[explode].sum(1)
rs = [len(r) for r in vals]
a = np.repeat(df[tile].values, rs, axis=0)
b = np.concatenate(vals.values)
d = np.column_stack((a, b))
return pd.DataFrame(d, columns = tile + ['_'.join(explode)])

unnest(df, ['A', 'D'], ['B', 'C'])




    A  D B_C
0 1 A 1
1 1 A 2
2 1 A 1
3 1 A 2
4 1 A 3
5 2 B 1
6 2 B 2
7 2 B 3
8 2 B 1
9 2 B 2
10 3 C 1
11 3 C 1
12 3 C 2




Functions



def wen1(df):
return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})

def wen2(df):
return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

def wen3(df):
s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))
return s.join(df.drop('B', 1), how='left')

def wen4(df):
return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

def chris1(df):
vals = np.array(df.B.values.tolist())
a = np.repeat(df.A, vals.shape[1])
return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

def chris2(df):
vals = df.B.values.tolist()
rs = [len(r) for r in vals]
a = np.repeat(df.A.values, rs)
return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)


Timings



import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from timeit import timeit

res = pd.DataFrame(
index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],
columns=[10, 50, 100, 500, 1000, 5000, 10000],
dtype=float
)

for f in res.index:
for c in res.columns:
df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
df = pd.concat([df]*c)
stmt = '{}(df)'.format(f)
setp = 'from __main__ import df, {}'.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)

ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N")
ax.set_ylabel("time (relative)")


Performance



enter image description here






share|improve this answer






























    up vote
    2
    down vote













    One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:



    import numpy as np
    import pandas as pd


    def unnest(frame, explode):
    def mesh(values):
    return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))

    data = np.vstack(mesh(row) for row in frame[explode].values)
    return pd.DataFrame(data=data, columns=explode)


    df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
    print(unnest(df, ['A', 'B'])) # base
    print()

    df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})
    print(unnest(df, ['A', 'B', 'C'])) # multiple columns
    print()

    df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],
    'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})

    print(unnest(df, ['A', 'B'])) # uneven length lists
    print()
    print(unnest(df, ['D', 'B'])) # different types
    print()


    Output



       A  B
    0 1 1
    1 1 2
    2 2 1
    3 2 2

    A B C
    0 1 1 1
    1 1 2 1
    2 1 1 2
    3 1 2 2
    4 2 3 3
    5 2 4 3
    6 2 3 4
    7 2 4 4

    A B
    0 1 1
    1 1 2
    2 2 1
    3 2 2
    4 2 3
    5 3 1

    D B
    0 A 1
    1 A 2
    2 B 1
    3 B 2
    4 B 3
    5 C 1





    share|improve this answer





















    • Nice one :-) I like those numpy solution
      – W-B
      6 hours ago


















    up vote
    1
    down vote













    Something pretty not recommended (at least work in this case):



    df=pd.concat([df]*2).sort_index()
    it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])
    df['B']=df['B'].apply(lambda x:next(it))


    concat + sort_index + iter + apply + next.



    Now:



    print(df)


    Is:



       A  B
    0 1 1
    0 1 2
    1 2 1
    1 2 2


    If care about index:



    df=df.reset_index(drop=True)


    Now:



    print(df)


    Is:



       A  B
    0 1 1
    1 1 2
    2 2 1
    3 2 2





    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53218931%2fhow-do-i-unnest-a-column-in-a-pandas-dataframe%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      11
      down vote



      accepted












      As an user with both R and python and spent one year in this site, I have seen this type of question couple times.





      Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.





      I know object columns type always make the data hard to convert by pandas' function. When I received the data like this , the first thing come into my mind is to 'flatten' or unnesting the columns .





      Method 1
      apply + pd.Series (easy to understand but in term of performance not recommended . )



      df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
      Out[463]:
      A B
      0 1 1
      1 1 2
      0 2 1
      1 2 2




      Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )



      df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})
      df
      Out[465]:
      A B
      0 1 1
      0 1 2
      1 2 1
      1 2 2


      Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .



      Solution : join or merge with the index after 'unnest' the single columns



      s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
      s.join(df.drop('B',1),how='left')
      Out[477]:
      B A
      0 1 1
      0 2 1
      1 1 2
      1 2 2


      If you need the column order exactly same as before , adding reindex at the end



      s.join(df.drop('B',1),how='left').reindex(columns=df.columns)




      Method 3 recreate the list



      pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
      Out[488]:
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2


      If more than two columns



      s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
      s.merge(df,left_on=0,right_index=True)
      Out[491]:
      0 1 A B
      0 0 1 1 [1, 2]
      1 0 2 1 [1, 2]
      2 1 1 2 [1, 2]
      3 1 2 2 [1, 2]




      Method 4 using reindex or loc



      df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
      Out[554]:
      A B
      0 1 1
      0 1 2
      1 2 1
      1 2 2

      #df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))


      Method 5 when the list only contain unique values:



      df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})
      from collections import ChainMap
      d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))
      pd.DataFrame(list(d.items()),columns=df.columns[::-1])
      Out[574]:
      B A
      0 1 1
      1 2 1
      2 3 2
      3 4 2


      Method 6 using numpy for high performance :



      newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
      pd.DataFrame(data=newvalues[0],columns=df.columns)
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Method 7 : using base function itertools cycle and chain: Pure python solution just for fun



      from itertools import cycle,chain
      l=df.values.tolist()
      l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
      pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Special case have two columns type object



      df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})
      df
      Out[592]:
      A B C
      0 1 [1, 2] [1, 2]
      1 2 [3, 4] [3, 4]




      Self-def function



      def unnesting(df, explode):
      idx=df.index.repeat(df[explode[0]].str.len())
      df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
      df1.index=idx
      return df1.join(df.drop(explode,1),how='left')

      unnesting(df,['B','C'])
      Out[609]:
      B C A
      0 1 1 1
      0 2 2 1
      1 3 3 2
      1 4 4 2




      Summary :



      I am using pandas and python function for this type of question , if you worry about the speed of above solutions I provided , you can check user3483203's answer , since he is using numpy and most of the time numpy is faster . Just a suggestion if the speed is do matter for your case , I will recommend Cpython and numba






      share|improve this answer



















      • 3




        Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
        – coldspeed
        Nov 9 at 3:32















      up vote
      11
      down vote



      accepted












      As an user with both R and python and spent one year in this site, I have seen this type of question couple times.





      Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.





      I know object columns type always make the data hard to convert by pandas' function. When I received the data like this , the first thing come into my mind is to 'flatten' or unnesting the columns .





      Method 1
      apply + pd.Series (easy to understand but in term of performance not recommended . )



      df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
      Out[463]:
      A B
      0 1 1
      1 1 2
      0 2 1
      1 2 2




      Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )



      df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})
      df
      Out[465]:
      A B
      0 1 1
      0 1 2
      1 2 1
      1 2 2


      Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .



      Solution : join or merge with the index after 'unnest' the single columns



      s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
      s.join(df.drop('B',1),how='left')
      Out[477]:
      B A
      0 1 1
      0 2 1
      1 1 2
      1 2 2


      If you need the column order exactly same as before , adding reindex at the end



      s.join(df.drop('B',1),how='left').reindex(columns=df.columns)




      Method 3 recreate the list



      pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
      Out[488]:
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2


      If more than two columns



      s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
      s.merge(df,left_on=0,right_index=True)
      Out[491]:
      0 1 A B
      0 0 1 1 [1, 2]
      1 0 2 1 [1, 2]
      2 1 1 2 [1, 2]
      3 1 2 2 [1, 2]




      Method 4 using reindex or loc



      df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
      Out[554]:
      A B
      0 1 1
      0 1 2
      1 2 1
      1 2 2

      #df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))


      Method 5 when the list only contain unique values:



      df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})
      from collections import ChainMap
      d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))
      pd.DataFrame(list(d.items()),columns=df.columns[::-1])
      Out[574]:
      B A
      0 1 1
      1 2 1
      2 3 2
      3 4 2


      Method 6 using numpy for high performance :



      newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
      pd.DataFrame(data=newvalues[0],columns=df.columns)
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Method 7 : using base function itertools cycle and chain: Pure python solution just for fun



      from itertools import cycle,chain
      l=df.values.tolist()
      l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
      pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Special case have two columns type object



      df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})
      df
      Out[592]:
      A B C
      0 1 [1, 2] [1, 2]
      1 2 [3, 4] [3, 4]




      Self-def function



      def unnesting(df, explode):
      idx=df.index.repeat(df[explode[0]].str.len())
      df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
      df1.index=idx
      return df1.join(df.drop(explode,1),how='left')

      unnesting(df,['B','C'])
      Out[609]:
      B C A
      0 1 1 1
      0 2 2 1
      1 3 3 2
      1 4 4 2




      Summary :



      I am using pandas and python function for this type of question , if you worry about the speed of above solutions I provided , you can check user3483203's answer , since he is using numpy and most of the time numpy is faster . Just a suggestion if the speed is do matter for your case , I will recommend Cpython and numba






      share|improve this answer



















      • 3




        Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
        – coldspeed
        Nov 9 at 3:32













      up vote
      11
      down vote



      accepted







      up vote
      11
      down vote



      accepted








      As an user with both R and python and spent one year in this site, I have seen this type of question couple times.





      Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.





      I know object columns type always make the data hard to convert by pandas' function. When I received the data like this , the first thing come into my mind is to 'flatten' or unnesting the columns .





      Method 1
      apply + pd.Series (easy to understand but in term of performance not recommended . )



      df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
      Out[463]:
      A B
      0 1 1
      1 1 2
      0 2 1
      1 2 2




      Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )



      df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})
      df
      Out[465]:
      A B
      0 1 1
      0 1 2
      1 2 1
      1 2 2


      Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .



      Solution : join or merge with the index after 'unnest' the single columns



      s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
      s.join(df.drop('B',1),how='left')
      Out[477]:
      B A
      0 1 1
      0 2 1
      1 1 2
      1 2 2


      If you need the column order exactly same as before , adding reindex at the end



      s.join(df.drop('B',1),how='left').reindex(columns=df.columns)




      Method 3 recreate the list



      pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
      Out[488]:
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2


      If more than two columns



      s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
      s.merge(df,left_on=0,right_index=True)
      Out[491]:
      0 1 A B
      0 0 1 1 [1, 2]
      1 0 2 1 [1, 2]
      2 1 1 2 [1, 2]
      3 1 2 2 [1, 2]




      Method 4 using reindex or loc



      df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
      Out[554]:
      A B
      0 1 1
      0 1 2
      1 2 1
      1 2 2

      #df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))


      Method 5 when the list only contain unique values:



      df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})
      from collections import ChainMap
      d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))
      pd.DataFrame(list(d.items()),columns=df.columns[::-1])
      Out[574]:
      B A
      0 1 1
      1 2 1
      2 3 2
      3 4 2


      Method 6 using numpy for high performance :



      newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
      pd.DataFrame(data=newvalues[0],columns=df.columns)
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Method 7 : using base function itertools cycle and chain: Pure python solution just for fun



      from itertools import cycle,chain
      l=df.values.tolist()
      l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
      pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Special case have two columns type object



      df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})
      df
      Out[592]:
      A B C
      0 1 [1, 2] [1, 2]
      1 2 [3, 4] [3, 4]




      Self-def function



      def unnesting(df, explode):
      idx=df.index.repeat(df[explode[0]].str.len())
      df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
      df1.index=idx
      return df1.join(df.drop(explode,1),how='left')

      unnesting(df,['B','C'])
      Out[609]:
      B C A
      0 1 1 1
      0 2 2 1
      1 3 3 2
      1 4 4 2




      Summary :



      I am using pandas and python function for this type of question , if you worry about the speed of above solutions I provided , you can check user3483203's answer , since he is using numpy and most of the time numpy is faster . Just a suggestion if the speed is do matter for your case , I will recommend Cpython and numba






      share|improve this answer
















      As an user with both R and python and spent one year in this site, I have seen this type of question couple times.





      Since in R they have the build-in function from package tidyr so called unnest, But in Python(pandas) there is no build-in function for this type of question.





      I know object columns type always make the data hard to convert by pandas' function. When I received the data like this , the first thing come into my mind is to 'flatten' or unnesting the columns .





      Method 1
      apply + pd.Series (easy to understand but in term of performance not recommended . )



      df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
      Out[463]:
      A B
      0 1 1
      1 1 2
      0 2 1
      1 2 2




      Method 2 using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )



      df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})
      df
      Out[465]:
      A B
      0 1 1
      0 1 2
      1 2 1
      1 2 2


      Method 2.1 for example besides A we have A.1 .....A.n, if we still using the method(Method 2) above it is hard for us to re-create the columns one by one .



      Solution : join or merge with the index after 'unnest' the single columns



      s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
      s.join(df.drop('B',1),how='left')
      Out[477]:
      B A
      0 1 1
      0 2 1
      1 1 2
      1 2 2


      If you need the column order exactly same as before , adding reindex at the end



      s.join(df.drop('B',1),how='left').reindex(columns=df.columns)




      Method 3 recreate the list



      pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
      Out[488]:
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2


      If more than two columns



      s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
      s.merge(df,left_on=0,right_index=True)
      Out[491]:
      0 1 A B
      0 0 1 1 [1, 2]
      1 0 2 1 [1, 2]
      2 1 1 2 [1, 2]
      3 1 2 2 [1, 2]




      Method 4 using reindex or loc



      df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
      Out[554]:
      A B
      0 1 1
      0 1 2
      1 2 1
      1 2 2

      #df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))


      Method 5 when the list only contain unique values:



      df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})
      from collections import ChainMap
      d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))
      pd.DataFrame(list(d.items()),columns=df.columns[::-1])
      Out[574]:
      B A
      0 1 1
      1 2 1
      2 3 2
      3 4 2


      Method 6 using numpy for high performance :



      newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
      pd.DataFrame(data=newvalues[0],columns=df.columns)
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Method 7 : using base function itertools cycle and chain: Pure python solution just for fun



      from itertools import cycle,chain
      l=df.values.tolist()
      l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
      pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
      A B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Special case have two columns type object



      df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})
      df
      Out[592]:
      A B C
      0 1 [1, 2] [1, 2]
      1 2 [3, 4] [3, 4]




      Self-def function



      def unnesting(df, explode):
      idx=df.index.repeat(df[explode[0]].str.len())
      df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
      df1.index=idx
      return df1.join(df.drop(explode,1),how='left')

      unnesting(df,['B','C'])
      Out[609]:
      B C A
      0 1 1 1
      0 2 2 1
      1 3 3 2
      1 4 4 2




      Summary :



      I am using pandas and python function for this type of question , if you worry about the speed of above solutions I provided , you can check user3483203's answer , since he is using numpy and most of the time numpy is faster . Just a suggestion if the speed is do matter for your case , I will recommend Cpython and numba







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 3 hours ago

























      answered Nov 9 at 2:20









      W-B

      94.8k72860




      94.8k72860








      • 3




        Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
        – coldspeed
        Nov 9 at 3:32














      • 3




        Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
        – coldspeed
        Nov 9 at 3:32








      3




      3




      Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
      – coldspeed
      Nov 9 at 3:32




      Good one! I like the answers here. Perhaps you could enumerate on some situations where multiple columns need unnesting, so how would a solution like this generalise to N arbitrary columns with even (or uneven) length lists.
      – coldspeed
      Nov 9 at 3:32












      up vote
      4
      down vote













      Option 1



      If all of the sublists in the other column are the same length, numpy can be an efficient option here:



      vals = np.array(df.B.values.tolist())    
      a = np.repeat(df.A, vals.shape[1])

      pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)




         A  B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Option 2



      If the sublists have different length, you need an additional step:



      vals = df.B.values.tolist()
      rs = [len(r) for r in vals]
      a = np.repeat(df.A, rs)

      pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)




         A  B
      0 1 1
      1 1 2
      2 2 1
      3 2 2




      Option 3



      I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:



      df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],
      'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})




         A          B          C  D
      0 1 [1, 2] [1, 2, 3] A
      1 2 [1, 2, 3] [1, 2] B
      2 3 [1] [1, 2] C




      def unnest(df, tile, explode):
      vals = df[explode].sum(1)
      rs = [len(r) for r in vals]
      a = np.repeat(df[tile].values, rs, axis=0)
      b = np.concatenate(vals.values)
      d = np.column_stack((a, b))
      return pd.DataFrame(d, columns = tile + ['_'.join(explode)])

      unnest(df, ['A', 'D'], ['B', 'C'])




          A  D B_C
      0 1 A 1
      1 1 A 2
      2 1 A 1
      3 1 A 2
      4 1 A 3
      5 2 B 1
      6 2 B 2
      7 2 B 3
      8 2 B 1
      9 2 B 2
      10 3 C 1
      11 3 C 1
      12 3 C 2




      Functions



      def wen1(df):
      return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})

      def wen2(df):
      return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

      def wen3(df):
      s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))
      return s.join(df.drop('B', 1), how='left')

      def wen4(df):
      return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

      def chris1(df):
      vals = np.array(df.B.values.tolist())
      a = np.repeat(df.A, vals.shape[1])
      return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

      def chris2(df):
      vals = df.B.values.tolist()
      rs = [len(r) for r in vals]
      a = np.repeat(df.A.values, rs)
      return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)


      Timings



      import pandas as pd
      import matplotlib.pyplot as plt
      import numpy as np
      from timeit import timeit

      res = pd.DataFrame(
      index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],
      columns=[10, 50, 100, 500, 1000, 5000, 10000],
      dtype=float
      )

      for f in res.index:
      for c in res.columns:
      df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
      df = pd.concat([df]*c)
      stmt = '{}(df)'.format(f)
      setp = 'from __main__ import df, {}'.format(f)
      res.at[f, c] = timeit(stmt, setp, number=50)

      ax = res.div(res.min()).T.plot(loglog=True)
      ax.set_xlabel("N")
      ax.set_ylabel("time (relative)")


      Performance



      enter image description here






      share|improve this answer



























        up vote
        4
        down vote













        Option 1



        If all of the sublists in the other column are the same length, numpy can be an efficient option here:



        vals = np.array(df.B.values.tolist())    
        a = np.repeat(df.A, vals.shape[1])

        pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)




           A  B
        0 1 1
        1 1 2
        2 2 1
        3 2 2




        Option 2



        If the sublists have different length, you need an additional step:



        vals = df.B.values.tolist()
        rs = [len(r) for r in vals]
        a = np.repeat(df.A, rs)

        pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)




           A  B
        0 1 1
        1 1 2
        2 2 1
        3 2 2




        Option 3



        I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:



        df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],
        'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})




           A          B          C  D
        0 1 [1, 2] [1, 2, 3] A
        1 2 [1, 2, 3] [1, 2] B
        2 3 [1] [1, 2] C




        def unnest(df, tile, explode):
        vals = df[explode].sum(1)
        rs = [len(r) for r in vals]
        a = np.repeat(df[tile].values, rs, axis=0)
        b = np.concatenate(vals.values)
        d = np.column_stack((a, b))
        return pd.DataFrame(d, columns = tile + ['_'.join(explode)])

        unnest(df, ['A', 'D'], ['B', 'C'])




            A  D B_C
        0 1 A 1
        1 1 A 2
        2 1 A 1
        3 1 A 2
        4 1 A 3
        5 2 B 1
        6 2 B 2
        7 2 B 3
        8 2 B 1
        9 2 B 2
        10 3 C 1
        11 3 C 1
        12 3 C 2




        Functions



        def wen1(df):
        return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})

        def wen2(df):
        return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

        def wen3(df):
        s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))
        return s.join(df.drop('B', 1), how='left')

        def wen4(df):
        return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

        def chris1(df):
        vals = np.array(df.B.values.tolist())
        a = np.repeat(df.A, vals.shape[1])
        return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

        def chris2(df):
        vals = df.B.values.tolist()
        rs = [len(r) for r in vals]
        a = np.repeat(df.A.values, rs)
        return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)


        Timings



        import pandas as pd
        import matplotlib.pyplot as plt
        import numpy as np
        from timeit import timeit

        res = pd.DataFrame(
        index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],
        columns=[10, 50, 100, 500, 1000, 5000, 10000],
        dtype=float
        )

        for f in res.index:
        for c in res.columns:
        df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
        df = pd.concat([df]*c)
        stmt = '{}(df)'.format(f)
        setp = 'from __main__ import df, {}'.format(f)
        res.at[f, c] = timeit(stmt, setp, number=50)

        ax = res.div(res.min()).T.plot(loglog=True)
        ax.set_xlabel("N")
        ax.set_ylabel("time (relative)")


        Performance



        enter image description here






        share|improve this answer

























          up vote
          4
          down vote










          up vote
          4
          down vote









          Option 1



          If all of the sublists in the other column are the same length, numpy can be an efficient option here:



          vals = np.array(df.B.values.tolist())    
          a = np.repeat(df.A, vals.shape[1])

          pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)




             A  B
          0 1 1
          1 1 2
          2 2 1
          3 2 2




          Option 2



          If the sublists have different length, you need an additional step:



          vals = df.B.values.tolist()
          rs = [len(r) for r in vals]
          a = np.repeat(df.A, rs)

          pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)




             A  B
          0 1 1
          1 1 2
          2 2 1
          3 2 2




          Option 3



          I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:



          df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],
          'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})




             A          B          C  D
          0 1 [1, 2] [1, 2, 3] A
          1 2 [1, 2, 3] [1, 2] B
          2 3 [1] [1, 2] C




          def unnest(df, tile, explode):
          vals = df[explode].sum(1)
          rs = [len(r) for r in vals]
          a = np.repeat(df[tile].values, rs, axis=0)
          b = np.concatenate(vals.values)
          d = np.column_stack((a, b))
          return pd.DataFrame(d, columns = tile + ['_'.join(explode)])

          unnest(df, ['A', 'D'], ['B', 'C'])




              A  D B_C
          0 1 A 1
          1 1 A 2
          2 1 A 1
          3 1 A 2
          4 1 A 3
          5 2 B 1
          6 2 B 2
          7 2 B 3
          8 2 B 1
          9 2 B 2
          10 3 C 1
          11 3 C 1
          12 3 C 2




          Functions



          def wen1(df):
          return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})

          def wen2(df):
          return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

          def wen3(df):
          s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))
          return s.join(df.drop('B', 1), how='left')

          def wen4(df):
          return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

          def chris1(df):
          vals = np.array(df.B.values.tolist())
          a = np.repeat(df.A, vals.shape[1])
          return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

          def chris2(df):
          vals = df.B.values.tolist()
          rs = [len(r) for r in vals]
          a = np.repeat(df.A.values, rs)
          return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)


          Timings



          import pandas as pd
          import matplotlib.pyplot as plt
          import numpy as np
          from timeit import timeit

          res = pd.DataFrame(
          index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],
          columns=[10, 50, 100, 500, 1000, 5000, 10000],
          dtype=float
          )

          for f in res.index:
          for c in res.columns:
          df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
          df = pd.concat([df]*c)
          stmt = '{}(df)'.format(f)
          setp = 'from __main__ import df, {}'.format(f)
          res.at[f, c] = timeit(stmt, setp, number=50)

          ax = res.div(res.min()).T.plot(loglog=True)
          ax.set_xlabel("N")
          ax.set_ylabel("time (relative)")


          Performance



          enter image description here






          share|improve this answer














          Option 1



          If all of the sublists in the other column are the same length, numpy can be an efficient option here:



          vals = np.array(df.B.values.tolist())    
          a = np.repeat(df.A, vals.shape[1])

          pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)




             A  B
          0 1 1
          1 1 2
          2 2 1
          3 2 2




          Option 2



          If the sublists have different length, you need an additional step:



          vals = df.B.values.tolist()
          rs = [len(r) for r in vals]
          a = np.repeat(df.A, rs)

          pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)




             A  B
          0 1 1
          1 1 2
          2 2 1
          3 2 2




          Option 3



          I took a shot at generalizing this to work to flatten N columns and tile M columns, I'll work later on making it more efficient:



          df = pd.DataFrame({'A': [1,2,3], 'B': [[1,2], [1,2,3], [1]],
          'C': [[1,2,3], [1,2], [1,2]], 'D': ['A', 'B', 'C']})




             A          B          C  D
          0 1 [1, 2] [1, 2, 3] A
          1 2 [1, 2, 3] [1, 2] B
          2 3 [1] [1, 2] C




          def unnest(df, tile, explode):
          vals = df[explode].sum(1)
          rs = [len(r) for r in vals]
          a = np.repeat(df[tile].values, rs, axis=0)
          b = np.concatenate(vals.values)
          d = np.column_stack((a, b))
          return pd.DataFrame(d, columns = tile + ['_'.join(explode)])

          unnest(df, ['A', 'D'], ['B', 'C'])




              A  D B_C
          0 1 A 1
          1 1 A 2
          2 1 A 1
          3 1 A 2
          4 1 A 3
          5 2 B 1
          6 2 B 2
          7 2 B 3
          8 2 B 1
          9 2 B 2
          10 3 C 1
          11 3 C 1
          12 3 C 2




          Functions



          def wen1(df):
          return df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0: 'B'})

          def wen2(df):
          return pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})

          def wen3(df):
          s = pd.DataFrame({'B': np.concatenate(df.B.values)}, index=df.index.repeat(df.B.str.len()))
          return s.join(df.drop('B', 1), how='left')

          def wen4(df):
          return pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

          def chris1(df):
          vals = np.array(df.B.values.tolist())
          a = np.repeat(df.A, vals.shape[1])
          return pd.DataFrame(np.column_stack((a, vals.ravel())), columns=df.columns)

          def chris2(df):
          vals = df.B.values.tolist()
          rs = [len(r) for r in vals]
          a = np.repeat(df.A.values, rs)
          return pd.DataFrame(np.column_stack((a, np.concatenate(vals))), columns=df.columns)


          Timings



          import pandas as pd
          import matplotlib.pyplot as plt
          import numpy as np
          from timeit import timeit

          res = pd.DataFrame(
          index=['wen1', 'wen2', 'wen3', 'wen4', 'chris1', 'chris2'],
          columns=[10, 50, 100, 500, 1000, 5000, 10000],
          dtype=float
          )

          for f in res.index:
          for c in res.columns:
          df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
          df = pd.concat([df]*c)
          stmt = '{}(df)'.format(f)
          setp = 'from __main__ import df, {}'.format(f)
          res.at[f, c] = timeit(stmt, setp, number=50)

          ax = res.div(res.min()).T.plot(loglog=True)
          ax.set_xlabel("N")
          ax.set_ylabel("time (relative)")


          Performance



          enter image description here







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 9 at 4:15

























          answered Nov 9 at 2:35









          user3483203

          29.2k72351




          29.2k72351






















              up vote
              2
              down vote













              One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:



              import numpy as np
              import pandas as pd


              def unnest(frame, explode):
              def mesh(values):
              return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))

              data = np.vstack(mesh(row) for row in frame[explode].values)
              return pd.DataFrame(data=data, columns=explode)


              df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
              print(unnest(df, ['A', 'B'])) # base
              print()

              df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})
              print(unnest(df, ['A', 'B', 'C'])) # multiple columns
              print()

              df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],
              'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})

              print(unnest(df, ['A', 'B'])) # uneven length lists
              print()
              print(unnest(df, ['D', 'B'])) # different types
              print()


              Output



                 A  B
              0 1 1
              1 1 2
              2 2 1
              3 2 2

              A B C
              0 1 1 1
              1 1 2 1
              2 1 1 2
              3 1 2 2
              4 2 3 3
              5 2 4 3
              6 2 3 4
              7 2 4 4

              A B
              0 1 1
              1 1 2
              2 2 1
              3 2 2
              4 2 3
              5 3 1

              D B
              0 A 1
              1 A 2
              2 B 1
              3 B 2
              4 B 3
              5 C 1





              share|improve this answer





















              • Nice one :-) I like those numpy solution
                – W-B
                6 hours ago















              up vote
              2
              down vote













              One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:



              import numpy as np
              import pandas as pd


              def unnest(frame, explode):
              def mesh(values):
              return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))

              data = np.vstack(mesh(row) for row in frame[explode].values)
              return pd.DataFrame(data=data, columns=explode)


              df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
              print(unnest(df, ['A', 'B'])) # base
              print()

              df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})
              print(unnest(df, ['A', 'B', 'C'])) # multiple columns
              print()

              df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],
              'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})

              print(unnest(df, ['A', 'B'])) # uneven length lists
              print()
              print(unnest(df, ['D', 'B'])) # different types
              print()


              Output



                 A  B
              0 1 1
              1 1 2
              2 2 1
              3 2 2

              A B C
              0 1 1 1
              1 1 2 1
              2 1 1 2
              3 1 2 2
              4 2 3 3
              5 2 4 3
              6 2 3 4
              7 2 4 4

              A B
              0 1 1
              1 1 2
              2 2 1
              3 2 2
              4 2 3
              5 3 1

              D B
              0 A 1
              1 A 2
              2 B 1
              3 B 2
              4 B 3
              5 C 1





              share|improve this answer





















              • Nice one :-) I like those numpy solution
                – W-B
                6 hours ago













              up vote
              2
              down vote










              up vote
              2
              down vote









              One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:



              import numpy as np
              import pandas as pd


              def unnest(frame, explode):
              def mesh(values):
              return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))

              data = np.vstack(mesh(row) for row in frame[explode].values)
              return pd.DataFrame(data=data, columns=explode)


              df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
              print(unnest(df, ['A', 'B'])) # base
              print()

              df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})
              print(unnest(df, ['A', 'B', 'C'])) # multiple columns
              print()

              df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],
              'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})

              print(unnest(df, ['A', 'B'])) # uneven length lists
              print()
              print(unnest(df, ['D', 'B'])) # different types
              print()


              Output



                 A  B
              0 1 1
              1 1 2
              2 2 1
              3 2 2

              A B C
              0 1 1 1
              1 1 2 1
              2 1 1 2
              3 1 2 2
              4 2 3 3
              5 2 4 3
              6 2 3 4
              7 2 4 4

              A B
              0 1 1
              1 1 2
              2 2 1
              3 2 2
              4 2 3
              5 3 1

              D B
              0 A 1
              1 A 2
              2 B 1
              3 B 2
              4 B 3
              5 C 1





              share|improve this answer












              One alternative is to apply the meshgrid recipe over the rows of the columns to unnest:



              import numpy as np
              import pandas as pd


              def unnest(frame, explode):
              def mesh(values):
              return np.array(np.meshgrid(*values)).T.reshape(-1, len(values))

              data = np.vstack(mesh(row) for row in frame[explode].values)
              return pd.DataFrame(data=data, columns=explode)


              df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})
              print(unnest(df, ['A', 'B'])) # base
              print()

              df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [3, 4]], 'C': [[1, 2], [3, 4]]})
              print(unnest(df, ['A', 'B', 'C'])) # multiple columns
              print()

              df = pd.DataFrame({'A': [1, 2, 3], 'B': [[1, 2], [1, 2, 3], [1]],
              'C': [[1, 2, 3], [1, 2], [1, 2]], 'D': ['A', 'B', 'C']})

              print(unnest(df, ['A', 'B'])) # uneven length lists
              print()
              print(unnest(df, ['D', 'B'])) # different types
              print()


              Output



                 A  B
              0 1 1
              1 1 2
              2 2 1
              3 2 2

              A B C
              0 1 1 1
              1 1 2 1
              2 1 1 2
              3 1 2 2
              4 2 3 3
              5 2 4 3
              6 2 3 4
              7 2 4 4

              A B
              0 1 1
              1 1 2
              2 2 1
              3 2 2
              4 2 3
              5 3 1

              D B
              0 A 1
              1 A 2
              2 B 1
              3 B 2
              4 B 3
              5 C 1






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 6 hours ago









              Daniel Mesejo

              9,0331923




              9,0331923












              • Nice one :-) I like those numpy solution
                – W-B
                6 hours ago


















              • Nice one :-) I like those numpy solution
                – W-B
                6 hours ago
















              Nice one :-) I like those numpy solution
              – W-B
              6 hours ago




              Nice one :-) I like those numpy solution
              – W-B
              6 hours ago










              up vote
              1
              down vote













              Something pretty not recommended (at least work in this case):



              df=pd.concat([df]*2).sort_index()
              it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])
              df['B']=df['B'].apply(lambda x:next(it))


              concat + sort_index + iter + apply + next.



              Now:



              print(df)


              Is:



                 A  B
              0 1 1
              0 1 2
              1 2 1
              1 2 2


              If care about index:



              df=df.reset_index(drop=True)


              Now:



              print(df)


              Is:



                 A  B
              0 1 1
              1 1 2
              2 2 1
              3 2 2





              share|improve this answer

























                up vote
                1
                down vote













                Something pretty not recommended (at least work in this case):



                df=pd.concat([df]*2).sort_index()
                it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])
                df['B']=df['B'].apply(lambda x:next(it))


                concat + sort_index + iter + apply + next.



                Now:



                print(df)


                Is:



                   A  B
                0 1 1
                0 1 2
                1 2 1
                1 2 2


                If care about index:



                df=df.reset_index(drop=True)


                Now:



                print(df)


                Is:



                   A  B
                0 1 1
                1 1 2
                2 2 1
                3 2 2





                share|improve this answer























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  Something pretty not recommended (at least work in this case):



                  df=pd.concat([df]*2).sort_index()
                  it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])
                  df['B']=df['B'].apply(lambda x:next(it))


                  concat + sort_index + iter + apply + next.



                  Now:



                  print(df)


                  Is:



                     A  B
                  0 1 1
                  0 1 2
                  1 2 1
                  1 2 2


                  If care about index:



                  df=df.reset_index(drop=True)


                  Now:



                  print(df)


                  Is:



                     A  B
                  0 1 1
                  1 1 2
                  2 2 1
                  3 2 2





                  share|improve this answer












                  Something pretty not recommended (at least work in this case):



                  df=pd.concat([df]*2).sort_index()
                  it=iter(df['B'].tolist()[0]+df['B'].tolist()[0])
                  df['B']=df['B'].apply(lambda x:next(it))


                  concat + sort_index + iter + apply + next.



                  Now:



                  print(df)


                  Is:



                     A  B
                  0 1 1
                  0 1 2
                  1 2 1
                  1 2 2


                  If care about index:



                  df=df.reset_index(drop=True)


                  Now:



                  print(df)


                  Is:



                     A  B
                  0 1 1
                  1 1 2
                  2 2 1
                  3 2 2






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 9 at 2:40









                  U9-Forward

                  10.2k2834




                  10.2k2834






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53218931%2fhow-do-i-unnest-a-column-in-a-pandas-dataframe%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                      Calculate evaluation metrics using cross_val_predict sklearn

                      Insert data from modal to MySQL (multiple modal on website)