Compare multiple columns in two dataframes and select rows with differing values












0















I'm trying to compare 2 columns in one dataframe(df1) with 2 columns in another dataframe(df2). After comparison, I want to select the rows where the first two columns do not match. You can see my attempts below and this what the dataframes look like [1]



import pandas as pd

fd1= 'Q37.xlsx'
fd2= 'Q43.xlsx'
df1 = pd.read_excel( fd1, sheetname='prio 1')
df2 = pd.read_excel( fd2, sheetname='prio 1')


closed_items= {} #items in fd1 but not in fd2
new_items={} #items in fd2 but not in fd1


In order to get closed_items, I've tried the following 3 things



closed_items.where(df1[df1['Code'].values!=df2[df2['Code'].values and 
df1['Owner'].values != key in df1['Owner'].values)


and gotten



ValueError: Can only compare identically-labeled Series objects


I've also tried



Closed_items = df2.loc[(df2['Code'] != df1['Code']) and 
df2.loc[(df2['Owner'] != df1['Owner'])]


And lastly I tried



for key in df1['Code'].values:
if key in df1['Code'].values != key in df1['Code'].values or key in
df1['Owner'].values != key in df1['Owner'].values:

closed_items.append()
else:
pass


Which gave this syntax



 The truth value of an array with more than one element is ambiguous. 
Use a.any() or a.all()


...



AFP= pd.ExcelWriter("AFP.xlsx", engine='xlsxwriter')

closed_items.to_excel(AFP, sheet_name='Closed', index=False)









share|improve this question

























  • I would think better is to have equally shaped series and try to use something like not s1.intersection(s2) or difference stackoverflow.com/questions/18079563/… pandas.pydata.org/pandas-docs/stable/generated/…

    – Sergii
    Nov 27 '18 at 12:18
















0















I'm trying to compare 2 columns in one dataframe(df1) with 2 columns in another dataframe(df2). After comparison, I want to select the rows where the first two columns do not match. You can see my attempts below and this what the dataframes look like [1]



import pandas as pd

fd1= 'Q37.xlsx'
fd2= 'Q43.xlsx'
df1 = pd.read_excel( fd1, sheetname='prio 1')
df2 = pd.read_excel( fd2, sheetname='prio 1')


closed_items= {} #items in fd1 but not in fd2
new_items={} #items in fd2 but not in fd1


In order to get closed_items, I've tried the following 3 things



closed_items.where(df1[df1['Code'].values!=df2[df2['Code'].values and 
df1['Owner'].values != key in df1['Owner'].values)


and gotten



ValueError: Can only compare identically-labeled Series objects


I've also tried



Closed_items = df2.loc[(df2['Code'] != df1['Code']) and 
df2.loc[(df2['Owner'] != df1['Owner'])]


And lastly I tried



for key in df1['Code'].values:
if key in df1['Code'].values != key in df1['Code'].values or key in
df1['Owner'].values != key in df1['Owner'].values:

closed_items.append()
else:
pass


Which gave this syntax



 The truth value of an array with more than one element is ambiguous. 
Use a.any() or a.all()


...



AFP= pd.ExcelWriter("AFP.xlsx", engine='xlsxwriter')

closed_items.to_excel(AFP, sheet_name='Closed', index=False)









share|improve this question

























  • I would think better is to have equally shaped series and try to use something like not s1.intersection(s2) or difference stackoverflow.com/questions/18079563/… pandas.pydata.org/pandas-docs/stable/generated/…

    – Sergii
    Nov 27 '18 at 12:18














0












0








0


1






I'm trying to compare 2 columns in one dataframe(df1) with 2 columns in another dataframe(df2). After comparison, I want to select the rows where the first two columns do not match. You can see my attempts below and this what the dataframes look like [1]



import pandas as pd

fd1= 'Q37.xlsx'
fd2= 'Q43.xlsx'
df1 = pd.read_excel( fd1, sheetname='prio 1')
df2 = pd.read_excel( fd2, sheetname='prio 1')


closed_items= {} #items in fd1 but not in fd2
new_items={} #items in fd2 but not in fd1


In order to get closed_items, I've tried the following 3 things



closed_items.where(df1[df1['Code'].values!=df2[df2['Code'].values and 
df1['Owner'].values != key in df1['Owner'].values)


and gotten



ValueError: Can only compare identically-labeled Series objects


I've also tried



Closed_items = df2.loc[(df2['Code'] != df1['Code']) and 
df2.loc[(df2['Owner'] != df1['Owner'])]


And lastly I tried



for key in df1['Code'].values:
if key in df1['Code'].values != key in df1['Code'].values or key in
df1['Owner'].values != key in df1['Owner'].values:

closed_items.append()
else:
pass


Which gave this syntax



 The truth value of an array with more than one element is ambiguous. 
Use a.any() or a.all()


...



AFP= pd.ExcelWriter("AFP.xlsx", engine='xlsxwriter')

closed_items.to_excel(AFP, sheet_name='Closed', index=False)









share|improve this question
















I'm trying to compare 2 columns in one dataframe(df1) with 2 columns in another dataframe(df2). After comparison, I want to select the rows where the first two columns do not match. You can see my attempts below and this what the dataframes look like [1]



import pandas as pd

fd1= 'Q37.xlsx'
fd2= 'Q43.xlsx'
df1 = pd.read_excel( fd1, sheetname='prio 1')
df2 = pd.read_excel( fd2, sheetname='prio 1')


closed_items= {} #items in fd1 but not in fd2
new_items={} #items in fd2 but not in fd1


In order to get closed_items, I've tried the following 3 things



closed_items.where(df1[df1['Code'].values!=df2[df2['Code'].values and 
df1['Owner'].values != key in df1['Owner'].values)


and gotten



ValueError: Can only compare identically-labeled Series objects


I've also tried



Closed_items = df2.loc[(df2['Code'] != df1['Code']) and 
df2.loc[(df2['Owner'] != df1['Owner'])]


And lastly I tried



for key in df1['Code'].values:
if key in df1['Code'].values != key in df1['Code'].values or key in
df1['Owner'].values != key in df1['Owner'].values:

closed_items.append()
else:
pass


Which gave this syntax



 The truth value of an array with more than one element is ambiguous. 
Use a.any() or a.all()


...



AFP= pd.ExcelWriter("AFP.xlsx", engine='xlsxwriter')

closed_items.to_excel(AFP, sheet_name='Closed', index=False)






python excel pandas dataframe conditional-formatting






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 13:07









user3471881

1,1922619




1,1922619










asked Nov 27 '18 at 12:02









gabriella gabriella

185




185













  • I would think better is to have equally shaped series and try to use something like not s1.intersection(s2) or difference stackoverflow.com/questions/18079563/… pandas.pydata.org/pandas-docs/stable/generated/…

    – Sergii
    Nov 27 '18 at 12:18



















  • I would think better is to have equally shaped series and try to use something like not s1.intersection(s2) or difference stackoverflow.com/questions/18079563/… pandas.pydata.org/pandas-docs/stable/generated/…

    – Sergii
    Nov 27 '18 at 12:18

















I would think better is to have equally shaped series and try to use something like not s1.intersection(s2) or difference stackoverflow.com/questions/18079563/… pandas.pydata.org/pandas-docs/stable/generated/…

– Sergii
Nov 27 '18 at 12:18





I would think better is to have equally shaped series and try to use something like not s1.intersection(s2) or difference stackoverflow.com/questions/18079563/… pandas.pydata.org/pandas-docs/stable/generated/…

– Sergii
Nov 27 '18 at 12:18












1 Answer
1






active

oldest

votes


















0














The problem is that df1 and df2 are of different shapes hence the loc will not work.
You first need to merge df1 and df2 like



df3 = df1.merge(df2,on='common_key',how='left',suffixes=('_df1','_df2'))
df3['select'] = 0
df3.loc[(df3['Code_df1'] == df3['Code_df2']) &
(df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

df3.loc[df3['select']==0,:]


will return wherever they do not match






share|improve this answer


























  • It returns Invalid syntax on df3.loc[(df3['Code_df1'] == df3['Code_df2']) & df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'] = 1

    – gabriella
    Nov 29 '18 at 8:50













  • Use this I missed a closing parenthesis while copying the code. df3.loc[(df3['Code_df1'] == df3['Code_df2']) & (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

    – Abhishek Sharma
    Nov 30 '18 at 9:10











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53499263%2fcompare-multiple-columns-in-two-dataframes-and-select-rows-with-differing-values%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














The problem is that df1 and df2 are of different shapes hence the loc will not work.
You first need to merge df1 and df2 like



df3 = df1.merge(df2,on='common_key',how='left',suffixes=('_df1','_df2'))
df3['select'] = 0
df3.loc[(df3['Code_df1'] == df3['Code_df2']) &
(df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

df3.loc[df3['select']==0,:]


will return wherever they do not match






share|improve this answer


























  • It returns Invalid syntax on df3.loc[(df3['Code_df1'] == df3['Code_df2']) & df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'] = 1

    – gabriella
    Nov 29 '18 at 8:50













  • Use this I missed a closing parenthesis while copying the code. df3.loc[(df3['Code_df1'] == df3['Code_df2']) & (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

    – Abhishek Sharma
    Nov 30 '18 at 9:10
















0














The problem is that df1 and df2 are of different shapes hence the loc will not work.
You first need to merge df1 and df2 like



df3 = df1.merge(df2,on='common_key',how='left',suffixes=('_df1','_df2'))
df3['select'] = 0
df3.loc[(df3['Code_df1'] == df3['Code_df2']) &
(df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

df3.loc[df3['select']==0,:]


will return wherever they do not match






share|improve this answer


























  • It returns Invalid syntax on df3.loc[(df3['Code_df1'] == df3['Code_df2']) & df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'] = 1

    – gabriella
    Nov 29 '18 at 8:50













  • Use this I missed a closing parenthesis while copying the code. df3.loc[(df3['Code_df1'] == df3['Code_df2']) & (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

    – Abhishek Sharma
    Nov 30 '18 at 9:10














0












0








0







The problem is that df1 and df2 are of different shapes hence the loc will not work.
You first need to merge df1 and df2 like



df3 = df1.merge(df2,on='common_key',how='left',suffixes=('_df1','_df2'))
df3['select'] = 0
df3.loc[(df3['Code_df1'] == df3['Code_df2']) &
(df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

df3.loc[df3['select']==0,:]


will return wherever they do not match






share|improve this answer















The problem is that df1 and df2 are of different shapes hence the loc will not work.
You first need to merge df1 and df2 like



df3 = df1.merge(df2,on='common_key',how='left',suffixes=('_df1','_df2'))
df3['select'] = 0
df3.loc[(df3['Code_df1'] == df3['Code_df2']) &
(df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

df3.loc[df3['select']==0,:]


will return wherever they do not match







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 30 '18 at 9:13

























answered Nov 27 '18 at 12:11









Abhishek SharmaAbhishek Sharma

6371622




6371622













  • It returns Invalid syntax on df3.loc[(df3['Code_df1'] == df3['Code_df2']) & df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'] = 1

    – gabriella
    Nov 29 '18 at 8:50













  • Use this I missed a closing parenthesis while copying the code. df3.loc[(df3['Code_df1'] == df3['Code_df2']) & (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

    – Abhishek Sharma
    Nov 30 '18 at 9:10



















  • It returns Invalid syntax on df3.loc[(df3['Code_df1'] == df3['Code_df2']) & df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'] = 1

    – gabriella
    Nov 29 '18 at 8:50













  • Use this I missed a closing parenthesis while copying the code. df3.loc[(df3['Code_df1'] == df3['Code_df2']) & (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

    – Abhishek Sharma
    Nov 30 '18 at 9:10

















It returns Invalid syntax on df3.loc[(df3['Code_df1'] == df3['Code_df2']) & df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'] = 1

– gabriella
Nov 29 '18 at 8:50







It returns Invalid syntax on df3.loc[(df3['Code_df1'] == df3['Code_df2']) & df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'] = 1

– gabriella
Nov 29 '18 at 8:50















Use this I missed a closing parenthesis while copying the code. df3.loc[(df3['Code_df1'] == df3['Code_df2']) & (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

– Abhishek Sharma
Nov 30 '18 at 9:10





Use this I missed a closing parenthesis while copying the code. df3.loc[(df3['Code_df1'] == df3['Code_df2']) & (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

– Abhishek Sharma
Nov 30 '18 at 9:10




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53499263%2fcompare-multiple-columns-in-two-dataframes-and-select-rows-with-differing-values%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Lallio

Futebolista

Jornalista