Pandas Empty Series Dataframe Constructor vs CSV

-1

I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:

df = pd.DataFrame([

        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]

    ], 

    columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])



index_cols = df.columns.tolist()

index_cols.remove("dest_port") 

df = df.groupby(index_cols)["dest_port"].apply(list)

df = df.reset_index()

DATAFRAME

          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count

0  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1025  outbound  allowed            2

1  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1026  outbound  allowed            2

2  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1028  outbound  allowed            2

3  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1029  outbound  allowed            2

4  Firewall-1       outside       tcp  3.3.3.3        22  2.2.2.2       2200  outbound  allowed            2

OUTPUT

   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count

    Firewall-1  outside        tcp        3.3.3.3  22        2.2.2.2  outbound   allowed         2              [2200]

                                          4.4.4.4  22        1.1.1.1  outbound   allowed         2        [1028, 1029]

                                                   53        1.1.1.1  outbound   allowed         2        [1025, 1026]

The issue is when I attempt to import the data from CSV:

fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)

df = pd.DataFrame(fwdata)



index_cols = df.columns.tolist()

index_cols.remove("dest_port")

df = df.groupby(index_cols)["dest_port"].apply(list)

df.reset_index()

print(df.head(10))

DATAFRAME
Same as above

OUTPUT

Series(, Name: dest_port, dtype: float64)

The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!

CSV

dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2

asked Nov 26 '18 at 18:31

depark

Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

– G. Anderson
Nov 26 '18 at 21:31

add a comment |

-1

I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:

df = pd.DataFrame([

        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]

    ], 

    columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])



index_cols = df.columns.tolist()

index_cols.remove("dest_port") 

df = df.groupby(index_cols)["dest_port"].apply(list)

df = df.reset_index()

DATAFRAME

          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count

0  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1025  outbound  allowed            2

1  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1026  outbound  allowed            2

2  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1028  outbound  allowed            2

3  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1029  outbound  allowed            2

4  Firewall-1       outside       tcp  3.3.3.3        22  2.2.2.2       2200  outbound  allowed            2

OUTPUT

   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count

    Firewall-1  outside        tcp        3.3.3.3  22        2.2.2.2  outbound   allowed         2              [2200]

                                          4.4.4.4  22        1.1.1.1  outbound   allowed         2        [1028, 1029]

                                                   53        1.1.1.1  outbound   allowed         2        [1025, 1026]

The issue is when I attempt to import the data from CSV:

fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)

df = pd.DataFrame(fwdata)



index_cols = df.columns.tolist()

index_cols.remove("dest_port")

df = df.groupby(index_cols)["dest_port"].apply(list)

df.reset_index()

print(df.head(10))

DATAFRAME
Same as above

OUTPUT

Series(, Name: dest_port, dtype: float64)

The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!

CSV

dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2

asked Nov 26 '18 at 18:31

depark

Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

– G. Anderson
Nov 26 '18 at 21:31

add a comment |

-1

I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:

df = pd.DataFrame([

        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]

    ], 

    columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])



index_cols = df.columns.tolist()

index_cols.remove("dest_port") 

df = df.groupby(index_cols)["dest_port"].apply(list)

df = df.reset_index()

DATAFRAME

          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count

0  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1025  outbound  allowed            2

1  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1026  outbound  allowed            2

2  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1028  outbound  allowed            2

3  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1029  outbound  allowed            2

4  Firewall-1       outside       tcp  3.3.3.3        22  2.2.2.2       2200  outbound  allowed            2

OUTPUT

   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count

    Firewall-1  outside        tcp        3.3.3.3  22        2.2.2.2  outbound   allowed         2              [2200]

                                          4.4.4.4  22        1.1.1.1  outbound   allowed         2        [1028, 1029]

                                                   53        1.1.1.1  outbound   allowed         2        [1025, 1026]

The issue is when I attempt to import the data from CSV:

fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)

df = pd.DataFrame(fwdata)



index_cols = df.columns.tolist()

index_cols.remove("dest_port")

df = df.groupby(index_cols)["dest_port"].apply(list)

df.reset_index()

print(df.head(10))

DATAFRAME
Same as above

OUTPUT

Series(, Name: dest_port, dtype: float64)

The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!

CSV

dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2

asked Nov 26 '18 at 18:31

depark

I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:

df = pd.DataFrame([

        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2], 

        ["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]

    ], 

    columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])



index_cols = df.columns.tolist()

index_cols.remove("dest_port") 

df = df.groupby(index_cols)["dest_port"].apply(list)

df = df.reset_index()

DATAFRAME

          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count

0  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1025  outbound  allowed            2

1  Firewall-1       outside       tcp  4.4.4.4        53  1.1.1.1       1026  outbound  allowed            2

2  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1028  outbound  allowed            2

3  Firewall-1       outside       tcp  4.4.4.4        22  1.1.1.1       1029  outbound  allowed            2

4  Firewall-1       outside       tcp  3.3.3.3        22  2.2.2.2       2200  outbound  allowed            2

OUTPUT

   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count

    Firewall-1  outside        tcp        3.3.3.3  22        2.2.2.2  outbound   allowed         2              [2200]

                                          4.4.4.4  22        1.1.1.1  outbound   allowed         2        [1028, 1029]

                                                   53        1.1.1.1  outbound   allowed         2        [1025, 1026]

The issue is when I attempt to import the data from CSV:

fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)

df = pd.DataFrame(fwdata)



index_cols = df.columns.tolist()

index_cols.remove("dest_port")

df = df.groupby(index_cols)["dest_port"].apply(list)

df.reset_index()

print(df.head(10))

DATAFRAME
Same as above

OUTPUT

Series(, Name: dest_port, dtype: float64)

The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!

CSV

dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2

python pandas dataframe aggregate series

asked Nov 26 '18 at 18:31

depark

asked Nov 26 '18 at 18:31

depark

asked Nov 26 '18 at 18:31

depark

asked Nov 26 '18 at 18:31

depark

asked Nov 26 '18 at 18:31

depark

Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

– G. Anderson
Nov 26 '18 at 21:31

add a comment |

Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

– G. Anderson
Nov 26 '18 at 21:31

Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

– G. Anderson
Nov 26 '18 at 21:31

add a comment |

1 Answer
1

active

oldest

votes

The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.

Dropping the column:

df.drop(columns=['column_name'], inplace=True)

Padding out the column with data:

df.column_name.fillna('', inplace=True)

(For these examples column_name = 'cause')

answered Nov 27 '18 at 23:10

depark

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487053%2fpandas-empty-series-dataframe-constructor-vs-csv%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.

Dropping the column:

df.drop(columns=['column_name'], inplace=True)

Padding out the column with data:

df.column_name.fillna('', inplace=True)

(For these examples column_name = 'cause')

answered Nov 27 '18 at 23:10

depark

add a comment |

The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.

Dropping the column:

df.drop(columns=['column_name'], inplace=True)

Padding out the column with data:

df.column_name.fillna('', inplace=True)

(For these examples column_name = 'cause')

answered Nov 27 '18 at 23:10

depark

add a comment |

The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.

Dropping the column:

df.drop(columns=['column_name'], inplace=True)

Padding out the column with data:

df.column_name.fillna('', inplace=True)

(For these examples column_name = 'cause')

answered Nov 27 '18 at 23:10

depark

The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.

Dropping the column:

df.drop(columns=['column_name'], inplace=True)

Padding out the column with data:

df.column_name.fillna('', inplace=True)

(For these examples column_name = 'cause')

answered Nov 27 '18 at 23:10

depark

answered Nov 27 '18 at 23:10

depark

answered Nov 27 '18 at 23:10

depark

answered Nov 27 '18 at 23:10

depark

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl