Pandas Empty Series Dataframe Constructor vs CSV
I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:
df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()
DATAFRAME
dvc src_interface transport src_ip src_port dest_ip dest_port direction action cause count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2
OUTPUT
dvc src_interface transport src_ip src_port dest_ip direction action cause count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]
The issue is when I attempt to import the data from CSV:
fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))
DATAFRAME
Same as above
OUTPUT
Series(, Name: dest_port, dtype: float64)
The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!
CSV
dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2
python pandas dataframe aggregate series
add a comment |
I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:
df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()
DATAFRAME
dvc src_interface transport src_ip src_port dest_ip dest_port direction action cause count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2
OUTPUT
dvc src_interface transport src_ip src_port dest_ip direction action cause count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]
The issue is when I attempt to import the data from CSV:
fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))
DATAFRAME
Same as above
OUTPUT
Series(, Name: dest_port, dtype: float64)
The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!
CSV
dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2
python pandas dataframe aggregate series
Have you verified that you get data frompd.read_csv(f)if you run it on just one file?
– G. Anderson
Nov 26 '18 at 21:31
add a comment |
I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:
df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()
DATAFRAME
dvc src_interface transport src_ip src_port dest_ip dest_port direction action cause count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2
OUTPUT
dvc src_interface transport src_ip src_port dest_ip direction action cause count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]
The issue is when I attempt to import the data from CSV:
fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))
DATAFRAME
Same as above
OUTPUT
Series(, Name: dest_port, dtype: float64)
The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!
CSV
dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2
python pandas dataframe aggregate series
I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:
df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()
DATAFRAME
dvc src_interface transport src_ip src_port dest_ip dest_port direction action cause count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2
OUTPUT
dvc src_interface transport src_ip src_port dest_ip direction action cause count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]
The issue is when I attempt to import the data from CSV:
fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))
DATAFRAME
Same as above
OUTPUT
Series(, Name: dest_port, dtype: float64)
The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!
CSV
dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2
python pandas dataframe aggregate series
python pandas dataframe aggregate series
asked Nov 26 '18 at 18:31
deparkdepark
32
32
Have you verified that you get data frompd.read_csv(f)if you run it on just one file?
– G. Anderson
Nov 26 '18 at 21:31
add a comment |
Have you verified that you get data frompd.read_csv(f)if you run it on just one file?
– G. Anderson
Nov 26 '18 at 21:31
Have you verified that you get data from
pd.read_csv(f) if you run it on just one file?– G. Anderson
Nov 26 '18 at 21:31
Have you verified that you get data from
pd.read_csv(f) if you run it on just one file?– G. Anderson
Nov 26 '18 at 21:31
add a comment |
1 Answer
1
active
oldest
votes
The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.
Dropping the column:
df.drop(columns=['column_name'], inplace=True)
Padding out the column with data:
df.column_name.fillna('', inplace=True)
(For these examples column_name = 'cause')
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487053%2fpandas-empty-series-dataframe-constructor-vs-csv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.
Dropping the column:
df.drop(columns=['column_name'], inplace=True)
Padding out the column with data:
df.column_name.fillna('', inplace=True)
(For these examples column_name = 'cause')
add a comment |
The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.
Dropping the column:
df.drop(columns=['column_name'], inplace=True)
Padding out the column with data:
df.column_name.fillna('', inplace=True)
(For these examples column_name = 'cause')
add a comment |
The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.
Dropping the column:
df.drop(columns=['column_name'], inplace=True)
Padding out the column with data:
df.column_name.fillna('', inplace=True)
(For these examples column_name = 'cause')
The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.
Dropping the column:
df.drop(columns=['column_name'], inplace=True)
Padding out the column with data:
df.column_name.fillna('', inplace=True)
(For these examples column_name = 'cause')
answered Nov 27 '18 at 23:10
deparkdepark
32
32
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487053%2fpandas-empty-series-dataframe-constructor-vs-csv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Have you verified that you get data from
pd.read_csv(f)if you run it on just one file?– G. Anderson
Nov 26 '18 at 21:31