Pandas Empty Series Dataframe Constructor vs CSV












-1















I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:



df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()


DATAFRAME



          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2


OUTPUT



   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]


The issue is when I attempt to import the data from CSV:



fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))


DATAFRAME
Same as above



OUTPUT



Series(, Name: dest_port, dtype: float64) 


The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!



CSV



dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2









share|improve this question























  • Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

    – G. Anderson
    Nov 26 '18 at 21:31
















-1















I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:



df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()


DATAFRAME



          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2


OUTPUT



   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]


The issue is when I attempt to import the data from CSV:



fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))


DATAFRAME
Same as above



OUTPUT



Series(, Name: dest_port, dtype: float64) 


The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!



CSV



dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2









share|improve this question























  • Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

    – G. Anderson
    Nov 26 '18 at 21:31














-1












-1








-1








I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:



df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()


DATAFRAME



          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2


OUTPUT



   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]


The issue is when I attempt to import the data from CSV:



fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))


DATAFRAME
Same as above



OUTPUT



Series(, Name: dest_port, dtype: float64) 


The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!



CSV



dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2









share|improve this question














I'm trying to aggregate data in a dataframe by specific columns. When I use a dataframe constructor it works:



df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()


DATAFRAME



          dvc src_interface transport   src_ip  src_port  dest_ip  dest_port direction   action cause  count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed 2
2 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1028 outbound allowed 2
3 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 1029 outbound allowed 2
4 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed 2


OUTPUT



   dvc         src_interface  transport  src_ip   src_port  dest_ip  direction  action   cause  count
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028, 1029]
53 1.1.1.1 outbound allowed 2 [1025, 1026]


The issue is when I attempt to import the data from CSV:



fwdata = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
df = pd.DataFrame(fwdata)

index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df.reset_index()
print(df.head(10))


DATAFRAME
Same as above



OUTPUT



Series(, Name: dest_port, dtype: float64) 


The CSV file has the exact same data as the constructor above but it appears to be treated differently. Any help would be appreciated. Thanks in advance!



CSV



dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",direction,action,cause,count "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22,"2.2.2.2",2200,outbound,allowed,"",2






python pandas dataframe aggregate series






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 26 '18 at 18:31









deparkdepark

32




32













  • Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

    – G. Anderson
    Nov 26 '18 at 21:31



















  • Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

    – G. Anderson
    Nov 26 '18 at 21:31

















Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

– G. Anderson
Nov 26 '18 at 21:31





Have you verified that you get data from pd.read_csv(f) if you run it on just one file?

– G. Anderson
Nov 26 '18 at 21:31












1 Answer
1






active

oldest

votes


















0














The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.



Dropping the column:



df.drop(columns=['column_name'], inplace=True)


Padding out the column with data:



df.column_name.fillna('', inplace=True)


(For these examples column_name = 'cause')






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487053%2fpandas-empty-series-dataframe-constructor-vs-csv%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.



    Dropping the column:



    df.drop(columns=['column_name'], inplace=True)


    Padding out the column with data:



    df.column_name.fillna('', inplace=True)


    (For these examples column_name = 'cause')






    share|improve this answer




























      0














      The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.



      Dropping the column:



      df.drop(columns=['column_name'], inplace=True)


      Padding out the column with data:



      df.column_name.fillna('', inplace=True)


      (For these examples column_name = 'cause')






      share|improve this answer


























        0












        0








        0







        The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.



        Dropping the column:



        df.drop(columns=['column_name'], inplace=True)


        Padding out the column with data:



        df.column_name.fillna('', inplace=True)


        (For these examples column_name = 'cause')






        share|improve this answer













        The issue was the empty data in the 'cause' column. Pandas hates this. You can solve this issue with either of the below solutions.



        Dropping the column:



        df.drop(columns=['column_name'], inplace=True)


        Padding out the column with data:



        df.column_name.fillna('', inplace=True)


        (For these examples column_name = 'cause')







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 27 '18 at 23:10









        deparkdepark

        32




        32
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53487053%2fpandas-empty-series-dataframe-constructor-vs-csv%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Lallio

            Futebolista

            Jornalista