How to read lines from a mmapped file?












19















Is seems that the mmap interface only supports readline().
If I try to iterate over the object I get character instead of complete lines.



What would be the "pythonic" method of reading a mmap'ed file line by line?



import sys
import mmap
import os


if (len(sys.argv) > 1):
STAT_FILE=sys.argv[1]
print STAT_FILE
else:
print "Need to know <statistics file name path>"
sys.exit(1)


with open(STAT_FILE, "r") as f:
map = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
for line in map:
print line # RETURNS single characters instead of whole line









share|improve this question




















  • 1





    Out of interest, what's the motivation for using a memory-mapped file for this, as opposed to a normal file?

    – NPE
    Nov 16 '11 at 13:25






  • 1





    @aix: I could possibly have GB's of raw data, and I would like to access them in the most efficient method possible. But the real reason is: It's cooler :)

    – Maxim Veksler
    Nov 16 '11 at 15:33











  • I don't know whether it's cooler, but you shouldn't simply assume that it's faster (if you really care, you ought to profile).

    – NPE
    Nov 16 '11 at 15:36











  • I added some timings to my post below.

    – hochl
    Nov 16 '11 at 16:14
















19















Is seems that the mmap interface only supports readline().
If I try to iterate over the object I get character instead of complete lines.



What would be the "pythonic" method of reading a mmap'ed file line by line?



import sys
import mmap
import os


if (len(sys.argv) > 1):
STAT_FILE=sys.argv[1]
print STAT_FILE
else:
print "Need to know <statistics file name path>"
sys.exit(1)


with open(STAT_FILE, "r") as f:
map = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
for line in map:
print line # RETURNS single characters instead of whole line









share|improve this question




















  • 1





    Out of interest, what's the motivation for using a memory-mapped file for this, as opposed to a normal file?

    – NPE
    Nov 16 '11 at 13:25






  • 1





    @aix: I could possibly have GB's of raw data, and I would like to access them in the most efficient method possible. But the real reason is: It's cooler :)

    – Maxim Veksler
    Nov 16 '11 at 15:33











  • I don't know whether it's cooler, but you shouldn't simply assume that it's faster (if you really care, you ought to profile).

    – NPE
    Nov 16 '11 at 15:36











  • I added some timings to my post below.

    – hochl
    Nov 16 '11 at 16:14














19












19








19


9






Is seems that the mmap interface only supports readline().
If I try to iterate over the object I get character instead of complete lines.



What would be the "pythonic" method of reading a mmap'ed file line by line?



import sys
import mmap
import os


if (len(sys.argv) > 1):
STAT_FILE=sys.argv[1]
print STAT_FILE
else:
print "Need to know <statistics file name path>"
sys.exit(1)


with open(STAT_FILE, "r") as f:
map = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
for line in map:
print line # RETURNS single characters instead of whole line









share|improve this question
















Is seems that the mmap interface only supports readline().
If I try to iterate over the object I get character instead of complete lines.



What would be the "pythonic" method of reading a mmap'ed file line by line?



import sys
import mmap
import os


if (len(sys.argv) > 1):
STAT_FILE=sys.argv[1]
print STAT_FILE
else:
print "Need to know <statistics file name path>"
sys.exit(1)


with open(STAT_FILE, "r") as f:
map = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
for line in map:
print line # RETURNS single characters instead of whole line






python file text mmap






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 23:43









martineau

67.8k1089182




67.8k1089182










asked Nov 16 '11 at 12:27









Maxim VekslerMaxim Veksler

11.6k31102139




11.6k31102139








  • 1





    Out of interest, what's the motivation for using a memory-mapped file for this, as opposed to a normal file?

    – NPE
    Nov 16 '11 at 13:25






  • 1





    @aix: I could possibly have GB's of raw data, and I would like to access them in the most efficient method possible. But the real reason is: It's cooler :)

    – Maxim Veksler
    Nov 16 '11 at 15:33











  • I don't know whether it's cooler, but you shouldn't simply assume that it's faster (if you really care, you ought to profile).

    – NPE
    Nov 16 '11 at 15:36











  • I added some timings to my post below.

    – hochl
    Nov 16 '11 at 16:14














  • 1





    Out of interest, what's the motivation for using a memory-mapped file for this, as opposed to a normal file?

    – NPE
    Nov 16 '11 at 13:25






  • 1





    @aix: I could possibly have GB's of raw data, and I would like to access them in the most efficient method possible. But the real reason is: It's cooler :)

    – Maxim Veksler
    Nov 16 '11 at 15:33











  • I don't know whether it's cooler, but you shouldn't simply assume that it's faster (if you really care, you ought to profile).

    – NPE
    Nov 16 '11 at 15:36











  • I added some timings to my post below.

    – hochl
    Nov 16 '11 at 16:14








1




1





Out of interest, what's the motivation for using a memory-mapped file for this, as opposed to a normal file?

– NPE
Nov 16 '11 at 13:25





Out of interest, what's the motivation for using a memory-mapped file for this, as opposed to a normal file?

– NPE
Nov 16 '11 at 13:25




1




1





@aix: I could possibly have GB's of raw data, and I would like to access them in the most efficient method possible. But the real reason is: It's cooler :)

– Maxim Veksler
Nov 16 '11 at 15:33





@aix: I could possibly have GB's of raw data, and I would like to access them in the most efficient method possible. But the real reason is: It's cooler :)

– Maxim Veksler
Nov 16 '11 at 15:33













I don't know whether it's cooler, but you shouldn't simply assume that it's faster (if you really care, you ought to profile).

– NPE
Nov 16 '11 at 15:36





I don't know whether it's cooler, but you shouldn't simply assume that it's faster (if you really care, you ought to profile).

– NPE
Nov 16 '11 at 15:36













I added some timings to my post below.

– hochl
Nov 16 '11 at 16:14





I added some timings to my post below.

– hochl
Nov 16 '11 at 16:14












4 Answers
4






active

oldest

votes


















25














The most concise way to iterate over the lines of an mmap is



with open(STAT_FILE, "r+b") as f:
map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
for line in iter(map_file.readline, b""):
# whatever


Note that in Python 3 the sentinel parameter of iter() must be of type bytes, while in Python 2 it needs to be a str (i.e. "" instead of b"").






share|improve this answer





















  • 3





    I didn't know iter took this callable/sentinel argument pair. +1 and removed my answer in favor of this one.

    – Fred Foo
    Nov 16 '11 at 13:37













  • And please change the open mode to r+b instead of r (as mentioned in my post below).

    – hochl
    Nov 16 '11 at 13:59













  • @hochl: Thanks, done.

    – Sven Marnach
    Nov 16 '11 at 14:04



















14














I modified your example like this:



with open(STAT_FILE, "r+b") as f:
m=mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
while True:
line=m.readline()
if line == '': break
print line.rstrip()


Suggestions:




  • Do not call a variable map, this is a built-in function.

  • Open the file in r+b mode, as in the Python example on the mmap help page. It states: In either case you must provide a file descriptor for a file opened for update. See http://docs.python.org/library/mmap.html#mmap.mmap.

  • It's better to not use UPPER_CASE_WITH_UNDERSCORES global variable names, as mentioned in Global Variable Names at https://www.python.org/dev/peps/pep-0008/#global-variable-names. In other programming languages (like C), constants are often written all uppercase.


Hope this helps.



Edit: I did some timing tests on Linux because the comment made me curious. Here is a comparison of timings made on 5 sequential runs on a 137MB text file.



Normal file access:



real    2.410 2.414 2.428 2.478 2.490
sys 0.052 0.052 0.064 0.080 0.152
user 2.232 2.276 2.292 2.304 2.320


mmap file access:



real    1.885 1.899 1.925 1.940 1.954
sys 0.088 0.108 0.108 0.116 0.120
user 1.696 1.732 1.736 1.744 1.752


Those timings do not include the print statement (I excluded it). Following these numbers I'd say memory mapped file access is quite a bit faster.



Edit 2: Using python -m cProfile test.py I got the following results:



5432833    2.273    0.000    2.273    0.000 {method 'readline' of 'file' objects}
5432833 1.451 0.000 1.451 0.000 {method 'readline' of 'mmap.mmap' objects}


If I'm not mistaken then mmap is quite a bit faster.



Additionally, it seems not len(line) performs worse than line == '', at least that's how I interpret the profiler output.






share|improve this answer


























  • AttributeError: 'mmap.mmap' object has no attribute 'readlines'

    – Fred Foo
    Nov 16 '11 at 12:33






  • 1





    hochl: Thank you. The benchmarks are great. Could you attach a script to reproduce the test and confirm the analysis?

    – Maxim Veksler
    Nov 16 '11 at 16:33






  • 1





    I simply commented out the print in your program and then did time test.py like 10 times, then took the 5 middle values. It would be interesting to check the results of python -m cProfile test.py.

    – hochl
    Nov 16 '11 at 16:51



















1














The following is reasonably concise:



with open(STAT_FILE, "r") as f:
m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
while True:
line = m.readline()
if line == "": break
print line
m.close()


Note that line retains the newline, so you might like to remove it. It is also the reason why if line == "" does the right thing (an empty line is returned as "n").



The reason the original iteration works the way it does is that mmap tries to look like both a file and a string. It looks like a string for the purposes of iteration.



I have no idea why it can't (or chooses not to) provide readlines()/xreadlines().






share|improve this answer


























  • The readlines() method of file objects returns a list of all lines of the file. doing this on an mmapped file would completely defeat the purpose of the mmap.

    – Sven Marnach
    Nov 16 '11 at 13:04











  • @SvenMarnach: It could be a generator. In any case, to be totally honest I fail to see the need for memory-mapped files in this entire question.

    – NPE
    Nov 16 '11 at 13:28













  • You are completely right. So maybe the reason for the non-existence of such a generator is that it would be pointless. :)

    – Sven Marnach
    Nov 16 '11 at 13:32



















1














Python 2.7 32bit on Windows is more than twice as fast on an mmapped file:



On a 27MB, 509k line text file (my 'parse' function is not interesting it mostly just readline()'s very rapidly):



with open(someFile,"r") as f:
if usemmap:
m=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
else:
m=f
e.parse(m)


With MMAP:



read in 0.308000087738


Without MMAP:



read in 0.680999994278





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f8151684%2fhow-to-read-lines-from-a-mmapped-file%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    25














    The most concise way to iterate over the lines of an mmap is



    with open(STAT_FILE, "r+b") as f:
    map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    for line in iter(map_file.readline, b""):
    # whatever


    Note that in Python 3 the sentinel parameter of iter() must be of type bytes, while in Python 2 it needs to be a str (i.e. "" instead of b"").






    share|improve this answer





















    • 3





      I didn't know iter took this callable/sentinel argument pair. +1 and removed my answer in favor of this one.

      – Fred Foo
      Nov 16 '11 at 13:37













    • And please change the open mode to r+b instead of r (as mentioned in my post below).

      – hochl
      Nov 16 '11 at 13:59













    • @hochl: Thanks, done.

      – Sven Marnach
      Nov 16 '11 at 14:04
















    25














    The most concise way to iterate over the lines of an mmap is



    with open(STAT_FILE, "r+b") as f:
    map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    for line in iter(map_file.readline, b""):
    # whatever


    Note that in Python 3 the sentinel parameter of iter() must be of type bytes, while in Python 2 it needs to be a str (i.e. "" instead of b"").






    share|improve this answer





















    • 3





      I didn't know iter took this callable/sentinel argument pair. +1 and removed my answer in favor of this one.

      – Fred Foo
      Nov 16 '11 at 13:37













    • And please change the open mode to r+b instead of r (as mentioned in my post below).

      – hochl
      Nov 16 '11 at 13:59













    • @hochl: Thanks, done.

      – Sven Marnach
      Nov 16 '11 at 14:04














    25












    25








    25







    The most concise way to iterate over the lines of an mmap is



    with open(STAT_FILE, "r+b") as f:
    map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    for line in iter(map_file.readline, b""):
    # whatever


    Note that in Python 3 the sentinel parameter of iter() must be of type bytes, while in Python 2 it needs to be a str (i.e. "" instead of b"").






    share|improve this answer















    The most concise way to iterate over the lines of an mmap is



    with open(STAT_FILE, "r+b") as f:
    map_file = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    for line in iter(map_file.readline, b""):
    # whatever


    Note that in Python 3 the sentinel parameter of iter() must be of type bytes, while in Python 2 it needs to be a str (i.e. "" instead of b"").







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Feb 4 at 19:17

























    answered Nov 16 '11 at 13:01









    Sven MarnachSven Marnach

    352k79748695




    352k79748695








    • 3





      I didn't know iter took this callable/sentinel argument pair. +1 and removed my answer in favor of this one.

      – Fred Foo
      Nov 16 '11 at 13:37













    • And please change the open mode to r+b instead of r (as mentioned in my post below).

      – hochl
      Nov 16 '11 at 13:59













    • @hochl: Thanks, done.

      – Sven Marnach
      Nov 16 '11 at 14:04














    • 3





      I didn't know iter took this callable/sentinel argument pair. +1 and removed my answer in favor of this one.

      – Fred Foo
      Nov 16 '11 at 13:37













    • And please change the open mode to r+b instead of r (as mentioned in my post below).

      – hochl
      Nov 16 '11 at 13:59













    • @hochl: Thanks, done.

      – Sven Marnach
      Nov 16 '11 at 14:04








    3




    3





    I didn't know iter took this callable/sentinel argument pair. +1 and removed my answer in favor of this one.

    – Fred Foo
    Nov 16 '11 at 13:37







    I didn't know iter took this callable/sentinel argument pair. +1 and removed my answer in favor of this one.

    – Fred Foo
    Nov 16 '11 at 13:37















    And please change the open mode to r+b instead of r (as mentioned in my post below).

    – hochl
    Nov 16 '11 at 13:59







    And please change the open mode to r+b instead of r (as mentioned in my post below).

    – hochl
    Nov 16 '11 at 13:59















    @hochl: Thanks, done.

    – Sven Marnach
    Nov 16 '11 at 14:04





    @hochl: Thanks, done.

    – Sven Marnach
    Nov 16 '11 at 14:04













    14














    I modified your example like this:



    with open(STAT_FILE, "r+b") as f:
    m=mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
    line=m.readline()
    if line == '': break
    print line.rstrip()


    Suggestions:




    • Do not call a variable map, this is a built-in function.

    • Open the file in r+b mode, as in the Python example on the mmap help page. It states: In either case you must provide a file descriptor for a file opened for update. See http://docs.python.org/library/mmap.html#mmap.mmap.

    • It's better to not use UPPER_CASE_WITH_UNDERSCORES global variable names, as mentioned in Global Variable Names at https://www.python.org/dev/peps/pep-0008/#global-variable-names. In other programming languages (like C), constants are often written all uppercase.


    Hope this helps.



    Edit: I did some timing tests on Linux because the comment made me curious. Here is a comparison of timings made on 5 sequential runs on a 137MB text file.



    Normal file access:



    real    2.410 2.414 2.428 2.478 2.490
    sys 0.052 0.052 0.064 0.080 0.152
    user 2.232 2.276 2.292 2.304 2.320


    mmap file access:



    real    1.885 1.899 1.925 1.940 1.954
    sys 0.088 0.108 0.108 0.116 0.120
    user 1.696 1.732 1.736 1.744 1.752


    Those timings do not include the print statement (I excluded it). Following these numbers I'd say memory mapped file access is quite a bit faster.



    Edit 2: Using python -m cProfile test.py I got the following results:



    5432833    2.273    0.000    2.273    0.000 {method 'readline' of 'file' objects}
    5432833 1.451 0.000 1.451 0.000 {method 'readline' of 'mmap.mmap' objects}


    If I'm not mistaken then mmap is quite a bit faster.



    Additionally, it seems not len(line) performs worse than line == '', at least that's how I interpret the profiler output.






    share|improve this answer


























    • AttributeError: 'mmap.mmap' object has no attribute 'readlines'

      – Fred Foo
      Nov 16 '11 at 12:33






    • 1





      hochl: Thank you. The benchmarks are great. Could you attach a script to reproduce the test and confirm the analysis?

      – Maxim Veksler
      Nov 16 '11 at 16:33






    • 1





      I simply commented out the print in your program and then did time test.py like 10 times, then took the 5 middle values. It would be interesting to check the results of python -m cProfile test.py.

      – hochl
      Nov 16 '11 at 16:51
















    14














    I modified your example like this:



    with open(STAT_FILE, "r+b") as f:
    m=mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
    line=m.readline()
    if line == '': break
    print line.rstrip()


    Suggestions:




    • Do not call a variable map, this is a built-in function.

    • Open the file in r+b mode, as in the Python example on the mmap help page. It states: In either case you must provide a file descriptor for a file opened for update. See http://docs.python.org/library/mmap.html#mmap.mmap.

    • It's better to not use UPPER_CASE_WITH_UNDERSCORES global variable names, as mentioned in Global Variable Names at https://www.python.org/dev/peps/pep-0008/#global-variable-names. In other programming languages (like C), constants are often written all uppercase.


    Hope this helps.



    Edit: I did some timing tests on Linux because the comment made me curious. Here is a comparison of timings made on 5 sequential runs on a 137MB text file.



    Normal file access:



    real    2.410 2.414 2.428 2.478 2.490
    sys 0.052 0.052 0.064 0.080 0.152
    user 2.232 2.276 2.292 2.304 2.320


    mmap file access:



    real    1.885 1.899 1.925 1.940 1.954
    sys 0.088 0.108 0.108 0.116 0.120
    user 1.696 1.732 1.736 1.744 1.752


    Those timings do not include the print statement (I excluded it). Following these numbers I'd say memory mapped file access is quite a bit faster.



    Edit 2: Using python -m cProfile test.py I got the following results:



    5432833    2.273    0.000    2.273    0.000 {method 'readline' of 'file' objects}
    5432833 1.451 0.000 1.451 0.000 {method 'readline' of 'mmap.mmap' objects}


    If I'm not mistaken then mmap is quite a bit faster.



    Additionally, it seems not len(line) performs worse than line == '', at least that's how I interpret the profiler output.






    share|improve this answer


























    • AttributeError: 'mmap.mmap' object has no attribute 'readlines'

      – Fred Foo
      Nov 16 '11 at 12:33






    • 1





      hochl: Thank you. The benchmarks are great. Could you attach a script to reproduce the test and confirm the analysis?

      – Maxim Veksler
      Nov 16 '11 at 16:33






    • 1





      I simply commented out the print in your program and then did time test.py like 10 times, then took the 5 middle values. It would be interesting to check the results of python -m cProfile test.py.

      – hochl
      Nov 16 '11 at 16:51














    14












    14








    14







    I modified your example like this:



    with open(STAT_FILE, "r+b") as f:
    m=mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
    line=m.readline()
    if line == '': break
    print line.rstrip()


    Suggestions:




    • Do not call a variable map, this is a built-in function.

    • Open the file in r+b mode, as in the Python example on the mmap help page. It states: In either case you must provide a file descriptor for a file opened for update. See http://docs.python.org/library/mmap.html#mmap.mmap.

    • It's better to not use UPPER_CASE_WITH_UNDERSCORES global variable names, as mentioned in Global Variable Names at https://www.python.org/dev/peps/pep-0008/#global-variable-names. In other programming languages (like C), constants are often written all uppercase.


    Hope this helps.



    Edit: I did some timing tests on Linux because the comment made me curious. Here is a comparison of timings made on 5 sequential runs on a 137MB text file.



    Normal file access:



    real    2.410 2.414 2.428 2.478 2.490
    sys 0.052 0.052 0.064 0.080 0.152
    user 2.232 2.276 2.292 2.304 2.320


    mmap file access:



    real    1.885 1.899 1.925 1.940 1.954
    sys 0.088 0.108 0.108 0.116 0.120
    user 1.696 1.732 1.736 1.744 1.752


    Those timings do not include the print statement (I excluded it). Following these numbers I'd say memory mapped file access is quite a bit faster.



    Edit 2: Using python -m cProfile test.py I got the following results:



    5432833    2.273    0.000    2.273    0.000 {method 'readline' of 'file' objects}
    5432833 1.451 0.000 1.451 0.000 {method 'readline' of 'mmap.mmap' objects}


    If I'm not mistaken then mmap is quite a bit faster.



    Additionally, it seems not len(line) performs worse than line == '', at least that's how I interpret the profiler output.






    share|improve this answer















    I modified your example like this:



    with open(STAT_FILE, "r+b") as f:
    m=mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
    line=m.readline()
    if line == '': break
    print line.rstrip()


    Suggestions:




    • Do not call a variable map, this is a built-in function.

    • Open the file in r+b mode, as in the Python example on the mmap help page. It states: In either case you must provide a file descriptor for a file opened for update. See http://docs.python.org/library/mmap.html#mmap.mmap.

    • It's better to not use UPPER_CASE_WITH_UNDERSCORES global variable names, as mentioned in Global Variable Names at https://www.python.org/dev/peps/pep-0008/#global-variable-names. In other programming languages (like C), constants are often written all uppercase.


    Hope this helps.



    Edit: I did some timing tests on Linux because the comment made me curious. Here is a comparison of timings made on 5 sequential runs on a 137MB text file.



    Normal file access:



    real    2.410 2.414 2.428 2.478 2.490
    sys 0.052 0.052 0.064 0.080 0.152
    user 2.232 2.276 2.292 2.304 2.320


    mmap file access:



    real    1.885 1.899 1.925 1.940 1.954
    sys 0.088 0.108 0.108 0.116 0.120
    user 1.696 1.732 1.736 1.744 1.752


    Those timings do not include the print statement (I excluded it). Following these numbers I'd say memory mapped file access is quite a bit faster.



    Edit 2: Using python -m cProfile test.py I got the following results:



    5432833    2.273    0.000    2.273    0.000 {method 'readline' of 'file' objects}
    5432833 1.451 0.000 1.451 0.000 {method 'readline' of 'mmap.mmap' objects}


    If I'm not mistaken then mmap is quite a bit faster.



    Additionally, it seems not len(line) performs worse than line == '', at least that's how I interpret the profiler output.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 27 '18 at 0:56









    martineau

    67.8k1089182




    67.8k1089182










    answered Nov 16 '11 at 12:32









    hochlhochl

    9,04973568




    9,04973568













    • AttributeError: 'mmap.mmap' object has no attribute 'readlines'

      – Fred Foo
      Nov 16 '11 at 12:33






    • 1





      hochl: Thank you. The benchmarks are great. Could you attach a script to reproduce the test and confirm the analysis?

      – Maxim Veksler
      Nov 16 '11 at 16:33






    • 1





      I simply commented out the print in your program and then did time test.py like 10 times, then took the 5 middle values. It would be interesting to check the results of python -m cProfile test.py.

      – hochl
      Nov 16 '11 at 16:51



















    • AttributeError: 'mmap.mmap' object has no attribute 'readlines'

      – Fred Foo
      Nov 16 '11 at 12:33






    • 1





      hochl: Thank you. The benchmarks are great. Could you attach a script to reproduce the test and confirm the analysis?

      – Maxim Veksler
      Nov 16 '11 at 16:33






    • 1





      I simply commented out the print in your program and then did time test.py like 10 times, then took the 5 middle values. It would be interesting to check the results of python -m cProfile test.py.

      – hochl
      Nov 16 '11 at 16:51

















    AttributeError: 'mmap.mmap' object has no attribute 'readlines'

    – Fred Foo
    Nov 16 '11 at 12:33





    AttributeError: 'mmap.mmap' object has no attribute 'readlines'

    – Fred Foo
    Nov 16 '11 at 12:33




    1




    1





    hochl: Thank you. The benchmarks are great. Could you attach a script to reproduce the test and confirm the analysis?

    – Maxim Veksler
    Nov 16 '11 at 16:33





    hochl: Thank you. The benchmarks are great. Could you attach a script to reproduce the test and confirm the analysis?

    – Maxim Veksler
    Nov 16 '11 at 16:33




    1




    1





    I simply commented out the print in your program and then did time test.py like 10 times, then took the 5 middle values. It would be interesting to check the results of python -m cProfile test.py.

    – hochl
    Nov 16 '11 at 16:51





    I simply commented out the print in your program and then did time test.py like 10 times, then took the 5 middle values. It would be interesting to check the results of python -m cProfile test.py.

    – hochl
    Nov 16 '11 at 16:51











    1














    The following is reasonably concise:



    with open(STAT_FILE, "r") as f:
    m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
    line = m.readline()
    if line == "": break
    print line
    m.close()


    Note that line retains the newline, so you might like to remove it. It is also the reason why if line == "" does the right thing (an empty line is returned as "n").



    The reason the original iteration works the way it does is that mmap tries to look like both a file and a string. It looks like a string for the purposes of iteration.



    I have no idea why it can't (or chooses not to) provide readlines()/xreadlines().






    share|improve this answer


























    • The readlines() method of file objects returns a list of all lines of the file. doing this on an mmapped file would completely defeat the purpose of the mmap.

      – Sven Marnach
      Nov 16 '11 at 13:04











    • @SvenMarnach: It could be a generator. In any case, to be totally honest I fail to see the need for memory-mapped files in this entire question.

      – NPE
      Nov 16 '11 at 13:28













    • You are completely right. So maybe the reason for the non-existence of such a generator is that it would be pointless. :)

      – Sven Marnach
      Nov 16 '11 at 13:32
















    1














    The following is reasonably concise:



    with open(STAT_FILE, "r") as f:
    m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
    line = m.readline()
    if line == "": break
    print line
    m.close()


    Note that line retains the newline, so you might like to remove it. It is also the reason why if line == "" does the right thing (an empty line is returned as "n").



    The reason the original iteration works the way it does is that mmap tries to look like both a file and a string. It looks like a string for the purposes of iteration.



    I have no idea why it can't (or chooses not to) provide readlines()/xreadlines().






    share|improve this answer


























    • The readlines() method of file objects returns a list of all lines of the file. doing this on an mmapped file would completely defeat the purpose of the mmap.

      – Sven Marnach
      Nov 16 '11 at 13:04











    • @SvenMarnach: It could be a generator. In any case, to be totally honest I fail to see the need for memory-mapped files in this entire question.

      – NPE
      Nov 16 '11 at 13:28













    • You are completely right. So maybe the reason for the non-existence of such a generator is that it would be pointless. :)

      – Sven Marnach
      Nov 16 '11 at 13:32














    1












    1








    1







    The following is reasonably concise:



    with open(STAT_FILE, "r") as f:
    m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
    line = m.readline()
    if line == "": break
    print line
    m.close()


    Note that line retains the newline, so you might like to remove it. It is also the reason why if line == "" does the right thing (an empty line is returned as "n").



    The reason the original iteration works the way it does is that mmap tries to look like both a file and a string. It looks like a string for the purposes of iteration.



    I have no idea why it can't (or chooses not to) provide readlines()/xreadlines().






    share|improve this answer















    The following is reasonably concise:



    with open(STAT_FILE, "r") as f:
    m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
    while True:
    line = m.readline()
    if line == "": break
    print line
    m.close()


    Note that line retains the newline, so you might like to remove it. It is also the reason why if line == "" does the right thing (an empty line is returned as "n").



    The reason the original iteration works the way it does is that mmap tries to look like both a file and a string. It looks like a string for the purposes of iteration.



    I have no idea why it can't (or chooses not to) provide readlines()/xreadlines().







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 16 '11 at 13:24

























    answered Nov 16 '11 at 12:35









    NPENPE

    353k64753883




    353k64753883













    • The readlines() method of file objects returns a list of all lines of the file. doing this on an mmapped file would completely defeat the purpose of the mmap.

      – Sven Marnach
      Nov 16 '11 at 13:04











    • @SvenMarnach: It could be a generator. In any case, to be totally honest I fail to see the need for memory-mapped files in this entire question.

      – NPE
      Nov 16 '11 at 13:28













    • You are completely right. So maybe the reason for the non-existence of such a generator is that it would be pointless. :)

      – Sven Marnach
      Nov 16 '11 at 13:32



















    • The readlines() method of file objects returns a list of all lines of the file. doing this on an mmapped file would completely defeat the purpose of the mmap.

      – Sven Marnach
      Nov 16 '11 at 13:04











    • @SvenMarnach: It could be a generator. In any case, to be totally honest I fail to see the need for memory-mapped files in this entire question.

      – NPE
      Nov 16 '11 at 13:28













    • You are completely right. So maybe the reason for the non-existence of such a generator is that it would be pointless. :)

      – Sven Marnach
      Nov 16 '11 at 13:32

















    The readlines() method of file objects returns a list of all lines of the file. doing this on an mmapped file would completely defeat the purpose of the mmap.

    – Sven Marnach
    Nov 16 '11 at 13:04





    The readlines() method of file objects returns a list of all lines of the file. doing this on an mmapped file would completely defeat the purpose of the mmap.

    – Sven Marnach
    Nov 16 '11 at 13:04













    @SvenMarnach: It could be a generator. In any case, to be totally honest I fail to see the need for memory-mapped files in this entire question.

    – NPE
    Nov 16 '11 at 13:28







    @SvenMarnach: It could be a generator. In any case, to be totally honest I fail to see the need for memory-mapped files in this entire question.

    – NPE
    Nov 16 '11 at 13:28















    You are completely right. So maybe the reason for the non-existence of such a generator is that it would be pointless. :)

    – Sven Marnach
    Nov 16 '11 at 13:32





    You are completely right. So maybe the reason for the non-existence of such a generator is that it would be pointless. :)

    – Sven Marnach
    Nov 16 '11 at 13:32











    1














    Python 2.7 32bit on Windows is more than twice as fast on an mmapped file:



    On a 27MB, 509k line text file (my 'parse' function is not interesting it mostly just readline()'s very rapidly):



    with open(someFile,"r") as f:
    if usemmap:
    m=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    else:
    m=f
    e.parse(m)


    With MMAP:



    read in 0.308000087738


    Without MMAP:



    read in 0.680999994278





    share|improve this answer






























      1














      Python 2.7 32bit on Windows is more than twice as fast on an mmapped file:



      On a 27MB, 509k line text file (my 'parse' function is not interesting it mostly just readline()'s very rapidly):



      with open(someFile,"r") as f:
      if usemmap:
      m=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
      else:
      m=f
      e.parse(m)


      With MMAP:



      read in 0.308000087738


      Without MMAP:



      read in 0.680999994278





      share|improve this answer




























        1












        1








        1







        Python 2.7 32bit on Windows is more than twice as fast on an mmapped file:



        On a 27MB, 509k line text file (my 'parse' function is not interesting it mostly just readline()'s very rapidly):



        with open(someFile,"r") as f:
        if usemmap:
        m=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        else:
        m=f
        e.parse(m)


        With MMAP:



        read in 0.308000087738


        Without MMAP:



        read in 0.680999994278





        share|improve this answer















        Python 2.7 32bit on Windows is more than twice as fast on an mmapped file:



        On a 27MB, 509k line text file (my 'parse' function is not interesting it mostly just readline()'s very rapidly):



        with open(someFile,"r") as f:
        if usemmap:
        m=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        else:
        m=f
        e.parse(m)


        With MMAP:



        read in 0.308000087738


        Without MMAP:



        read in 0.680999994278






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Dec 29 '18 at 3:48









        Michael

        1,81751633




        1,81751633










        answered Dec 29 '18 at 1:18









        Richard AplinRichard Aplin

        111




        111






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f8151684%2fhow-to-read-lines-from-a-mmapped-file%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)