Big Idea
- index csv / tsv up front, memory map it and then and use Altrep to delay field parsing on demand.
10x faster initial import than data.table, ~15x faster than readr, ~65x faster than read.delim
vroom |
1.72 |
65.88 |
971.70 MB |
data.table |
19.37 |
5.83 |
86.03 MB |
readr |
25.71 |
4.40 |
64.84 MB |
read.delim |
113.02 |
1.00 |
14.75 MB |
.Internal(inspect(high_fares, 1, 11))
@7f9b0e08eac8 19 VECSXP g0c5 [OBJ,MARK,NAM(3),ATT] (len=11, tl=0)
@7f9b0561c600 16 STRSXP g0c7 [MARK,NAM(3)] (len=317, tl=0)
@7f9b0a211c00 16 STRSXP g0c7 [MARK,NAM(3)] (len=317, tl=0)
@7f9b0614de00 16 STRSXP g0c7 [MARK,NAM(3)] (len=317, tl=0)
@7f9b0a0e8400 16 STRSXP g0c7 [MARK,NAM(3)] (len=317, tl=0)
@7f9b0d07a800 16 STRSXP g0c7 [MARK,NAM(3)] (len=317, tl=0)
@7f9b0ca46c00 14 REALSXP g0c7 [MARK,NAM(3)] (len=317, tl=0) 460,250,260,242,280,275,209,260.06,323,200,250,...
@7f9b09f4ce00 14 REALSXP g0c7 [MARK,NAM(3)] (len=317, tl=0) 0,0,0,0,0,0,0,0,0,0,0,...
@7f9b0a218e00 14 REALSXP g0c7 [MARK,NAM(3)] (len=317, tl=0) 0.5,0,0,0,0,0,0.5,0,0,0.5,0.5,...
@7f9b0a21e000 14 REALSXP g0c7 [MARK,NAM(3)] (len=317, tl=0) 0,50,0,45,0,25,41.8,0,0,50,0,...
@7f9b08048c00 14 REALSXP g0c7 [MARK,NAM(3)] (len=317, tl=0) 0,0,6.55,0,0,0,4.8,0,4.8,0,0,...
@7f9b0c3bb000 14 REALSXP g0c7 [MARK,NAM(3)] (len=317, tl=0) 460.5,300,266.55,287,280,300,256.1,260.06,327.8,250.5,250.5,...
ATTRIB:
@7f9b0c338da0 02 LISTSXP g0c0 [MARK]
TAG: @7f9b0501bb00 01 SYMSXP g1c0 [MARK,NAM(3),LCK,gp=0x6000] "names" (has value)
@7f9b0c3b54a8 16 STRSXP g1c5 [MARK,NAM(3)] (len=11, tl=0)
TAG: @7f9b0501b8d0 01 SYMSXP g1c0 [MARK,NAM(3),LCK,gp=0x4000] "row.names" (has value)
@7f9b0e0ea918 13 INTSXP g0c1 [MARK,NAM(3)] (len=2, tl=0) -2147483648,-317
TAG: @7f9b0501bfd0 01 SYMSXP g1c0 [MARK,NAM(3),LCK,gp=0x4000] "class" (has value)
@7f9b0d1097b8 16 STRSXP g1c3 [MARK,NAM(3)] (len=3, tl=0)
reading + fully materializing all vectors ~ speed of data.table
LS0tCnRpdGxlOiAidnJvb20iCmRhdGU6IDIwMTktMDItMTEKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKPGh0dHBzOi8vZ2l0aHViLmNvbS9qaW1oZXN0ZXIvdnJvb20+CgojIEJpZyBJZGVhCgppbmRleCBjc3YgLyB0c3YgdXAgZnJvbnQsIG1lbW9yeSBtYXAgaXQgYW5kIHRoZW4gYW5kIHVzZSBBbHRyZXAgdG8gZGVsYXkgZmllbGQgcGFyc2luZyBvbiBkZW1hbmQuCgp+IDEweCBmYXN0ZXIgaW5pdGlhbCBpbXBvcnQgdGhhbiBkYXRhLnRhYmxlLCB+MTV4IGZhc3RlciB0aGFuIHJlYWRyLCB+NjV4IGZhc3RlciB0aGFuIHJlYWQuZGVsaW0KCnwgcGFja2FnZSAgICB8IHRpbWUgKHNlYykgfCBzcGVlZHVwIHwgdGhyb3VnaHB1dCB8CnwgOi0tLS0tLS0tLSB8IC0tLS0tLS0tLTogfCAtLS0tLS06IHwgOi0tLS0tLS0tLSB8CnwgdnJvb20gICAgICB8ICAgICAgIDEuNzIgfCAgIDY1Ljg4IHwgOTcxLjcwIE1CICB8CnwgZGF0YS50YWJsZSB8ICAgICAgMTkuMzcgfCAgICA1LjgzIHwgODYuMDMgTUIgICB8CnwgcmVhZHIgICAgICB8ICAgICAgMjUuNzEgfCAgICA0LjQwIHwgNjQuODQgTUIgICB8CnwgcmVhZC5kZWxpbSB8ICAgICAxMTMuMDIgfCAgICAxLjAwIHwgMTQuNzUgTUIgICB8CgoKYGBge3J9CmZzOjpmaWxlX2luZm8oIn4vZGF0YS90cmlwX2ZhcmVfMS50c3YiKSRzaXplCgpmYXJlcyA8LSB2cm9vbTo6dnJvb20oIn4vZGF0YS90cmlwX2ZhcmVfMS50c3YiKQpmYXJlcwouSW50ZXJuYWwoaW5zcGVjdChmYXJlcywgMSwgMTEpKQoKaGlnaF9mYXJlcyA8LSBmYXJlc1tmYXJlcyR0b3RhbF9hbW91bnQgPiAyNTAsIF0KbGlicmFyeShkcGx5cikKZmlsdGVyIDwtIGRwbHlyOjpmaWx0ZXIKaGlnaF9mYXJlcyA8LSBmYXJlcyAlPiUgZmlsdGVyKHRvdGFsX2Ftb3VudCA+IDI1MCkKaGlnaF9mYXJlcwouSW50ZXJuYWwoaW5zcGVjdChmYXJlcywgMSwgMTEpKQouSW50ZXJuYWwoaW5zcGVjdChoaWdoX2ZhcmVzLCAxLCAxMSkpCmBgYAoKcmVhZGluZyArIGZ1bGx5IG1hdGVyaWFsaXppbmcgYWxsIHZlY3RvcnMgfiBzcGVlZCBvZiBkYXRhLnRhYmxlIAoKIyBwYXJzZXIgZmVhdHVyZXMKCiMjIGdlbmVyYWwKLSBkaWZmZXJlbnQgZGVsaW1pdGVycyAoc2luZ2xlIGNoYXJhY3RlciBBU0NJSSkKCmBgYHtyfQp2cm9vbTo6dnJvb20oIn4vZGF0YS90cmlwX2ZhcmVfMS5jc3YiLCBkZWxpbSA9ICIsIikKYGBgCgojIyBjb2x1bW4gdHlwZXMKLSBjb2x1bW4gc3BlY2lmaWNhdGlvbnMgdmlhIHJlYWRyIGNvbCBzcGVjcwotIGRvdWJsZSwgaW50ZWdlciwgY2hhcmFjdGVyLCBsb2dpY2FsLCBmYWN0b3IgdHlwZXMKLSBndWVzc2luZyBvZiBjb2x1bW4gdHlwZXMgKGV2ZW5seSBzcGFjZWQgc2FtcGxlIGFjcm9zcyB3aG9sZSBmaWxlKQotIGNvbHVtbiBuYW1lcwoKYGBge3J9CnZyb29tOjp2cm9vbShoZXJlOjpoZXJlKCJtdGNhcnMudHN2IiksCiAgY29sX3R5cGVzID0gbGlzdChjeWwgPSAiaSIsIGdlYXIgPSAiZiIsaHAgPSAiaSIsIGRpc3AgPSAiXyIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgZHJhdCA9ICJfIiwgdnMgPSAibCIsIGFtID0gImwiLCBjYXJiID0gImkiKQopCgp2cm9vbTo6dnJvb20oaGVyZTo6aGVyZSgibXRjYXJzLnRzdiIpLAogIGNvbF90eXBlcyA9IHJlYWRyOjpjb2xzX29ubHkoY3lsID0gImkiKSkKYGBgCgojIyBza2lwcGluZwotIHJvdyBza2lwcGluZwotIGNvbHVtbiBza2lwcGluZwotIHNraXBwaW5nIGNvbW1lbnRlZCBsaW5lcwotIHNraXBwaW5nIGJsYW5rIGxpbmVzCgpgYGB7cn0KY2F0KHJlYWRMaW5lcyhoZXJlOjpoZXJlKCJpcmlzLnRzdiIpKSwgc2VwID0gIlxuIikKYGBgCgpgYGB7cn0KdnJvb206OnZyb29tKGhlcmU6OmhlcmUoImlyaXMudHN2IiksIHNraXAgPSAxLCBjb21tZW50ID0gIiMiKQpgYGAKCiMjIGZpZWxkIHBhcnNpbmcKLSBuYSB2YWx1ZShzKQotIHF1b3RlZCBmaWVsZHMKLSB3aGl0ZXNwYWNlIHRyaW1taW5nCi0gZG91YmxlIHF1b3RlIGVzY2FwZXMKLSBiYWNrc2xhc2ggZXNjYXBlcwoKYGBge3J9CnZyb29tOjp2cm9vbSgnCmEsYgoiIiIxIiIiLCIyLCIKTUlTU0lORywgZm9vXFwsIGJhciAKJywgCiAgZGVsaW0gPSAiLCIsIG5hID0gIk1JU1NJTkciLCAKICBlc2NhcGVfZG91YmxlID0gVFJVRSwgZXNjYXBlX2JhY2tzbGFzaCA9IFRSVUUpCmBgYAoKIyMgZmVhdHVyZXMgbm92ZWwgdG8gdnJvb20gKHdydCByZWFkcikKLSBtdWx0aS10aHJlYWRlZCBpbmRleGluZyAoY29hcnNlIGdyYWluZWQpCi0gbXVsdGktdGhyZWFkZWQgZmllbGQgcGFyc2luZyAoZG91YmxlLCBpbnRlZ2VyLCBsb2dpY2FsKQotIG11bHRpcGxlIGZpbGVzIC8gY29ubmVjdGlvbnMKCmBgYHtyfQpmaWxlcyA8LSBmczo6ZGlyX2xzKCJ+L2RhdGEvIiwgZ2xvYiA9ICIqdHJpcF9mYXJlKi5jc3YiKQpmaWxlcwpmczo6ZmlsZV9pbmZvKGZpbGVzKSRzaXplCnN1bShmczo6ZmlsZV9pbmZvKGZpbGVzKSRzaXplKQoKZGF0YSA8LSB2cm9vbTo6dnJvb20oZmlsZXMsIGRlbGltID0gIiwiKQpkYXRhCnRhaWwoZGF0YSkKZHBseXI6OnNhbXBsZV9uKGRhdGEsIDEwKQpgYGAKCi0gc3RyZWFtaW5nIGZyb20gY29ubmVjdGlvbnMgdG8gdGVtcCBmaWxlIChhdXRvbWF0aWMgY2xlYW51cCkKCmBgYHtyfQp2cm9vbTo6dnJvb20oaGVyZTo6aGVyZSgibXRjYXJzLnRzdi5neiIpKQpgYGAKCiMgcmVxdWlyZW1lbnRzCgotIFIgMy41LjAgKEFsdHJlcCkKLSBDKysxMSAobWlvIGxpYnJhcnkgZm9yIG1lbW9yeSBtYXBwaW5nKQotIFJlY2VudCAocHJldmlldykgdmVyc2lvbiBvZiBSU3R1ZGlvIChbcnN0dWRpbyM0MjEwIGZpeGVkIG9uIDIwMTktMDEtMjNdKGh0dHBzOi8vZ2l0aHViLmNvbS9yc3R1ZGlvL3JzdHVkaW8vcHVsbC80MjEwKSkKLSBpbmRleCBtZW1vcnkgcmVxdWlyZW1lbnRzICgjdG90YWwgZmllbGRzICsgMSkgKiA2NGJpdHMKCiMgUmNwcCAvIGRwbHlyIC8gUiBpc3N1ZXMKCi0gUmNwcCBleHBsaWNpdGx5IGNhbGxzIGUuZy4gYFJFQUwoKWAgaW4gYE51bWVyaWNWZWN0b3JgIGN0b3IsIG1hdGVyaWFsaXplcyBmdWxsIHZlY3RvcgogIC0gRnV0dXJlIFJjcHAgUFIgdG8gY2hhbmdlIHRoaXMgYmVoYXZpb3IgYW5kIHVzZSBgUkVBTF9FTFQoKWAgYW5kIGZyaWVuZHMgd2hlbiBwb3NzaWJsZQotIFN1cHBvcnQgZm9yIGxvZ2ljYWwgYWx0cmVwIHZlY3RvcnMgb25seSBpbiBSLWRldmVsIChzaG91bGQgYmUgaW4gUiAzLjYuMCkKCiMgbWlzc2luZyBmZWF0dXJlcwoKIyMgZWFzeQotIEJ5dGUgb3JkZXIgbWFya3MKLSBXaW5kb3dzIG5ld2xpbmVzCi0gdXNlci1zdXBwbGllZCBsZXZlbHMgZm9yIGZhY3RvcnMKCiMjIG1vZGVyYXRlCi0gRGF0ZXMsIHRpbWVzLCBkYXRldGltZXMKLSByZWFkcidzIGZsZXhpYmxlIG51bWJlciBwYXJzZXIKLSBtdWx0aXBsZSBjaGFyYWN0ZXIgQVNDSUkgZGVsaW1pdGVycwotIHVuaWNvZGUgZGVsaW1pdGVycwotIE5vbiBVVEYtOCBpbnB1dAotIGF1dG9tYXRpY2FsbHkgZ3Vlc3NpbmcgZGVsaW1pdGVycwotIHByb2dyZXNzIGJhcnMKCiMjIGhhcmRlcgotIHJvYnVzdG5lc3MgdG8gbWFsZm9ybWVkIGlucHV0cwotIG11bHRpLXRocmVhZGluZyBzdHJhdGVneSByZXF1aXJlcyBfbm9fIGVtYmVkZGVkIG5ld2xpbmVzCiAgLSBDb3VsZCB1c2UgYXN5bmMgcGVyIGxpbmUgcmVhZGluZyAvIGluZGV4aW5nIGZvciBlbWJlZGRlZCBuZXdsaW5lcwotIGNvbW1lbnRzIGFuZCBibGFuayBsaW5lcyBza2lwcGVkIG9ubHkgX2JlZm9yZV8gY29sdW1uIGhlYWRlcnMKICAtIHJlcXVpcmVzIGNoYW5naW5nIGZpZWxkIGluZGV4IGRhdGF0eXBlIChzdGFydCArIGxlbmd0aCkKCiMgcG9zc2libGUgcGVyZm9ybWFuY2UgaW1wcm92ZW1lbnRzCi0gYXN5bmMgcmVhZGluZyAvIHdyaXRpbmcgLyBpbmRleGluZyBmb3IgY29ubmVjdGlvbnMKLSBzdHJpbmcgcG9vbCBvciBzdGF0aWMgbWVtb3J5IGZvciBlc2NhcGVkIHN0cmluZ3MKICAtIGN1cnJlbnRseSBlYWNoIGZpZWxkIGlzIGR5bmFtaWNhbGx5IGFsbG9jYXRlZCBhbmQgdGhlbiBjb3BpZWQgdHdpY2UKCmBgYApkYXRhOiAiYWJjIiIxMjMiCnN0ZDo6c3RyaW5nOiBhYmMiMTIzCkNIQVJTWFA6IGFiYyIxMjMKYGBgCgotIFVzZSBDKysgdHJhaXRzIGFuZCB0ZW1wbGF0ZSBzcGVjaWZpY2F0aW9ucyBmb3IgemVybyBjb3N0IGZlYXR1cmVzCi0gbXVsdGktdGhyZWFkZWQgZmFjdG9yIGluZGV4aW5nCi0gdXNlIEFsdHJlcCBmb3IgZmFjdG9ycwo=