Hit a Linux kernel bug that read()
returns wrong data if it crosses a hugepage boundary.
Scenario
When I read a file in the hugetlbfs
using std::ifstream
, I fail to get the exact data of the file.
1
2
3
4
5
6
7
8
std :: string file = "/mnt/huge/foo" ;
std :: ifstream fin ;
fin . open ( file , std :: ifstream :: binary );
char c ;
while ( fin . get ( c )) {
std :: cout << c ;
}
fin . close ()
However, if I use fread()
I can get the correct data:
1
2
3
4
5
6
7
std :: string file = "/mnt/huge/foo" ;
FILE * fin = fopen ( file . c_str (), "rb" );
char c ;
while ( fread ( & c , 1 , 1 , fin ) == 1 ) {
std :: cout << c ;
}
fclose ( fin );
To figure out the reason, I strace
these two programs:
1
2
3
4
5
6
7
8
9
10
11
12
open ( "/mnt/huge/foo" , O_RDONLY ) = 3
read ( 3 , " \220 N \210\344\36\227\276\303\305\301\334\346\246\245\371 7tmg/ \25\235 C \365 k \7\273 T2 \266\220\327 " ..., 8191 ) = 8191
read ( 3 , "% \361 ! \253 lek& \30\306\370\333 f \304\357 L6@z \224 W<ef \335\206\225\246\342 ! \327\6 " ..., 8191 ) = 8191
read ( 3 , "B \222\327 - \17 `' \250 E[] \327 mi \37\330 8u \250\231 F \200\250\35 - \v\276\245 >H \321 R" ..., 8191 ) = 8191
read ( 3 , "u \311 w \336\10 h \374\f\214\301\376 - \025 8' \263 ;Iu1 \273\267\345\313\246\22 O \320\335\254 ' \7 " ..., 8191 ) = 8191
read ( 3 , " \342\265\263\314\222\265 rr \265 *A \27\34 < \342\344 F \244 | \371\f\231\345\331\343 = \321 SZx \273\240 " ..., 8191 ) = 8191
read ( 3 , "=? \241\337\20\235\367\233\10\234 ;^ \234\337\274\322\237\242\346\32\32\233 gb \231\236 DZ \336 t \364 ]" ..., 8191 ) = 8191
read ( 3 , "1 \233\21 z \345\355 ? \243\342\361 e \335\334\246\363\316 A \267\361 Nv \304\250\225\240 Q \267\31\r\265\314 '" ..., 8191 ) = 8191
read ( 3 , "$ \24\277\\\213\320 jGj \n b4 \317\370 p \216 >5V \331\1\256 1 \275\24\233\326 d+ \1 UM" ..., 8191 ) = 8191
read ( 3 , " \262\355\327 ! \2 h \303\332\373\16\257\3\32 y!O \303 ]5 \331\256 ?Q \277 t \27\262\223\316\357 j(" ..., 8191 ) = 8191
read ( 3 , "pd \204 3 \261\350 C \313\356\200\366 } \17\25\335\240 ? \357\225 Fs \226 qKW \241 r \227 b \242 4 \347 " ..., 8191 ) = 8191
...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
open ( "/mnt/huge/foo" , O_RDONLY ) = 3
fstat ( 3 , { st_mode = S_IFREG | 0644 , st_size = 5242880 , ...}) = 0
mmap ( NULL , 4096 , PROT_READ | PROT_WRITE , MAP_PRIVATE | MAP_ANONYMOUS , - 1 , 0 ) = 0x7f9328b03000
read ( 3 , " \220 N \210\344\36\227\276\303\305\301\334\346\246\245\371 7tmg/ \25\235 C \365 k \7\273 T2 \266\220\327 " ..., 4096 ) = 4096
read ( 3 , " \v\327\265 M \276\253\357\325 m \244\253\351\237\350\273\21 E< \326\356\030 03 \210\5\277\210 h \200 V5 \376 " ..., 4096 ) = 4096
read ( 3 , " \361 ! \253 lek& \30\306\370\333 f \304\357 L6@z \224 W<ef \335\206\225\246\342 ! \327\6\t " ..., 4096 ) = 4096
read ( 3 , " \343\300\202 / \343\t\300\340\332\215\214 8 \226\342\251\377\f q_ \21 n \370\212\273 tn \305\210 # \320 @`" ..., 4096 ) = 4096
read ( 3 , " \327 - \17 `' \250 E[] \327 mi \37\330 8u \250\231 F \200\250\35 - \v\276\245 >H \321 R \277\36 " ..., 4096 ) = 4096
read ( 3 , "t \346 p0 \204 OeD \211\256\233 g \242\351 3 \3 X \367\032 3 \332\235\330\215\375\261 G \234\217\17\34\375 " ..., 4096 ) = 4096
read ( 3 , " \336\10 h \374\f\214\301\376 - \025 8' \263 ;Iu1 \273\267\345\313\246\22 O \320\335\254 ' \7\205\r\325 " ..., 4096 ) = 4096
read ( 3 , "xK \373\233\300\n\354\350 >s \243\270\365 D \276\263\226 / \276\27 S \225\" yL \4 V \352\272\26 b \261 " ..., 4096 ) = 4096
read ( 3 , " \222\265 rr \265 *A \27\34 < \342\344 F \244 | \371\f\231\345\331\343 = \321 SZx \273\240 ) \245 h \224 " ..., 4096 ) = 4096
read ( 3 , "P \332 o>+ \355\17\372\251\275 n \266 \n\310 aB \210\235\30 u{ \365\34\255\367\36\375\365\v\27\331 " ..., 4096 ) = 4096
read ( 3 , " \235\367\233\10\234 ;^ \234\337\274\322\237\242\346\32\32\233 gb \231\236 DZ \336 t \364 ] \225\216 .=C" ..., 4096 ) = 4096
read ( 3 , "p$ \350 r \31\215\"\225\331 & \354\200\361\344\333 L \201\37 e \r\"\353\255\244\250 ? \253 O \252 A3 \371 " ..., 4096 ) = 4096
...
Based on strace
, std::ifstream
reads 8191 bytes at a time and fread()
reads 4096 bytes at a time. To check if the read size matters, I change the std::ifstream
program so that it also reads 4096 bytes:
1
2
3
4
5
6
7
8
9
10
11
std :: string file = "/mnt/huge/foo" ;
std :: ifstream fin ;
// with a user-provided buffer, libstdc++ reads n-1 bytes at a time
char buf [ 4096 + 1 ];
fin . rdbuf () -> pubsetbuf ( buf , sizeof ( buf ));
fin . open ( file , std :: ifstream :: binary );
char c ;
while ( fin . get ( c )) {
std :: cout << c ;
}
fin . close ()
After I change std::ifstream
to read 4096 bytes at a time, I’m able to read the correct data so the read size matters. read()
is a system call and it should handle all kinds of read size so the experiment indicates that there might be a bug somewhere in the kernel. After looking at the kernel commit log, something interesting shows up:
1
2
3
4
5
6
7
8
9
Author : Al Viro < viro @ zeniv . linux . org . uk >
Date : Fri Apr 3 11 : 31 : 35 2015 - 0400
switch hugetlbfs to -> read_iter ()
... and fix the case when the area we are asked to read crosses
a hugepage boundary
Signed - off - by : Al Viro < viro @ zeniv . linux . org . uk >
So I actually hit a kernel bug.
Solution
Change the std::ifstream
read size by providing a user-provided buffer so that read won’t cross the hugepage boundary or upgrade the Linux kernel version to include the fix.