Fix GH-8561, GH-8562, GH-8563, GH-8564: SplFileObject iterator desync#21679
Fix GH-8561, GH-8562, GH-8563, GH-8564: SplFileObject iterator desync#21679iliaal wants to merge 4 commits intophp:masterfrom
Conversation
…xt() SplFileObject::next() without READ_AHEAD cleared the cached line and incremented current_line_num but didn't advance the stream. When called without a preceding current() (e.g. rewind() then next()), the stream position stayed put, so the subsequent current() read stale data. Read a line to advance the stream when next() is called with no cached line. Closes phpGH-8562
|
Have you had a look at #8644 ? There are multiple related SplFileObject bugs. |
Yeah I saw, trying to tackle them one at a time, or would you prefer all in one fix? |
|
I would prefer all in one to be sure that the design of the fixes makes sense together. :) |
And here I was hoping for an easy win 🤣, ok I'll see what I can merge in. |
fgets() read a line into the cache and incremented the line counter, but left the cached line in place. The subsequent current() returned the stale cached value instead of reading the next line from the stream's actual position. Clear the cached line after fgets() copies it to the return value. This forces current() to re-read from the stream, which has already advanced past the fgets'd line. Closes phpGH-8561
… next() spl_filesystem_file_read_ex() treated a NULL buffer from php_stream_get_line as a successful read of an empty line, creating a phantom cached line at EOF. This caused seek() to give inconsistent results between SplFileObject and SplTempFileObject (phpGH-8563), and next() to increment the line counter indefinitely past EOF (phpGH-8564). Return FAILURE when php_stream_get_line returns NULL, matching the behavior of the existing php_stream_eof check. In seek(), break out of the loop on EOF instead of returning, so the post-loop cleanup runs consistently. In next(), return early at EOF without incrementing. Make __toString() return empty string at EOF instead of throwing. Closes phpGH-8563 Closes phpGH-8564
| --EXPECT-- | ||
| After rewind+fgets: key=1 current=line 1 | ||
| After rewind+fgets+fgets: key=2 current=line 2 | ||
| After current+fgets: key=1 current=line 2 |
There was a problem hiding this comment.
This seems to indicate the key is getting out of sync?
There was a problem hiding this comment.
You're right, that assertion locked in a desync. Traced it: after rewind → current() → fgets(), the physical stream was past "line 1" (fgets re-read the stream instead of reusing the cached "line 0"), so the second current() read "line 2" while line_num was still at 1. The latest commit makes fgets() reuse the cached line when one is present, otherwise read fresh. Case 3 now gives key=1 current=line 1, consistent with the other two cases.
| var_dump($s->valid()); | ||
| ?> | ||
| --EXPECT-- | ||
| int(14) |
There was a problem hiding this comment.
The .phpt temp file is exactly 12 physical lines (verified via run-tests.php --keep-all). seek(13) reads all 12, the 13th iteration hits EOF and breaks, and the post-loop line_num++ lands at 12. Your own draft in #8644 sets this test to int(12) for the same reason.
| $s->seek(120); | ||
| $s->next(); | ||
| var_dump($s->key()); | ||
| var_dump($s->valid()); | ||
| ?> | ||
| --EXPECT-- | ||
| int(13) | ||
| int(12) | ||
| bool(false) |
There was a problem hiding this comment.
I'm confused why do we have such an advanced key? Surely a seek failing should not advance it?
There was a problem hiding this comment.
Same file, 12 physical lines. seek(120) reads all 12, iteration 12 hits EOF and breaks, post-loop line_num++ lands at 12, same as seek(13). The old expected value 13 came from the pre-fix read_ex treating NULL buffers as empty-line successes, which pumped line_num past EOF. #8644 also sets this test to int(12).
| " | ||
| string(13) ""D", "E", "F"" | ||
| Cannot read from file php://temp | ||
| string(0) "" |
There was a problem hiding this comment.
I think part of the motivation for CSV functions is to ignore the final line if it just an EOL before EOF.
Although throwing an error is not amazing. I wonder if it shouldn't be returning the last line?
There was a problem hiding this comment.
Returning the last read line would mean holding a separate "last line" field on the object across free_line() calls, adding state for one boundary case. The failure mode this test guards against is a NULL pointer crash, and returning an empty string from __toString() prevents that. Happy to revisit in a follow-up if you want stronger semantics here.
When current() reads a line into the cache without advancing line_num, a subsequent fgets() would re-read the stream and return the next line, skipping the cached one and leaving key() out of sync with current() for the rest of the iteration. Use the cached line if present; otherwise read a fresh line. Either way, advance line_num by one.
ca6af4a to
e3c8189
Compare
Girgias
left a comment
There was a problem hiding this comment.
This looks to fix the issues, let's hope that people don't rely on this behaviour for some reason...
Fixes #8562
Fixes #8561
Fixes #8563
Fixes #8564
Fixes #8851
Five related SplFileObject bugs where the iterator state, fgets(), and seek() disagreed on the current position:
GH-8562:
next()without a precedingcurrent()didn't advance the stream, socurrent()read stale data. Fix: read a line to advance the stream whennext()is called with no cached line.GH-8561:
fgets()cached the read line, so the subsequentcurrent()returned the fgets'd value instead of re-reading from the stream's new position. Fix: clear the cached line afterfgets()copies it to the return value.GH-8563:
seek()past EOF gave differentkey()values forSplFileObjectvsSplTempFileObject. Root cause:spl_filesystem_file_read_ex()treated a NULL buffer fromphp_stream_get_lineas a successful empty-line read, and the two stream types set the EOF flag at different times. Fix: return FAILURE on NULL buffer.GH-8564:
next()past EOF kept incrementingkey()indefinitely. Fix: return early fromnext()at EOF without incrementing. Also fixseek()tobreakon EOF instead ofreturn, so the post-loop cleanup runs and both methods give consistent results.GH-8851:
fgets()afternext()returned the wrong line. Already resolved by the GH-8562 and GH-8561 fixes above.