Refactor test/dune and add a failing test #2
Loading…
Reference in a new issue
No description provided.
Delete branch "batch-test"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The failing test shows how writes during a change_and_push are not readable.
A
dune runtest
run produces the following output / test failure:The question now is which behaviour we strive for.... I guess the "read returns the newest (batched)" is reasonable from a perspective of caldav and refactorings. So, from the semantics I'd be keen to get that behaviour (since it relaxes my brain).
From my point of view, with caldav in mind: we have some changes to various files, and want to ensure that either all of the changes are committed or none of them.
I tried the following which I hoped would work as I thought: that is, if you're doing reads inside a
change_and_push´ (that is, inside
fin
change_and_push fs f`), then you get the "staged" changes; otherwise you get the old changes. But it doesn't seem to do what I though.Yes, you must update the
set
function and add at.head <- new_head
:But some questions are raised, specifically about cooperation between multiple
change_and_push
...I pushed a commit that implements what I had in mind for get, get_partial and list at least. For digest and last_modified I'm unsure it's correct. For remove it's incorrect and will have to look into that. Finally,
noppp
is a terrible function name and should be renamed!This is what @reynir and myself developed. The semantics have changed, but we believe that this is in good shape now.
We also added some alcotest-lwt tests to show the intended semantics and whether this works nicely.
@ -684,3 +690,2 @@
tree_root_hash_of_store t >>= fun tree_root_hash ->
t.committed <- Some (tree_root_hash, th) ;
let t' = { t with in_closure= true } in
t.change_and_push_waiter <- Some th ;
t.change_and_push
is already set 3 lines before.indeed, removed in
750ec11
@ -683,0 +685,4 @@
t.change_and_push_waiter <- Some th;
( match th' with
| None -> Lwt.return_unit
| Some th -> th ) >>= fun () ->
the reason for this change is: we may have one change_and_push that is active, and when there are then multiple other change_and_push that should be executed, each needs to wait for the next one.
previously, the code had all other change_and_push wait for the same task -- and thus there may have been multiple waiting change_and_push waken up at the same time, leading to races.
I'm not sure about the serialization of
change_and_push
and I think, after that our oldth
continue, we should ensure the exclusivity of a task to sett.change_and_push_waiter
and recheck this value. ALwt_condition.t
/Lwt_mutex.t
is probably needed in my opinion to really ensure the serialization and avoid that 2 pending tasks continue thechange_and_push
process.@ -692,2 +697,4 @@
if Digestif.SHA1.equal new_tree_root_hash tree_root_hash
then Lwt.return_ok res (* XXX(dinosaure): nothing to send! *)
else if not (Option.equal Digestif.SHA1.equal t.head t'.head) then
Lwt.return (Error (`Msg "store was modified outside of change_and_push, please retry"))
this change requires that the head didn't mutate since we started the change_and_push. we could as well do a rebase or merge, but we weren't able to find merge/rebase code.
this is illustrated by the test
set_outside_change_and_push
.@ -482,0 +481,4 @@
| _ ->
Ok (Option.fold
~none:Ptime.epoch
~some:Fun.id (Ptime.of_float_s (Int64.to_float (now ())))))
we don't have a commit in this case (we're in a change_and_push), so let's use the current timestamp. this is good enough for our definition of last_modified (which is the last commit)
So our idea was:
Task A runs change_and_push, and has set the change_and_push_waiter.
Now, task B enters the change_and_push. t,committed is None (since we're not inside another change_and_push). Then t.change_and_push_waiter is set to a fresh task, and the old value is waited for.
Once task A finishes change_and_push, it wakes up task B.
If there's a task C coming along, doing change_and_push, it will set t.change_and_push_waiter to a new task, and wait for the task B to wake up.
So, the t.change_and_push_waiter is only ever used by a single task, since on entry (without any bind / yield point) of change_and_push, the value in (the main) t is set to a fresh task, and then the old value is waited for.
WDYT, where are the potential races?
I added a comment in the code to clarify this. It may not be evident from the code that there must not be any yield point there.
An observation I made is we can drop the
option
from thechange_and_push_waiter
and initialize it aLwt.return_unit
. The code may become a bit simpler (we drop a match andSome _
constructors), but I fear it may be less easy to understand. WDYT?I tried to write a test with the three tasks in mind, and there's something wonky going on:
output:
where line 198 is in task_c the first alcotest check.
I think that if a task B and a task C are both waiting for A to finish, both will become unblocked. Even if cooperation ensures that B runs before C or that C runs before B, both will still change t.change_and_push. My idea relates more to something like:
Hmm. Would you mind pushing that test?
I pushed the test, and found a fix... we were in the teardown of the change_and_push unconditionally setting t.change_and_push_waiter to None -- I now use (physical!) equality of th == t.change_and_push_waiter to guard that assignment.
Et voila, I think this is fine now.
But that is fine, no? There's no yield point (as far as I understand, but then I barely understand Lwt) between function entry and the setting of change_and_push_waiter!?
Actually, it seems we can remove those three lines!
No, we can't. Now, when change_and_push finishes, and at some other time some other change_and_push is entered, there's still the old th in change_and_push_waiter.
EDIT: sorry, force-pushed with a comment. If you're convinced that these three lines can be removed, I'd be happy to first see a test...
51e30538f6
to1327cc4f94
@ -13,0 +11,4 @@
; mutable committed : Digestif.SHA1.t option
; mutable change_and_push_running : bool
; condition : unit Lwt_condition.t
; mutex : Lwt_mutex.t
Could this be done with the mutex only instead of a bool, mutex and a condition?
Indeed, we can acquire the mutex at the begin of change_and_push -- which will wait (block) until the mutex is unused.
yes, basically protect the whole
change_and_push
with a mutex will solve all of our issues 😄LGTM