revise startup, address urls pointing to same sha256 and support mirrors (upstream and in opam file) #24
Loading…
Reference in a new issue
No description provided.
Delete branch "startup"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
startup as proposed in #18:
also address #5 #13 #21 (url stuff, deduplicating based on sha256, adding support for mirrors and upstream-caches)
I added a second commit which makes the contents available as soon as they're checked. And once the check has been completed, the (mutable) field (
checked
) is set to None.revise startup (as proposed in #18):to revise startup, address urls pointing to same sha256 and support mirrors (upstream and in opam file)@ -134,1 +142,4 @@
let sha256s = Hashtbl.create 13
let empty () = Hashtbl.clear sha256s
this code, using a global hashtable, is ugly and should be revised.
@ -152,2 +193,3 @@
then
Some (HM.union (fun _h v _v' -> Some v) csums csums')
Some (HM.union (fun _h v _v' -> Some v) csums csums',
SSet.union mirrors mirrors')
not doing
upstream csums
here since we did that for the previous entry already. downside is if there's an artifact that has some hashes in some file, and more hashes in another file, we won't get the other hashes.This looks interesting. I will do a closer review later.
I like that you use sha256 for comparing duplicate URLs. We could possibly also check the sha512 values (but for simplicity maybe let's not). For md5 I think the risk of collision is too high.
I think it is interesting with
skip-download
. With #25 an an option to set "archive-mirrors:" we could split the opam repository functionality from the cache functionality. I think that could be interesting for some deployments while for others it's possibly nicer to only have to deploy a single unikernel.@ -44,0 +50,4 @@
| Some { pelem = Variable (_, { pelem = String url ; _ }) ; _ } -> [ url ]
| Some { pelem = Variable (_, { pelem = List { pelem = urls ; _ } ; _ }) } ->
List.fold_left (fun acc -> function
| { pelem = String url ; _ } -> url :: acc
I don't think the order is significant, but it's worth noting we reverse the order here.
@ -80,3 +99,3 @@
in
(match url, csum with
| Ok url, Ok csum -> Ok (url, csum)
| Ok url, Ok csum -> Ok (url, csum, mirrors)
The opam source code does
url :: mirrors
FWIW. Just an observation; not a request for change.We keep the url in the map as key, and just carry around a set of other urls for the same artifact.
Not sure whether it is worth to have the url as key in the map anymore, though we discovered some opam packages that used the same archive with different checksums...
@ -13,0 +14,4 @@
Mirage_runtime.register_arg Arg.(value & flag doc)
let upstream_caches =
let doc = Arg.info ~doc:"Upstream caches (e.g. https://opam.ocaml.org/cache)" ["upstream-cache"] in
"Upstream caches to use internally (e.g. https://opam.ocaml.org/cache). This makes opam-mirror try the cache(s) before going to the source and mirrors. This does not change the published "archive-mirrors:" value in the /repo endpoint."
actually, if I'm not misguided, we first go to the source, and only thereafter to mirror(s). Is this a good semantics?
And please feel free to push the documentation updates directly to the branch.
Hm it turns out the exact semantics is a bit complicated to explain. You are right that we first to go the source! Then we go to the mirrors or the cache depending on how the URLs sort as they both go into a string set.
I will give this some thought and suggest a change.
Latest commit splits mirror URLs from upstream cache URLs so the mirror URLs are tried first before the upstream caches, and adds a longer description of the option.
@ -139,2 +151,3 @@
List.iter (fun (`Msg msg) -> add_parse_error path msg) errs;
List.fold_left (fun acc (url, csums) ->
let upstream hm =
List.fold_left (fun set (hash, hash_value) ->
Maybe worth using
HM.fold
Indeed, currently it is a List.fold_left of "HM.bindings hm". So yes, this could be nicer code. :)
mirror
into account inurl
#13