bstr/lib/bstr.mli

(** A small library for manipulating bigstrings.

    To clarify the use of bigstrings in OCaml, we advise you to read the
    overview of bigstrings and the difference with bytes. After this theoretical
    reading, this module offers a whole host of useful functions for
    manipulating bigstrings.

    {1:overview Overview.}

    A bigstring is a special kind of memory area in the OCaml world. Unlike
    bytes, bigstrings are allocated via [malloc()] or are available via
    [Unix.map_file].

    They therefore exist outside the space normally allocated for OCaml with
    regard to all its values. So there are some particularities to the use of
    bigstrings.

    The first thing to understand about bigstrings is that allocating them can
    take time. Since a bigstring is obtained either by [malloc()] or by
    [Unix.map_file], the former is a performance hit on the [malloc()] used
    (which also depends on the fragmentation of the C heap) and the latter is a
    system call that can interact with your file system.

    By way of comparison, a byte of less than 2048 bytes requires only 3
    processor instructions to exist and be available — beyond that, the bytes is
    allocated in the major heap.

    It is therefore advisable to allocate just a few bigstrings and reuse them
    throughout your application. It's even advisable to allocate large
    bigstrings.

    A particularity of bigstrings is that they cannot be moved by the Garbage
    Collector. Existing in a space other than that of OCaml (the C heap), they
    don't move. With this advantage in mind, we can imagine several situations
    where we'd like a memory zone that doesn't move:
    - a bigstring can be manipulated by several threads/domains. Of course,
      parallel accesses must be protected, but you can be sure that the
      bigstring will not move throughout the process. Thus, its location in
      memory can be shared by several threads/domains.

    One example is to "release" the GC lock when performing a calculation such
    as a hash or checksum on a bigarray. Since the latter will not be moved by
    the GC, if the elements required for the calculation are pre-allocated on
    the C stack, it is possible to perform such a calculation on a Thread other
    than the main OCaml thread.

    - it may be necessary, in system programming, to write to a particular
      address in order to interact with a device. In this case, the bigstring
      can be found as an OCaml value bridging a special memory area (such as the
      framebuffer).

    This is somewhat equivalent to [Unix.map_file]. The latter uses [mmap(3P)],
    which asks the kernel for a special memory address. This address can be
    related (via the kernel) to an area on your hard disk that corresponds to a
    file. In the case of unikernels or embedded systems, it's quite common to
    prepare bigstrings according to the devices available.

    A final feature of bigstring is that it can be seen as a slice. You can have
    another view of a bigstring that would be equally smaller. For example, the
    {!val:sub} operation in particular {b doesn't copy} your bigstring, but
    offers you a "proxy" accessing the same memory area as the original
    bigstring.

    This can be useful for decoding packets, extracting information such as
    integers, without copying parts or all of the bigstring. For example, for a
    TCP/IP packet, we'd like to decode certain information but also give a
    slice of the bigstring that corresponds to the packet's payload (so that we
    can process this payload without having to copy).

    Finally, it may be interesting in an encoder of some kind to give
    bigstrings that the user can write to, and check that these bigstrings are
    part of a larger bigstring (in other words, these bigstrings come from a
    {!val:sub} of a larger bigstring) that has been allocated beforehand.

    Bigstrings therefore have certain advantages over bytes, but also some
    disadvantages. Considering the former as elements you should use
    systematically is not a good choice. However, we are sometimes forced to
    use them (especially when communicating with embedded devices) and they can
    be interesting for certain types of applications. This overview presents a
    few cases, but examples exist in the OCaml community where the use of
    bigstrings is justified.

    In short, this library attempts to summarize everything that can be done
    with bigstrings.

    {2: Performance.}

    {1:pkt Encode & Decode packets.}

    In order to encode or decode packets (such as ARP or DNS packets), Bstr
    offers a small API for converting a slice of bytes from a {!val:Bstr.t} to a
    user-defined variant or record. *)

type t = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t

val memcpy : t -> src_off:int -> t -> dst_off:int -> len:int -> unit
(** [memcpy src ~src_off dst ~dst_off ~len] copies [len] bytes from [src] to
    [dst]. [src] {b must not} overlap [dst]. Use {!val:memmove} if [src] & [dst]
    do overlap. *)

val memmove : t -> src_off:int -> t -> dst_off:int -> len:int -> unit
(** [memmove src ~src_off dst ~dst_off ~len] copies [len] bytes from [src] to
    [dst]. [src] and [dst] may overlap: copying takes place as though the bytes
    in [src] are first copied into a temporary array that does not overlap [src]
    or [dst], and the bytes are then copied from the temporary array to [dst].
*)

val memcmp : t -> src_off:int -> t -> dst_off:int -> len:int -> int
val memchr : t -> off:int -> len:int -> char -> int
val memset : t -> off:int -> len:int -> char -> unit

val empty : t
(** [empty] is an empty bigstring. *)

val length : t -> int
(** [length bstr] is the number of bytes in [bstr]. *)

val get : t -> int -> char
(** [get bstr i] is the byte of [bstr]' at index [i]. This is equivalent to the
    [bstr.{i}] notation.

    @raise Invalid_argument if [i] is not an index of [bstr]. *)

val set : t -> int -> char -> unit
(** [set t i chr] modifies [t] in place, replacing the byte at index [i] with
    [chr].

    @raise Invalid_argument if [i] is not a valid index in [t]. *)

val unsafe_get : t -> int -> char
(** [unsafe_get t idx] is like {!val:get} except no bounds checking is
    performed. *)

val unsafe_set : t -> int -> char -> unit
(** [unsafe_set t idx chr] is like {!val:set} except no bounds checking is
    performed. *)

val chop : ?rev:bool -> t -> char option

val create : int -> t
(** [create len] returns a new byte sequence of length [len]. The sequence
    {b is unitialized} and contains arbitrary bytes. *)

val make : int -> char -> t
(** [make len chr] is {!type:t} of length [len] with each index holding the
    character [chr]. *)

val of_string : string -> t
(** [of_string str] returns a new {!type:t} that contains the contents of the
    given string [str]. *)

val string : ?off:int -> ?len:int -> string -> t
(** [string ~off ~len str] is the sub-buffer of [str] that starts at position
    [off] (defaults to [0]) and stops at position [off + len] (defaults to
    [String.length str]). [str] is fully-replaced by a fresh allocated
    {!type:t}. *)

val fill : t -> off:int -> len:int -> char -> unit
(** [fill t off len chr] modifies [t] in place, replacing [len] characters with
    [chr], starting at [off].

    @raise Invalid_argument
      if [off] and [len] do not designate a valid range of [t]. *)

val init : int -> (int -> char) -> t
(** [init len fn] returns a fresh byte sequence of length [len], with character
    [idx] initialized to the result of [fn idx] (in increasing index order). *)

val copy : t -> t
(** [copy t] returns a new byte sequence that contains the same bytes as the
    argument. *)

(** {2 Copy operation from one byte sequence to another.} *)

val blit : t -> src_off:int -> t -> dst_off:int -> len:int -> unit
(** [blit src ~src_off dst ~dst_off ~len] copies [len] bytes from byte sequence
    [src], starting at index [src_off], to byte sequence [dst], starting at
    index [dst_off]. It works correctly even if [src] and [dst] are (physically)
    the same byte sequence, and the source and destination intervals overlap.

    @raise Invalid_argument
      if [src_pos] and [len] do not designate a valid range of [src], or if
      [dst_off] and [len] do not designate a valid range of [dst]. *)

val blit_from_string :
  string -> src_off:int -> t -> dst_off:int -> len:int -> unit
(** Just like {!val:blit}, but with a string as source one.

    {b Note}: since it is impossible for [src] to overlap [dst], {!val:memcpy}
    is used to make the copy.

    @raise Invalid_argument
      if [src_pos] and [len] do not designate a valid range of [src], or if
      [dst_off] and [len] do not designate a valid range of [dst]. *)

val blit_from_bytes :
  bytes -> src_off:int -> t -> dst_off:int -> len:int -> unit
(** Just like {!val:blit}, but with a bytes as source one.

    {b Note}: since it is impossible for [src] to overlap [dst], {!val:memcpy}
    is used to make the copy.

    @raise Invalid_argument
      if [src_pos] and [len] do not designate a valid range of [src], or if
      [dst_off] and [len] do not designate a valid range of [dst]. *)

val blit_to_bytes : t -> src_off:int -> bytes -> dst_off:int -> len:int -> unit
(** [blit_to_bytes src ~src_off dst ~dst_off ~len] copies [len] bytes from
    [src], starting at index [src_off], to byte sequence [dst], starting at
    index [dst_off].

    {b Note}: since it is impossible for [src] to overlap [dst], {!val:memcpy}
    is used to make the copy.

    @raise Invalid_argument
      if [src_off] and [len] do not designate a valid range of [src], or if
      [dst_off] and [len] do not designate a valid range of [dst]. *)

(*
val extend : t -> int -> int -> t
val concat : t -> t list -> t
val cat : t -> t -> t
val iter : (char -> unit) -> t -> unit
val iteri : (int -> char -> unit) -> t -> unit
val map : (char -> char) -> t -> t
val mapi : (int -> char -> char) -> t -> t
val fold_left : ('acc -> char -> 'acc) -> 'acc -> t -> 'acc
val fold_right : (char -> 'acc -> 'acc) -> t -> 'acc -> 'acc
val index : t -> ?rev:bool -> ?from:int -> char -> int
val contains : t -> ?rev:bool -> ?from:int -> char -> bool
val compare : t -> t -> int
val starts_with : prefix:string -> t -> bool
val ends_with : suffix:string -> t -> bool
*)

(** {2 Decode integers from a byte sequence.} *)

val get_int8 : t -> int -> int
(** [get_int8 bstr i] is [bstr]'s signed 8-bit integer starting at byte index
    [i]. *)

val get_uint8 : t -> int -> int
(** [get_uint8 bstr i] is [bstr]'s unsigned 8-bit integer starting at byte index
    [i]. *)

val get_uint16_ne : t -> int -> int
(** [get_int16_ne bstr i] is [bstr]'s native-endian unsigned 16-bit integer
    starting at byte index [i]. *)

val get_uint16_le : t -> int -> int
(** [get_int16_le bstr i] is [bstr]'s little-endian unsigned 16-bit integer
    starting at byte index [i]. *)

val get_uint16_be : t -> int -> int
(** [get_int16_be bstr i] is [bstr]'s big-endian unsigned 16-bit integer
    starting at byte index [i]. *)

val get_int16_ne : t -> int -> int
(** [get_int16_ne bstr i] is [bstr]'s native-endian signed 16-bit integer
    starting at byte index [i]. *)

val get_int16_le : t -> int -> int
(** [get_int16_le bstr i] is [bstr]'s little-endian signed 16-bit integer
    starting at byte index [i]. *)

val get_int16_be : t -> int -> int
(** [get_int16_be bstr i] is [bstr]'s big-endian signed 16-bit integer starting
    at byte index [i]. *)

val get_int32_ne : t -> int -> int32
(** [get_int32_ne bstr i] is [bstr]'s native-endian 32-bit integer starting at
    byte index [i]. *)

val get_int32_le : t -> int -> int32
(** [get_int32_le bstr i] is [bstr]'s little-endian 32-bit integer starting at
    byte index [i]. *)

val get_int32_be : t -> int -> int32
(** [get_int32_be bstr i] is [bstr]'s big-endian 32-bit integer starting at byte
    index [i]. *)

val get_int64_ne : t -> int -> int64
(** [get_int64_ne bstr i] is [bstr]'s native-endian 64-bit integer starting at
    byte index [i]. *)

val get_int64_le : t -> int -> int64
(** [get_int64_le bstr i] is [bstr]'s little-endian 64-bit integer starting at
    byte index [i]. *)

val get_int64_be : t -> int -> int64
(** [get_int64_be bstr i] is [bstr]'s big-endian 64-bit integer starting at byte
    index [i]. *)

val set_int8 : t -> int -> int -> unit
(** [set_int8 t i v] sets [t]'s signed 8-bit integer starting at byte index [i]
    to [v]. *)

val set_uint8 : t -> int -> int -> unit
(** [set_uint8 t i v] sets [t]'s unsigned 8-bit integer starting at byte index
    [i] to [v]. *)

val set_uint16_ne : t -> int -> int -> unit
(** [set_uint16_ne t i v] sets [t]'s native-endian unsigned 16-bit integer
    starting at byte index [i] to [v]. *)

val set_uint16_le : t -> int -> int -> unit
(** [set_uint16_le t i v] sets [t]'s little-endian unsigned 16-bit integer
    starting at byte index [i] to [v]. *)

val set_uint16_be : t -> int -> int -> unit
(** [set_uint16_le t i v] sets [t]'s big-endian unsigned 16-bit integer starting
    at byte index [i] to [v]. *)

val set_int16_ne : t -> int -> int -> unit
(** [set_uint16_ne t i v] sets [t]'s native-endian signed 16-bit integer
    starting at byte index [i] to [v]. *)

val set_int16_le : t -> int -> int -> unit
(** [set_uint16_le t i v] sets [t]'s little-endian signed 16-bit integer
    starting at byte index [i] to [v]. *)

val set_int16_be : t -> int -> int -> unit
(** [set_uint16_le t i v] sets [t]'s big-endian signed 16-bit integer starting
    at byte index [i] to [v]. *)

val set_int32_ne : t -> int -> int32 -> unit
(** [set_int32_ne t i v] sets [t]'s native-endian 32-bit integer starting at
    byte index [i] to [v]. *)

val set_int32_le : t -> int -> int32 -> unit
(** [set_int32_ne t i v] sets [t]'s little-endian 32-bit integer starting at
    byte index [i] to [v]. *)

val set_int32_be : t -> int -> int32 -> unit
(** [set_int32_ne t i v] sets [t]'s big-endian 32-bit integer starting at byte
    index [i] to [v]. *)

val set_int64_ne : t -> int -> int64 -> unit
(** [set_int32_ne t i v] sets [t]'s native-endian 64-bit integer starting at
    byte index [i] to [v]. *)

val set_int64_le : t -> int -> int64 -> unit
(** [set_int32_ne t i v] sets [t]'s little-endian 64-bit integer starting at
    byte index [i] to [v]. *)

val set_int64_be : t -> int -> int64 -> unit
(** [set_int32_ne t i v] sets [t]'s big-endian 64-bit integer starting at byte
    index [i] to [v]. *)

val sub : t -> off:int -> len:int -> t
(** [sub bstr ~off ~len] does not allocate a bigstring, but instead returns a
    new view into [bstr] starting at [off], and with length [len].

    {b Note} that this does not allocate a new buffer, but instead shares the
    buffer of [bstr] with the newly-returned bigstring.

    {b Note} [sub] is more expensive than a [Slice.sub] (about 8 times slower).
    If you want to focus on performance while avoiding copying, it's best to use
    a [Slice]. *)

val shift : t -> int -> t
(** [shift bstr n] is [sub bstr n (length bstr - n)] (see {!val:sub} for more
    details). *)

val overlap : t -> t -> (int * int * int) option
(** [overlap x y] returns the size (in bytes) of what is physically common
    between [x] and [y], as well as the position of [y] in [x] and the position
    of [x] in [y]. *)

val sub_string : t -> off:int -> len:int -> string
(** [sub_string bstr ~off ~len] returns a string of length [len] containing the
    bytes of [t] starting at [off]. *)

val to_string : t -> string
(** [to_string bstr] is equivalent to
    [sub_string bstr ~off:0 ~len:(length bstr)]. *)

val is_empty : t -> bool
(** [is_empty bstr] is [length bstr = 0]. *)

val is_prefix : affix:string -> t -> bool
(** [is_prefix ~affix bstr] is [true] iff [affix.[idx] = bstr.{idx}] for all
    indices [idx] of [affix]. *)

val starts_with : prefix:t -> t -> bool
(** [starts_with ~prefix t] is like {!val:is_prefix} but the prefix is a
    {!type:t} (instead of a [string]). *)

val is_infix : affix:string -> t -> bool
(** [is_infix ~affix bstr] is [true] iff there exists an index [j] in [bstr]
    such that for all indices [i] of [affix] we have [affix.[i] = bstr.{j + i}].
*)

val is_suffix : affix:string -> t -> bool
(** [is_suffix ~affix bstr] is [true] iff [affix.[n - idx] = bstr.{m - idx}] for
    all indices [idx] of [affix] with [n = String.length affix - 1] and
    [m = length bstr - 1]. *)

val ends_with : suffix:t -> t -> bool
(** [ends_with ~suffix t] is like {!val:is_suffix} but the suffix is a {!type:t}
    (instead of a [string]. *)

val for_all : (char -> bool) -> t -> bool
(** [for_all p bstr] is [true] iff for all indices [idx] of [bstr],
    [p bstr.{idx} = true]. *)

val exists : (char -> bool) -> t -> bool
(** [exists p bstr] is [true] iff there exists an index [idx] of [bstr] with
    [p bstr.{idx} = true]. *)

val equal : t -> t -> bool
(** [equal a b] is [a = b]. *)

val with_range : ?first:int -> ?len:int -> t -> t
(** [with_range ~first ~len bstr] are the consecutive bytes of [bstr] whose
    indices exist in the range \[[first];[first + len - 1]\].

    [first] defaults to [0] and [len] to [max_int]. Note that [first] can be any
    integer and [len] any positive integer. *)

val with_index_range : ?first:int -> ?last:int -> t -> t
(** [with_index_range ~first ~last bstr] are the consecutive bytes of [bstr]
    whose indices exists in the range \[[first];[last]\].

    [first] defaults to [0] and [last] to [length bstr - 1].

    Note that both [first] and [last] can be any integer. If [first > last] the
    interval is empty and the empty bigstring is returned. *)

val trim : ?drop:(char -> bool) -> t -> t
(** [trim ~drop bstr] is [bstr] with prefix and suffix bytes satisfying [drop]
    in [bstr] removed. [drop] defaults to [fun chr -> chr = ' ']. *)

val span :
  ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> t -> t * t
(** [span ~rev ~min ~max ~sat bstr] is [(l, r)] where:
    - if [rev] is [false] (default), [l] is at least [min] and at most [max]
      consecutive [sat] satisfying initial bytes of [bstr] or {!empty} if there
      are no such bytes. [r] are the remaining bytes of [bstr].
    - if [rev] is [true], [r] is at least [min] and at most [max] consecutive
      [sat] satisfying final bytes of [bstr] or {!empty} if there are no such
      bytes. [l] are the remaining bytes of [bstr].

    If [max] is unspecified the span is unlimited. If [min] is unspecified it
    defaults to [0]. If [min > max] the condition can't be satisfied and the
    left or right span, depending on [rev], is always empty. [sat] defaults to
    [Fun.const true].

    @raise Invalid_argument if [max] or [min] is negative. *)

val take : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> t -> t
(** [take ~rev ~min ~max ~sat bstr] is the matching span of {!span} without the
    remaining one. In other words:

    {[
      (if rev then snd else fst) (span ~rev ~min ~max ~sat bstr)
    ]} *)

val drop : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> t -> t
(** [drop ~rev ~min ~max ~sat bstr] is the remaining span of {!span} without the
    matching span. In other words:

    {[
      (if rev then fst else snd) (span ~rev ~min ~max ~sat bstr)
    ]} *)

val split_on_char : char -> t -> t list
val to_seq : t -> char Seq.t
val to_seqi : t -> (int * char) Seq.t
val of_seq : char Seq.t -> t