(** A small library for manipulating bigstrings. To clarify the use of bigstrings in OCaml, we advise you to read the overview of bigstrings and the difference with bytes. After this theoretical reading, this module offers a whole host of useful functions for manipulating bigstrings. {1:overview Overview.} A bigstring is a special kind of memory area in the OCaml world. Unlike bytes, bigstrings are allocated via [malloc()] or are available via [Unix.map_file]. They therefore exist outside the space normally allocated for OCaml with regard to all its values. So there are some particularities to the use of bigstrings. The first thing to understand about bigstrings is that allocating them can take time. Since a bigstring is obtained either by [malloc()] or by [Unix.map_file], the former is a performance hit on the [malloc()] used (which also depends on the fragmentation of the C heap) and the latter is a system call that can interact with your file system. By way of comparison, a byte of less than 2048 bytes requires only 3 processor instructions to exist and be available — beyond that, the bytes is allocated in the major heap. It is therefore advisable to allocate just a few bigstrings and reuse them throughout your application. It's even advisable to allocate large bigstrings. A particularity of bigstrings is that they cannot be moved by the Garbage Collector. Existing in a space other than that of OCaml (the C heap), they don't move. With this advantage in mind, we can imagine several situations where we'd like a memory zone that doesn't move: - a bigstring can be manipulated by several threads/domains. Of course, parallel accesses must be protected, but you can be sure that the bigstring will not move throughout the process. Thus, its location in memory can be shared by several threads/domains. One example is to "release" the GC lock when performing a calculation such as a hash or checksum on a bigarray. Since the latter will not be moved by the GC, if the elements required for the calculation are pre-allocated on the C stack, it is possible to perform such a calculation on a Thread other than the main OCaml thread. - it may be necessary, in system programming, to write to a particular address in order to interact with a device. In this case, the bigstring can be found as an OCaml value bridging a special memory area (such as the framebuffer). This is somewhat equivalent to [Unix.map_file]. The latter uses [mmap(3P)], which asks the kernel for a special memory address. This address can be related (via the kernel) to an area on your hard disk that corresponds to a file. In the case of unikernels or embedded systems, it's quite common to prepare bigstrings according to the devices available. A final feature of bigstring is that it can be seen as a slice. You can have another view of a bigstring that would be equally smaller. For example, the {!val:sub} operation in particular {b doesn't copy} your bigstring, but offers you a "proxy" accessing the same memory area as the original bigstring. This can be useful for decoding packets, extracting information such as integers, without copying parts or all of the bigstring. For example, for a TCP/IP packet, we'd like to decode certain information but also give a slice of the bigstring that corresponds to the packet's payload (so that we can process this payload without having to copy). Finally, it may be interesting in an encoder of some kind to give bigstrings that the user can write to, and check that these bigstrings are part of a larger bigstring (in other words, these bigstrings come from a {!val:sub} of a larger bigstring) that has been allocated beforehand. Bigstrings therefore have certain advantages over bytes, but also some disadvantages. Considering the former as elements you should use systematically is not a good choice. However, we are sometimes forced to use them (especially when communicating with embedded devices) and they can be interesting for certain types of applications. This overview presents a few cases, but examples exist in the OCaml community where the use of bigstrings is justified. In short, this library attempts to summarize everything that can be done with bigstrings. {2: Performance.} {1:pkt Encode & Decode packets.} In order to encode or decode packets (such as ARP or DNS packets), Bstr offers a small API for converting a slice of bytes from a {!val:Bstr.t} to a user-defined variant or record. *) type t = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t val memcpy : t -> src_off:int -> t -> dst_off:int -> len:int -> unit (** [memcpy src ~src_off dst ~dst_off ~len] copies [len] bytes from [src] to [dst]. [src] {b must not} overlap [dst]. Use {!val:memmove} if [src] & [dst] do overlap. *) val memmove : t -> src_off:int -> t -> dst_off:int -> len:int -> unit (** [memmove src ~src_off dst ~dst_off ~len] copies [len] bytes from [src] to [dst]. [src] and [dst] may overlap: copying takes place as though the bytes in [src] are first copied into a temporary array that does not overlap [src] or [dst], and the bytes are then copied from the temporary array to [dst]. *) val memcmp : t -> src_off:int -> t -> dst_off:int -> len:int -> int val memchr : t -> off:int -> len:int -> char -> int val memset : t -> off:int -> len:int -> char -> unit val empty : t (** [empty] is an empty bigstring. *) val length : t -> int (** [length bstr] is the number of bytes in [bstr]. *) val get : t -> int -> char (** [get bstr i] is the byte of [bstr]' at index [i]. This is equivalent to the [bstr.{i}] notation. @raise Invalid_argument if [i] is not an index of [bstr]. *) val set : t -> int -> char -> unit (** [set t i chr] modifies [t] in place, replacing the byte at index [i] with [chr]. @raise Invalid_argument if [i] is not a valid index in [t]. *) val unsafe_get : t -> int -> char (** [unsafe_get t idx] is like {!val:get} except no bounds checking is performed. *) val unsafe_set : t -> int -> char -> unit (** [unsafe_set t idx chr] is like {!val:set} except no bounds checking is performed. *) val chop : ?rev:bool -> t -> char option val create : int -> t (** [create len] returns a new byte sequence of length [len]. The sequence {b is unitialized} and contains arbitrary bytes. *) val make : int -> char -> t (** [make len chr] is {!type:t} of length [len] with each index holding the character [chr]. *) val of_string : string -> t (** [of_string str] returns a new {!type:t} that contains the contents of the given string [str]. *) val string : ?off:int -> ?len:int -> string -> t (** [string ~off ~len str] is the sub-buffer of [str] that starts at position [off] (defaults to [0]) and stops at position [off + len] (defaults to [String.length str]). [str] is fully-replaced by a fresh allocated {!type:t}. *) val fill : t -> off:int -> len:int -> char -> unit (** [fill t off len chr] modifies [t] in place, replacing [len] characters with [chr], starting at [off]. @raise Invalid_argument if [off] and [len] do not designate a valid range of [t]. *) val init : int -> (int -> char) -> t (** [init len fn] returns a fresh byte sequence of length [len], with character [idx] initialized to the result of [fn idx] (in increasing index order). *) val copy : t -> t (** [copy t] returns a new byte sequence that contains the same bytes as the argument. *) (** {2 Copy operation from one byte sequence to another.} *) val blit : t -> src_off:int -> t -> dst_off:int -> len:int -> unit (** [blit src ~src_off dst ~dst_off ~len] copies [len] bytes from byte sequence [src], starting at index [src_off], to byte sequence [dst], starting at index [dst_off]. It works correctly even if [src] and [dst] are (physically) the same byte sequence, and the source and destination intervals overlap. @raise Invalid_argument if [src_pos] and [len] do not designate a valid range of [src], or if [dst_off] and [len] do not designate a valid range of [dst]. *) val blit_from_string : string -> src_off:int -> t -> dst_off:int -> len:int -> unit (** Just like {!val:blit}, but with a string as source one. {b Note}: since it is impossible for [src] to overlap [dst], {!val:memcpy} is used to make the copy. @raise Invalid_argument if [src_pos] and [len] do not designate a valid range of [src], or if [dst_off] and [len] do not designate a valid range of [dst]. *) val blit_from_bytes : bytes -> src_off:int -> t -> dst_off:int -> len:int -> unit (** Just like {!val:blit}, but with a bytes as source one. {b Note}: since it is impossible for [src] to overlap [dst], {!val:memcpy} is used to make the copy. @raise Invalid_argument if [src_pos] and [len] do not designate a valid range of [src], or if [dst_off] and [len] do not designate a valid range of [dst]. *) val blit_to_bytes : t -> src_off:int -> bytes -> dst_off:int -> len:int -> unit (** [blit_to_bytes src ~src_off dst ~dst_off ~len] copies [len] bytes from [src], starting at index [src_off], to byte sequence [dst], starting at index [dst_off]. {b Note}: since it is impossible for [src] to overlap [dst], {!val:memcpy} is used to make the copy. @raise Invalid_argument if [src_off] and [len] do not designate a valid range of [src], or if [dst_off] and [len] do not designate a valid range of [dst]. *) (* val extend : t -> int -> int -> t val concat : t -> t list -> t val cat : t -> t -> t val iter : (char -> unit) -> t -> unit val iteri : (int -> char -> unit) -> t -> unit val map : (char -> char) -> t -> t val mapi : (int -> char -> char) -> t -> t val fold_left : ('acc -> char -> 'acc) -> 'acc -> t -> 'acc val fold_right : (char -> 'acc -> 'acc) -> t -> 'acc -> 'acc val index : t -> ?rev:bool -> ?from:int -> char -> int val contains : t -> ?rev:bool -> ?from:int -> char -> bool val compare : t -> t -> int val starts_with : prefix:string -> t -> bool val ends_with : suffix:string -> t -> bool *) (** {2 Decode integers from a byte sequence.} *) val get_int8 : t -> int -> int (** [get_int8 bstr i] is [bstr]'s signed 8-bit integer starting at byte index [i]. *) val get_uint8 : t -> int -> int (** [get_uint8 bstr i] is [bstr]'s unsigned 8-bit integer starting at byte index [i]. *) val get_uint16_ne : t -> int -> int (** [get_int16_ne bstr i] is [bstr]'s native-endian unsigned 16-bit integer starting at byte index [i]. *) val get_uint16_le : t -> int -> int (** [get_int16_le bstr i] is [bstr]'s little-endian unsigned 16-bit integer starting at byte index [i]. *) val get_uint16_be : t -> int -> int (** [get_int16_be bstr i] is [bstr]'s big-endian unsigned 16-bit integer starting at byte index [i]. *) val get_int16_ne : t -> int -> int (** [get_int16_ne bstr i] is [bstr]'s native-endian signed 16-bit integer starting at byte index [i]. *) val get_int16_le : t -> int -> int (** [get_int16_le bstr i] is [bstr]'s little-endian signed 16-bit integer starting at byte index [i]. *) val get_int16_be : t -> int -> int (** [get_int16_be bstr i] is [bstr]'s big-endian signed 16-bit integer starting at byte index [i]. *) val get_int32_ne : t -> int -> int32 (** [get_int32_ne bstr i] is [bstr]'s native-endian 32-bit integer starting at byte index [i]. *) val get_int32_le : t -> int -> int32 (** [get_int32_le bstr i] is [bstr]'s little-endian 32-bit integer starting at byte index [i]. *) val get_int32_be : t -> int -> int32 (** [get_int32_be bstr i] is [bstr]'s big-endian 32-bit integer starting at byte index [i]. *) val get_int64_ne : t -> int -> int64 (** [get_int64_ne bstr i] is [bstr]'s native-endian 64-bit integer starting at byte index [i]. *) val get_int64_le : t -> int -> int64 (** [get_int64_le bstr i] is [bstr]'s little-endian 64-bit integer starting at byte index [i]. *) val get_int64_be : t -> int -> int64 (** [get_int64_be bstr i] is [bstr]'s big-endian 64-bit integer starting at byte index [i]. *) val set_int8 : t -> int -> int -> unit (** [set_int8 t i v] sets [t]'s signed 8-bit integer starting at byte index [i] to [v]. *) val set_uint8 : t -> int -> int -> unit (** [set_uint8 t i v] sets [t]'s unsigned 8-bit integer starting at byte index [i] to [v]. *) val set_uint16_ne : t -> int -> int -> unit (** [set_uint16_ne t i v] sets [t]'s native-endian unsigned 16-bit integer starting at byte index [i] to [v]. *) val set_uint16_le : t -> int -> int -> unit (** [set_uint16_le t i v] sets [t]'s little-endian unsigned 16-bit integer starting at byte index [i] to [v]. *) val set_uint16_be : t -> int -> int -> unit (** [set_uint16_le t i v] sets [t]'s big-endian unsigned 16-bit integer starting at byte index [i] to [v]. *) val set_int16_ne : t -> int -> int -> unit (** [set_uint16_ne t i v] sets [t]'s native-endian signed 16-bit integer starting at byte index [i] to [v]. *) val set_int16_le : t -> int -> int -> unit (** [set_uint16_le t i v] sets [t]'s little-endian signed 16-bit integer starting at byte index [i] to [v]. *) val set_int16_be : t -> int -> int -> unit (** [set_uint16_le t i v] sets [t]'s big-endian signed 16-bit integer starting at byte index [i] to [v]. *) val set_int32_ne : t -> int -> int32 -> unit (** [set_int32_ne t i v] sets [t]'s native-endian 32-bit integer starting at byte index [i] to [v]. *) val set_int32_le : t -> int -> int32 -> unit (** [set_int32_ne t i v] sets [t]'s little-endian 32-bit integer starting at byte index [i] to [v]. *) val set_int32_be : t -> int -> int32 -> unit (** [set_int32_ne t i v] sets [t]'s big-endian 32-bit integer starting at byte index [i] to [v]. *) val set_int64_ne : t -> int -> int64 -> unit (** [set_int32_ne t i v] sets [t]'s native-endian 64-bit integer starting at byte index [i] to [v]. *) val set_int64_le : t -> int -> int64 -> unit (** [set_int32_ne t i v] sets [t]'s little-endian 64-bit integer starting at byte index [i] to [v]. *) val set_int64_be : t -> int -> int64 -> unit (** [set_int32_ne t i v] sets [t]'s big-endian 64-bit integer starting at byte index [i] to [v]. *) val sub : t -> off:int -> len:int -> t (** [sub bstr ~off ~len] does not allocate a bigstring, but instead returns a new view into [bstr] starting at [off], and with length [len]. {b Note} that this does not allocate a new buffer, but instead shares the buffer of [bstr] with the newly-returned bigstring. {b Note} [sub] is more expensive than a [Slice.sub] (about 8 times slower). If you want to focus on performance while avoiding copying, it's best to use a [Slice]. *) val shift : t -> int -> t (** [shift bstr n] is [sub bstr n (length bstr - n)] (see {!val:sub} for more details). *) val overlap : t -> t -> (int * int * int) option (** [overlap x y] returns the size (in bytes) of what is physically common between [x] and [y], as well as the position of [y] in [x] and the position of [x] in [y]. *) val sub_string : t -> off:int -> len:int -> string (** [sub_string bstr ~off ~len] returns a string of length [len] containing the bytes of [t] starting at [off]. *) val to_string : t -> string (** [to_string bstr] is equivalent to [sub_string bstr ~off:0 ~len:(length bstr)]. *) val is_empty : t -> bool (** [is_empty bstr] is [length bstr = 0]. *) val is_prefix : affix:string -> t -> bool (** [is_prefix ~affix bstr] is [true] iff [affix.[idx] = bstr.{idx}] for all indices [idx] of [affix]. *) val starts_with : prefix:t -> t -> bool (** [starts_with ~prefix t] is like {!val:is_prefix} but the prefix is a {!type:t} (instead of a [string]). *) val is_infix : affix:string -> t -> bool (** [is_infix ~affix bstr] is [true] iff there exists an index [j] in [bstr] such that for all indices [i] of [affix] we have [affix.[i] = bstr.{j + i}]. *) val is_suffix : affix:string -> t -> bool (** [is_suffix ~affix bstr] is [true] iff [affix.[n - idx] = bstr.{m - idx}] for all indices [idx] of [affix] with [n = String.length affix - 1] and [m = length bstr - 1]. *) val ends_with : suffix:t -> t -> bool (** [ends_with ~suffix t] is like {!val:is_suffix} but the suffix is a {!type:t} (instead of a [string]. *) val for_all : (char -> bool) -> t -> bool (** [for_all p bstr] is [true] iff for all indices [idx] of [bstr], [p bstr.{idx} = true]. *) val exists : (char -> bool) -> t -> bool (** [exists p bstr] is [true] iff there exists an index [idx] of [bstr] with [p bstr.{idx} = true]. *) val equal : t -> t -> bool (** [equal a b] is [a = b]. *) val with_range : ?first:int -> ?len:int -> t -> t (** [with_range ~first ~len bstr] are the consecutive bytes of [bstr] whose indices exist in the range \[[first];[first + len - 1]\]. [first] defaults to [0] and [len] to [max_int]. Note that [first] can be any integer and [len] any positive integer. *) val with_index_range : ?first:int -> ?last:int -> t -> t (** [with_index_range ~first ~last bstr] are the consecutive bytes of [bstr] whose indices exists in the range \[[first];[last]\]. [first] defaults to [0] and [last] to [length bstr - 1]. Note that both [first] and [last] can be any integer. If [first > last] the interval is empty and the empty bigstring is returned. *) val trim : ?drop:(char -> bool) -> t -> t (** [trim ~drop bstr] is [bstr] with prefix and suffix bytes satisfying [drop] in [bstr] removed. [drop] defaults to [fun chr -> chr = ' ']. *) val span : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> t -> t * t (** [span ~rev ~min ~max ~sat bstr] is [(l, r)] where: - if [rev] is [false] (default), [l] is at least [min] and at most [max] consecutive [sat] satisfying initial bytes of [bstr] or {!empty} if there are no such bytes. [r] are the remaining bytes of [bstr]. - if [rev] is [true], [r] is at least [min] and at most [max] consecutive [sat] satisfying final bytes of [bstr] or {!empty} if there are no such bytes. [l] are the remaining bytes of [bstr]. If [max] is unspecified the span is unlimited. If [min] is unspecified it defaults to [0]. If [min > max] the condition can't be satisfied and the left or right span, depending on [rev], is always empty. [sat] defaults to [Fun.const true]. @raise Invalid_argument if [max] or [min] is negative. *) val take : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> t -> t (** [take ~rev ~min ~max ~sat bstr] is the matching span of {!span} without the remaining one. In other words: {[ (if rev then snd else fst) (span ~rev ~min ~max ~sat bstr) ]} *) val drop : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> t -> t (** [drop ~rev ~min ~max ~sat bstr] is the remaining span of {!span} without the matching span. In other words: {[ (if rev then fst else snd) (span ~rev ~min ~max ~sat bstr) ]} *) val split_on_char : char -> t -> t list val to_seq : t -> char Seq.t val to_seqi : t -> (int * char) Seq.t val of_seq : char Seq.t -> t