forked from robur/blog.robur.coop
Compare commits
No commits in common. "teeest" and "main" have entirely different histories.
55 changed files with 4994 additions and 3621 deletions
3
.gitignore
vendored
Normal file
3
.gitignore
vendored
Normal file
|
@ -0,0 +1,3 @@
|
|||
_build/
|
||||
_site/
|
||||
_cache
|
674
LICENSE
Normal file
674
LICENSE
Normal file
|
@ -0,0 +1,674 @@
|
|||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 3, 29 June 2007
|
||||
|
||||
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
Preamble
|
||||
|
||||
The GNU General Public License is a free, copyleft license for
|
||||
software and other kinds of works.
|
||||
|
||||
The licenses for most software and other practical works are designed
|
||||
to take away your freedom to share and change the works. By contrast,
|
||||
the GNU General Public License is intended to guarantee your freedom to
|
||||
share and change all versions of a program--to make sure it remains free
|
||||
software for all its users. We, the Free Software Foundation, use the
|
||||
GNU General Public License for most of our software; it applies also to
|
||||
any other work released this way by its authors. You can apply it to
|
||||
your programs, too.
|
||||
|
||||
When we speak of free software, we are referring to freedom, not
|
||||
price. Our General Public Licenses are designed to make sure that you
|
||||
have the freedom to distribute copies of free software (and charge for
|
||||
them if you wish), that you receive source code or can get it if you
|
||||
want it, that you can change the software or use pieces of it in new
|
||||
free programs, and that you know you can do these things.
|
||||
|
||||
To protect your rights, we need to prevent others from denying you
|
||||
these rights or asking you to surrender the rights. Therefore, you have
|
||||
certain responsibilities if you distribute copies of the software, or if
|
||||
you modify it: responsibilities to respect the freedom of others.
|
||||
|
||||
For example, if you distribute copies of such a program, whether
|
||||
gratis or for a fee, you must pass on to the recipients the same
|
||||
freedoms that you received. You must make sure that they, too, receive
|
||||
or can get the source code. And you must show them these terms so they
|
||||
know their rights.
|
||||
|
||||
Developers that use the GNU GPL protect your rights with two steps:
|
||||
(1) assert copyright on the software, and (2) offer you this License
|
||||
giving you legal permission to copy, distribute and/or modify it.
|
||||
|
||||
For the developers' and authors' protection, the GPL clearly explains
|
||||
that there is no warranty for this free software. For both users' and
|
||||
authors' sake, the GPL requires that modified versions be marked as
|
||||
changed, so that their problems will not be attributed erroneously to
|
||||
authors of previous versions.
|
||||
|
||||
Some devices are designed to deny users access to install or run
|
||||
modified versions of the software inside them, although the manufacturer
|
||||
can do so. This is fundamentally incompatible with the aim of
|
||||
protecting users' freedom to change the software. The systematic
|
||||
pattern of such abuse occurs in the area of products for individuals to
|
||||
use, which is precisely where it is most unacceptable. Therefore, we
|
||||
have designed this version of the GPL to prohibit the practice for those
|
||||
products. If such problems arise substantially in other domains, we
|
||||
stand ready to extend this provision to those domains in future versions
|
||||
of the GPL, as needed to protect the freedom of users.
|
||||
|
||||
Finally, every program is threatened constantly by software patents.
|
||||
States should not allow patents to restrict development and use of
|
||||
software on general-purpose computers, but in those that do, we wish to
|
||||
avoid the special danger that patents applied to a free program could
|
||||
make it effectively proprietary. To prevent this, the GPL assures that
|
||||
patents cannot be used to render the program non-free.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification follow.
|
||||
|
||||
TERMS AND CONDITIONS
|
||||
|
||||
0. Definitions.
|
||||
|
||||
"This License" refers to version 3 of the GNU General Public License.
|
||||
|
||||
"Copyright" also means copyright-like laws that apply to other kinds of
|
||||
works, such as semiconductor masks.
|
||||
|
||||
"The Program" refers to any copyrightable work licensed under this
|
||||
License. Each licensee is addressed as "you". "Licensees" and
|
||||
"recipients" may be individuals or organizations.
|
||||
|
||||
To "modify" a work means to copy from or adapt all or part of the work
|
||||
in a fashion requiring copyright permission, other than the making of an
|
||||
exact copy. The resulting work is called a "modified version" of the
|
||||
earlier work or a work "based on" the earlier work.
|
||||
|
||||
A "covered work" means either the unmodified Program or a work based
|
||||
on the Program.
|
||||
|
||||
To "propagate" a work means to do anything with it that, without
|
||||
permission, would make you directly or secondarily liable for
|
||||
infringement under applicable copyright law, except executing it on a
|
||||
computer or modifying a private copy. Propagation includes copying,
|
||||
distribution (with or without modification), making available to the
|
||||
public, and in some countries other activities as well.
|
||||
|
||||
To "convey" a work means any kind of propagation that enables other
|
||||
parties to make or receive copies. Mere interaction with a user through
|
||||
a computer network, with no transfer of a copy, is not conveying.
|
||||
|
||||
An interactive user interface displays "Appropriate Legal Notices"
|
||||
to the extent that it includes a convenient and prominently visible
|
||||
feature that (1) displays an appropriate copyright notice, and (2)
|
||||
tells the user that there is no warranty for the work (except to the
|
||||
extent that warranties are provided), that licensees may convey the
|
||||
work under this License, and how to view a copy of this License. If
|
||||
the interface presents a list of user commands or options, such as a
|
||||
menu, a prominent item in the list meets this criterion.
|
||||
|
||||
1. Source Code.
|
||||
|
||||
The "source code" for a work means the preferred form of the work
|
||||
for making modifications to it. "Object code" means any non-source
|
||||
form of a work.
|
||||
|
||||
A "Standard Interface" means an interface that either is an official
|
||||
standard defined by a recognized standards body, or, in the case of
|
||||
interfaces specified for a particular programming language, one that
|
||||
is widely used among developers working in that language.
|
||||
|
||||
The "System Libraries" of an executable work include anything, other
|
||||
than the work as a whole, that (a) is included in the normal form of
|
||||
packaging a Major Component, but which is not part of that Major
|
||||
Component, and (b) serves only to enable use of the work with that
|
||||
Major Component, or to implement a Standard Interface for which an
|
||||
implementation is available to the public in source code form. A
|
||||
"Major Component", in this context, means a major essential component
|
||||
(kernel, window system, and so on) of the specific operating system
|
||||
(if any) on which the executable work runs, or a compiler used to
|
||||
produce the work, or an object code interpreter used to run it.
|
||||
|
||||
The "Corresponding Source" for a work in object code form means all
|
||||
the source code needed to generate, install, and (for an executable
|
||||
work) run the object code and to modify the work, including scripts to
|
||||
control those activities. However, it does not include the work's
|
||||
System Libraries, or general-purpose tools or generally available free
|
||||
programs which are used unmodified in performing those activities but
|
||||
which are not part of the work. For example, Corresponding Source
|
||||
includes interface definition files associated with source files for
|
||||
the work, and the source code for shared libraries and dynamically
|
||||
linked subprograms that the work is specifically designed to require,
|
||||
such as by intimate data communication or control flow between those
|
||||
subprograms and other parts of the work.
|
||||
|
||||
The Corresponding Source need not include anything that users
|
||||
can regenerate automatically from other parts of the Corresponding
|
||||
Source.
|
||||
|
||||
The Corresponding Source for a work in source code form is that
|
||||
same work.
|
||||
|
||||
2. Basic Permissions.
|
||||
|
||||
All rights granted under this License are granted for the term of
|
||||
copyright on the Program, and are irrevocable provided the stated
|
||||
conditions are met. This License explicitly affirms your unlimited
|
||||
permission to run the unmodified Program. The output from running a
|
||||
covered work is covered by this License only if the output, given its
|
||||
content, constitutes a covered work. This License acknowledges your
|
||||
rights of fair use or other equivalent, as provided by copyright law.
|
||||
|
||||
You may make, run and propagate covered works that you do not
|
||||
convey, without conditions so long as your license otherwise remains
|
||||
in force. You may convey covered works to others for the sole purpose
|
||||
of having them make modifications exclusively for you, or provide you
|
||||
with facilities for running those works, provided that you comply with
|
||||
the terms of this License in conveying all material for which you do
|
||||
not control copyright. Those thus making or running the covered works
|
||||
for you must do so exclusively on your behalf, under your direction
|
||||
and control, on terms that prohibit them from making any copies of
|
||||
your copyrighted material outside their relationship with you.
|
||||
|
||||
Conveying under any other circumstances is permitted solely under
|
||||
the conditions stated below. Sublicensing is not allowed; section 10
|
||||
makes it unnecessary.
|
||||
|
||||
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
|
||||
|
||||
No covered work shall be deemed part of an effective technological
|
||||
measure under any applicable law fulfilling obligations under article
|
||||
11 of the WIPO copyright treaty adopted on 20 December 1996, or
|
||||
similar laws prohibiting or restricting circumvention of such
|
||||
measures.
|
||||
|
||||
When you convey a covered work, you waive any legal power to forbid
|
||||
circumvention of technological measures to the extent such circumvention
|
||||
is effected by exercising rights under this License with respect to
|
||||
the covered work, and you disclaim any intention to limit operation or
|
||||
modification of the work as a means of enforcing, against the work's
|
||||
users, your or third parties' legal rights to forbid circumvention of
|
||||
technological measures.
|
||||
|
||||
4. Conveying Verbatim Copies.
|
||||
|
||||
You may convey verbatim copies of the Program's source code as you
|
||||
receive it, in any medium, provided that you conspicuously and
|
||||
appropriately publish on each copy an appropriate copyright notice;
|
||||
keep intact all notices stating that this License and any
|
||||
non-permissive terms added in accord with section 7 apply to the code;
|
||||
keep intact all notices of the absence of any warranty; and give all
|
||||
recipients a copy of this License along with the Program.
|
||||
|
||||
You may charge any price or no price for each copy that you convey,
|
||||
and you may offer support or warranty protection for a fee.
|
||||
|
||||
5. Conveying Modified Source Versions.
|
||||
|
||||
You may convey a work based on the Program, or the modifications to
|
||||
produce it from the Program, in the form of source code under the
|
||||
terms of section 4, provided that you also meet all of these conditions:
|
||||
|
||||
a) The work must carry prominent notices stating that you modified
|
||||
it, and giving a relevant date.
|
||||
|
||||
b) The work must carry prominent notices stating that it is
|
||||
released under this License and any conditions added under section
|
||||
7. This requirement modifies the requirement in section 4 to
|
||||
"keep intact all notices".
|
||||
|
||||
c) You must license the entire work, as a whole, under this
|
||||
License to anyone who comes into possession of a copy. This
|
||||
License will therefore apply, along with any applicable section 7
|
||||
additional terms, to the whole of the work, and all its parts,
|
||||
regardless of how they are packaged. This License gives no
|
||||
permission to license the work in any other way, but it does not
|
||||
invalidate such permission if you have separately received it.
|
||||
|
||||
d) If the work has interactive user interfaces, each must display
|
||||
Appropriate Legal Notices; however, if the Program has interactive
|
||||
interfaces that do not display Appropriate Legal Notices, your
|
||||
work need not make them do so.
|
||||
|
||||
A compilation of a covered work with other separate and independent
|
||||
works, which are not by their nature extensions of the covered work,
|
||||
and which are not combined with it such as to form a larger program,
|
||||
in or on a volume of a storage or distribution medium, is called an
|
||||
"aggregate" if the compilation and its resulting copyright are not
|
||||
used to limit the access or legal rights of the compilation's users
|
||||
beyond what the individual works permit. Inclusion of a covered work
|
||||
in an aggregate does not cause this License to apply to the other
|
||||
parts of the aggregate.
|
||||
|
||||
6. Conveying Non-Source Forms.
|
||||
|
||||
You may convey a covered work in object code form under the terms
|
||||
of sections 4 and 5, provided that you also convey the
|
||||
machine-readable Corresponding Source under the terms of this License,
|
||||
in one of these ways:
|
||||
|
||||
a) Convey the object code in, or embodied in, a physical product
|
||||
(including a physical distribution medium), accompanied by the
|
||||
Corresponding Source fixed on a durable physical medium
|
||||
customarily used for software interchange.
|
||||
|
||||
b) Convey the object code in, or embodied in, a physical product
|
||||
(including a physical distribution medium), accompanied by a
|
||||
written offer, valid for at least three years and valid for as
|
||||
long as you offer spare parts or customer support for that product
|
||||
model, to give anyone who possesses the object code either (1) a
|
||||
copy of the Corresponding Source for all the software in the
|
||||
product that is covered by this License, on a durable physical
|
||||
medium customarily used for software interchange, for a price no
|
||||
more than your reasonable cost of physically performing this
|
||||
conveying of source, or (2) access to copy the
|
||||
Corresponding Source from a network server at no charge.
|
||||
|
||||
c) Convey individual copies of the object code with a copy of the
|
||||
written offer to provide the Corresponding Source. This
|
||||
alternative is allowed only occasionally and noncommercially, and
|
||||
only if you received the object code with such an offer, in accord
|
||||
with subsection 6b.
|
||||
|
||||
d) Convey the object code by offering access from a designated
|
||||
place (gratis or for a charge), and offer equivalent access to the
|
||||
Corresponding Source in the same way through the same place at no
|
||||
further charge. You need not require recipients to copy the
|
||||
Corresponding Source along with the object code. If the place to
|
||||
copy the object code is a network server, the Corresponding Source
|
||||
may be on a different server (operated by you or a third party)
|
||||
that supports equivalent copying facilities, provided you maintain
|
||||
clear directions next to the object code saying where to find the
|
||||
Corresponding Source. Regardless of what server hosts the
|
||||
Corresponding Source, you remain obligated to ensure that it is
|
||||
available for as long as needed to satisfy these requirements.
|
||||
|
||||
e) Convey the object code using peer-to-peer transmission, provided
|
||||
you inform other peers where the object code and Corresponding
|
||||
Source of the work are being offered to the general public at no
|
||||
charge under subsection 6d.
|
||||
|
||||
A separable portion of the object code, whose source code is excluded
|
||||
from the Corresponding Source as a System Library, need not be
|
||||
included in conveying the object code work.
|
||||
|
||||
A "User Product" is either (1) a "consumer product", which means any
|
||||
tangible personal property which is normally used for personal, family,
|
||||
or household purposes, or (2) anything designed or sold for incorporation
|
||||
into a dwelling. In determining whether a product is a consumer product,
|
||||
doubtful cases shall be resolved in favor of coverage. For a particular
|
||||
product received by a particular user, "normally used" refers to a
|
||||
typical or common use of that class of product, regardless of the status
|
||||
of the particular user or of the way in which the particular user
|
||||
actually uses, or expects or is expected to use, the product. A product
|
||||
is a consumer product regardless of whether the product has substantial
|
||||
commercial, industrial or non-consumer uses, unless such uses represent
|
||||
the only significant mode of use of the product.
|
||||
|
||||
"Installation Information" for a User Product means any methods,
|
||||
procedures, authorization keys, or other information required to install
|
||||
and execute modified versions of a covered work in that User Product from
|
||||
a modified version of its Corresponding Source. The information must
|
||||
suffice to ensure that the continued functioning of the modified object
|
||||
code is in no case prevented or interfered with solely because
|
||||
modification has been made.
|
||||
|
||||
If you convey an object code work under this section in, or with, or
|
||||
specifically for use in, a User Product, and the conveying occurs as
|
||||
part of a transaction in which the right of possession and use of the
|
||||
User Product is transferred to the recipient in perpetuity or for a
|
||||
fixed term (regardless of how the transaction is characterized), the
|
||||
Corresponding Source conveyed under this section must be accompanied
|
||||
by the Installation Information. But this requirement does not apply
|
||||
if neither you nor any third party retains the ability to install
|
||||
modified object code on the User Product (for example, the work has
|
||||
been installed in ROM).
|
||||
|
||||
The requirement to provide Installation Information does not include a
|
||||
requirement to continue to provide support service, warranty, or updates
|
||||
for a work that has been modified or installed by the recipient, or for
|
||||
the User Product in which it has been modified or installed. Access to a
|
||||
network may be denied when the modification itself materially and
|
||||
adversely affects the operation of the network or violates the rules and
|
||||
protocols for communication across the network.
|
||||
|
||||
Corresponding Source conveyed, and Installation Information provided,
|
||||
in accord with this section must be in a format that is publicly
|
||||
documented (and with an implementation available to the public in
|
||||
source code form), and must require no special password or key for
|
||||
unpacking, reading or copying.
|
||||
|
||||
7. Additional Terms.
|
||||
|
||||
"Additional permissions" are terms that supplement the terms of this
|
||||
License by making exceptions from one or more of its conditions.
|
||||
Additional permissions that are applicable to the entire Program shall
|
||||
be treated as though they were included in this License, to the extent
|
||||
that they are valid under applicable law. If additional permissions
|
||||
apply only to part of the Program, that part may be used separately
|
||||
under those permissions, but the entire Program remains governed by
|
||||
this License without regard to the additional permissions.
|
||||
|
||||
When you convey a copy of a covered work, you may at your option
|
||||
remove any additional permissions from that copy, or from any part of
|
||||
it. (Additional permissions may be written to require their own
|
||||
removal in certain cases when you modify the work.) You may place
|
||||
additional permissions on material, added by you to a covered work,
|
||||
for which you have or can give appropriate copyright permission.
|
||||
|
||||
Notwithstanding any other provision of this License, for material you
|
||||
add to a covered work, you may (if authorized by the copyright holders of
|
||||
that material) supplement the terms of this License with terms:
|
||||
|
||||
a) Disclaiming warranty or limiting liability differently from the
|
||||
terms of sections 15 and 16 of this License; or
|
||||
|
||||
b) Requiring preservation of specified reasonable legal notices or
|
||||
author attributions in that material or in the Appropriate Legal
|
||||
Notices displayed by works containing it; or
|
||||
|
||||
c) Prohibiting misrepresentation of the origin of that material, or
|
||||
requiring that modified versions of such material be marked in
|
||||
reasonable ways as different from the original version; or
|
||||
|
||||
d) Limiting the use for publicity purposes of names of licensors or
|
||||
authors of the material; or
|
||||
|
||||
e) Declining to grant rights under trademark law for use of some
|
||||
trade names, trademarks, or service marks; or
|
||||
|
||||
f) Requiring indemnification of licensors and authors of that
|
||||
material by anyone who conveys the material (or modified versions of
|
||||
it) with contractual assumptions of liability to the recipient, for
|
||||
any liability that these contractual assumptions directly impose on
|
||||
those licensors and authors.
|
||||
|
||||
All other non-permissive additional terms are considered "further
|
||||
restrictions" within the meaning of section 10. If the Program as you
|
||||
received it, or any part of it, contains a notice stating that it is
|
||||
governed by this License along with a term that is a further
|
||||
restriction, you may remove that term. If a license document contains
|
||||
a further restriction but permits relicensing or conveying under this
|
||||
License, you may add to a covered work material governed by the terms
|
||||
of that license document, provided that the further restriction does
|
||||
not survive such relicensing or conveying.
|
||||
|
||||
If you add terms to a covered work in accord with this section, you
|
||||
must place, in the relevant source files, a statement of the
|
||||
additional terms that apply to those files, or a notice indicating
|
||||
where to find the applicable terms.
|
||||
|
||||
Additional terms, permissive or non-permissive, may be stated in the
|
||||
form of a separately written license, or stated as exceptions;
|
||||
the above requirements apply either way.
|
||||
|
||||
8. Termination.
|
||||
|
||||
You may not propagate or modify a covered work except as expressly
|
||||
provided under this License. Any attempt otherwise to propagate or
|
||||
modify it is void, and will automatically terminate your rights under
|
||||
this License (including any patent licenses granted under the third
|
||||
paragraph of section 11).
|
||||
|
||||
However, if you cease all violation of this License, then your
|
||||
license from a particular copyright holder is reinstated (a)
|
||||
provisionally, unless and until the copyright holder explicitly and
|
||||
finally terminates your license, and (b) permanently, if the copyright
|
||||
holder fails to notify you of the violation by some reasonable means
|
||||
prior to 60 days after the cessation.
|
||||
|
||||
Moreover, your license from a particular copyright holder is
|
||||
reinstated permanently if the copyright holder notifies you of the
|
||||
violation by some reasonable means, this is the first time you have
|
||||
received notice of violation of this License (for any work) from that
|
||||
copyright holder, and you cure the violation prior to 30 days after
|
||||
your receipt of the notice.
|
||||
|
||||
Termination of your rights under this section does not terminate the
|
||||
licenses of parties who have received copies or rights from you under
|
||||
this License. If your rights have been terminated and not permanently
|
||||
reinstated, you do not qualify to receive new licenses for the same
|
||||
material under section 10.
|
||||
|
||||
9. Acceptance Not Required for Having Copies.
|
||||
|
||||
You are not required to accept this License in order to receive or
|
||||
run a copy of the Program. Ancillary propagation of a covered work
|
||||
occurring solely as a consequence of using peer-to-peer transmission
|
||||
to receive a copy likewise does not require acceptance. However,
|
||||
nothing other than this License grants you permission to propagate or
|
||||
modify any covered work. These actions infringe copyright if you do
|
||||
not accept this License. Therefore, by modifying or propagating a
|
||||
covered work, you indicate your acceptance of this License to do so.
|
||||
|
||||
10. Automatic Licensing of Downstream Recipients.
|
||||
|
||||
Each time you convey a covered work, the recipient automatically
|
||||
receives a license from the original licensors, to run, modify and
|
||||
propagate that work, subject to this License. You are not responsible
|
||||
for enforcing compliance by third parties with this License.
|
||||
|
||||
An "entity transaction" is a transaction transferring control of an
|
||||
organization, or substantially all assets of one, or subdividing an
|
||||
organization, or merging organizations. If propagation of a covered
|
||||
work results from an entity transaction, each party to that
|
||||
transaction who receives a copy of the work also receives whatever
|
||||
licenses to the work the party's predecessor in interest had or could
|
||||
give under the previous paragraph, plus a right to possession of the
|
||||
Corresponding Source of the work from the predecessor in interest, if
|
||||
the predecessor has it or can get it with reasonable efforts.
|
||||
|
||||
You may not impose any further restrictions on the exercise of the
|
||||
rights granted or affirmed under this License. For example, you may
|
||||
not impose a license fee, royalty, or other charge for exercise of
|
||||
rights granted under this License, and you may not initiate litigation
|
||||
(including a cross-claim or counterclaim in a lawsuit) alleging that
|
||||
any patent claim is infringed by making, using, selling, offering for
|
||||
sale, or importing the Program or any portion of it.
|
||||
|
||||
11. Patents.
|
||||
|
||||
A "contributor" is a copyright holder who authorizes use under this
|
||||
License of the Program or a work on which the Program is based. The
|
||||
work thus licensed is called the contributor's "contributor version".
|
||||
|
||||
A contributor's "essential patent claims" are all patent claims
|
||||
owned or controlled by the contributor, whether already acquired or
|
||||
hereafter acquired, that would be infringed by some manner, permitted
|
||||
by this License, of making, using, or selling its contributor version,
|
||||
but do not include claims that would be infringed only as a
|
||||
consequence of further modification of the contributor version. For
|
||||
purposes of this definition, "control" includes the right to grant
|
||||
patent sublicenses in a manner consistent with the requirements of
|
||||
this License.
|
||||
|
||||
Each contributor grants you a non-exclusive, worldwide, royalty-free
|
||||
patent license under the contributor's essential patent claims, to
|
||||
make, use, sell, offer for sale, import and otherwise run, modify and
|
||||
propagate the contents of its contributor version.
|
||||
|
||||
In the following three paragraphs, a "patent license" is any express
|
||||
agreement or commitment, however denominated, not to enforce a patent
|
||||
(such as an express permission to practice a patent or covenant not to
|
||||
sue for patent infringement). To "grant" such a patent license to a
|
||||
party means to make such an agreement or commitment not to enforce a
|
||||
patent against the party.
|
||||
|
||||
If you convey a covered work, knowingly relying on a patent license,
|
||||
and the Corresponding Source of the work is not available for anyone
|
||||
to copy, free of charge and under the terms of this License, through a
|
||||
publicly available network server or other readily accessible means,
|
||||
then you must either (1) cause the Corresponding Source to be so
|
||||
available, or (2) arrange to deprive yourself of the benefit of the
|
||||
patent license for this particular work, or (3) arrange, in a manner
|
||||
consistent with the requirements of this License, to extend the patent
|
||||
license to downstream recipients. "Knowingly relying" means you have
|
||||
actual knowledge that, but for the patent license, your conveying the
|
||||
covered work in a country, or your recipient's use of the covered work
|
||||
in a country, would infringe one or more identifiable patents in that
|
||||
country that you have reason to believe are valid.
|
||||
|
||||
If, pursuant to or in connection with a single transaction or
|
||||
arrangement, you convey, or propagate by procuring conveyance of, a
|
||||
covered work, and grant a patent license to some of the parties
|
||||
receiving the covered work authorizing them to use, propagate, modify
|
||||
or convey a specific copy of the covered work, then the patent license
|
||||
you grant is automatically extended to all recipients of the covered
|
||||
work and works based on it.
|
||||
|
||||
A patent license is "discriminatory" if it does not include within
|
||||
the scope of its coverage, prohibits the exercise of, or is
|
||||
conditioned on the non-exercise of one or more of the rights that are
|
||||
specifically granted under this License. You may not convey a covered
|
||||
work if you are a party to an arrangement with a third party that is
|
||||
in the business of distributing software, under which you make payment
|
||||
to the third party based on the extent of your activity of conveying
|
||||
the work, and under which the third party grants, to any of the
|
||||
parties who would receive the covered work from you, a discriminatory
|
||||
patent license (a) in connection with copies of the covered work
|
||||
conveyed by you (or copies made from those copies), or (b) primarily
|
||||
for and in connection with specific products or compilations that
|
||||
contain the covered work, unless you entered into that arrangement,
|
||||
or that patent license was granted, prior to 28 March 2007.
|
||||
|
||||
Nothing in this License shall be construed as excluding or limiting
|
||||
any implied license or other defenses to infringement that may
|
||||
otherwise be available to you under applicable patent law.
|
||||
|
||||
12. No Surrender of Others' Freedom.
|
||||
|
||||
If conditions are imposed on you (whether by court order, agreement or
|
||||
otherwise) that contradict the conditions of this License, they do not
|
||||
excuse you from the conditions of this License. If you cannot convey a
|
||||
covered work so as to satisfy simultaneously your obligations under this
|
||||
License and any other pertinent obligations, then as a consequence you may
|
||||
not convey it at all. For example, if you agree to terms that obligate you
|
||||
to collect a royalty for further conveying from those to whom you convey
|
||||
the Program, the only way you could satisfy both those terms and this
|
||||
License would be to refrain entirely from conveying the Program.
|
||||
|
||||
13. Use with the GNU Affero General Public License.
|
||||
|
||||
Notwithstanding any other provision of this License, you have
|
||||
permission to link or combine any covered work with a work licensed
|
||||
under version 3 of the GNU Affero General Public License into a single
|
||||
combined work, and to convey the resulting work. The terms of this
|
||||
License will continue to apply to the part which is the covered work,
|
||||
but the special requirements of the GNU Affero General Public License,
|
||||
section 13, concerning interaction through a network will apply to the
|
||||
combination as such.
|
||||
|
||||
14. Revised Versions of this License.
|
||||
|
||||
The Free Software Foundation may publish revised and/or new versions of
|
||||
the GNU General Public License from time to time. Such new versions will
|
||||
be similar in spirit to the present version, but may differ in detail to
|
||||
address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the
|
||||
Program specifies that a certain numbered version of the GNU General
|
||||
Public License "or any later version" applies to it, you have the
|
||||
option of following the terms and conditions either of that numbered
|
||||
version or of any later version published by the Free Software
|
||||
Foundation. If the Program does not specify a version number of the
|
||||
GNU General Public License, you may choose any version ever published
|
||||
by the Free Software Foundation.
|
||||
|
||||
If the Program specifies that a proxy can decide which future
|
||||
versions of the GNU General Public License can be used, that proxy's
|
||||
public statement of acceptance of a version permanently authorizes you
|
||||
to choose that version for the Program.
|
||||
|
||||
Later license versions may give you additional or different
|
||||
permissions. However, no additional obligations are imposed on any
|
||||
author or copyright holder as a result of your choosing to follow a
|
||||
later version.
|
||||
|
||||
15. Disclaimer of Warranty.
|
||||
|
||||
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
|
||||
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
|
||||
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
|
||||
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
|
||||
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
|
||||
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
|
||||
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
|
||||
|
||||
16. Limitation of Liability.
|
||||
|
||||
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
|
||||
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
|
||||
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
|
||||
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
|
||||
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
|
||||
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
|
||||
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
|
||||
SUCH DAMAGES.
|
||||
|
||||
17. Interpretation of Sections 15 and 16.
|
||||
|
||||
If the disclaimer of warranty and limitation of liability provided
|
||||
above cannot be given local legal effect according to their terms,
|
||||
reviewing courts shall apply local law that most closely approximates
|
||||
an absolute waiver of all civil liability in connection with the
|
||||
Program, unless a warranty or assumption of liability accompanies a
|
||||
copy of the Program in return for a fee.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
How to Apply These Terms to Your New Programs
|
||||
|
||||
If you develop a new program, and you want it to be of the greatest
|
||||
possible use to the public, the best way to achieve this is to make it
|
||||
free software which everyone can redistribute and change under these terms.
|
||||
|
||||
To do so, attach the following notices to the program. It is safest
|
||||
to attach them to the start of each source file to most effectively
|
||||
state the exclusion of warranty; and each file should have at least
|
||||
the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
<one line to give the program's name and a brief idea of what it does.>
|
||||
Copyright (C) <year> <name of author>
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
If the program does terminal interaction, make it output a short
|
||||
notice like this when it starts in an interactive mode:
|
||||
|
||||
<program> Copyright (C) <year> <name of author>
|
||||
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||
This is free software, and you are welcome to redistribute it
|
||||
under certain conditions; type `show c' for details.
|
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||
parts of the General Public License. Of course, your program's commands
|
||||
might be different; for a GUI interface, you would use an "about box".
|
||||
|
||||
You should also get your employer (if you work as a programmer) or school,
|
||||
if any, to sign a "copyright disclaimer" for the program, if necessary.
|
||||
For more information on this, and how to apply and follow the GNU GPL, see
|
||||
<https://www.gnu.org/licenses/>.
|
||||
|
||||
The GNU General Public License does not permit incorporating your program
|
||||
into proprietary programs. If your program is a subroutine library, you
|
||||
may consider it more useful to permit linking proprietary applications with
|
||||
the library. If this is what you want to do, use the GNU Lesser General
|
||||
Public License instead of this License. But first, please read
|
||||
<https://www.gnu.org/licenses/why-not-lgpl.html>.
|
46
README.md
Normal file
46
README.md
Normal file
|
@ -0,0 +1,46 @@
|
|||
### How to add an article?
|
||||
|
||||
The Git repository contains 2 branches:
|
||||
- the main branch which has the blog engine
|
||||
- the `gh-pages` (as GitHub) which contains the generated website
|
||||
|
||||
The user can have an overview of the website via:
|
||||
```shell-session
|
||||
$ git clone git@git.robur.coop:robur/blog.robur.coop
|
||||
$ cd blog.robur.coop/
|
||||
$ opam pin add -yn .
|
||||
$ opam install --deps-only blogger
|
||||
$ dune exec bin/watch.exe --
|
||||
```
|
||||
|
||||
A little server run on `http://localhost:8000`.
|
||||
|
||||
The user can add an article into the `articles/` directory. The format is easy.
|
||||
A simple header which starts with `---` and finish with `---`. Inside, you have
|
||||
a YAML description of the article where some fields are required:
|
||||
- `date`
|
||||
- `title`
|
||||
- `description`
|
||||
- `tags`
|
||||
|
||||
You can specify an `author` (with its `name`, `email` and `link`) or not. By
|
||||
default, we use `team@robur.coop`. If everything looks good, you can generate
|
||||
via the `blogger.exe` tool the generated website via:
|
||||
```shell-session
|
||||
$ dune exec bin/push.exe -- push \
|
||||
-r git@git.robur.coop:robur/blog.robur.coop.git#gh-pages \
|
||||
--host https://blog.robur.coop
|
||||
[--name "The Robur team"] \
|
||||
[--email team@robur.coop]
|
||||
```
|
||||
|
||||
An SSH communication will starts. If you already registered your private key
|
||||
with `ssh-agent` and your `.ssh/config` is configured to take this one if you
|
||||
communicate with with `git@git.robur.coop`, everything will be smooth! Et voilà!
|
||||
At the end, an HTTP request will be send to `https://blog.robur.coop` (via
|
||||
Forgejo) to update the unikernel with the last version of the blog.
|
||||
|
||||
You can also use the `update.sh` script to update the blog with the builder user
|
||||
on the server machine.
|
||||
|
||||
**NOTE**: don't forget `#gh-pages`! Also, you probably should do a `git pull`.
|
|
@ -1,283 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogPython's `str.__repr__()`
|
||||
</title>
|
||||
<meta name="description" content="Reimplementing Python string escaping in OCaml">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>Python's `str.__repr__()`</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Python">Python</a></li><li><a href="https://blog.robur.coop/tags.html#tag-unicode">unicode</a></li></ul><p>Sometimes software is written using whatever built-ins you find in your programming language of choice.
|
||||
This is usually great!
|
||||
However, it can happen that you depend on the precise semantics of those built-ins.
|
||||
This can be a problem if those semantics become important to your software and you need to port it to another programming language.
|
||||
This story is about Python and its <code>str.__repr()__</code> function.</p>
|
||||
<p>The piece of software I was helping port to <a href="https://ocaml.org/">OCaml</a> was constructing a hash from the string representation of a tuple.
|
||||
The gist of it was basically this:</p>
|
||||
<pre><code class="language-python">def get_id(x):
|
||||
id = (x.get_unique_string(), x.path, x.name)
|
||||
return myhash(str(id))
|
||||
</code></pre>
|
||||
<p>In other words it's a Python tuple consisting of mostly strings but also a <code>PosixPath</code> object.
|
||||
The way <code>str()</code> works is it calls the <code>__str__()</code> method on the argument objects (or otherwise <code>repr(x)</code>).
|
||||
For Python tuples the <code>__str__()</code> method seems to print the result of <code>repr()</code> on each elemenet separated by a comma and a space and surrounded by parenthesis.
|
||||
So good so far.
|
||||
If we can precisely emulate <code>repr()</code> on strings and <code>PosixPath</code> it's easy.
|
||||
In the case of <code>PosixPath</code> it's really just <code>'PosixPath('+repr(str(path))+')'</code>;
|
||||
so in that case it's down to <code>repr()</code> on strings - which is <code>str.__repr__()</code>,</p>
|
||||
<p>There had been a previous attempt at this that would use OCaml's string escape functions and surround the string with single quotes (<code>'</code>).
|
||||
This works for some cases, but not if the string has a double quote (<code>"</code>).
|
||||
In that case OCaml would escape the double quote with a backslash (<code>\"</code>) while python would not escape it.
|
||||
So a regular expression substitution was added to replace the escape sequence with just a double quote.
|
||||
This pattern of finding small differences between Python and OCaml escaping had been repeated,
|
||||
and eventually I decided to take a more rigorous approach to it.</p>
|
||||
<h2 id="what-is-a-string"><a class="anchor" aria-hidden="true" href="#what-is-a-string"></a>What is a string?</h2>
|
||||
<p>First of all, what is a string? In Python? And in OCaml?
|
||||
In OCaml a string is just a sequence of bytes.
|
||||
Any bytes, even <code>NUL</code> bytes.
|
||||
There is no concept of unicode in OCaml strings.<br>
|
||||
In Python there is the <code>str</code> type which is a sequence of Unicode code points<sup><a href="#fn-python-bytes" id="ref-1-fn-python-bytes" role="doc-noteref" class="fn-label">[1]</a></sup>.
|
||||
I can recommend reading Daniel Bünzli's <a href="https://ocaml.org/p/uucp/13.0.0/doc/unicode.html#minimal">minimal introduction to Unicode</a>.
|
||||
Already here there is a significant gap in semantics between Python and OCaml.
|
||||
For many practical purposes we can get away with using the OCaml <code>string</code> type and treating it as a UTF-8 encoded Unicode string.
|
||||
This is what I will do as in both the Python code and the OCaml code the data being read is a UTF-8 (or often only the US ASCII subset) encoded string.</p>
|
||||
<h2 id="what-does-a-string-literal-look-like"><a class="anchor" aria-hidden="true" href="#what-does-a-string-literal-look-like"></a>What does a string literal look like?</h2>
|
||||
<h3 id="ocaml"><a class="anchor" aria-hidden="true" href="#ocaml"></a>OCaml</h3>
|
||||
<p>I will not dive too deep into the details of OCaml string literals, and focus mostly on how they are escaped by the language built-ins (<code>String.escaped</code>, <code>Printf.printf "%S"</code>).
|
||||
Normal printable ASCII is printed as-is.
|
||||
That is, letters, numbers and other symbols except for backslash and double quote.
|
||||
There are the usual escape sequences <code>\n</code>, <code>\t</code>, <code>\r</code>, <code>\"</code> and <code>\\</code>.
|
||||
Any byte value can be represented with decimal notation <code>\032</code> or octal notation '\o040' or hexadecimal notation <code>\x20</code>.
|
||||
The escape functions in OCaml has a preference for the decimal notation over the hexadecimal notation.
|
||||
Finally I also want to mention the Unicode code point escape sequence <code>\u{3bb}</code> which represents the UTF-8 encoding of U+3BB.
|
||||
While the escape functions do not use it, it will become handy later on.
|
||||
Illegal escape sequences (escape sequences that are not recognized) will emit a warning but otherwise result in the escape sequence as-is.
|
||||
It is common to compile OCaml programs with warnings-as-errors, however.</p>
|
||||
<h3 id="python"><a class="anchor" aria-hidden="true" href="#python"></a>Python</h3>
|
||||
<p>Python has a number of different string literals and string-like literals.
|
||||
They all use single quote or double quote to delimit the string (or string-like) literals.
|
||||
There is a preference towards single quotes in <code>str.__repr__()</code>.
|
||||
You can also triple the quotes if you like to write a string that uses a lot of both quote characters.
|
||||
This format is not used by <code>str.__repr__()</code> so I will not cover it further, but you can read about it in the <a href="https://docs.python.org/3/reference/lexical_analysis.html#strings">Python reference manual</a>.
|
||||
The string literal can optionally have a prefix character that modifies what type the string literal is and how its content is interpreted.</p>
|
||||
<p>The <code>r</code>-prefixed strings are called <em>raw strings</em>.
|
||||
That means backslash escape sequences are not interpreted.
|
||||
In my experiments they seem to be quasi-interpreted, however!
|
||||
The string <code>r"\"</code> is considered unterminated!
|
||||
But <code>r"\""</code> is fine as is interpreted as <code>'\\"'</code><sup><a href="#fn-raw-escape-example" id="ref-1-fn-raw-escape-example" role="doc-noteref" class="fn-label">[2]</a></sup>.
|
||||
Why this is the case I have not found a good explanation for.</p>
|
||||
<p>The <code>b</code>-prefixed strings are <code>bytes</code> literals.
|
||||
This is close to OCaml strings.</p>
|
||||
<p>Finally there are the unprefixed strings which are <code>str</code> literals.
|
||||
These are the ones we are most interested in.
|
||||
They use the usual escape <code>\[ntr"]</code> we know from OCaml as well as <code>\'</code>.
|
||||
<code>\032</code> is <strong>octal</strong> notation and <code>\x20</code> is hexadecimal notation.
|
||||
There is as far as I know <strong>no</strong> decimal notation.
|
||||
The output of <code>str.__repr__()</code> uses the hexadecimal notation over the octal notation.
|
||||
As Python strings are Unicode code point sequences we need more than two hexadecimal digits to be able to represent all valid "characters".
|
||||
Thus there are the longer <code>\u0032</code> and the longest <code>\U00000032</code>.</p>
|
||||
<h2 id="intermezzo"><a class="anchor" aria-hidden="true" href="#intermezzo"></a>Intermezzo</h2>
|
||||
<p>While studying Python string literals I discovered several odd corners of the syntax and semantics besides the raw string quasi-escape sequence mentioned earlier.
|
||||
One fact is that Python doesn't have a separate character or Unicode code point type.
|
||||
Instead, a character is a one element string.
|
||||
This leads to some interesting indexing shenanigans: <code>"a"[0][0][0] == "a"</code>.
|
||||
Furthermore, strings separated by spaces only are treated as one single concatenated string: <code>"a" "b" "c" == "abc"</code>.
|
||||
These two combined makes it possible to write this unusual snippet: <code>"a" "b" "c"[0] == "a"</code>!
|
||||
For byte sequences, or <code>b</code>-prefixed strings, things are different.
|
||||
Indexing a bytes object returns the integer value of that byte (or character):</p>
|
||||
<pre><code class="language-python">>>> b"a"[0]
|
||||
97
|
||||
>>> b"a"[0][0]
|
||||
<stdin>:1: SyntaxWarning: 'int' object is not subscriptable; perhaps you missed a comma?
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: 'int' object is not subscriptable
|
||||
</code></pre>
|
||||
<p>For strings <code>\x32</code> can be said to be shorthand for <code>"\u0032"</code> (or <code>"\u00000032"</code>).
|
||||
But for bytes <code>"\x32" != "\u0032"</code>!
|
||||
Why is this?!
|
||||
Well, bytes is a byte sequence and <code>b"\u0032"</code> is not interpreted as an escape sequence and is instead <strong>silently</strong> treated as <code>b"\\u0032"</code>!
|
||||
Writing <code>"\xff".encode()</code> which encodes the string <code>"\xff"</code> to UTF-8 is <strong>not</strong> the same as <code>b"\xff"</code>.
|
||||
The bytes <code>"\xff"</code> consist of a single byte with decimal value 255,
|
||||
and the Unicode wizards reading will know that the Unicode code point 255 (or U+FF) is encoded in two bytes in UTF-8.</p>
|
||||
<h2 id="where-is-the-python-code"><a class="anchor" aria-hidden="true" href="#where-is-the-python-code"></a>Where is the Python code?</h2>
|
||||
<p>Finding the implementation of <code>str.__repr__()</code> turned out to not be so easy.
|
||||
In the end I asked on the Internet and got a link to <a href="https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Objects/unicodeobject.c#L12245-L12405">cpython's <code>Objects/unicodeobject.c</code></a>.
|
||||
And holy cow!
|
||||
That's some 160 lines of C code with two loops, a switch statement and I don't know how many chained and nested if statements!
|
||||
Meanwhile the OCaml implementation is a much less daunting 52 lines of which about a fifth is a long comment.
|
||||
It also has two loops which each contain one much more tame match expression (roughly a C switch statement).
|
||||
In both cases they first loop over the string to compute the size of the output string.
|
||||
The Python implementation also counts the number of double quotes and single quotes as well as the highest code point value.
|
||||
The latter I'm not sure why they do, but my guess it's so they can choose an efficient internal representation.
|
||||
Then the Python code decides what quote character to use with the following algorithm:<br>
|
||||
Does the string contain single quotes but no double quotes? Then use double quotes. Otherwise use single quotes.
|
||||
Then the output size estimate is adjusted with the number of backslashes to escape the quote character chosen and the two quotes surrounding the string.</p>
|
||||
<p>Already here it's clear that a regular expression substitution is not enough by itself to fix OCaml escaping to be Python escaping.
|
||||
My first step then was to implement the algorithm only for US ASCII.
|
||||
This is simpler as we don't have to worry much about Unicode, and I could implement it relatively quickly.
|
||||
The first 32 characters and the last US ASCII character (DEL or <code>\x7f</code>) are considered non-printable and must be escaped.
|
||||
I then wrote some simple tests by hand.
|
||||
Then I discovered the OCaml <a href="https://github.com/zshipko/ocaml-py">py</a> library which provides bindings to Python from OCaml.
|
||||
Great! This I can use to test my implementation against Python!</p>
|
||||
<h2 id="how-about-unicode"><a class="anchor" aria-hidden="true" href="#how-about-unicode"></a>How about Unicode?</h2>
|
||||
<p>For the non-ascii characters (or code points rather) they are either considered <em>printable</em> or <em>non-printable</em>.
|
||||
For now let's look at what that means for the output.
|
||||
A printable character is copied as-is.
|
||||
That is, there is no escaping done.
|
||||
Non-printable characters must be escaped, and python wil use <code>\xHH</code>, <code>\uHHHH</code> or <code>\UHHHHHHHH</code> depending on how many hexadecimal digits are necessary to represent the code point.
|
||||
That is, the latin-1 subset of ASCII (<code>0x80</code>-<code>0xff</code>) can be represented using <code>\xHH</code> and neither <code>\u00HH</code> nor <code>\U000000HH</code> will be used etc.</p>
|
||||
<h3 id="what-is-a-printable-unicode-character"><a class="anchor" aria-hidden="true" href="#what-is-a-printable-unicode-character"></a>What is a printable Unicode character?</h3>
|
||||
<p>In the cpython <a href="https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Objects/unicodeobject.c#L12245-L12405">function</a> mentioned earlier they use the function <code>Py_UNICODE_ISPRINTABLE</code>.
|
||||
I had a local clone of the cpython git repository where I ran <code>git grep Py_UNICODE_ISPRINTABLE</code> to find information about it.
|
||||
In <a href="https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Doc/c-api/unicode.rst?plain=1#L257-L265">unicode.rst</a> I found a documentation string for the function that describes it to return false if the character is nonprintable with the definition of nonprintable as the code point being in the categories "Other" or "Separator" in the Unicode character database <strong>with the exception of ASCII space</strong> (U+20 or <code> </code>).</p>
|
||||
<p>What are those "Other" and "Separator" categories?
|
||||
Further searching for the function definition we find in <a href="https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Include/cpython/unicodeobject.h#L683"><code>Include/cpython/unicodeobject.h</code></a> the definition.
|
||||
Well, we find <code>#define Py_UNICODE_ISPRINTABLE(ch) _PyUnicode_IsPrintable(ch)</code>.
|
||||
On to <code>git grep _PyUnicode_IsPrintable</code> then.
|
||||
That function is defined in <a href="https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Objects/unicodectype.c#L158-L163"><code>Objects/unicodectype.c</code></a>.</p>
|
||||
<pre><code class="language-C">/* Returns 1 for Unicode characters to be hex-escaped when repr()ed,
|
||||
0 otherwise.
|
||||
All characters except those characters defined in the Unicode character
|
||||
database as following categories are considered printable.
|
||||
* Cc (Other, Control)
|
||||
* Cf (Other, Format)
|
||||
* Cs (Other, Surrogate)
|
||||
* Co (Other, Private Use)
|
||||
* Cn (Other, Not Assigned)
|
||||
* Zl Separator, Line ('\u2028', LINE SEPARATOR)
|
||||
* Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
|
||||
* Zs (Separator, Space) other than ASCII space('\x20').
|
||||
*/
|
||||
int _PyUnicode_IsPrintable(Py_UCS4 ch)
|
||||
{
|
||||
const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);
|
||||
|
||||
return (ctype->flags & PRINTABLE_MASK) != 0;
|
||||
}
|
||||
</code></pre>
|
||||
<p>Ok, now we're getting close to something.
|
||||
Searching for <code>PRINTABLE_MASK</code> we find in <a href="https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Tools/unicode/makeunicodedata.py#L450-L451"><code>Tools/unicode/makeunicodedata.py</code></a> the following line of code:</p>
|
||||
<pre><code class="language-Python">if char == ord(" ") or category[0] not in ("C", "Z"):
|
||||
flags |= PRINTABLE_MASK
|
||||
</code></pre>
|
||||
<p>So the algorithm is really if the character is a space character or if its Unicode general category doesn't start with a <code>C</code> or <code>Z</code>.
|
||||
This can be implemented in OCaml using the uucp library as follows:</p>
|
||||
<pre><code class="language-OCaml">let py_unicode_isprintable uchar =
|
||||
(* {[if char == ord(" ") or category[0] not in ("C", "Z"):
|
||||
flags |= PRINTABLE_MASK]} *)
|
||||
Uchar.equal uchar (Uchar.of_char ' ')
|
||||
||
|
||||
let gc = Uucp.Gc.general_category uchar in
|
||||
(* Not those categories starting with 'C' or 'Z' *)
|
||||
match gc with
|
||||
| `Cc | `Cf | `Cn | `Co | `Cs | `Zl | `Zp | `Zs -> false
|
||||
| `Ll | `Lm | `Lo | `Lt | `Lu | `Mc | `Me | `Mn | `Nd | `Nl | `No | `Pc | `Pd
|
||||
| `Pe | `Pf | `Pi | `Po | `Ps | `Sc | `Sk | `Sm | `So ->
|
||||
true
|
||||
</code></pre>
|
||||
<p>After implementing unicode I expanded the tests to generate arbitrary OCaml strings and compare the results of calling my function and Python's <code>str.__repr__()</code> on the string.
|
||||
Well, that didn't go quite well.
|
||||
OCaml strings are just any byte sequence, and ocaml-py expects it to be a UTF-8 encoded string and fails on invalid UTF-8.
|
||||
Then in qcheck you can "assume" a predicate which means if a predicate doesn't hold on the generated value then the test is skipped for that input.
|
||||
So I implement a simple verification of UTF-8.
|
||||
This is far from optimal because qcheck will generate a lot of invalid utf-8 strings.</p>
|
||||
<p>The next test failure is some unassigned code point.
|
||||
So I add to <code>py_unicode_isprintable</code> a check that the code point is assigned using <code>Uucp.Age.age uchar <> `Unassigned</code>.</p>
|
||||
<p>Still, qcheck found a case I hadn't considered: U+61D.
|
||||
My python version (Python 3.9.2 (default, Feb 28 2021, 17:03:44)) renders this as <code>'\u061'</code> while my OCaml function prints it as-is.
|
||||
In other words my implementation considers it printable while python does not.
|
||||
I try to enter this Unicode character in my terminal, but nothing shows up.
|
||||
Then I look it up and its name is <code>ARABIC END OF TEXT MARKER</code>.
|
||||
The general category according to uucp is <code>`Po</code>.
|
||||
So this <strong>should</strong> be a printable character‽</p>
|
||||
<p>After being stumped by this for a while I get the suspicion it may be dependent on the Python version.
|
||||
I am still on Debian 11 and my Python version is far from being the latest and greatest.
|
||||
I ask someone with a newer Python version to write <code>'\u061d'</code> in a python session.
|
||||
And 'lo! It prints something that looks like <code>''</code>!
|
||||
Online I figure out how to get the unicode version compiled into Python:</p>
|
||||
<pre><code class="language-Python">>>> import unicodedata
|
||||
>>> unicodedata.unidata_version
|
||||
'13.0.0'
|
||||
</code></pre>
|
||||
<p>Aha! And with uucp we find that the unicode version that introduced U+61D to be 14.0:</p>
|
||||
<pre><code class="language-OCaml"># Uucp.Age.age (Uchar.of_int 0x61D);;
|
||||
- : Uucp.Age.t = `Version (14, 0)
|
||||
</code></pre>
|
||||
<p>My reaction is this is seriously some ungodly mess we are in.
|
||||
Not only is the code that instigated this journey highly dependent on Python-specifics - it's also dependent on the specific version of unicode and thus the version of Python!</p>
|
||||
<p>I modify our <code>py_unicode_isprintable</code> function to take an optional <code>?unicode_version</code> argument and replace the "is this unassigned?" check with the following snippet:</p>
|
||||
<pre><code class="language-OCaml">let age = Uucp.Age.age uchar in
|
||||
(match (age, unicode_version) with
|
||||
| `Unassigned, _ -> false
|
||||
| `Version _, None -> true
|
||||
| `Version (major, minor), Some (major', minor') ->
|
||||
major < major' || (major = major' && minor <= minor'))
|
||||
</code></pre>
|
||||
<p>Great! I modify the test suite to first detect the unicode version python uses and then pass that version to the OCaml function.
|
||||
Now I can't find anymore failing test cases!</p>
|
||||
<h2 id="epilogue"><a class="anchor" aria-hidden="true" href="#epilogue"></a>Epilogue</h2>
|
||||
<p>What can we learn from this?
|
||||
It is easy to say in hindsight that a different representation should have been chosen.
|
||||
However, arriving at this insight takes time.
|
||||
The exact behavior of <code>str.__repr__()</code> is poorly documented.
|
||||
Reaching my understanding of <code>str.__repr__()</code> took hours of research and reading the C implementation.
|
||||
It often doesn't seem to be worth it to spend so much time on research for a small function.
|
||||
Technical debt is a real thing and often hard to predict.
|
||||
Below is the output of <code>help(str.__repr__)</code>:</p>
|
||||
<pre><code class="language-Python">__repr__(self, /)
|
||||
Return repr(self)
|
||||
</code></pre>
|
||||
<p>Language and (standard) library designers could consider whether the slightly nicer looking strings are worth the added complexity users eventually are going to rely on - inadvertently or not.
|
||||
I do think strings and bytes in Python are a bit too complex.
|
||||
It is not easy to get a language lawyer<sup><a href="#fn-language-lawyer" id="ref-1-fn-language-lawyer" role="doc-noteref" class="fn-label">[3]</a></sup> level understanding.
|
||||
In my opinion it is a mistake to not at least print a warning if there are illegal escape sequences - especially considering there are escape sequences that are valid in one string literal but not another.</p>
|
||||
<p>Unfortunately it is often the case that to get a precise specification it is necessary to look at the implementation.
|
||||
For testing your implementation hand-written tests are good.
|
||||
Testing against the original implementation is great, and if combined with property-based testing or fuzzing you may find failing test cases you couldn't dream up!
|
||||
I certainly didn't see it coming that the output depends on the Unicode version.
|
||||
As is said, testing can only show the presence of bugs, but with a, in a sense, limited domain like this function you can get pretty close to showing absence of bugs.</p>
|
||||
<p>I enjoyed working on this.
|
||||
Sure, it was frustrating and at times I discovered some ungodly properties, but it's a great feeling to study and understand something at a deeper level.
|
||||
It may be the last time I need to understand Python's <code>str.__repr__()</code> this well, but if I do I now have the OCaml code and this blog post to reread.</p>
|
||||
<p>If you are curious to read the resulting code you may find it on github at <a href="https://github.com/reynir/python-str-repr">github.com/reynir/python-str-repr</a>.
|
||||
I have documented the code to make it more approachable and maintainable by others.
|
||||
Hopefully it is not something that you need, but in case it is useful to you it is licensed under a permissive license.</p>
|
||||
<p>If you have a project in OCaml or want to port something to OCaml and would like help from me and my colleagues at <a href="https://robur.coop/">Robur</a> please <a href="https://robur.coop/Contact">get in touch</a> with us and we will figure something out.</p>
|
||||
<section role="doc-endnotes"><ol>
|
||||
<li id="fn-python-bytes">
|
||||
<p>There is as well the <code>bytes</code> type which is a byte sequence like OCaml's <code>string</code>.
|
||||
The Python code in question is using <code>str</code> however.</p>
|
||||
<span><a href="#ref-1-fn-python-bytes" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-raw-escape-example">
|
||||
<p>Note I use single quotes for the output. This is what Python would do. It would be equivalent to <code>"\\\""</code>.</p>
|
||||
<span><a href="#ref-1-fn-raw-escape-example" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-language-lawyer">
|
||||
<p><a href="http://catb.org/jargon/html/L/language-lawyer.html">A person, usually an experienced or senior software engineer, who is intimately familiar with many or most of the numerous restrictions and features (both useful and esoteric) applicable to one or more computer programming languages. A language lawyer is distinguished by the ability to show you the five sentences scattered through a 200-plus-page manual that together imply the answer to your question “if only you had thought to look there”.</a></p>
|
||||
<span><a href="#ref-1-fn-language-lawyer" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
333
articles/2024-02-03-python-str-repr.md
Normal file
333
articles/2024-02-03-python-str-repr.md
Normal file
|
@ -0,0 +1,333 @@
|
|||
---
|
||||
date: 2024-02-03
|
||||
title: Python's `str.__repr__()`
|
||||
description: Reimplementing Python string escaping in OCaml
|
||||
tags:
|
||||
- OCaml
|
||||
- Python
|
||||
- unicode
|
||||
author:
|
||||
name: Reynir Björnsson
|
||||
email: reynir@reynir.dk
|
||||
link: https://reyn.ir/
|
||||
---
|
||||
Sometimes software is written using whatever built-ins you find in your programming language of choice.
|
||||
This is usually great!
|
||||
However, it can happen that you depend on the precise semantics of those built-ins.
|
||||
This can be a problem if those semantics become important to your software and you need to port it to another programming language.
|
||||
This story is about Python and its `str.__repr()__` function.
|
||||
|
||||
The piece of software I was helping port to [OCaml][ocaml] was constructing a hash from the string representation of a tuple.
|
||||
The gist of it was basically this:
|
||||
|
||||
```python
|
||||
def get_id(x):
|
||||
id = (x.get_unique_string(), x.path, x.name)
|
||||
return myhash(str(id))
|
||||
```
|
||||
|
||||
In other words it's a Python tuple consisting of mostly strings but also a `PosixPath` object.
|
||||
The way `str()` works is it calls the `__str__()` method on the argument objects (or otherwise `repr(x)`).
|
||||
For Python tuples the `__str__()` method seems to print the result of `repr()` on each elemenet separated by a comma and a space and surrounded by parenthesis.
|
||||
So good so far.
|
||||
If we can precisely emulate `repr()` on strings and `PosixPath` it's easy.
|
||||
In the case of `PosixPath` it's really just `'PosixPath('+repr(str(path))+')'`;
|
||||
so in that case it's down to `repr()` on strings - which is `str.__repr__()`,
|
||||
|
||||
There had been a previous attempt at this that would use OCaml's string escape functions and surround the string with single quotes (`'`).
|
||||
This works for some cases, but not if the string has a double quote (`"`).
|
||||
In that case OCaml would escape the double quote with a backslash (`\"`) while python would not escape it.
|
||||
So a regular expression substitution was added to replace the escape sequence with just a double quote.
|
||||
This pattern of finding small differences between Python and OCaml escaping had been repeated,
|
||||
and eventually I decided to take a more rigorous approach to it.
|
||||
|
||||
## What is a string?
|
||||
|
||||
First of all, what is a string? In Python? And in OCaml?
|
||||
In OCaml a string is just a sequence of bytes.
|
||||
Any bytes, even `NUL` bytes.
|
||||
There is no concept of unicode in OCaml strings.
|
||||
In Python there is the `str` type which is a sequence of Unicode code points[^python-bytes].
|
||||
I can recommend reading Daniel Bünzli's [minimal introduction to Unicode][unicode-minimal].
|
||||
Already here there is a significant gap in semantics between Python and OCaml.
|
||||
For many practical purposes we can get away with using the OCaml `string` type and treating it as a UTF-8 encoded Unicode string.
|
||||
This is what I will do as in both the Python code and the OCaml code the data being read is a UTF-8 (or often only the US ASCII subset) encoded string.
|
||||
|
||||
## What does a string literal look like?
|
||||
|
||||
### OCaml
|
||||
|
||||
I will not dive too deep into the details of OCaml string literals, and focus mostly on how they are escaped by the language built-ins (`String.escaped`, `Printf.printf "%S"`).
|
||||
Normal printable ASCII is printed as-is.
|
||||
That is, letters, numbers and other symbols except for backslash and double quote.
|
||||
There are the usual escape sequences `\n`, `\t`, `\r`, `\"` and `\\`.
|
||||
Any byte value can be represented with decimal notation `\032` or octal notation '\o040' or hexadecimal notation `\x20`.
|
||||
The escape functions in OCaml has a preference for the decimal notation over the hexadecimal notation.
|
||||
Finally I also want to mention the Unicode code point escape sequence `\u{3bb}` which represents the UTF-8 encoding of U+3BB.
|
||||
While the escape functions do not use it, it will become handy later on.
|
||||
Illegal escape sequences (escape sequences that are not recognized) will emit a warning but otherwise result in the escape sequence as-is.
|
||||
It is common to compile OCaml programs with warnings-as-errors, however.
|
||||
|
||||
### Python
|
||||
|
||||
Python has a number of different string literals and string-like literals.
|
||||
They all use single quote or double quote to delimit the string (or string-like) literals.
|
||||
There is a preference towards single quotes in `str.__repr__()`.
|
||||
You can also triple the quotes if you like to write a string that uses a lot of both quote characters.
|
||||
This format is not used by `str.__repr__()` so I will not cover it further, but you can read about it in the [Python reference manual](https://docs.python.org/3/reference/lexical_analysis.html#strings).
|
||||
The string literal can optionally have a prefix character that modifies what type the string literal is and how its content is interpreted.
|
||||
|
||||
The `r`-prefixed strings are called *raw strings*.
|
||||
That means backslash escape sequences are not interpreted.
|
||||
In my experiments they seem to be quasi-interpreted, however!
|
||||
The string `r"\"` is considered unterminated!
|
||||
But `r"\""` is fine as is interpreted as `'\\"'`[^raw-escape-example].
|
||||
Why this is the case I have not found a good explanation for.
|
||||
|
||||
The `b`-prefixed strings are `bytes` literals.
|
||||
This is close to OCaml strings.
|
||||
|
||||
Finally there are the unprefixed strings which are `str` literals.
|
||||
These are the ones we are most interested in.
|
||||
They use the usual escape `\[ntr"]` we know from OCaml as well as `\'`.
|
||||
`\032` is **octal** notation and `\x20` is hexadecimal notation.
|
||||
There is as far as I know **no** decimal notation.
|
||||
The output of `str.__repr__()` uses the hexadecimal notation over the octal notation.
|
||||
As Python strings are Unicode code point sequences we need more than two hexadecimal digits to be able to represent all valid "characters".
|
||||
Thus there are the longer `\u0032` and the longest `\U00000032`.
|
||||
|
||||
## Intermezzo
|
||||
|
||||
While studying Python string literals I discovered several odd corners of the syntax and semantics besides the raw string quasi-escape sequence mentioned earlier.
|
||||
One fact is that Python doesn't have a separate character or Unicode code point type.
|
||||
Instead, a character is a one element string.
|
||||
This leads to some interesting indexing shenanigans: `"a"[0][0][0] == "a"`.
|
||||
Furthermore, strings separated by spaces only are treated as one single concatenated string: `"a" "b" "c" == "abc"`.
|
||||
These two combined makes it possible to write this unusual snippet: `"a" "b" "c"[0] == "a"`!
|
||||
For byte sequences, or `b`-prefixed strings, things are different.
|
||||
Indexing a bytes object returns the integer value of that byte (or character):
|
||||
|
||||
```python
|
||||
>>> b"a"[0]
|
||||
97
|
||||
>>> b"a"[0][0]
|
||||
<stdin>:1: SyntaxWarning: 'int' object is not subscriptable; perhaps you missed a comma?
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: 'int' object is not subscriptable
|
||||
```
|
||||
|
||||
For strings `\x32` can be said to be shorthand for `"\u0032"` (or `"\u00000032"`).
|
||||
But for bytes `"\x32" != "\u0032"`!
|
||||
Why is this?!
|
||||
Well, bytes is a byte sequence and `b"\u0032"` is not interpreted as an escape sequence and is instead **silently** treated as `b"\\u0032"`!
|
||||
Writing `"\xff".encode()` which encodes the string `"\xff"` to UTF-8 is **not** the same as `b"\xff"`.
|
||||
The bytes `"\xff"` consist of a single byte with decimal value 255,
|
||||
and the Unicode wizards reading will know that the Unicode code point 255 (or U+FF) is encoded in two bytes in UTF-8.
|
||||
|
||||
## Where is the Python code?
|
||||
|
||||
Finding the implementation of `str.__repr__()` turned out to not be so easy.
|
||||
In the end I asked on the Internet and got a link to [cpython's `Objects/unicodeobject.c`][unicodeobject.c].
|
||||
And holy cow!
|
||||
That's some 160 lines of C code with two loops, a switch statement and I don't know how many chained and nested if statements!
|
||||
Meanwhile the OCaml implementation is a much less daunting 52 lines of which about a fifth is a long comment.
|
||||
It also has two loops which each contain one much more tame match expression (roughly a C switch statement).
|
||||
In both cases they first loop over the string to compute the size of the output string.
|
||||
The Python implementation also counts the number of double quotes and single quotes as well as the highest code point value.
|
||||
The latter I'm not sure why they do, but my guess it's so they can choose an efficient internal representation.
|
||||
Then the Python code decides what quote character to use with the following algorithm:
|
||||
Does the string contain single quotes but no double quotes? Then use double quotes. Otherwise use single quotes.
|
||||
Then the output size estimate is adjusted with the number of backslashes to escape the quote character chosen and the two quotes surrounding the string.
|
||||
|
||||
Already here it's clear that a regular expression substitution is not enough by itself to fix OCaml escaping to be Python escaping.
|
||||
My first step then was to implement the algorithm only for US ASCII.
|
||||
This is simpler as we don't have to worry much about Unicode, and I could implement it relatively quickly.
|
||||
The first 32 characters and the last US ASCII character (DEL or `\x7f`) are considered non-printable and must be escaped.
|
||||
I then wrote some simple tests by hand.
|
||||
Then I discovered the OCaml [py][ocaml-py] library which provides bindings to Python from OCaml.
|
||||
Great! This I can use to test my implementation against Python!
|
||||
|
||||
## How about Unicode?
|
||||
|
||||
For the non-ascii characters (or code points rather) they are either considered *printable* or *non-printable*.
|
||||
For now let's look at what that means for the output.
|
||||
A printable character is copied as-is.
|
||||
That is, there is no escaping done.
|
||||
Non-printable characters must be escaped, and python wil use `\xHH`, `\uHHHH` or `\UHHHHHHHH` depending on how many hexadecimal digits are necessary to represent the code point.
|
||||
That is, the latin-1 subset of ASCII (`0x80`-`0xff`) can be represented using `\xHH` and neither `\u00HH` nor `\U000000HH` will be used etc.
|
||||
|
||||
### What is a printable Unicode character?
|
||||
|
||||
In the cpython [function][unicodeobject.c] mentioned earlier they use the function `Py_UNICODE_ISPRINTABLE`.
|
||||
I had a local clone of the cpython git repository where I ran `git grep Py_UNICODE_ISPRINTABLE` to find information about it.
|
||||
In [unicode.rst][unicode.rst-isprintable] I found a documentation string for the function that describes it to return false if the character is nonprintable with the definition of nonprintable as the code point being in the categories "Other" or "Separator" in the Unicode character database **with the exception of ASCII space** (U+20 or ` `).
|
||||
|
||||
What are those "Other" and "Separator" categories?
|
||||
Further searching for the function definition we find in [`Include/cpython/unicodeobject.h`][unicodeobject.h-isprintable] the definition.
|
||||
Well, we find `#define Py_UNICODE_ISPRINTABLE(ch) _PyUnicode_IsPrintable(ch)`.
|
||||
On to `git grep _PyUnicode_IsPrintable` then.
|
||||
That function is defined in [`Objects/unicodectype.c`][unicodectype.c-isprintable].
|
||||
|
||||
```C
|
||||
/* Returns 1 for Unicode characters to be hex-escaped when repr()ed,
|
||||
0 otherwise.
|
||||
All characters except those characters defined in the Unicode character
|
||||
database as following categories are considered printable.
|
||||
* Cc (Other, Control)
|
||||
* Cf (Other, Format)
|
||||
* Cs (Other, Surrogate)
|
||||
* Co (Other, Private Use)
|
||||
* Cn (Other, Not Assigned)
|
||||
* Zl Separator, Line ('\u2028', LINE SEPARATOR)
|
||||
* Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
|
||||
* Zs (Separator, Space) other than ASCII space('\x20').
|
||||
*/
|
||||
int _PyUnicode_IsPrintable(Py_UCS4 ch)
|
||||
{
|
||||
const _PyUnicode_TypeRecord *ctype = gettyperecord(ch);
|
||||
|
||||
return (ctype->flags & PRINTABLE_MASK) != 0;
|
||||
}
|
||||
```
|
||||
|
||||
Ok, now we're getting close to something.
|
||||
Searching for `PRINTABLE_MASK` we find in [`Tools/unicode/makeunicodedata.py`][makeunicodedata.py-printable-mask] the following line of code:
|
||||
|
||||
```Python
|
||||
if char == ord(" ") or category[0] not in ("C", "Z"):
|
||||
flags |= PRINTABLE_MASK
|
||||
```
|
||||
|
||||
So the algorithm is really if the character is a space character or if its Unicode general category doesn't start with a `C` or `Z`.
|
||||
This can be implemented in OCaml using the uucp library as follows:
|
||||
|
||||
```OCaml
|
||||
let py_unicode_isprintable uchar =
|
||||
(* {[if char == ord(" ") or category[0] not in ("C", "Z"):
|
||||
flags |= PRINTABLE_MASK]} *)
|
||||
Uchar.equal uchar (Uchar.of_char ' ')
|
||||
||
|
||||
let gc = Uucp.Gc.general_category uchar in
|
||||
(* Not those categories starting with 'C' or 'Z' *)
|
||||
match gc with
|
||||
| `Cc | `Cf | `Cn | `Co | `Cs | `Zl | `Zp | `Zs -> false
|
||||
| `Ll | `Lm | `Lo | `Lt | `Lu | `Mc | `Me | `Mn | `Nd | `Nl | `No | `Pc | `Pd
|
||||
| `Pe | `Pf | `Pi | `Po | `Ps | `Sc | `Sk | `Sm | `So ->
|
||||
true
|
||||
```
|
||||
|
||||
After implementing unicode I expanded the tests to generate arbitrary OCaml strings and compare the results of calling my function and Python's `str.__repr__()` on the string.
|
||||
Well, that didn't go quite well.
|
||||
OCaml strings are just any byte sequence, and ocaml-py expects it to be a UTF-8 encoded string and fails on invalid UTF-8.
|
||||
Then in qcheck you can "assume" a predicate which means if a predicate doesn't hold on the generated value then the test is skipped for that input.
|
||||
So I implement a simple verification of UTF-8.
|
||||
This is far from optimal because qcheck will generate a lot of invalid utf-8 strings.
|
||||
|
||||
The next test failure is some unassigned code point.
|
||||
So I add to `py_unicode_isprintable` a check that the code point is assigned using ``Uucp.Age.age uchar <> `Unassigned``.
|
||||
|
||||
Still, qcheck found a case I hadn't considered: U+61D.
|
||||
My python version (Python 3.9.2 (default, Feb 28 2021, 17:03:44)) renders this as `'\u061'` while my OCaml function prints it as-is.
|
||||
In other words my implementation considers it printable while python does not.
|
||||
I try to enter this Unicode character in my terminal, but nothing shows up.
|
||||
Then I look it up and its name is `ARABIC END OF TEXT MARKER`.
|
||||
The general category according to uucp is `` `Po ``.
|
||||
So this **should** be a printable character‽
|
||||
|
||||
After being stumped by this for a while I get the suspicion it may be dependent on the Python version.
|
||||
I am still on Debian 11 and my Python version is far from being the latest and greatest.
|
||||
I ask someone with a newer Python version to write `'\u061d'` in a python session.
|
||||
And 'lo! It prints something that looks like `''`!
|
||||
Online I figure out how to get the unicode version compiled into Python:
|
||||
|
||||
```Python
|
||||
>>> import unicodedata
|
||||
>>> unicodedata.unidata_version
|
||||
'13.0.0'
|
||||
```
|
||||
|
||||
Aha! And with uucp we find that the unicode version that introduced U+61D to be 14.0:
|
||||
|
||||
```OCaml
|
||||
# Uucp.Age.age (Uchar.of_int 0x61D);;
|
||||
- : Uucp.Age.t = `Version (14, 0)
|
||||
```
|
||||
|
||||
My reaction is this is seriously some ungodly mess we are in.
|
||||
Not only is the code that instigated this journey highly dependent on Python-specifics - it's also dependent on the specific version of unicode and thus the version of Python!
|
||||
|
||||
I modify our `py_unicode_isprintable` function to take an optional `?unicode_version` argument and replace the "is this unassigned?" check with the following snippet:
|
||||
|
||||
```OCaml
|
||||
let age = Uucp.Age.age uchar in
|
||||
(match (age, unicode_version) with
|
||||
| `Unassigned, _ -> false
|
||||
| `Version _, None -> true
|
||||
| `Version (major, minor), Some (major', minor') ->
|
||||
major < major' || (major = major' && minor <= minor'))
|
||||
```
|
||||
|
||||
Great! I modify the test suite to first detect the unicode version python uses and then pass that version to the OCaml function.
|
||||
Now I can't find anymore failing test cases!
|
||||
|
||||
## Epilogue
|
||||
|
||||
What can we learn from this?
|
||||
It is easy to say in hindsight that a different representation should have been chosen.
|
||||
However, arriving at this insight takes time.
|
||||
The exact behavior of `str.__repr__()` is poorly documented.
|
||||
Reaching my understanding of `str.__repr__()` took hours of research and reading the C implementation.
|
||||
It often doesn't seem to be worth it to spend so much time on research for a small function.
|
||||
Technical debt is a real thing and often hard to predict.
|
||||
Below is the output of `help(str.__repr__)`:
|
||||
|
||||
```Python
|
||||
__repr__(self, /)
|
||||
Return repr(self)
|
||||
```
|
||||
|
||||
Language and (standard) library designers could consider whether the slightly nicer looking strings are worth the added complexity users eventually are going to rely on - inadvertently or not.
|
||||
I do think strings and bytes in Python are a bit too complex.
|
||||
It is not easy to get a language lawyer[^language-lawyer] level understanding.
|
||||
In my opinion it is a mistake to not at least print a warning if there are illegal escape sequences - especially considering there are escape sequences that are valid in one string literal but not another.
|
||||
|
||||
Unfortunately it is often the case that to get a precise specification it is necessary to look at the implementation.
|
||||
For testing your implementation hand-written tests are good.
|
||||
Testing against the original implementation is great, and if combined with property-based testing or fuzzing you may find failing test cases you couldn't dream up!
|
||||
I certainly didn't see it coming that the output depends on the Unicode version.
|
||||
As is said, testing can only show the presence of bugs, but with a, in a sense, limited domain like this function you can get pretty close to showing absence of bugs.
|
||||
|
||||
I enjoyed working on this.
|
||||
Sure, it was frustrating and at times I discovered some ungodly properties, but it's a great feeling to study and understand something at a deeper level.
|
||||
It may be the last time I need to understand Python's `str.__repr__()` this well, but if I do I now have the OCaml code and this blog post to reread.
|
||||
|
||||
If you are curious to read the resulting code you may find it on github at [github.com/reynir/python-str-repr](https://github.com/reynir/python-str-repr).
|
||||
I have documented the code to make it more approachable and maintainable by others.
|
||||
Hopefully it is not something that you need, but in case it is useful to you it is licensed under a permissive license.
|
||||
|
||||
If you have a project in OCaml or want to port something to OCaml and would like help from me and my colleagues at [Robur](https://robur.coop/) please [get in touch](https://robur.coop/Contact) with us and we will figure something out.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
[ocaml]: https://ocaml.org/
|
||||
[unicode-minimal]: https://ocaml.org/p/uucp/13.0.0/doc/unicode.html#minimal
|
||||
[unicodeobject.c]: https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Objects/unicodeobject.c#L12245-L12405
|
||||
[escaped]: https://github.com/ocaml/ocaml/blob/a51089215d5ae1187688a5b130e9f62bf50adfeb/stdlib/bytes.ml#L170-L222
|
||||
[ocaml-py]: https://github.com/zshipko/ocaml-py
|
||||
[unicode.rst-isprintable]: https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Doc/c-api/unicode.rst?plain=1#L257-L265
|
||||
[makeunicodedata.py-printable-mask]: https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Tools/unicode/makeunicodedata.py#L450-L451
|
||||
[unicodectype.c-isprintable]: https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Objects/unicodectype.c#L158-L163
|
||||
[unicodeobject.h-isprintable]: https://github.com/python/cpython/blob/963904335e579bfe39101adf3fd6a0cf705975ff/Include/cpython/unicodeobject.h#L683
|
||||
|
||||
[^python-bytes]: There is as well the `bytes` type which is a byte sequence like OCaml's `string`.
|
||||
The Python code in question is using `str` however.
|
||||
|
||||
[^raw-escape-example]: Note I use single quotes for the output. This is what Python would do. It would be equivalent to `"\\\""`.
|
||||
|
||||
[^language-lawyer]: [A person, usually an experienced or senior software engineer, who is intimately familiar with many or most of the numerous restrictions and features (both useful and esoteric) applicable to one or more computer programming languages. A language lawyer is distinguished by the ability to show you the five sentences scattered through a 200-plus-page manual that together imply the answer to your question “if only you had thought to look there”.](http://catb.org/jargon/html/L/language-lawyer.html)
|
|
@ -1,237 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogMirageVPN and OpenVPN
|
||||
</title>
|
||||
<meta name="description" content="Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>MirageVPN and OpenVPN</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-MirageVPN">MirageVPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-OpenVPN">OpenVPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li></ul><p>At <a href="https://robur.coop/">Robur</a> we have been busy at work implementing our OpenVPN™-compatible MirageVPN software.
|
||||
Recently we have implemented the <a href="https://blog.robur.coop/articles/miragevpn-server.html">server side</a>.
|
||||
In order to implement this side of the protocol I studied parts of the OpenVPN™ source code and performed experiments to understand what the implementation does at the protocol level.
|
||||
Studying the OpenVPN™ implementation has lead me to discover two security issues: CVE-2024-28882 and CVE-2024-5594.
|
||||
In this article I will talk about the relevant parts of the protocol, and describe the security issues in detail.</p>
|
||||
<p>A VPN establishes a secure tunnel in which (usually) IP packets are sent.
|
||||
The OpenVPN protocol establishes a TLS tunnel<sup><a href="#fn-openvpn-tls" id="ref-1-fn-openvpn-tls" role="doc-noteref" class="fn-label">[1]</a></sup> with which key material and configuration options are negotiated.
|
||||
Once established the TLS tunnel is used to exchange so-called control channel messages.
|
||||
They are NUL-terminated (well, more on that later) text messages sent in a single TLS record frame (mostly, more on that later).</p>
|
||||
<p>I will describe two (groups) of control channel messages (and a bonus control channel message):</p>
|
||||
<ul>
|
||||
<li><code>EXIT</code>, <code>RESTART</code>, and <code>HALT</code></li>
|
||||
<li><code>PUSH_REQUEST</code> / <code>PUSH_REPLY</code></li>
|
||||
<li>(<code>AUTH_FAILED</code>)</li>
|
||||
</ul>
|
||||
<p>The <code>EXIT</code>, <code>RESTART</code>, and <code>HALT</code> messages share similarity.
|
||||
They are all three used to signal to the client that it should disconnect<sup><a href="#fn-disconnect" id="ref-1-fn-disconnect" role="doc-noteref" class="fn-label">[2]</a></sup> from the server.
|
||||
<code>HALT</code> tells the client to disconnect and suggests the client should terminate.
|
||||
<code>RESTART</code> also tells the client to disconnect and suggests the client can reconnect either to the same server or the next server if multiple are configured depending on flags in the message.
|
||||
<code>EXIT</code> tells the <em>peer</em> that it is exiting and the <em>peer</em> should disconnect.
|
||||
The last one can be sent by either the server or the client and is useful when the underlying transport is UDP.
|
||||
It informs the peer that the sender is exiting and will (soon) not be receiving and ACK'ing messages; for UDP the peer would otherwise (re)send messages until a timeout.</p>
|
||||
<p>Because the underlying transport can either be TCP or UDP the sender may have no guarantees that the message arrives.
|
||||
OpenVPN's control channel implements a reliable layer with ACKs and retransmissions to work around that.
|
||||
To accomodate this OpenVPN™ will wait five seconds before disconnecting to allow for retransmission of the exit message.</p>
|
||||
<h3 id="the-bug"><a class="anchor" aria-hidden="true" href="#the-bug"></a>The bug</h3>
|
||||
<p>While I was working on implementing more control channel message types I modified a client application that connects to a server and sends pings over the tunnel - instead of ICMPv4 echo requests I modified it to send the <code>EXIT</code> control channel message once a second.
|
||||
In the server logs I saw that the server successfully received the <code>EXIT</code> message!
|
||||
But nothing else happened.
|
||||
The server just kept receiving <code>EXIT</code> messages but for some reason it never disconnected the client.</p>
|
||||
<p>Curious about this behavior I dived into the OpenVPN™ source code and found that on each <code>EXIT</code> message it (re)schedules an exit (disconnect) timer! That is, every time the server receives an <code>EXIT</code> message it'll go "OK! I'll shut down this connection in five seconds" forgetting it promised to do so earlier, too.</p>
|
||||
<h3 id="implications"><a class="anchor" aria-hidden="true" href="#implications"></a>Implications</h3>
|
||||
<p>At first this seemed like a relatively benign bug.
|
||||
What's the worst that could happen if a client says "let's stop in five second! No, five seconds from now! No, five seconds from now!" etc?
|
||||
Well, it turns out the same timer is used when the server sends an exit message.
|
||||
Ok, so what?
|
||||
The client can hold open a resource it <em>was</em> authorized to use <em>longer</em>.
|
||||
So we have a somewhat boring potential denial of service attack.</p>
|
||||
<p>Then I learned more about the management interface.
|
||||
The management interface is a text protocol to communicate with the OpenVPN server (or client) and query for information or send commands.
|
||||
One command is the <code>client-kill</code> command.
|
||||
The documentation says to use this command to "[i]mmediately kill a client instance[...]".
|
||||
In practice it sends an exit message to the client (either a custom one or the default <code>RESTART</code>).
|
||||
I learnt that it shares code paths with the exit control messages to schedule an exit (disconnect)<sup><a href="#fn-kill-immediately" id="ref-1-fn-kill-immediately" role="doc-noteref" class="fn-label">[3]</a></sup>.
|
||||
That is, <code>client-kill</code> schedules the same five second timer.</p>
|
||||
<p>Thus a malicious client can, instead of exiting on receiving an exit or <code>RESTART</code> message, send back repeatedly <code>EXIT</code> to the server to reset the five second timer.
|
||||
This way the client can indefinitely delay the exit/disconnect assuming sufficiently stable and responsive network.
|
||||
This is suddenly not so good.
|
||||
The application using the management interface might be enforcing a security policy which we can now circumvent!
|
||||
The client might be a former employee in a company, and the security team might want to revoke access to the internal network for the former employee, and in that process uses <code>client-kill</code> to kick off all of his connecting clients.
|
||||
The former employee, if prepared, can circumvent this by sending back <code>EXIT</code> messages repeatedly and thus keep unauthorized access.
|
||||
Or a commercial VPN service may try to enforce a data transfer limit with the same mechanism which is then rather easily circumvented by sending <code>EXIT</code> messages.</p>
|
||||
<p>Does anyone use the management interface in this way?
|
||||
I don't know.
|
||||
If you do or are aware of software that enforces policies this way please do reach out to <a href="https://reyn.ir/contact.html">me</a>.
|
||||
It would be interesting to hear and discuss.
|
||||
The OpenVPN security@ mailing list took it seriously enough to assign it CVE-2024-28882.</p>
|
||||
<h2 id="openvpn-configuration-language"><a class="anchor" aria-hidden="true" href="#openvpn-configuration-language"></a>OpenVPN configuration language</h2>
|
||||
<p>Next up we have <code>PUSH_REQUEST</code> / <code>PUSH_REPLY</code>.
|
||||
As the names suggest it's a request/response protocol.
|
||||
It is used to communicate configuration options from the server to the client.
|
||||
These options include routes, ip address configuration, negotiated cryptographic algorithms.
|
||||
The client signals it would like to receive configuration options from the server by sending the <code>PUSH_REQUEST</code> control channel message<sup><a href="#fn-proto-push-request" id="ref-1-fn-proto-push-request" role="doc-noteref" class="fn-label">[4]</a></sup>.
|
||||
The server then sends a <code>PUSH_REPLY</code> message.</p>
|
||||
<p>The format of a <code>PUSH_REPLY</code> message is <code>PUSH_REPLY,</code> followed by a comma separated list of OpenVPN configuration directives terminated by a NUL byte as in other control channel messages.
|
||||
Note that this means pushed configuration directives cannot contain commas.</p>
|
||||
<p>When implementing the <code>push</code> server configuration directive, which tells the server to send the parameter of <code>push</code> as a configuration option to the client in the <code>PUSH_REPLY</code>, I studied how exactly OpenVPN™ parses configuration options.
|
||||
I learned some quirks of the configuration language which I find surprising and somewhat hard to implement.
|
||||
I will not cover all corners of the configuration language.</p>
|
||||
<p>In some sense you could say the configuration language of OpenVPN™ is line based.
|
||||
At least, the first step to parsing configuration directives as OpenVPN 2.X does is to read one line at a time and parse it as one configuration directive<sup><a href="#fn-inline-files" id="ref-1-fn-inline-files" role="doc-noteref" class="fn-label">[5]</a></sup>.
|
||||
A line is whatever <code>fgets()</code> says it is - this includes the newline if not at the end of the file<sup><a href="#fn-configuration-newlines" id="ref-1-fn-configuration-newlines" role="doc-noteref" class="fn-label">[6]</a></sup>.
|
||||
This is how it is for configuration files.
|
||||
However, if it is a <code>PUSH_REPLY</code> a <em>"line"</em> is the text string up to a comma or the end of file (or, importantly, a NUL byte).
|
||||
This "line" tokenization is done by repeatedly calling OpenVPN™'s <code>buf_parse(buf, ',', line, sizeof(line))</code> function.</p>
|
||||
<pre><code class="language-C">/* file: src/openvpn/buffer.c */
|
||||
bool
|
||||
buf_parse(struct buffer *buf, const int delim, char *line, const int size)
|
||||
{
|
||||
bool eol = false;
|
||||
int n = 0;
|
||||
int c;
|
||||
|
||||
ASSERT(size > 0);
|
||||
|
||||
do
|
||||
{
|
||||
c = buf_read_u8(buf);
|
||||
if (c < 0)
|
||||
{
|
||||
eol = true;
|
||||
}
|
||||
if (c <= 0 || c == delim)
|
||||
{
|
||||
c = 0;
|
||||
}
|
||||
if (n >= size)
|
||||
{
|
||||
break;
|
||||
}
|
||||
line[n++] = c;
|
||||
}
|
||||
while (c);
|
||||
|
||||
line[size-1] = '\0';
|
||||
return !(eol && !strlen(line));
|
||||
}
|
||||
</code></pre>
|
||||
<p><code>buf_parse()</code> takes a <code>struct buffer*</code> which is a pointer to a byte array with a offset and length field, a delimiter character (in our case <code>','</code>), a destination buffer <code>line</code> and its length <code>size</code>.
|
||||
It calls <code>buf_read_u8()</code> which returns the first character in the buffer and advances the offset and decrements the length, or returns <code>-1</code> if the buffer is empty.
|
||||
In essence, <code>buf_parse()</code> "reads" from the buffer and copies over to <code>line</code> until it encounters <code>delim</code>, a NUL byte or the end of the buffer.
|
||||
In that case a NUL byte is written to <code>line</code>.</p>
|
||||
<p>What is interesting is that a NUL byte is effectively considered a delimiter, too, and that it is consumed by <code>buf_parse()</code>.
|
||||
Next, let's look at how incoming control channel messages are handled (modified for brevity):</p>
|
||||
<pre><code class="language-C">/* file: src/openvpn/forward.c (before fix) */
|
||||
/*
|
||||
* Handle incoming configuration
|
||||
* messages on the control channel.
|
||||
*/
|
||||
static void
|
||||
check_incoming_control_channel(struct context *c, struct buffer buf)
|
||||
{
|
||||
/* force null termination of message */
|
||||
buf_null_terminate(&buf);
|
||||
|
||||
/* enforce character class restrictions */
|
||||
string_mod(BSTR(&buf), CC_PRINT, CC_CRLF, 0);
|
||||
|
||||
if (buf_string_match_head_str(&buf, "AUTH_FAILED"))
|
||||
{
|
||||
receive_auth_failed(c, &buf);
|
||||
}
|
||||
else if (buf_string_match_head_str(&buf, "PUSH_"))
|
||||
{
|
||||
incoming_push_message(c, &buf);
|
||||
}
|
||||
/* SNIP */
|
||||
}
|
||||
</code></pre>
|
||||
<p>First, the buffer is ensured to be NUL terminated by replacing the last byte with a NUL byte.
|
||||
This is already somewhat questionable as it could make an otherwise invalid message valid.
|
||||
Next, character class restrictions are "enforced".
|
||||
What this roughly does is removing non-printable characters and carriage returns and line feeds from the C string.
|
||||
The macro <code>BSTR()</code> returns the underlying buffer behind the <code>struct buffer</code> with the offset added.
|
||||
Notably, <code>string_mod()</code> works on (NUL terminated) C strings and not <code>struct buffer</code>s.
|
||||
As an example, the string (with the usual C escape sequences):</p>
|
||||
<pre><code>"PUSH_REPLY,line \nfeeds\n,are\n,removed\n\000"
|
||||
</code></pre>
|
||||
<p>becomes</p>
|
||||
<pre><code>"PUSH_REPLY,line feeds,are,removed\000ed\n\000"
|
||||
</code></pre>
|
||||
<p>As you can see, if interpreted as a C string we have removed the line feeds.
|
||||
But what is this at the end?
|
||||
It is the same last 4 bytes from the original string.
|
||||
More generally, it is the last N bytes from the original string if the original string has N line feeds (or other disallowed characters).</p>
|
||||
<p>The whole buffer is still passed to the push reply parsing.
|
||||
Remember that the "line" parser will not only consume commas as the line delimiter, but also NUL bytes!
|
||||
This means the configuration directives are parsed as lines:</p>
|
||||
<pre><code class="language-C">"line feeds"
|
||||
"are"
|
||||
"removed"
|
||||
"ed\n"
|
||||
</code></pre>
|
||||
<p>With this technique we can now inject (almost; the exception is NUL) arbitrary bytes as configuration directive lines.
|
||||
This is bad because the configuration directive is printed to the console if it doesn't parse.
|
||||
As a proof of concept I sent a <code>PUSH_REPLY</code> with an embedded BEL character, and the OpenVPN™ client prints to console (abbreviated):</p>
|
||||
<pre><code>Unrecognized option or missing or extra parameter(s): ^G
|
||||
</code></pre>
|
||||
<p>The <code>^G</code> is how the BEL character is printed in my terminal.
|
||||
I was also able to hear an audible bell.</p>
|
||||
<p>A more thorough explanation on how terminal escape sequences can be exploited can be found on <a href="https://www.gresearch.com/news/g-research-the-terminal-escapes/">G-Reasearch's blog</a>.</p>
|
||||
<h3 id="the-fix"><a class="anchor" aria-hidden="true" href="#the-fix"></a>The fix</h3>
|
||||
<p>The fix also is also a first step towards decoupling the control channel messaging from the TLS record frames.
|
||||
First, the data is split on NUL bytes in order to get the control channel message(s), and then messages are rejected if they contain illegal characters.
|
||||
This solves the vulnerability described previously.</p>
|
||||
<p>Unfortunately, it turns out that especially for the <code>AUTH_FAILED</code> control channel message it is easy to create invalid messages:
|
||||
If 2FA is implemented using the script mechanism sending custom messages they easily end with a newline asking the client to enter the verification code.
|
||||
I believe in 2.6.12 the client tolerates trailing newline characters.</p>
|
||||
<h2 id="conclusion"><a class="anchor" aria-hidden="true" href="#conclusion"></a>Conclusion</h2>
|
||||
<p>The first bug, the timer rescheduling bug, is at least 20 years old!
|
||||
It hasn't always been exploitable, but the bug itself goes back as far as the git history does.
|
||||
I haven't attempted further software archeology to find the exact time of introduction.
|
||||
Either way, it's old and gone unnoticed for quite a while.</p>
|
||||
<p>I think this shows that diversity in implementations is a great way to exercise corner cases, push forward (protocol) documentation efforts and get thorough code review by motivated peers.
|
||||
This work was funded by <a href="https://nlnet.nl/project/MirageVPN/">the EU NGI Assure Fund through NLnet</a>.
|
||||
In my opinion, this shows that funding one open source project can have a positive impact on other open source projects, too.</p>
|
||||
<section role="doc-endnotes"><ol>
|
||||
<li id="fn-openvpn-tls">
|
||||
<p>This is not always the case. It is possible to use static shared secret keys, but it is mostly considered deprecated.</p>
|
||||
<span><a href="#ref-1-fn-openvpn-tls" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-disconnect">
|
||||
<p>I say "disconnect" even when the underlying transport is the connection-less UDP.</p>
|
||||
<span><a href="#ref-1-fn-disconnect" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-kill-immediately">
|
||||
<p>As the alert reader might have realized this is inaccurate. It does not kill the client "immediately" as it will wait five seconds after the exit message is sent before exiting. At best this will kill a cooperating client once it's received the kill message.</p>
|
||||
<span><a href="#ref-1-fn-kill-immediately" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-proto-push-request">
|
||||
<p>There is another mechanism to request a <code>PUSH_REPLY</code> earlier with less roundtrips, but let's ignore that for now. The exact message is <code>PUSH_REQUEST<NUL-BYTE></code> as messages need to be NUL-terminated.</p>
|
||||
<span><a href="#ref-1-fn-proto-push-request" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-inline-files">
|
||||
<p>An exception being inline files which can span multiple lines. They vaguely resemble XML tags with an open <code><tag></code> and close <code></tag></code> each on their own line with the data in between. I doubt these are sent in <code>PUSH_REPLY</code>s, but I can't rule out without diving into the source code that it isn't possible to send inline files.</p>
|
||||
<span><a href="#ref-1-fn-inline-files" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-configuration-newlines">
|
||||
<p>This results in the quirk that it is possible to sort-of escape a newline in a configuration directive. But since the line splitting is done <em>first</em> it's not possible to continue the directive on the next line! I believe this is mostly useless, but it is a way to inject line feeds in configuration options without modifying the OpenVPN source code.</p>
|
||||
<span><a href="#ref-1-fn-configuration-newlines" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
247
articles/2024-08-21-OpenVPN-and-MirageVPN.md
Normal file
247
articles/2024-08-21-OpenVPN-and-MirageVPN.md
Normal file
|
@ -0,0 +1,247 @@
|
|||
---
|
||||
date: 2024-08-21
|
||||
title: MirageVPN and OpenVPN
|
||||
description: Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library
|
||||
tags:
|
||||
- MirageVPN
|
||||
- OpenVPN
|
||||
- security
|
||||
author:
|
||||
name: Reynir Björnsson
|
||||
email: reynir@reynir.dk
|
||||
link: https://reyn.ir/
|
||||
---
|
||||
At [Robur][robur] we have been busy at work implementing our OpenVPN™-compatible MirageVPN software.
|
||||
Recently we have implemented the [server side][miragevpn-server].
|
||||
In order to implement this side of the protocol I studied parts of the OpenVPN™ source code and performed experiments to understand what the implementation does at the protocol level.
|
||||
Studying the OpenVPN™ implementation has lead me to discover two security issues: CVE-2024-28882 and CVE-2024-5594.
|
||||
In this article I will talk about the relevant parts of the protocol, and describe the security issues in detail.
|
||||
|
||||
A VPN establishes a secure tunnel in which (usually) IP packets are sent.
|
||||
The OpenVPN protocol establishes a TLS tunnel[^openvpn-tls] with which key material and configuration options are negotiated.
|
||||
Once established the TLS tunnel is used to exchange so-called control channel messages.
|
||||
They are NUL-terminated (well, more on that later) text messages sent in a single TLS record frame (mostly, more on that later).
|
||||
|
||||
I will describe two (groups) of control channel messages (and a bonus control channel message):
|
||||
|
||||
* `EXIT`, `RESTART`, and `HALT`
|
||||
* `PUSH_REQUEST` / `PUSH_REPLY`
|
||||
* (`AUTH_FAILED`)
|
||||
|
||||
The `EXIT`, `RESTART`, and `HALT` messages share similarity.
|
||||
They are all three used to signal to the client that it should disconnect[^disconnect] from the server.
|
||||
`HALT` tells the client to disconnect and suggests the client should terminate.
|
||||
`RESTART` also tells the client to disconnect and suggests the client can reconnect either to the same server or the next server if multiple are configured depending on flags in the message.
|
||||
`EXIT` tells the *peer* that it is exiting and the *peer* should disconnect.
|
||||
The last one can be sent by either the server or the client and is useful when the underlying transport is UDP.
|
||||
It informs the peer that the sender is exiting and will (soon) not be receiving and ACK'ing messages; for UDP the peer would otherwise (re)send messages until a timeout.
|
||||
|
||||
Because the underlying transport can either be TCP or UDP the sender may have no guarantees that the message arrives.
|
||||
OpenVPN's control channel implements a reliable layer with ACKs and retransmissions to work around that.
|
||||
To accomodate this OpenVPN™ will wait five seconds before disconnecting to allow for retransmission of the exit message.
|
||||
|
||||
### The bug
|
||||
|
||||
While I was working on implementing more control channel message types I modified a client application that connects to a server and sends pings over the tunnel - instead of ICMPv4 echo requests I modified it to send the `EXIT` control channel message once a second.
|
||||
In the server logs I saw that the server successfully received the `EXIT` message!
|
||||
But nothing else happened.
|
||||
The server just kept receiving `EXIT` messages but for some reason it never disconnected the client.
|
||||
|
||||
Curious about this behavior I dived into the OpenVPN™ source code and found that on each `EXIT` message it (re)schedules an exit (disconnect) timer! That is, every time the server receives an `EXIT` message it'll go "OK! I'll shut down this connection in five seconds" forgetting it promised to do so earlier, too.
|
||||
|
||||
### Implications
|
||||
|
||||
At first this seemed like a relatively benign bug.
|
||||
What's the worst that could happen if a client says "let's stop in five second! No, five seconds from now! No, five seconds from now!" etc?
|
||||
Well, it turns out the same timer is used when the server sends an exit message.
|
||||
Ok, so what?
|
||||
The client can hold open a resource it *was* authorized to use *longer*.
|
||||
So we have a somewhat boring potential denial of service attack.
|
||||
|
||||
Then I learned more about the management interface.
|
||||
The management interface is a text protocol to communicate with the OpenVPN server (or client) and query for information or send commands.
|
||||
One command is the `client-kill` command.
|
||||
The documentation says to use this command to "[i]mmediately kill a client instance[...]".
|
||||
In practice it sends an exit message to the client (either a custom one or the default `RESTART`).
|
||||
I learnt that it shares code paths with the exit control messages to schedule an exit (disconnect)[^kill-immediately].
|
||||
That is, `client-kill` schedules the same five second timer.
|
||||
|
||||
Thus a malicious client can, instead of exiting on receiving an exit or `RESTART` message, send back repeatedly `EXIT` to the server to reset the five second timer.
|
||||
This way the client can indefinitely delay the exit/disconnect assuming sufficiently stable and responsive network.
|
||||
This is suddenly not so good.
|
||||
The application using the management interface might be enforcing a security policy which we can now circumvent!
|
||||
The client might be a former employee in a company, and the security team might want to revoke access to the internal network for the former employee, and in that process uses `client-kill` to kick off all of his connecting clients.
|
||||
The former employee, if prepared, can circumvent this by sending back `EXIT` messages repeatedly and thus keep unauthorized access.
|
||||
Or a commercial VPN service may try to enforce a data transfer limit with the same mechanism which is then rather easily circumvented by sending `EXIT` messages.
|
||||
|
||||
Does anyone use the management interface in this way?
|
||||
I don't know.
|
||||
If you do or are aware of software that enforces policies this way please do reach out to [me][contact].
|
||||
It would be interesting to hear and discuss.
|
||||
The OpenVPN security@ mailing list took it seriously enough to assign it CVE-2024-28882.
|
||||
|
||||
## OpenVPN configuration language
|
||||
|
||||
Next up we have `PUSH_REQUEST` / `PUSH_REPLY`.
|
||||
As the names suggest it's a request/response protocol.
|
||||
It is used to communicate configuration options from the server to the client.
|
||||
These options include routes, ip address configuration, negotiated cryptographic algorithms.
|
||||
The client signals it would like to receive configuration options from the server by sending the `PUSH_REQUEST` control channel message[^proto-push-request].
|
||||
The server then sends a `PUSH_REPLY` message.
|
||||
|
||||
The format of a `PUSH_REPLY` message is `PUSH_REPLY,` followed by a comma separated list of OpenVPN configuration directives terminated by a NUL byte as in other control channel messages.
|
||||
Note that this means pushed configuration directives cannot contain commas.
|
||||
|
||||
When implementing the `push` server configuration directive, which tells the server to send the parameter of `push` as a configuration option to the client in the `PUSH_REPLY`, I studied how exactly OpenVPN™ parses configuration options.
|
||||
I learned some quirks of the configuration language which I find surprising and somewhat hard to implement.
|
||||
I will not cover all corners of the configuration language.
|
||||
|
||||
In some sense you could say the configuration language of OpenVPN™ is line based.
|
||||
At least, the first step to parsing configuration directives as OpenVPN 2.X does is to read one line at a time and parse it as one configuration directive[^inline-files].
|
||||
A line is whatever `fgets()` says it is - this includes the newline if not at the end of the file[^configuration-newlines].
|
||||
This is how it is for configuration files.
|
||||
However, if it is a `PUSH_REPLY` a *"line"* is the text string up to a comma or the end of file (or, importantly, a NUL byte).
|
||||
This "line" tokenization is done by repeatedly calling OpenVPN™'s `buf_parse(buf, ',', line, sizeof(line))` function.
|
||||
|
||||
```C
|
||||
/* file: src/openvpn/buffer.c */
|
||||
bool
|
||||
buf_parse(struct buffer *buf, const int delim, char *line, const int size)
|
||||
{
|
||||
bool eol = false;
|
||||
int n = 0;
|
||||
int c;
|
||||
|
||||
ASSERT(size > 0);
|
||||
|
||||
do
|
||||
{
|
||||
c = buf_read_u8(buf);
|
||||
if (c < 0)
|
||||
{
|
||||
eol = true;
|
||||
}
|
||||
if (c <= 0 || c == delim)
|
||||
{
|
||||
c = 0;
|
||||
}
|
||||
if (n >= size)
|
||||
{
|
||||
break;
|
||||
}
|
||||
line[n++] = c;
|
||||
}
|
||||
while (c);
|
||||
|
||||
line[size-1] = '\0';
|
||||
return !(eol && !strlen(line));
|
||||
}
|
||||
```
|
||||
|
||||
`buf_parse()` takes a `struct buffer*` which is a pointer to a byte array with a offset and length field, a delimiter character (in our case `','`), a destination buffer `line` and its length `size`.
|
||||
It calls `buf_read_u8()` which returns the first character in the buffer and advances the offset and decrements the length, or returns `-1` if the buffer is empty.
|
||||
In essence, `buf_parse()` "reads" from the buffer and copies over to `line` until it encounters `delim`, a NUL byte or the end of the buffer.
|
||||
In that case a NUL byte is written to `line`.
|
||||
|
||||
What is interesting is that a NUL byte is effectively considered a delimiter, too, and that it is consumed by `buf_parse()`.
|
||||
Next, let's look at how incoming control channel messages are handled (modified for brevity):
|
||||
|
||||
```C
|
||||
/* file: src/openvpn/forward.c (before fix) */
|
||||
/*
|
||||
* Handle incoming configuration
|
||||
* messages on the control channel.
|
||||
*/
|
||||
static void
|
||||
check_incoming_control_channel(struct context *c, struct buffer buf)
|
||||
{
|
||||
/* force null termination of message */
|
||||
buf_null_terminate(&buf);
|
||||
|
||||
/* enforce character class restrictions */
|
||||
string_mod(BSTR(&buf), CC_PRINT, CC_CRLF, 0);
|
||||
|
||||
if (buf_string_match_head_str(&buf, "AUTH_FAILED"))
|
||||
{
|
||||
receive_auth_failed(c, &buf);
|
||||
}
|
||||
else if (buf_string_match_head_str(&buf, "PUSH_"))
|
||||
{
|
||||
incoming_push_message(c, &buf);
|
||||
}
|
||||
/* SNIP */
|
||||
}
|
||||
```
|
||||
|
||||
First, the buffer is ensured to be NUL terminated by replacing the last byte with a NUL byte.
|
||||
This is already somewhat questionable as it could make an otherwise invalid message valid.
|
||||
Next, character class restrictions are "enforced".
|
||||
What this roughly does is removing non-printable characters and carriage returns and line feeds from the C string.
|
||||
The macro `BSTR()` returns the underlying buffer behind the `struct buffer` with the offset added.
|
||||
Notably, `string_mod()` works on (NUL terminated) C strings and not `struct buffer`s.
|
||||
As an example, the string (with the usual C escape sequences):
|
||||
|
||||
"PUSH_REPLY,line \nfeeds\n,are\n,removed\n\000"
|
||||
|
||||
becomes
|
||||
|
||||
"PUSH_REPLY,line feeds,are,removed\000ed\n\000"
|
||||
|
||||
As you can see, if interpreted as a C string we have removed the line feeds.
|
||||
But what is this at the end?
|
||||
It is the same last 4 bytes from the original string.
|
||||
More generally, it is the last N bytes from the original string if the original string has N line feeds (or other disallowed characters).
|
||||
|
||||
The whole buffer is still passed to the push reply parsing.
|
||||
Remember that the "line" parser will not only consume commas as the line delimiter, but also NUL bytes!
|
||||
This means the configuration directives are parsed as lines:
|
||||
|
||||
```C
|
||||
"line feeds"
|
||||
"are"
|
||||
"removed"
|
||||
"ed\n"
|
||||
```
|
||||
|
||||
With this technique we can now inject (almost; the exception is NUL) arbitrary bytes as configuration directive lines.
|
||||
This is bad because the configuration directive is printed to the console if it doesn't parse.
|
||||
As a proof of concept I sent a `PUSH_REPLY` with an embedded BEL character, and the OpenVPN™ client prints to console (abbreviated):
|
||||
|
||||
Unrecognized option or missing or extra parameter(s): ^G
|
||||
|
||||
The `^G` is how the BEL character is printed in my terminal.
|
||||
I was also able to hear an audible bell.
|
||||
|
||||
A more thorough explanation on how terminal escape sequences can be exploited can be found on [G-Reasearch's blog](https://www.gresearch.com/news/g-research-the-terminal-escapes/).
|
||||
|
||||
### The fix
|
||||
|
||||
The fix also is also a first step towards decoupling the control channel messaging from the TLS record frames.
|
||||
First, the data is split on NUL bytes in order to get the control channel message(s), and then messages are rejected if they contain illegal characters.
|
||||
This solves the vulnerability described previously.
|
||||
|
||||
Unfortunately, it turns out that especially for the `AUTH_FAILED` control channel message it is easy to create invalid messages:
|
||||
If 2FA is implemented using the script mechanism sending custom messages they easily end with a newline asking the client to enter the verification code.
|
||||
I believe in 2.6.12 the client tolerates trailing newline characters.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The first bug, the timer rescheduling bug, is at least 20 years old!
|
||||
It hasn't always been exploitable, but the bug itself goes back as far as the git history does.
|
||||
I haven't attempted further software archeology to find the exact time of introduction.
|
||||
Either way, it's old and gone unnoticed for quite a while.
|
||||
|
||||
I think this shows that diversity in implementations is a great way to exercise corner cases, push forward (protocol) documentation efforts and get thorough code review by motivated peers.
|
||||
This work was funded by [the EU NGI Assure Fund through NLnet](https://nlnet.nl/project/MirageVPN/).
|
||||
In my opinion, this shows that funding one open source project can have a positive impact on other open source projects, too.
|
||||
|
||||
[robur]: https://robur.coop/
|
||||
[miragevpn-server]: miragevpn-server.html
|
||||
[contact]: https://reyn.ir/contact.html
|
||||
|
||||
[^openvpn-tls]: This is not always the case. It is possible to use static shared secret keys, but it is mostly considered deprecated.
|
||||
[^disconnect]: I say "disconnect" even when the underlying transport is the connection-less UDP.
|
||||
[^kill-immediately]: As the alert reader might have realized this is inaccurate. It does not kill the client "immediately" as it will wait five seconds after the exit message is sent before exiting. At best this will kill a cooperating client once it's received the kill message.
|
||||
[^proto-push-request]: There is another mechanism to request a `PUSH_REPLY` earlier with less roundtrips, but let's ignore that for now. The exact message is `PUSH_REQUEST<NUL-BYTE>` as messages need to be NUL-terminated.
|
||||
[^inline-files]: An exception being inline files which can span multiple lines. They vaguely resemble XML tags with an open `<tag>` and close `</tag>` each on their own line with the data in between. I doubt these are sent in `PUSH_REPLY`s, but I can't rule out without diving into the source code that it isn't possible to send inline files.
|
||||
[^configuration-newlines]: This results in the quirk that it is possible to sort-of escape a newline in a configuration directive. But since the line splitting is done *first* it's not possible to continue the directive on the next line! I believe this is mostly useless, but it is a way to inject line feeds in configuration options without modifying the OpenVPN source code.
|
221
articles/2024-10-29-ptt.md
Normal file
221
articles/2024-10-29-ptt.md
Normal file
|
@ -0,0 +1,221 @@
|
|||
---
|
||||
date: 2024-10-29
|
||||
title: Postes, télégraphes et téléphones, next steps
|
||||
description: An update of our email stack
|
||||
tags:
|
||||
- SMTP
|
||||
- emails
|
||||
- mailing-lists
|
||||
author:
|
||||
name: Romain Calascibetta
|
||||
email: romain.calascibetta@gmail.com
|
||||
link: https://blog.osau.re/
|
||||
breaks: false
|
||||
---
|
||||
|
||||
As you know from [our article on Robur's
|
||||
finances](https://blog.robur.coop/articles/finances.html), we've just received
|
||||
[funding for our email project](https://nlnet.nl/project/PTT). This project
|
||||
started when I was doing my internship in Cambridge and it's great to see that
|
||||
it's been able to evolve over time and remain functional. This article will
|
||||
introduce you to the latest changes to [our PTT
|
||||
project](https://github.com/mirage/ptt) and how far we've got towards providing
|
||||
an OCaml mailing list service.
|
||||
|
||||
## A Git repository or a simple block device as a database?
|
||||
|
||||
One issue that came up quickly in our latest experiments with our SMTP stack was
|
||||
the database of users with an email address. Since we had decided to ‘break
|
||||
down’ the various stages of an email submission to offer simple unikernels, we
|
||||
ended up having to deploy 4 unikernels to have a service that worked.
|
||||
- a unikernel for authentication
|
||||
- a unikernel DKIM-signing the incoming email
|
||||
- one unikernel as primary DNS server
|
||||
- one unikernel sending the signed email to its real destination
|
||||
|
||||
And we're only talking here about the submission of an email, the reception
|
||||
concerns another ‘pipe’.
|
||||
|
||||
The problem with such an architecture is that some unikernels need to have the
|
||||
same data: the users. In this case, the first unikernel needs to know the user's
|
||||
password in order to verify authentication. The final unikernel needs to know
|
||||
the real destinations of the users.
|
||||
|
||||
Let's take the example of two users: foo@robur.coop and bar@robur.coop. The
|
||||
first points to hannes@foo.org and the second to reynir@example.com.
|
||||
|
||||
If Hannes wants to send a message to bar@robur.coop under the identity of
|
||||
foo@robur.coop, he will need to authenticate himself to our first unikernel.
|
||||
This first unikernel must therefore:
|
||||
1) check that the user `foo` exists
|
||||
2) the hashed password used by Hannes is the same as the one in the database
|
||||
|
||||
Next, the email will be signed by our second unikernel. It will then forward the
|
||||
email to the last unikernel, which will do the actual translation of the
|
||||
recipients and DNS resolution. In other words:
|
||||
1) it will see that one (the only) recipient is bar@robur.coop
|
||||
2) check that bar@robur.coop exists and obtain its real address
|
||||
3) it will obtain reynir@example.com and perform DNS resolution on
|
||||
`example.com` to find out the email server for this domain
|
||||
4) finally send the email signed by foo@robur.coop to reynir@example.com!
|
||||
|
||||
So the first and last unikernels need to have the same information about our
|
||||
users. One for the passwords, the second for the real email addresses.
|
||||
|
||||
But as you know, we're talking about unikernels that exist independently of each
|
||||
other. What's more, they can't share files and the possibility of them sharing
|
||||
block-devices remains an open question (and a complex one where parallel access
|
||||
may be involved). In short, the only way to ‘synchronise’ these unikernels in
|
||||
relation to common data is with a Git repository.
|
||||
|
||||
[Git][git-kv] has the advantage of being widely used for our unikernels
|
||||
([primary-git][primary-git], [pasteur][pasteur], [unipi][unipi] and
|
||||
[contruno][contruno]). The advantage is that you can track changes, modify
|
||||
files and notify the unikernel to update itself (using nsupdate, a simple ping
|
||||
or an http request to the unikernel).
|
||||
|
||||
The problem is that this requires certain skills. Even if it's ‘simple’ to set
|
||||
up a Git server and then deploy our unikernels, we can restructure our
|
||||
architecture and simplify the deployment of an SMTP stack!
|
||||
|
||||
## Elit and OneFFS
|
||||
|
||||
We have therefore decided to merge the email exchange service and email
|
||||
submission into a unikernel so that this is the only user information requester.
|
||||
|
||||
So we decided to use [OneFFS][oneffs] as the file system for our database,
|
||||
which will be a plain JSON file. This is perhaps one of the advantages of
|
||||
MirageOS, which is that you can decide exactly what you need to implement
|
||||
specific objectives.
|
||||
|
||||
In this case, those with experience of Postfix, LDAP or MariaDB could confirm
|
||||
that configuring an email service should be ‘simpler’ than implementing a
|
||||
multitude of pipes between different applications and authentication methods.
|
||||
|
||||
The JSON file is therefore very simple and so is the creation of an OneFFS
|
||||
image:
|
||||
```sh
|
||||
$ cat >database.json<<EOF
|
||||
> [ { "name": "din"
|
||||
> , "password": "xxxxxx"
|
||||
> , "mailboxes": [ "romain.calascibetta@gmail.com" ] } ]
|
||||
> EOF
|
||||
$ opam install oneffs
|
||||
$ oneffs create -i database.json -o database.img
|
||||
```
|
||||
|
||||
All you have to do is register this image as a block with [albatross][albatross] and launch
|
||||
our Elit unikernel with this block-device.
|
||||
```sh
|
||||
$ albatross-client create-block --data=database.img database 1024
|
||||
$ albatross-client create --net=service:br0 --block=database:database \
|
||||
elit elit.hvt \
|
||||
--arg=...
|
||||
```
|
||||
|
||||
At this stage, and if we add our unikernel signing incoming emails, we have more
|
||||
or less the same thing as what I've described in [my previous articles][smtp_1] on
|
||||
[deploying][smtp_2] an [email service][smtp_3].
|
||||
|
||||
## Multiplex receiving & sending emails
|
||||
|
||||
The PTT project is a toolkit for implementing SMTP servers. It gives developers
|
||||
the choice of implementing their logic as they see fit:
|
||||
* sign an email
|
||||
* resolve destinations according to a database
|
||||
* check SPF information
|
||||
* annotate the email as spam or not
|
||||
* etc.
|
||||
|
||||
Previously, PTT was split into 2 parts:
|
||||
1) management of incoming clients/emails
|
||||
2) the logic to be applied to incoming emails and their delivery
|
||||
|
||||
The second point was becoming increasingly complex, however, and errors in
|
||||
sending emails are legion (DMARC non-alignment, the email is too big for the
|
||||
destination, the destination doesn't exist, etc.). All the more so since, up to
|
||||
now, PTT could only report these errors via the logs...
|
||||
|
||||
Hannes immediately mentioned the possibility of separating the logic of the
|
||||
unikernel from the delivery. This will allow us to deal with temporary failures
|
||||
(greylisting) as well. So a fundamental change was made:
|
||||
- improve the [sendmail][sendmail] and `sendmail-lwt` packages (as well as proposing
|
||||
`sendmail-miou`!) when sending or submitting an email
|
||||
- improve PTT so that there are now 3 distinct jobs: receiving, what to do with
|
||||
incoming emails and sending emails
|
||||
|
||||
![SMTP](../images/smtp.jpg)
|
||||
|
||||
This finally allows us to describe a clearer error management policy that is
|
||||
independent of what we want to do with incoming emails. At this stage, we can
|
||||
look for the `Return-Path` in emails that we haven't managed to send and notify
|
||||
the senders!
|
||||
|
||||
All this is still in the experimental stage and practical cases are needed to
|
||||
observe how we should handle errors and how others do.
|
||||
|
||||
## Insights & Next goals
|
||||
|
||||
We're already starting to have a bit of fun with email and we can start sending
|
||||
and receiving emails right away.
|
||||
|
||||
We're also already seeing hacking attempts on our unikernel:
|
||||
- people trying to authenticate themselves without `STARTTLS` (or with it,
|
||||
depending on how clever the bot is)
|
||||
- people trying to send emails as non-existent users in our database
|
||||
- we're also seeing content that has nothing to do with SMTP
|
||||
|
||||
Above all, this shows that, very early on, bots try to usurp the identity linked
|
||||
to your server (in our case, osau.re) in order to send spam, authenticate
|
||||
themselves or simply send ‘stuff’ and observe what happens. In this case, for
|
||||
all the cases mentioned, Elit (and PTT) reacts well: in other words, it simply
|
||||
cuts off the connection.
|
||||
|
||||
We were also able to observe how services such as gmail work. In addition, for
|
||||
the purposes of a mailing list, email forwarding distorts DMARC verification
|
||||
(specifically, SPF verification). The case is very simple:
|
||||
|
||||
foo@gmail.com tries to reply to robur@osau.re. robur@osau.re is a mailing list
|
||||
to several addresses (one of them is bar@gmail.com). The unikernel will receive
|
||||
the email and send it to bar@gmail.com. The problem is the alignment between
|
||||
the `From` field (which corresponds to foo@gmail.com) and our osau.re server.
|
||||
From gmail.com's point of view, there is a misalignment between these two
|
||||
pieces of information and it therefore refuses to receive the email.
|
||||
|
||||
This is where our next objectives come in:
|
||||
- finish our DMARC implementation
|
||||
- implement ARC so that our server notifies us that, on our side, the DMARC
|
||||
check went well and that gmail.com should trust us on this.
|
||||
|
||||
There is another way of solving the problem, perhaps a little more problematic,
|
||||
modify the incoming email and in particular the `From` field. Although this
|
||||
could be done quite simply with [mrmime][mrmime], it's better to concentrate on
|
||||
DMARC and ARC so that we can send our emails as they are and never alter them
|
||||
(especially as this will invalidate previous DKIM signatures!).
|
||||
|
||||
## Conclusion
|
||||
|
||||
It's always satisfying to see your projects working ‘more or less’ correctly.
|
||||
This article will surely be the start of a series on the intricacies of email
|
||||
and the difficulty of deploying such a service at home.
|
||||
|
||||
We hope that this NLnet-funded work will enable us to replace our current email
|
||||
system with unikernels. We're already past the stage where we can, more or less
|
||||
(without DMARC checking), send emails to each other, which is a big step!
|
||||
|
||||
So follow our work on our blog and if you like what we're producing (which
|
||||
involves a whole bunch of protocols and formats - much more than just SMTP), you
|
||||
can make [a donation here](https://robur.coop/Donate)!
|
||||
|
||||
[mrmime]: https://github.com/mirage/mrmime
|
||||
[smtp_1]: https://blog.osau.re/articles/smtp_1.html
|
||||
[smtp_2]: https://blog.osau.re/articles/smtp_2.html
|
||||
[smtp_3]: https://blog.osau.re/articles/smtp_3.html
|
||||
[oneffs]: https://github.com/robur-coop/oneffs
|
||||
[albatross]: https://github.com/robur-coop/albatross
|
||||
[git-kv]: https://github.com/robur-coop/git-kv
|
||||
[primary-git]: https://github.com/robur-coop/dns-primary-git/
|
||||
[contruno]: https://github.com/dinosaure/contruno
|
||||
[pasteur]: https://github.com/dinosaure/pasteur
|
||||
[unipi]: https://github.com/robur-coop/unipi
|
||||
[sendmail]: https://github.com/mirage/colombe
|
|
@ -1,480 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogRuntime arguments in MirageOS
|
||||
</title>
|
||||
<meta name="description" content="The history of runtime arguments to a MirageOS unikernel">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>Runtime arguments in MirageOS</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li></ul><p>I MODIFIED THIS FILE THE REPO IS NOW DIRTY!</p>
|
||||
<p>TL;DR: Passing runtime arguments around is tricky, and prone to change every other month.</p>
|
||||
<h2 id="motivation"><a class="anchor" aria-hidden="true" href="#motivation"></a>Motivation</h2>
|
||||
<p>Sometimes, as an unikernel developer and also as operator, it's nice to have
|
||||
some runtime arguments passed to an unikernel. Now, if you're into OCaml,
|
||||
command-line parsing - together with error messages, man page generation, ... -
|
||||
can be done by the amazing <a href="https://erratique.ch/software/cmdliner">cmdliner</a>
|
||||
package from Daniel Bünzli.</p>
|
||||
<p>MirageOS uses cmdliner for command line argument passing. This also enabled
|
||||
us from the early days to have nice man pages for unikernels (see
|
||||
<code>my-unikernel-binary --help</code>). There are two kinds
|
||||
of arguments: those at configuration time (<code>mirage configure</code>), such as the
|
||||
target to compile for, and those at runtime - when the unikernel is executed.</p>
|
||||
<p>In Mirage 4.8.1 and 4.8.0 (released October 2024) there have been some changes
|
||||
to command-line arguments, which were motivated by 4.5.0 (released April 2024)
|
||||
and user feedback.</p>
|
||||
<p>First of all, our current way to pass a custom runtime argument to a unikernel
|
||||
(<code>unikernel.ml</code>):</p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
open Cmdliner
|
||||
|
||||
let hello =
|
||||
let doc = Arg.info ~doc:"How to say hello." [ "hello" ] in
|
||||
let term = Arg.(value & opt string "Hello World!" doc) in
|
||||
Mirage_runtime.register_arg term
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () -> loop (n - 1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<p>We define the <a href="https://erratique.ch/software/cmdliner/doc/Cmdliner/Term/index.html#type-t">Cmdliner.Term.t</a>
|
||||
in line 6 (<code>let term = ..</code>) - which provides documentation ("How to say hello."), the option to
|
||||
use (<code>["hello"]</code> - which is then translated to <code>--hello=</code>), that it is optional,
|
||||
of type <code>string</code> (cmdliner allows you to convert the incoming strings to more
|
||||
complex (or more narrow) data types, with decent error handling).</p>
|
||||
<p>The defined argument is directly passed to <a href="https://ocaml.org/p/mirage-runtime/4.8.1/doc/Mirage_runtime/index.html#val-register_arg"><code>Mirage_runtime.register_arg</code></a>,
|
||||
(in line 7) so our binding <code>hello</code> is of type <code>unit -> string</code>.
|
||||
In line 14, the value of the runtime argument is used (<code>hello ()</code>) for printing
|
||||
a log message.</p>
|
||||
<p>The nice property is that it is all local in <code>unikernel.ml</code>, there are no other
|
||||
parts involved. It is just a bunch of API calls. The downside is that <code>hello ()</code>
|
||||
should only be evaluated after the function <code>start</code> was called - since the
|
||||
<code>Mirage_runtime</code> needs to parse and fill in the command line arguments. If you
|
||||
call <code>hello ()</code> earlier, you'll get an exception "Called too early. Please delay
|
||||
this call to after the start function of the unikernel.". Also, since
|
||||
Mirage_runtime needs to collect and evaluate the command line arguments, the
|
||||
<code>Mirage_runtime.register_arg</code> may only be called at top-level, otherwise you'll
|
||||
get another exception "The function register_arg was called to late. Please call
|
||||
register_arg before the start function is executed (e.g. in a top-level binding).".</p>
|
||||
<p>Another advantage is, having it all in unikernel.ml means adding and removing
|
||||
arguments doesn't need another execution of <code>mirage configure</code>. Also, any
|
||||
type can be used that the unikernel depends on - the config.ml is compiled only
|
||||
with a small set of dependencies (mirage itself) - and we don't want to impose a
|
||||
large dependency cone for mirage just because someone may like to use
|
||||
X509.Key_type.t as argument type.</p>
|
||||
<p>Earlier, before mirage 4.5.0, we had runtime and configure arguments mixed
|
||||
together. And code was generated when <code>mirage configure</code> was executed to
|
||||
deal with these arguments. The downsides included: we needed serialization for
|
||||
all command-line arguments (at configure time you could fill the argument, which
|
||||
was then serialized, and deserialized at runtime and used unless the argument
|
||||
was provided explicitly), they had to appear in <code>config.ml</code> (which also means
|
||||
changing any would need an execution of <code>mirage configure</code>), since they generated code
|
||||
potential errors were in code that the developer didn't write (though we had
|
||||
some <code>__POS__</code> arguments to provide error locations in the developer code).</p>
|
||||
<p>Related recent changes are:</p>
|
||||
<ul>
|
||||
<li>in mirage 4.8.1, the runtime arguments to configure the OCaml runtime system
|
||||
(such as GC settings, randomization of hashtables, recording of backtraces)
|
||||
are now provided using the <a href="https://ocaml.org/p/cmdliner-stdlib">cmdliner-stdlib</a>
|
||||
package.</li>
|
||||
<li>in mirage 4.8.0, for git, dns-client, and happy-eyeballs devices the optional
|
||||
arguments are generated by default - so they are always available and don't
|
||||
need to be manually done by the unikernel developer.</li>
|
||||
</ul>
|
||||
<p>Let's dive a bit deeper into the history.</p>
|
||||
<h2 id="history"><a class="anchor" aria-hidden="true" href="#history"></a>History</h2>
|
||||
<p>In MirageOS, since the early stages (I'll go back to 2.7.0 (February 2016) where
|
||||
functoria was introduced) used an embedded fork of <code>cmdliner</code> to handle command
|
||||
line arguments.</p>
|
||||
<p><a href="https://asciinema.org/a/ruHoadi2oZGOzgzMKk5ZYoFgf"><img src="https://asciinema.org/a/ruHoadi2oZGOzgzMKk5ZYoFgf.svg" alt="Animated changes to the hello world unikernel" ></a></p>
|
||||
<h3 id="february-2016-mirage-270"><a class="anchor" aria-hidden="true" href="#february-2016-mirage-270"></a>February 2016 (Mirage 2.7.0)</h3>
|
||||
<p>When looking into the MirageOS 2.x series, here's the code for our hello world
|
||||
unikernel:</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
"Unikernel.Hello" (console @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_console]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
|
||||
module Hello (C: V1_LWT.CONSOLE) = struct
|
||||
let start c =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
C.log c (Key_gen.hello ());
|
||||
OS.Time.sleep 1.0 >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<p>As you can see, the cmdliner term was provided in <code>config.ml</code>, and in
|
||||
<code>unikernel.ml</code> the expression <code>Key_gen.hello ()</code> was used - <code>Key_gen</code> was
|
||||
a module generated by the <code>mirage configure</code> invocation.</p>
|
||||
<p>You can as well see that the term was wrapped in <code>Key.create "hello"</code> - where
|
||||
this string was used as the identifier for the code generation.</p>
|
||||
<p>As mentioned above, a change needed to be done in <code>config.ml</code> and a
|
||||
<code>mirage configure</code> to take effect.</p>
|
||||
<h3 id="july-2016-mirage-291"><a class="anchor" aria-hidden="true" href="#july-2016-mirage-291"></a>July 2016 (Mirage 2.9.1)</h3>
|
||||
<p>The <code>OS.Time</code> was functorized with a <code>Time</code> functor:</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
"Unikernel.Hello" (console @-> time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_console $ default_time]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
|
||||
module Hello (C: V1_LWT.CONSOLE) (Time : V1_LWT.TIME) = struct
|
||||
let start c _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
C.log c (Key_gen.hello ());
|
||||
Time.sleep 1.0 >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<h3 id="february-2017-mirage-pre3"><a class="anchor" aria-hidden="true" href="#february-2017-mirage-pre3"></a>February 2017 (Mirage pre3)</h3>
|
||||
<p>The <code>Time</code> signature changed, now the <code>sleep_ns</code> function sleeps in nanoseconds.
|
||||
This avoids floating point numbers at the core of MirageOS. The helper package
|
||||
<code>duration</code> is used to avoid manual conversions.</p>
|
||||
<p>Also, the console signature changed - and <code>log</code> is now inside the Lwt monad.</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (console @-> time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_console $ default_time]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
|
||||
module Hello (C: V1_LWT.CONSOLE) (Time : V1_LWT.TIME) = struct
|
||||
let start c _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
C.log c (Key_gen.hello ()) >>= fun () ->
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<h3 id="february-2017-mirage-3"><a class="anchor" aria-hidden="true" href="#february-2017-mirage-3"></a>February 2017 (Mirage 3)</h3>
|
||||
<p>Another big change is that now console is not used anymore, but
|
||||
<a href="https://erratique.ch/software/logs">logs</a>.</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
|
||||
module Hello (Time : Mirage_time_lwt.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (Key_gen.hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<h3 id="january-2020-mirage-370"><a class="anchor" aria-hidden="true" href="#january-2020-mirage-370"></a>January 2020 (Mirage 3.7.0)</h3>
|
||||
<p>The <code>_lwt</code> is dropped from the interfaces (we used to have Mirage_time and
|
||||
Mirage_time_lwt - where the latter was instantiating the former with concrete
|
||||
types: <code>type 'a io = Lwt.t</code> and <code>type buffer = Cstruct.t</code> -- in a cleanup
|
||||
session we dropped the <code>_lwt</code> interfaces and opam packages. The reasoning was
|
||||
that when we'll get around to move to another IO system, we'll move everything
|
||||
at once anyways. No need to have <code>lwt</code> and something else (<code>async</code>, or nowadays
|
||||
<code>miou</code> or <code>eio</code>) in a single unikernel.</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (Key_gen.hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<h3 id="october-2021-mirage-310"><a class="anchor" aria-hidden="true" href="#october-2021-mirage-310"></a>October 2021 (Mirage 3.10)</h3>
|
||||
<p>Some renamings to fix warnings. Only <code>config.ml</code> changed.</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
main
|
||||
~keys:[key hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (Key_gen.hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<h3 id="june-2023-mirage-44"><a class="anchor" aria-hidden="true" href="#june-2023-mirage-44"></a>June 2023 (Mirage 4.4)</h3>
|
||||
<p>The argument was moved to runtime.</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt ~stage:`Run string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
main
|
||||
~keys:[key hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (Key_gen.hello ());
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<h3 id="march-2024-mirage-45"><a class="anchor" aria-hidden="true" href="#march-2024-mirage-45"></a>March 2024 (Mirage 4.5)</h3>
|
||||
<p>The runtime argument is in <code>config.ml</code> refering to the argument as string
|
||||
("Unikernel.hello"), and being passed to the <code>start</code> function as argument.</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let runtime_args = [ runtime_arg ~pos:__POS__ "Unikernel.hello" ]
|
||||
|
||||
let main =
|
||||
main
|
||||
~runtime_args
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
open Cmdliner
|
||||
|
||||
let hello =
|
||||
let doc = Arg.info ~doc:"How to say hello." [ "hello" ] in
|
||||
Arg.(value & opt string "Hello World!" doc)
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time hello =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" hello);
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<h3 id="october-2024-mirage-48"><a class="anchor" aria-hidden="true" href="#october-2024-mirage-48"></a>October 2024 (Mirage 4.8)</h3>
|
||||
<p>Again, moved out of <code>config.ml</code>.</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let main =
|
||||
main
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
open Cmdliner
|
||||
|
||||
let hello =
|
||||
let doc = Arg.info ~doc:"How to say hello." [ "hello" ] in
|
||||
Mirage_runtime.register_arg Arg.(value & opt string "Hello World!" doc)
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
</code></pre>
|
||||
<h3 id="2024-not-yet-released"><a class="anchor" aria-hidden="true" href="#2024-not-yet-released"></a>2024 (Not yet released)</h3>
|
||||
<p>This is the future with time defunctorized. Read more in the <a href="https://github.com/mirage/mirage/issues/1513">discussion</a>.
|
||||
To delay the start function, a <code>dep</code> of <code>noop</code> is introduced.</p>
|
||||
<p><code>config.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Mirage
|
||||
|
||||
let main =
|
||||
main
|
||||
~packages:[package "duration"]
|
||||
~dep:[dep noop]
|
||||
"Unikernel" job
|
||||
|
||||
let () = register "hello-key" [main]
|
||||
</code></pre>
|
||||
<p>and <code>unikernel.ml</code></p>
|
||||
<pre><code class="language-OCaml">open Lwt.Infix
|
||||
open Cmdliner
|
||||
|
||||
let hello =
|
||||
let doc = Arg.info ~doc:"How to say hello." [ "hello" ] in
|
||||
Mirage_runtime.register_arg Arg.(value & opt string "Hello World!" doc)
|
||||
|
||||
let start () =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (hello ()));
|
||||
Mirage_timer.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
</code></pre>
|
||||
<h2 id="conclusion"><a class="anchor" aria-hidden="true" href="#conclusion"></a>Conclusion</h2>
|
||||
<p>The history of hello world shows that over time we slowly improve the developer
|
||||
experience, and removing the boilerplate needed to get MirageOS unikernels up
|
||||
and running. This is work over a decade including lots of other (here invisible)
|
||||
improvements to the mirage utility.</p>
|
||||
<p>Our current goal is to minimize the code generated by mirage, since code
|
||||
generation has lots of issues (e.g. error locations, naming, binary size). It
|
||||
is a long journey. At the same time, we are working on improving the performance
|
||||
of MirageOS unikernels, developing unikernels that are useful in the real
|
||||
world (<a href="https://github.com/robur-coop/miragevpn">VPN endpoint</a>, <a href="https://github.com/robur-coop/dnsvizor">DNSmasq replacement</a>, ...), and also <a href="https://github.com/robur-coop/mollymawk">simplifying the
|
||||
deployment of MirageOS unikernels</a>.</p>
|
||||
<p>If you're interested in MirageOS and using it in your domain, don't hesitate
|
||||
to reach out to us (via eMail: team@robur.coop) - we're keen to deploy MirageOS
|
||||
and find more domains where it is useful. If you can spare a dime, we're a
|
||||
registered non-profit in Germany - and can provide tax-deductable receipts for
|
||||
donations (<a href="https://robur.coop/Donate">more information</a>).</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
538
articles/arguments.md
Normal file
538
articles/arguments.md
Normal file
|
@ -0,0 +1,538 @@
|
|||
---
|
||||
date: 2024-10-22
|
||||
title: Runtime arguments in MirageOS
|
||||
description:
|
||||
The history of runtime arguments to a MirageOS unikernel
|
||||
tags:
|
||||
- OCaml
|
||||
- MirageOS
|
||||
author:
|
||||
name: Hannes Mehnert
|
||||
email: hannes@mehnert.org
|
||||
link: https://hannes.robur.coop
|
||||
---
|
||||
|
||||
TL;DR: Passing runtime arguments around is tricky, and prone to change every other month.
|
||||
|
||||
## Motivation
|
||||
|
||||
Sometimes, as an unikernel developer and also as operator, it's nice to have
|
||||
some runtime arguments passed to an unikernel. Now, if you're into OCaml,
|
||||
command-line parsing - together with error messages, man page generation, ... -
|
||||
can be done by the amazing [cmdliner](https://erratique.ch/software/cmdliner)
|
||||
package from Daniel Bünzli.
|
||||
|
||||
MirageOS uses cmdliner for command line argument passing. This also enabled
|
||||
us from the early days to have nice man pages for unikernels (see
|
||||
`my-unikernel-binary --help`). There are two kinds
|
||||
of arguments: those at configuration time (`mirage configure`), such as the
|
||||
target to compile for, and those at runtime - when the unikernel is executed.
|
||||
|
||||
In Mirage 4.8.1 and 4.8.0 (released October 2024) there have been some changes
|
||||
to command-line arguments, which were motivated by 4.5.0 (released April 2024)
|
||||
and user feedback.
|
||||
|
||||
First of all, our current way to pass a custom runtime argument to a unikernel
|
||||
(`unikernel.ml`):
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
open Cmdliner
|
||||
|
||||
let hello =
|
||||
let doc = Arg.info ~doc:"How to say hello." [ "hello" ] in
|
||||
let term = Arg.(value & opt string "Hello World!" doc) in
|
||||
Mirage_runtime.register_arg term
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () -> loop (n - 1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
We define the [Cmdliner.Term.t](https://erratique.ch/software/cmdliner/doc/Cmdliner/Term/index.html#type-t)
|
||||
in line 6 (`let term = ..`) - which provides documentation ("How to say hello."), the option to
|
||||
use (`["hello"]` - which is then translated to `--hello=`), that it is optional,
|
||||
of type `string` (cmdliner allows you to convert the incoming strings to more
|
||||
complex (or more narrow) data types, with decent error handling).
|
||||
|
||||
The defined argument is directly passed to [`Mirage_runtime.register_arg`](https://ocaml.org/p/mirage-runtime/4.8.1/doc/Mirage_runtime/index.html#val-register_arg),
|
||||
(in line 7) so our binding `hello` is of type `unit -> string`.
|
||||
In line 14, the value of the runtime argument is used (`hello ()`) for printing
|
||||
a log message.
|
||||
|
||||
The nice property is that it is all local in `unikernel.ml`, there are no other
|
||||
parts involved. It is just a bunch of API calls. The downside is that `hello ()`
|
||||
should only be evaluated after the function `start` was called - since the
|
||||
`Mirage_runtime` needs to parse and fill in the command line arguments. If you
|
||||
call `hello ()` earlier, you'll get an exception "Called too early. Please delay
|
||||
this call to after the start function of the unikernel.". Also, since
|
||||
Mirage_runtime needs to collect and evaluate the command line arguments, the
|
||||
`Mirage_runtime.register_arg` may only be called at top-level, otherwise you'll
|
||||
get another exception "The function register_arg was called to late. Please call
|
||||
register_arg before the start function is executed (e.g. in a top-level binding).".
|
||||
|
||||
Another advantage is, having it all in unikernel.ml means adding and removing
|
||||
arguments doesn't need another execution of `mirage configure`. Also, any
|
||||
type can be used that the unikernel depends on - the config.ml is compiled only
|
||||
with a small set of dependencies (mirage itself) - and we don't want to impose a
|
||||
large dependency cone for mirage just because someone may like to use
|
||||
X509.Key_type.t as argument type.
|
||||
|
||||
Earlier, before mirage 4.5.0, we had runtime and configure arguments mixed
|
||||
together. And code was generated when `mirage configure` was executed to
|
||||
deal with these arguments. The downsides included: we needed serialization for
|
||||
all command-line arguments (at configure time you could fill the argument, which
|
||||
was then serialized, and deserialized at runtime and used unless the argument
|
||||
was provided explicitly), they had to appear in `config.ml` (which also means
|
||||
changing any would need an execution of `mirage configure`), since they generated code
|
||||
potential errors were in code that the developer didn't write (though we had
|
||||
some `__POS__` arguments to provide error locations in the developer code).
|
||||
|
||||
Related recent changes are:
|
||||
- in mirage 4.8.1, the runtime arguments to configure the OCaml runtime system
|
||||
(such as GC settings, randomization of hashtables, recording of backtraces)
|
||||
are now provided using the [cmdliner-stdlib](https://ocaml.org/p/cmdliner-stdlib)
|
||||
package.
|
||||
- in mirage 4.8.0, for git, dns-client, and happy-eyeballs devices the optional
|
||||
arguments are generated by default - so they are always available and don't
|
||||
need to be manually done by the unikernel developer.
|
||||
|
||||
Let's dive a bit deeper into the history.
|
||||
|
||||
## History
|
||||
|
||||
In MirageOS, since the early stages (I'll go back to 2.7.0 (February 2016) where
|
||||
functoria was introduced) used an embedded fork of `cmdliner` to handle command
|
||||
line arguments.
|
||||
|
||||
[![Animated changes to the hello world unikernel](https://asciinema.org/a/ruHoadi2oZGOzgzMKk5ZYoFgf.svg)](https://asciinema.org/a/ruHoadi2oZGOzgzMKk5ZYoFgf)
|
||||
|
||||
### February 2016 (Mirage 2.7.0)
|
||||
|
||||
When looking into the MirageOS 2.x series, here's the code for our hello world
|
||||
unikernel:
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
"Unikernel.Hello" (console @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_console]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
|
||||
module Hello (C: V1_LWT.CONSOLE) = struct
|
||||
let start c =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
C.log c (Key_gen.hello ());
|
||||
OS.Time.sleep 1.0 >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
As you can see, the cmdliner term was provided in `config.ml`, and in
|
||||
`unikernel.ml` the expression `Key_gen.hello ()` was used - `Key_gen` was
|
||||
a module generated by the `mirage configure` invocation.
|
||||
|
||||
You can as well see that the term was wrapped in `Key.create "hello"` - where
|
||||
this string was used as the identifier for the code generation.
|
||||
|
||||
As mentioned above, a change needed to be done in `config.ml` and a
|
||||
`mirage configure` to take effect.
|
||||
|
||||
### July 2016 (Mirage 2.9.1)
|
||||
|
||||
The `OS.Time` was functorized with a `Time` functor:
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
"Unikernel.Hello" (console @-> time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_console $ default_time]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
|
||||
module Hello (C: V1_LWT.CONSOLE) (Time : V1_LWT.TIME) = struct
|
||||
let start c _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
C.log c (Key_gen.hello ());
|
||||
Time.sleep 1.0 >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
### February 2017 (Mirage pre3)
|
||||
|
||||
The `Time` signature changed, now the `sleep_ns` function sleeps in nanoseconds.
|
||||
This avoids floating point numbers at the core of MirageOS. The helper package
|
||||
`duration` is used to avoid manual conversions.
|
||||
|
||||
Also, the console signature changed - and `log` is now inside the Lwt monad.
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (console @-> time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_console $ default_time]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
|
||||
module Hello (C: V1_LWT.CONSOLE) (Time : V1_LWT.TIME) = struct
|
||||
let start c _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
C.log c (Key_gen.hello ()) >>= fun () ->
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
### February 2017 (Mirage 3)
|
||||
|
||||
Another big change is that now console is not used anymore, but
|
||||
[logs](https://erratique.ch/software/logs).
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
|
||||
module Hello (Time : Mirage_time_lwt.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (Key_gen.hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
### January 2020 (Mirage 3.7.0)
|
||||
|
||||
The `_lwt` is dropped from the interfaces (we used to have Mirage_time and
|
||||
Mirage_time_lwt - where the latter was instantiating the former with concrete
|
||||
types: `type 'a io = Lwt.t` and `type buffer = Cstruct.t` -- in a cleanup
|
||||
session we dropped the `_lwt` interfaces and opam packages. The reasoning was
|
||||
that when we'll get around to move to another IO system, we'll move everything
|
||||
at once anyways. No need to have `lwt` and something else (`async`, or nowadays
|
||||
`miou` or `eio`) in a single unikernel.
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
foreign
|
||||
~keys:[Key.abstract hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (Key_gen.hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
### October 2021 (Mirage 3.10)
|
||||
|
||||
Some renamings to fix warnings. Only `config.ml` changed.
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
main
|
||||
~keys:[key hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (Key_gen.hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
### June 2023 (Mirage 4.4)
|
||||
|
||||
The argument was moved to runtime.
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let hello =
|
||||
let doc = Key.Arg.info ~doc:"How to say hello." ["hello"] in
|
||||
Key.(create "hello" Arg.(opt ~stage:`Run string "Hello World!" doc))
|
||||
|
||||
let main =
|
||||
main
|
||||
~keys:[key hello]
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (Key_gen.hello ());
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
### March 2024 (Mirage 4.5)
|
||||
|
||||
The runtime argument is in `config.ml` refering to the argument as string
|
||||
("Unikernel.hello"), and being passed to the `start` function as argument.
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let runtime_args = [ runtime_arg ~pos:__POS__ "Unikernel.hello" ]
|
||||
|
||||
let main =
|
||||
main
|
||||
~runtime_args
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
open Cmdliner
|
||||
|
||||
let hello =
|
||||
let doc = Arg.info ~doc:"How to say hello." [ "hello" ] in
|
||||
Arg.(value & opt string "Hello World!" doc)
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time hello =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" hello);
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
### October 2024 (Mirage 4.8)
|
||||
|
||||
Again, moved out of `config.ml`.
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let main =
|
||||
main
|
||||
~packages:[package "duration"]
|
||||
"Unikernel.Hello" (time @-> job)
|
||||
|
||||
let () = register "hello-key" [main $ default_time]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
open Cmdliner
|
||||
|
||||
let hello =
|
||||
let doc = Arg.info ~doc:"How to say hello." [ "hello" ] in
|
||||
Mirage_runtime.register_arg Arg.(value & opt string "Hello World!" doc)
|
||||
|
||||
module Hello (Time : Mirage_time.S) = struct
|
||||
let start _time =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (hello ()));
|
||||
Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
end
|
||||
```
|
||||
|
||||
### 2024 (Not yet released)
|
||||
|
||||
This is the future with time defunctorized. Read more in the [discussion](https://github.com/mirage/mirage/issues/1513).
|
||||
To delay the start function, a `dep` of `noop` is introduced.
|
||||
|
||||
`config.ml`
|
||||
```OCaml
|
||||
open Mirage
|
||||
|
||||
let main =
|
||||
main
|
||||
~packages:[package "duration"]
|
||||
~dep:[dep noop]
|
||||
"Unikernel" job
|
||||
|
||||
let () = register "hello-key" [main]
|
||||
```
|
||||
|
||||
and `unikernel.ml`
|
||||
```OCaml
|
||||
open Lwt.Infix
|
||||
open Cmdliner
|
||||
|
||||
let hello =
|
||||
let doc = Arg.info ~doc:"How to say hello." [ "hello" ] in
|
||||
Mirage_runtime.register_arg Arg.(value & opt string "Hello World!" doc)
|
||||
|
||||
let start () =
|
||||
let rec loop = function
|
||||
| 0 -> Lwt.return_unit
|
||||
| n ->
|
||||
Logs.info (fun f -> f "%s" (hello ()));
|
||||
Mirage_timer.sleep_ns (Duration.of_sec 1) >>= fun () ->
|
||||
loop (n-1)
|
||||
in
|
||||
loop 4
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The history of hello world shows that over time we slowly improve the developer
|
||||
experience, and removing the boilerplate needed to get MirageOS unikernels up
|
||||
and running. This is work over a decade including lots of other (here invisible)
|
||||
improvements to the mirage utility.
|
||||
|
||||
Our current goal is to minimize the code generated by mirage, since code
|
||||
generation has lots of issues (e.g. error locations, naming, binary size). It
|
||||
is a long journey. At the same time, we are working on improving the performance
|
||||
of MirageOS unikernels, developing unikernels that are useful in the real
|
||||
world ([VPN endpoint](https://github.com/robur-coop/miragevpn), [DNSmasq replacement](https://github.com/robur-coop/dnsvizor), ...), and also [simplifying the
|
||||
deployment of MirageOS unikernels](https://github.com/robur-coop/mollymawk).
|
||||
|
||||
If you're interested in MirageOS and using it in your domain, don't hesitate
|
||||
to reach out to us (via eMail: team@robur.coop) - we're keen to deploy MirageOS
|
||||
and find more domains where it is useful. If you can spare a dime, we're a
|
||||
registered non-profit in Germany - and can provide tax-deductable receipts for
|
||||
donations ([more information](https://robur.coop/Donate)).
|
|
@ -1,116 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogMeet DNSvizor: run your own DHCP and DNS MirageOS unikernel
|
||||
</title>
|
||||
<meta name="description" content="The NGI-funded DNSvizor provides core network services on your network; DNS resolution and DHCP.">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-DNSvizor">DNSvizor</a></li></ul><p>TL;DR: We got <a href="https://nlnet.nl/entrust/">NGI0 Entrust (via NLnet)</a> funding for developing
|
||||
<a href="https://nlnet.nl/project/DNSvizor/">DNSvizor</a> - a DNS resolver and
|
||||
DHCP server. Please help us by <a href="https://github.com/robur-coop/dnsvizor/issues/new">sharing with us your dnsmasq
|
||||
configuration</a>, so we can
|
||||
prioritize the configuration options to support.</p>
|
||||
<h2 id="introduction"><a class="anchor" aria-hidden="true" href="#introduction"></a>Introduction</h2>
|
||||
<p>The <a href="https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol">dynamic host configuration protocol (DHCP)</a>
|
||||
is fundamental in today's Internet and local networks. It usually runs on your
|
||||
router (or as a dedicated independent service) and automatically configures
|
||||
computers that join your network (for example wireless laptops, smartphones)
|
||||
with an IP address, routing information, a DNS resolver, etc. No manual
|
||||
configuration is needed once your friends' smartphone got the password of your
|
||||
wireless network \o/</p>
|
||||
<p>The <a href="https://en.wikipedia.org/wiki/Domain_Name_System">domain name system (DNS)</a>
|
||||
is responsible for translating domain names (such as "robur.coop", "nlnet.nl")
|
||||
to IP addresses (such as 193.30.40.138 or 2a0f:7cc7:7cc7:7c40::138) - used by
|
||||
computers to talk to each other. Humans can remember domain names instead of
|
||||
memorizing IP addresses. Computers then use DNS to translate these domain names
|
||||
to IP addresses to communicate with. DNS is a hierarchic, distributed,
|
||||
faul-tolerant service.</p>
|
||||
<p>These two protocols are fundamental to today's Internet: without them it would
|
||||
be much harder for humans to use it.</p>
|
||||
<h2 id="dnsvizor"><a class="anchor" aria-hidden="true" href="#dnsvizor"></a>DNSvizor</h2>
|
||||
<p>We at <a href="https://robur.coop">robur</a> got funding (from
|
||||
<a href="https://nlnet.nl/project/DNSvizor/">NGI0 Entrust via NLnet</a>) to continue our work on
|
||||
<a href="https://github.com/robur-coop/dnsvizor">DNSvizor</a> - a
|
||||
<a href="https://mirageos.org">MirageOS unikernel</a> that provides DNS resolution and
|
||||
DHCP service for a network. This is fully implemented in
|
||||
<a href="https://ocaml.org">OCaml</a>.</p>
|
||||
<p>Already at our <a href="https://retreat.mirageos.org">MirageOS retreats</a> we deployed
|
||||
such unikernel, to test our <a href="https://github.com/mirage/charrua">DHCP implementation</a>
|
||||
and our <a href="https://github.com/mirage/ocaml-dns">DNS resolver</a> - and found and
|
||||
fixed issues on-site. At the retreats we have a very limited Internet uplink,
|
||||
thus caching DNS queries and answers is great for reducing the load on the
|
||||
uplink.</p>
|
||||
<p>Thanks to the funding we received, we'll be able to work on improving the
|
||||
performance, but also to finish our DNSSec implementation, provide DNS-over-TLS
|
||||
and DNS-over-HTTPS services, and also a web interface. DNSvizor will use the
|
||||
existing <a href="https://thekelleys.org.uk/dnsmasq/doc.html">dnsmasq</a> configuration
|
||||
syntax, and provide lots of features from dnsmasq, and also provide features
|
||||
such as block lists from <a href="https://pi-hole.net/">pi-hole</a>.</p>
|
||||
<p>We are at a point where the <a href="https://github.com/robur-coop/dnsvizor">basic unikernel (our MVP)</a></p>
|
||||
<ul>
|
||||
<li>providing DNS and DHCP services - is ready, and we provide
|
||||
<a href="https://builds.robur.coop/job/dnsvizor">reproducible binary builds</a>. Phew. This
|
||||
means that the first step is done. The <code>--dhcp-range</code> from dnsmasq is already
|
||||
being parsed.</li>
|
||||
</ul>
|
||||
<p>We are now curious on concrete usages of dnsmasq and the configurations you use.
|
||||
If you're interested in dnsvizor, please <a href="https://github.com/robur-coop/dnsvizor/issues/new">open an issue at our repository</a>
|
||||
with your dnsmasq configuration. This will help us to guide which parts of the configuration to prioritize.</p>
|
||||
<h2 id="usages-of-dnsvizor"><a class="anchor" aria-hidden="true" href="#usages-of-dnsvizor"></a>Usages of DNSvizor</h2>
|
||||
<p>We have several use cases for DNSvizor:</p>
|
||||
<ul>
|
||||
<li>at your home router to provide DNS resolution and DHCP service, filtering ads,</li>
|
||||
<li>in the datacenter auto-configuring your machine park,</li>
|
||||
<li>when running your unikernel swarm to auto-configure them.</li>
|
||||
</ul>
|
||||
<p>The first one is where pi-hole as well fits into, and where dnsmasq is used quite
|
||||
a lot. The second one is also a domain where dnsmasq is used. The third one is
|
||||
from our experience that lots of people struggle with deploying MirageOS
|
||||
unikernels since they have to manually do IP configuration etc. We ourselves
|
||||
also pass additional information to the unikernels, such as syslog host,
|
||||
monitoring sink, X.509 certificates or host names, do some DNS provisioning, ...</p>
|
||||
<p>With DNSvizor we will leverage the common configuration options of all
|
||||
unikernels (reducing the need for boot arguments), and also go a bit further
|
||||
and make deployment seamless (including adding hostnames to DNS, forwarding
|
||||
from our reverse TLS proxy, etc.).</p>
|
||||
<h2 id="conclusion"><a class="anchor" aria-hidden="true" href="#conclusion"></a>Conclusion</h2>
|
||||
<p><a href="https://github.com/robur-coop/dnsvizor">DNSvizor</a> provides DNS resolution and
|
||||
DHCP service for your network, and <a href="https://builds.robur.coop/job/dnsvizor">already exists</a> :).
|
||||
Please <a href="https://github.com/robur-coop/dnsvizor/issues/">report issues</a> you
|
||||
encounter and questions you may have. Also, if you use dnsmasq, please
|
||||
<a href="https://github.com/robur-coop/dnsvizor/issues/new">show us your configuration</a>.</p>
|
||||
<p>If you're interested in MirageOS and using it in your domain, don't hesitate
|
||||
to reach out to us (via eMail: team@robur.coop) - we're keen to deploy MirageOS
|
||||
and find more domains where it is useful. If you can
|
||||
<a href="https://robur.coop/Donate">spare a dime</a>, we're a registered non-profit in
|
||||
Germany - and can provide tax-deductable receipts in Europe.</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
107
articles/dnsvizor01.md
Normal file
107
articles/dnsvizor01.md
Normal file
|
@ -0,0 +1,107 @@
|
|||
---
|
||||
date: 2024-10-25
|
||||
title: "Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel"
|
||||
description:
|
||||
The NGI-funded DNSvizor provides core network services on your network; DNS resolution and DHCP.
|
||||
tags:
|
||||
- OCaml
|
||||
- MirageOS
|
||||
- DNSvizor
|
||||
author:
|
||||
name: Hannes Mehnert
|
||||
email: hannes@mehnert.org
|
||||
link: https://hannes.robur.coop
|
||||
---
|
||||
|
||||
TL;DR: We got [NGI0 Entrust (via NLnet)](https://nlnet.nl/entrust/) funding for developing
|
||||
[DNSvizor](https://nlnet.nl/project/DNSvizor/) - a DNS resolver and
|
||||
DHCP server. Please help us by [sharing with us your dnsmasq
|
||||
configuration](https://github.com/robur-coop/dnsvizor/issues/new), so we can
|
||||
prioritize the configuration options to support.
|
||||
|
||||
## Introduction
|
||||
|
||||
The [dynamic host configuration protocol (DHCP)](https://en.wikipedia.org/wiki/Dynamic_Host_Configuration_Protocol)
|
||||
is fundamental in today's Internet and local networks. It usually runs on your
|
||||
router (or as a dedicated independent service) and automatically configures
|
||||
computers that join your network (for example wireless laptops, smartphones)
|
||||
with an IP address, routing information, a DNS resolver, etc. No manual
|
||||
configuration is needed once your friends' smartphone got the password of your
|
||||
wireless network \o/
|
||||
|
||||
The [domain name system (DNS)](https://en.wikipedia.org/wiki/Domain_Name_System)
|
||||
is responsible for translating domain names (such as "robur.coop", "nlnet.nl")
|
||||
to IP addresses (such as 193.30.40.138 or 2a0f:7cc7:7cc7:7c40::138) - used by
|
||||
computers to talk to each other. Humans can remember domain names instead of
|
||||
memorizing IP addresses. Computers then use DNS to translate these domain names
|
||||
to IP addresses to communicate with. DNS is a hierarchic, distributed,
|
||||
faul-tolerant service.
|
||||
|
||||
These two protocols are fundamental to today's Internet: without them it would
|
||||
be much harder for humans to use it.
|
||||
|
||||
## DNSvizor
|
||||
|
||||
We at [robur](https://robur.coop) got funding (from
|
||||
[NGI0 Entrust via NLnet](https://nlnet.nl/project/DNSvizor/)) to continue our work on
|
||||
[DNSvizor](https://github.com/robur-coop/dnsvizor) - a
|
||||
[MirageOS unikernel](https://mirageos.org) that provides DNS resolution and
|
||||
DHCP service for a network. This is fully implemented in
|
||||
[OCaml](https://ocaml.org).
|
||||
|
||||
Already at our [MirageOS retreats](https://retreat.mirageos.org) we deployed
|
||||
such unikernel, to test our [DHCP implementation](https://github.com/mirage/charrua)
|
||||
and our [DNS resolver](https://github.com/mirage/ocaml-dns) - and found and
|
||||
fixed issues on-site. At the retreats we have a very limited Internet uplink,
|
||||
thus caching DNS queries and answers is great for reducing the load on the
|
||||
uplink.
|
||||
|
||||
Thanks to the funding we received, we'll be able to work on improving the
|
||||
performance, but also to finish our DNSSec implementation, provide DNS-over-TLS
|
||||
and DNS-over-HTTPS services, and also a web interface. DNSvizor will use the
|
||||
existing [dnsmasq](https://thekelleys.org.uk/dnsmasq/doc.html) configuration
|
||||
syntax, and provide lots of features from dnsmasq, and also provide features
|
||||
such as block lists from [pi-hole](https://pi-hole.net/).
|
||||
|
||||
We are at a point where the [basic unikernel (our MVP)](https://github.com/robur-coop/dnsvizor)
|
||||
- providing DNS and DHCP services - is ready, and we provide
|
||||
[reproducible binary builds](https://builds.robur.coop/job/dnsvizor). Phew. This
|
||||
means that the first step is done. The `--dhcp-range` from dnsmasq is already
|
||||
being parsed.
|
||||
|
||||
We are now curious on concrete usages of dnsmasq and the configurations you use.
|
||||
If you're interested in dnsvizor, please [open an issue at our repository](https://github.com/robur-coop/dnsvizor/issues/new)
|
||||
with your dnsmasq configuration. This will help us to guide which parts of the configuration to prioritize.
|
||||
|
||||
## Usages of DNSvizor
|
||||
|
||||
We have several use cases for DNSvizor:
|
||||
- at your home router to provide DNS resolution and DHCP service, filtering ads,
|
||||
- in the datacenter auto-configuring your machine park,
|
||||
- when running your unikernel swarm to auto-configure them.
|
||||
|
||||
The first one is where pi-hole as well fits into, and where dnsmasq is used quite
|
||||
a lot. The second one is also a domain where dnsmasq is used. The third one is
|
||||
from our experience that lots of people struggle with deploying MirageOS
|
||||
unikernels since they have to manually do IP configuration etc. We ourselves
|
||||
also pass additional information to the unikernels, such as syslog host,
|
||||
monitoring sink, X.509 certificates or host names, do some DNS provisioning, ...
|
||||
|
||||
With DNSvizor we will leverage the common configuration options of all
|
||||
unikernels (reducing the need for boot arguments), and also go a bit further
|
||||
and make deployment seamless (including adding hostnames to DNS, forwarding
|
||||
from our reverse TLS proxy, etc.).
|
||||
|
||||
## Conclusion
|
||||
|
||||
[DNSvizor](https://github.com/robur-coop/dnsvizor) provides DNS resolution and
|
||||
DHCP service for your network, and [already exists](https://builds.robur.coop/job/dnsvizor) :).
|
||||
Please [report issues](https://github.com/robur-coop/dnsvizor/issues/) you
|
||||
encounter and questions you may have. Also, if you use dnsmasq, please
|
||||
[show us your configuration](https://github.com/robur-coop/dnsvizor/issues/new).
|
||||
|
||||
If you're interested in MirageOS and using it in your domain, don't hesitate
|
||||
to reach out to us (via eMail: team@robur.coop) - we're keen to deploy MirageOS
|
||||
and find more domains where it is useful. If you can
|
||||
[spare a dime](https://robur.coop/Donate), we're a registered non-profit in
|
||||
Germany - and can provide tax-deductable receipts in Europe.
|
|
@ -1,410 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogHow has robur financially been doing since 2018?
|
||||
</title>
|
||||
<meta name="description" content="How we organise as a collective, and why we're doing that.">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>How has robur financially been doing since 2018?</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-finances">finances</a></li><li><a href="https://blog.robur.coop/tags.html#tag-cooperative">cooperative</a></li></ul><p>Since the beginning, robur has been working on MirageOS unikernels and getting
|
||||
them deployed. Due to our experience in hierarchical companies, we wanted to
|
||||
create something different - a workplace without bosses and management. Instead,
|
||||
we are a collective where everybody has a say on what we do, and who gets how
|
||||
much money at the end of the month. This means nobody has to write report and
|
||||
meet any goals - there's no KPI involved. We strive to be a bunch of people
|
||||
working together nicely and projects that we own and want to bring forward. If
|
||||
we discover lack of funding, we reach out to (potential) customers to fill our
|
||||
cash register. Or reach out to people to donate money.</p>
|
||||
<p>Since our mission is fulfilling and already complex - organising ourselves in a
|
||||
hierarchy-free environment, including the payment, and work on software in a
|
||||
niche market - we decided from the early days that bookeeping and invoicing
|
||||
should not be part of our collective. Especially since we want to be free in
|
||||
what kind of funding we accept - donations, commercial contracts, public
|
||||
funding. In the books, robur is part of the non-profit company
|
||||
<a href="https://aenderwerk.de">Änderwerk</a> in Germany - and friends of ours run that
|
||||
company. They get a cut on each income we generate.</p>
|
||||
<p>To be inclusive and enable everyone to participate in decisions, we are 100%
|
||||
transparent in our books - every collective member has access to the financial
|
||||
spreadsheets, contracts, etc. We use a needs-based payment model, so we talk
|
||||
about the needs everyone has on a regular basis and adjust the salary, everyone
|
||||
agreeing to all the numbers.</p>
|
||||
<h2 id="2018"><a class="anchor" aria-hidden="true" href="#2018"></a>2018</h2>
|
||||
<p>We started operations in 2018. In late 2017, we got donations (in the form of
|
||||
bitcoins) by friends who were convinced of our mission. This was 54,194.91 €.
|
||||
So, in 2018 we started with that money, and tried to find a mission, and
|
||||
generate income to sustain our salaries.</p>
|
||||
<p>Also, already in 2017, we applied for funding from
|
||||
<a href="https://prototypefund.de">Prototypefund</a> on a <a href="https://prototypefund.de/project/robur-io/">CalDAV server</a>,
|
||||
and we received the grant in early 2018. This was another 48,500 €, paid to
|
||||
individuals (due to reasons, Prototype fund can't cash out to the non-profit -
|
||||
this put us into some struggle, since we needed some double bookkeeping and
|
||||
individuals had to dig into health care etc.).</p>
|
||||
<p>We also did in the second half of 2018 a security audit for
|
||||
<a href="https://leastauthority.com/blog/audits/five-security-audits-for-the-tezos-foundation/">Least Authority</a>
|
||||
(invoicing 19,600 €).</p>
|
||||
<p>And later in 2018 we started on what is now called NetHSM with an initial
|
||||
design workshop (5,000 €).</p>
|
||||
<p>And lastly, we started to work on a grant to implement <a href="https://datatracker.ietf.org/doc/html/rfc8446">TLS 1.3</a>,
|
||||
funded by Jane Street (via OCaml Labs Consulting). In 2018, we received 12,741.71 €</p>
|
||||
<p>We applied at NLNet for improving the QubesOS firewall developed in MirageOS
|
||||
(without success), tried to get the IT security prize in Germany (without
|
||||
success), and to DIAL OSC (without success).</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>Project</th>
|
||||
<th class="right">Amount</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Donation</td>
|
||||
<td class="right">54,194.91</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Prototypefund</td>
|
||||
<td class="right">48,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Least Authority</td>
|
||||
<td class="right">19,600.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>TLS 1.3</td>
|
||||
<td class="right">12,741.71</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Nitrokey</td>
|
||||
<td class="right">5,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Total</strong></td>
|
||||
<td class="right"><strong>140,036.62</strong></td>
|
||||
</tr>
|
||||
</table></div><h2 id="2019"><a class="anchor" aria-hidden="true" href="#2019"></a>2019</h2>
|
||||
<p>We were keen to finish the CalDAV implementation (and start a CardDAV
|
||||
implementation), and received some financial support from Tarides for it
|
||||
(15,000 €).</p>
|
||||
<p>The TLS 1.3 work continued, we got in total 68,887.53 €.</p>
|
||||
<p>We also applied to (and got funding from) Prototypefund, once with an <a href="https://prototypefund.de/en/project/robust-openvpn-client-with-low-use-of-resources/">OpenVPN-compatible
|
||||
MirageOS unikernel</a>,
|
||||
and once with <a href="https://prototypefund.de/project/portable-firewall-fuer-qubesos/">improving the QubesOS firewall developed as MirageOS unikernel</a>.
|
||||
This means again twice 48,500 €.</p>
|
||||
<p>We also started the implementation work of NetHSM - which still included a lot
|
||||
of design work - in total the contract was over 82,500 €. In 2019, we invoiced
|
||||
Nitrokey in 2019 in total 40,500 €.</p>
|
||||
<p>We also received a total of 516.48 € as donations from source unknown to us.</p>
|
||||
<p>We also applied to NLnet with <a href="https://nlnet.nl/project/Robur/">DNSvizor</a>, and
|
||||
got a grant, but due to buerocratic reasons they couldn't transfer the money to
|
||||
our non-profit (which was involved with NLnet in some EU grants), and we didn't
|
||||
get any money in the end.</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>Project</th>
|
||||
<th class="right">Amount</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>CardDAV</td>
|
||||
<td class="right">15,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>TLS 1.3</td>
|
||||
<td class="right">68,887.53</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>OpenVPN</td>
|
||||
<td class="right">48,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>QubesOS</td>
|
||||
<td class="right">48,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Donation</td>
|
||||
<td class="right">516.48</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Nitrokey</td>
|
||||
<td class="right">40,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Total</strong></td>
|
||||
<td class="right"><strong>221,904.01</strong></td>
|
||||
</tr>
|
||||
</table></div><h2 id="2020"><a class="anchor" aria-hidden="true" href="#2020"></a>2020</h2>
|
||||
<p>In 2020, we agreed with OCaml Labs Consulting to work on maintenance of OCaml
|
||||
packages in the MirageOS ecosystem. This was a contract where at the end of the
|
||||
month, we reported on which PRs and issues we spent how much time. For us, this
|
||||
was great to have the freedom to work on which OCaml packages we were keen to
|
||||
get up to speed. In 2020, we received 45,000 € for this maintenance.</p>
|
||||
<p>We finished the TLS 1.3 work (18,659.01 €)</p>
|
||||
<p>We continued to work on the NetHSM project, and invoiced 55,500 €.</p>
|
||||
<p>We received a total of 255 € in donations from sources unknown to us.</p>
|
||||
<p>We applied at reset.tech again with DNSvizor, unfortunately without success.</p>
|
||||
<p>We also applied at <a href="https://pointer.ngi.eu">NGI pointer</a> to work on reproducible
|
||||
builds for MirageOS, and a web frontend. Here we got the grant of 200,000 €,
|
||||
which we worked on in 2021 and 2022.</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>Project</th>
|
||||
<th class="right">Amount</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>OCLC</td>
|
||||
<td class="right">45,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>TLS 1.3</td>
|
||||
<td class="right">18,659.01</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Nitrokey</td>
|
||||
<td class="right">55,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Donations</td>
|
||||
<td class="right">255.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Total</strong></td>
|
||||
<td class="right"><strong>119,414.01</strong></td>
|
||||
</tr>
|
||||
</table></div><h2 id="2021"><a class="anchor" aria-hidden="true" href="#2021"></a>2021</h2>
|
||||
<p>As outlined, we worked on reproducible builds of unikernels - rethinking the way
|
||||
how a unikernel is configured: no more compiled-in secrets, but instead using
|
||||
boot parameters. We setup the infrastructure for doing daily reproducible
|
||||
builds, serving system packages via a package repository, and a
|
||||
<a href="https://builds.robur.coop">web frontend</a> hosting the reproducible builds.
|
||||
We received in total 120,000 € from NGI Pointer in 2021.</p>
|
||||
<p>Our work on NetHSM continued, including the introduction of elliptic curves
|
||||
in mirage-crypto (using <a href="https://github.com/mit-plv/fiat-crypto/">fiat</a>). The
|
||||
invoices to Nitrokey summed up to 26,000 € in 2021.</p>
|
||||
<p>We developed in a short timeframe two packages, <a href="https://github.com/robur-coop/u2f">u2f</a>
|
||||
and later <a href="https://git.robur.coop/robur/webauthn">webauthn</a> for Skolem Labs based
|
||||
on <a href="https://en.wikipedia.org/wiki/Gift_economy">gift economy</a>. This resulted in
|
||||
donations of 18,976 €.</p>
|
||||
<p>We agreed with <a href="https://ocaml-sf.org/">OCSF</a> to work on
|
||||
<a href="https://github.com/hannesm/conex">conex</a>, which we have not delivered yet
|
||||
(lots of other things had to be cleared first: we did a security review of opam
|
||||
(leading to <a href="https://opam.ocaml.org/blog/opam-2-1-5-local-cache/">a security advisory</a>),
|
||||
we got rid of <a href="https://discuss.ocaml.org/t/ann-opam-repository-policy-change-checksums-no-md5-and-no-extra-files"><code>extra-files</code></a>
|
||||
in the opam-repository, and we <a href="https://discuss.ocaml.org/t/ann-opam-repository-policy-change-checksums-no-md5-and-no-extra-files">removed the weak hash md5</a>
|
||||
from the opam-repository.</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>Customer</th>
|
||||
<th class="right">Amount</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>NGI Pointer</td>
|
||||
<td class="right">120,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Nitrokey</td>
|
||||
<td class="right">26,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Skolem</td>
|
||||
<td class="right">18,976.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Total</strong></td>
|
||||
<td class="right"><strong>164,976.00</strong></td>
|
||||
</tr>
|
||||
</table></div><h2 id="2022"><a class="anchor" aria-hidden="true" href="#2022"></a>2022</h2>
|
||||
<p>We finished our NGI pointer project, and received another 80,000 €.</p>
|
||||
<p>We also did some minor maintenance for Nitrokey, and invoiced 4,500 €.</p>
|
||||
<p>For Tarides, we started another maintaining MirageOS packages (and continuing
|
||||
<a href="https://github.com/robur-coop/utcp">our TCP/IP stack</a>), and invoiced in
|
||||
total 22,500 €.</p>
|
||||
<p>A grant application for <a href="https://github.com/dinosaure/bob/">bob</a> was rejected,
|
||||
but a grant application for <a href="https://github.com/robur-coop/miragevpn">MirageVPN</a>
|
||||
got accepted. Both at NLnet within the EU NGI project.</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>Project</th>
|
||||
<th class="right">Amount</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>NGI Pointer</td>
|
||||
<td class="right">80,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Nitrokey</td>
|
||||
<td class="right">4,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Tarides</td>
|
||||
<td class="right">22,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Total</strong></td>
|
||||
<td class="right"><strong>107,000.00</strong></td>
|
||||
</tr>
|
||||
</table></div><h2 id="2023"><a class="anchor" aria-hidden="true" href="#2023"></a>2023</h2>
|
||||
<p>We finished the NetHSM project, and had a final invoice over 2,500 €.</p>
|
||||
<p>We started a collaboration for <a href="https://semgrep.dev">semgrep</a>, porting some of
|
||||
their Python code to OCaml. We received in total 37,500 €.</p>
|
||||
<p>We continued the MirageOS opam package maintenance and invoiced in total
|
||||
89,250 € to Tarides.</p>
|
||||
<p>A grant application on <a href="https://nlnet.nl/project/MirageVPN/">MirageVPN</a> got
|
||||
accepted (NGI Assure), and we received in total 12,000 € for our work on it.
|
||||
This is a continuation of our 2019 work funded by Prototypefund.</p>
|
||||
<p>We also wrote various funding applications, including one for
|
||||
<a href="https://github.com/robur-coop/dnsvizor">DNSvizor</a> that was
|
||||
<a href="https://nlnet.nl/project/DNSvizor/">accepted</a> (NGI0 Entrust).</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>Customer</th>
|
||||
<th class="right">Amount</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Nitrokey</td>
|
||||
<td class="right">2,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>semgrep</td>
|
||||
<td class="right">37,500.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Tarides</td>
|
||||
<td class="right">89,250.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MirageVPN</td>
|
||||
<td class="right">12,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Total</strong></td>
|
||||
<td class="right"><strong>141,250.00</strong></td>
|
||||
</tr>
|
||||
</table></div><h2 id="2024"><a class="anchor" aria-hidden="true" href="#2024"></a>2024</h2>
|
||||
<p>We're still in the middle of it, but so far we continued the Tarides maintenance
|
||||
contract (54,937.50 €).</p>
|
||||
<p>We also finished the MirageVPN work, and received another 45,000 €.</p>
|
||||
<p>We had a contract with Semgrep again on porting Python code to OCaml and received 18,559.40 €.</p>
|
||||
<p>We again worked on several successful funding applications, one on
|
||||
<a href="https://nlnet.nl/project/PTT/">PTT</a> (NGI Zero Core), a continuation of the
|
||||
<a href="https://www.ngi.eu/funded_solution/ngi-dapsiproject-24/">NGI DAPSI</a> project -
|
||||
now realizing mailing lists with our SMTP stack.</p>
|
||||
<p>We also got <a href="https://nlnet.nl/project/MTE/">MTE</a> (NGI Taler) accepted.</p>
|
||||
<p>The below table is until end of September 2024.</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>Project</th>
|
||||
<th class="right">Amount</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Semgrep</td>
|
||||
<td class="right">18,559.40</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Tarides</td>
|
||||
<td class="right">62,812.50</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MirageVPN</td>
|
||||
<td class="right">45,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Total</strong></td>
|
||||
<td class="right"><strong>126,371.90</strong></td>
|
||||
</tr>
|
||||
</table></div><h2 id="total"><a class="anchor" aria-hidden="true" href="#total"></a>Total</h2>
|
||||
<p>In a single table, here's our income since robur started.</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>Year</th>
|
||||
<th class="right">Amount</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2018</td>
|
||||
<td class="right">140,036.62</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2019</td>
|
||||
<td class="right">221,904.01</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2020</td>
|
||||
<td class="right">119,414.01</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2021</td>
|
||||
<td class="right">164,976.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2022</td>
|
||||
<td class="right">107,000.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2023</td>
|
||||
<td class="right">141,250.00</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2024</td>
|
||||
<td class="right">126,371.90</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Total</strong></td>
|
||||
<td class="right"><strong>1,020,952.54</strong></td>
|
||||
</tr>
|
||||
</table></div><p><img src="../images/finances.png" alt="Plot of above income table" ></p>
|
||||
<p>As you can spot, it varies quite a bit. In some years we have fewer money
|
||||
available than in other years.</p>
|
||||
<h2 id="expenses"><a class="anchor" aria-hidden="true" href="#expenses"></a>Expenses</h2>
|
||||
<p>As mentioned, the non-profit company <a href="https://aenderwerk.de">Änderwerk</a> running
|
||||
the bookkeeping and legal stuff (invoices, tax statements, contracts, etc.) gets
|
||||
a cut on each income we produce. They are doing amazing work and are very
|
||||
quick responding to our queries.</p>
|
||||
<p>We spend most of our income on salary. Some money we spend on travel. We also
|
||||
pay monthly for our server (plus some extra for hardware, and in June 2024 a
|
||||
huge amount for trying to recover data from failed SSDs).</p>
|
||||
<h2 id="conclusion"><a class="anchor" aria-hidden="true" href="#conclusion"></a>Conclusion</h2>
|
||||
<p>We have provided an overview of our income, we were three to five people working
|
||||
at robur over the entire time. As written at the beginning, we use needs-based
|
||||
payment. Our experience with this is great! It provides a lot of trust into each
|
||||
other.</p>
|
||||
<p>Our funding is diverse from multiple sources - donations, commercial work,
|
||||
public funding. This was our initial goal, and we're very happy that it works
|
||||
fine over the last five years.</p>
|
||||
<p>Taking the numbers into account, we are not paying ourselves "industry standard"
|
||||
rates - but we really love what we do - and sometimes we just take some time off.
|
||||
We do work on various projects that we really really enjoy - but where (at the
|
||||
moment) no funding is available for.</p>
|
||||
<p>We are always happy to discuss how our collective operates. If you're
|
||||
interested, please drop us a message.</p>
|
||||
<p>Of course, if we receive donations, we use them wisely - mainly for working on
|
||||
the currently not funded projects (bob, albatross, miou, mollymawk - to name a few). If you
|
||||
can spare a dime or two, don't hesitate to <a href="https://robur.coop/Donate">donate</a>.
|
||||
Donations are tax-deductable in Germany (and should be in Europe) since we're a
|
||||
registered non-profit.</p>
|
||||
<p>If you're interested in MirageOS and using it in your domain, don't hesitate
|
||||
to reach out to us (via eMail: team@robur.coop) so we can start to chat - we're keen to deploy MirageOS
|
||||
and find more domains where it is useful.</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
302
articles/finances.md
Normal file
302
articles/finances.md
Normal file
|
@ -0,0 +1,302 @@
|
|||
---
|
||||
date: 2024-10-21
|
||||
title: How has robur financially been doing since 2018?
|
||||
description: How we organise as a collective, and why we're doing that.
|
||||
tags:
|
||||
- finances
|
||||
- cooperative
|
||||
author:
|
||||
name: Hannes Mehnert
|
||||
email: hannes@mehnert.org
|
||||
link: https://hannes.robur.coop
|
||||
---
|
||||
|
||||
Since the beginning, robur has been working on MirageOS unikernels and getting
|
||||
them deployed. Due to our experience in hierarchical companies, we wanted to
|
||||
create something different - a workplace without bosses and management. Instead,
|
||||
we are a collective where everybody has a say on what we do, and who gets how
|
||||
much money at the end of the month. This means nobody has to write report and
|
||||
meet any goals - there's no KPI involved. We strive to be a bunch of people
|
||||
working together nicely and projects that we own and want to bring forward. If
|
||||
we discover lack of funding, we reach out to (potential) customers to fill our
|
||||
cash register. Or reach out to people to donate money.
|
||||
|
||||
Since our mission is fulfilling and already complex - organising ourselves in a
|
||||
hierarchy-free environment, including the payment, and work on software in a
|
||||
niche market - we decided from the early days that bookeeping and invoicing
|
||||
should not be part of our collective. Especially since we want to be free in
|
||||
what kind of funding we accept - donations, commercial contracts, public
|
||||
funding. In the books, robur is part of the non-profit company
|
||||
[Änderwerk](https://aenderwerk.de) in Germany - and friends of ours run that
|
||||
company. They get a cut on each income we generate.
|
||||
|
||||
To be inclusive and enable everyone to participate in decisions, we are 100%
|
||||
transparent in our books - every collective member has access to the financial
|
||||
spreadsheets, contracts, etc. We use a needs-based payment model, so we talk
|
||||
about the needs everyone has on a regular basis and adjust the salary, everyone
|
||||
agreeing to all the numbers.
|
||||
|
||||
## 2018
|
||||
|
||||
We started operations in 2018. In late 2017, we got donations (in the form of
|
||||
bitcoins) by friends who were convinced of our mission. This was 54,194.91 €.
|
||||
So, in 2018 we started with that money, and tried to find a mission, and
|
||||
generate income to sustain our salaries.
|
||||
|
||||
Also, already in 2017, we applied for funding from
|
||||
[Prototypefund](https://prototypefund.de) on a [CalDAV server](https://prototypefund.de/project/robur-io/),
|
||||
and we received the grant in early 2018. This was another 48,500 €, paid to
|
||||
individuals (due to reasons, Prototype fund can't cash out to the non-profit -
|
||||
this put us into some struggle, since we needed some double bookkeeping and
|
||||
individuals had to dig into health care etc.).
|
||||
|
||||
We also did in the second half of 2018 a security audit for
|
||||
[Least Authority](https://leastauthority.com/blog/audits/five-security-audits-for-the-tezos-foundation/)
|
||||
(invoicing 19,600 €).
|
||||
|
||||
And later in 2018 we started on what is now called NetHSM with an initial
|
||||
design workshop (5,000 €).
|
||||
|
||||
And lastly, we started to work on a grant to implement [TLS 1.3](https://datatracker.ietf.org/doc/html/rfc8446),
|
||||
funded by Jane Street (via OCaml Labs Consulting). In 2018, we received 12,741.71 €
|
||||
|
||||
We applied at NLNet for improving the QubesOS firewall developed in MirageOS
|
||||
(without success), tried to get the IT security prize in Germany (without
|
||||
success), and to DIAL OSC (without success).
|
||||
|
||||
| Project | Amount |
|
||||
|-----------------|----------:|
|
||||
| Donation | 54,194.91 |
|
||||
| Prototypefund | 48,500.00 |
|
||||
| Least Authority | 19,600.00 |
|
||||
| TLS 1.3 | 12,741.71 |
|
||||
| Nitrokey | 5,000.00 |
|
||||
| __Total__ | __140,036.62__ |
|
||||
|
||||
|
||||
## 2019
|
||||
|
||||
We were keen to finish the CalDAV implementation (and start a CardDAV
|
||||
implementation), and received some financial support from Tarides for it
|
||||
(15,000 €).
|
||||
|
||||
The TLS 1.3 work continued, we got in total 68,887.53 €.
|
||||
|
||||
We also applied to (and got funding from) Prototypefund, once with an [OpenVPN-compatible
|
||||
MirageOS unikernel](https://prototypefund.de/en/project/robust-openvpn-client-with-low-use-of-resources/),
|
||||
and once with [improving the QubesOS firewall developed as MirageOS unikernel](https://prototypefund.de/project/portable-firewall-fuer-qubesos/).
|
||||
This means again twice 48,500 €.
|
||||
|
||||
We also started the implementation work of NetHSM - which still included a lot
|
||||
of design work - in total the contract was over 82,500 €. In 2019, we invoiced
|
||||
Nitrokey in 2019 in total 40,500 €.
|
||||
|
||||
We also received a total of 516.48 € as donations from source unknown to us.
|
||||
|
||||
We also applied to NLnet with [DNSvizor](https://nlnet.nl/project/Robur/), and
|
||||
got a grant, but due to buerocratic reasons they couldn't transfer the money to
|
||||
our non-profit (which was involved with NLnet in some EU grants), and we didn't
|
||||
get any money in the end.
|
||||
|
||||
| Project | Amount |
|
||||
|----------|----------:|
|
||||
| CardDAV | 15,000.00 |
|
||||
| TLS 1.3 | 68,887.53 |
|
||||
| OpenVPN | 48,500.00 |
|
||||
| QubesOS | 48,500.00 |
|
||||
| Donation | 516.48 |
|
||||
| Nitrokey | 40,500.00 |
|
||||
| __Total__ | __221,904.01__ |
|
||||
|
||||
## 2020
|
||||
|
||||
In 2020, we agreed with OCaml Labs Consulting to work on maintenance of OCaml
|
||||
packages in the MirageOS ecosystem. This was a contract where at the end of the
|
||||
month, we reported on which PRs and issues we spent how much time. For us, this
|
||||
was great to have the freedom to work on which OCaml packages we were keen to
|
||||
get up to speed. In 2020, we received 45,000 € for this maintenance.
|
||||
|
||||
We finished the TLS 1.3 work (18,659.01 €)
|
||||
|
||||
We continued to work on the NetHSM project, and invoiced 55,500 €.
|
||||
|
||||
We received a total of 255 € in donations from sources unknown to us.
|
||||
|
||||
We applied at reset.tech again with DNSvizor, unfortunately without success.
|
||||
|
||||
We also applied at [NGI pointer](https://pointer.ngi.eu) to work on reproducible
|
||||
builds for MirageOS, and a web frontend. Here we got the grant of 200,000 €,
|
||||
which we worked on in 2021 and 2022.
|
||||
|
||||
| Project | Amount |
|
||||
|-----------|----------:|
|
||||
| OCLC | 45,000.00 |
|
||||
| TLS 1.3 | 18,659.01 |
|
||||
| Nitrokey | 55,500.00 |
|
||||
| Donations | 255.00 |
|
||||
| __Total__ | __119,414.01__ |
|
||||
|
||||
## 2021
|
||||
|
||||
As outlined, we worked on reproducible builds of unikernels - rethinking the way
|
||||
how a unikernel is configured: no more compiled-in secrets, but instead using
|
||||
boot parameters. We setup the infrastructure for doing daily reproducible
|
||||
builds, serving system packages via a package repository, and a
|
||||
[web frontend](https://builds.robur.coop) hosting the reproducible builds.
|
||||
We received in total 120,000 € from NGI Pointer in 2021.
|
||||
|
||||
Our work on NetHSM continued, including the introduction of elliptic curves
|
||||
in mirage-crypto (using [fiat](https://github.com/mit-plv/fiat-crypto/)). The
|
||||
invoices to Nitrokey summed up to 26,000 € in 2021.
|
||||
|
||||
We developed in a short timeframe two packages, [u2f](https://github.com/robur-coop/u2f)
|
||||
and later [webauthn](https://git.robur.coop/robur/webauthn) for Skolem Labs based
|
||||
on [gift economy](https://en.wikipedia.org/wiki/Gift_economy). This resulted in
|
||||
donations of 18,976 €.
|
||||
|
||||
We agreed with [OCSF](https://ocaml-sf.org/) to work on
|
||||
[conex](https://github.com/hannesm/conex), which we have not delivered yet
|
||||
(lots of other things had to be cleared first: we did a security review of opam
|
||||
(leading to [a security advisory](https://opam.ocaml.org/blog/opam-2-1-5-local-cache/)),
|
||||
we got rid of [`extra-files`](https://discuss.ocaml.org/t/ann-opam-repository-policy-change-checksums-no-md5-and-no-extra-files)
|
||||
in the opam-repository, and we [removed the weak hash md5](https://discuss.ocaml.org/t/ann-opam-repository-policy-change-checksums-no-md5-and-no-extra-files)
|
||||
from the opam-repository.
|
||||
|
||||
| Customer | Amount |
|
||||
|-------------|----------:|
|
||||
| NGI Pointer | 120,000.00 |
|
||||
| Nitrokey | 26,000.00 |
|
||||
| Skolem | 18,976.00 |
|
||||
| __Total__ | __164,976.00__ |
|
||||
|
||||
## 2022
|
||||
|
||||
We finished our NGI pointer project, and received another 80,000 €.
|
||||
|
||||
We also did some minor maintenance for Nitrokey, and invoiced 4,500 €.
|
||||
|
||||
For Tarides, we started another maintaining MirageOS packages (and continuing
|
||||
[our TCP/IP stack](https://github.com/robur-coop/utcp)), and invoiced in
|
||||
total 22,500 €.
|
||||
|
||||
A grant application for [bob](https://github.com/dinosaure/bob/) was rejected,
|
||||
but a grant application for [MirageVPN](https://github.com/robur-coop/miragevpn)
|
||||
got accepted. Both at NLnet within the EU NGI project.
|
||||
|
||||
| Project | Amount |
|
||||
|-------------|---------:|
|
||||
| NGI Pointer | 80,000.00 |
|
||||
| Nitrokey | 4,500.00 |
|
||||
| Tarides | 22,500.00 |
|
||||
| __Total__ | __107,000.00__ |
|
||||
|
||||
## 2023
|
||||
|
||||
We finished the NetHSM project, and had a final invoice over 2,500 €.
|
||||
|
||||
We started a collaboration for [semgrep](https://semgrep.dev), porting some of
|
||||
their Python code to OCaml. We received in total 37,500 €.
|
||||
|
||||
We continued the MirageOS opam package maintenance and invoiced in total
|
||||
89,250 € to Tarides.
|
||||
|
||||
A grant application on [MirageVPN](https://nlnet.nl/project/MirageVPN/) got
|
||||
accepted (NGI Assure), and we received in total 12,000 € for our work on it.
|
||||
This is a continuation of our 2019 work funded by Prototypefund.
|
||||
|
||||
We also wrote various funding applications, including one for
|
||||
[DNSvizor](https://github.com/robur-coop/dnsvizor) that was
|
||||
[accepted](https://nlnet.nl/project/DNSvizor/) (NGI0 Entrust).
|
||||
|
||||
| Customer | Amount |
|
||||
|-----------|---------:|
|
||||
| Nitrokey | 2,500.00 |
|
||||
| semgrep | 37,500.00 |
|
||||
| Tarides | 89,250.00 |
|
||||
| MirageVPN | 12,000.00 |
|
||||
| __Total__ | __141,250.00__ |
|
||||
|
||||
## 2024
|
||||
|
||||
We're still in the middle of it, but so far we continued the Tarides maintenance
|
||||
contract (54,937.50 €).
|
||||
|
||||
We also finished the MirageVPN work, and received another 45,000 €.
|
||||
|
||||
We had a contract with Semgrep again on porting Python code to OCaml and received 18,559.40 €.
|
||||
|
||||
We again worked on several successful funding applications, one on
|
||||
[PTT](https://nlnet.nl/project/PTT/) (NGI Zero Core), a continuation of the
|
||||
[NGI DAPSI](https://www.ngi.eu/funded_solution/ngi-dapsiproject-24/) project -
|
||||
now realizing mailing lists with our SMTP stack.
|
||||
|
||||
We also got [MTE](https://nlnet.nl/project/MTE/) (NGI Taler) accepted.
|
||||
|
||||
The below table is until end of September 2024.
|
||||
|
||||
| Project | Amount |
|
||||
|-----------|----------:|
|
||||
| Semgrep | 18,559.40 |
|
||||
| Tarides | 62,812.50 |
|
||||
| MirageVPN | 45,000.00 |
|
||||
| __Total__ | __126,371.90__ |
|
||||
|
||||
## Total
|
||||
|
||||
In a single table, here's our income since robur started.
|
||||
|
||||
| Year | Amount |
|
||||
|-------|-----------:|
|
||||
| 2018 | 140,036.62 |
|
||||
| 2019 | 221,904.01 |
|
||||
| 2020 | 119,414.01 |
|
||||
| 2021 | 164,976.00 |
|
||||
| 2022 | 107,000.00 |
|
||||
| 2023 | 141,250.00 |
|
||||
| 2024 | 126,371.90 |
|
||||
| __Total__ | __1,020,952.54__ |
|
||||
|
||||
![Plot of above income table](../images/finances.png)
|
||||
|
||||
As you can spot, it varies quite a bit. In some years we have fewer money
|
||||
available than in other years.
|
||||
|
||||
## Expenses
|
||||
|
||||
As mentioned, the non-profit company [Änderwerk](https://aenderwerk.de) running
|
||||
the bookkeeping and legal stuff (invoices, tax statements, contracts, etc.) gets
|
||||
a cut on each income we produce. They are doing amazing work and are very
|
||||
quick responding to our queries.
|
||||
|
||||
We spend most of our income on salary. Some money we spend on travel. We also
|
||||
pay monthly for our server (plus some extra for hardware, and in June 2024 a
|
||||
huge amount for trying to recover data from failed SSDs).
|
||||
|
||||
## Conclusion
|
||||
|
||||
We have provided an overview of our income, we were three to five people working
|
||||
at robur over the entire time. As written at the beginning, we use needs-based
|
||||
payment. Our experience with this is great! It provides a lot of trust into each
|
||||
other.
|
||||
|
||||
Our funding is diverse from multiple sources - donations, commercial work,
|
||||
public funding. This was our initial goal, and we're very happy that it works
|
||||
fine over the last five years.
|
||||
|
||||
Taking the numbers into account, we are not paying ourselves "industry standard"
|
||||
rates - but we really love what we do - and sometimes we just take some time off.
|
||||
We do work on various projects that we really really enjoy - but where (at the
|
||||
moment) no funding is available for.
|
||||
|
||||
We are always happy to discuss how our collective operates. If you're
|
||||
interested, please drop us a message.
|
||||
|
||||
Of course, if we receive donations, we use them wisely - mainly for working on
|
||||
the currently not funded projects (bob, albatross, miou, mollymawk - to name a few). If you
|
||||
can spare a dime or two, don't hesitate to [donate](https://robur.coop/Donate).
|
||||
Donations are tax-deductable in Germany (and should be in Europe) since we're a
|
||||
registered non-profit.
|
||||
|
||||
If you're interested in MirageOS and using it in your domain, don't hesitate
|
||||
to reach out to us (via eMail: team@robur.coop) so we can start to chat - we're keen to deploy MirageOS
|
||||
and find more domains where it is useful.
|
110
articles/gptar-update.md
Normal file
110
articles/gptar-update.md
Normal file
|
@ -0,0 +1,110 @@
|
|||
---
|
||||
title: GPTar (update)
|
||||
date: 2024-10-28
|
||||
description: libarchive vs hybrid GUID partition table and GNU tar volume header
|
||||
tags:
|
||||
- OCaml
|
||||
- gpt
|
||||
- tar
|
||||
- mbr
|
||||
- persistent storage
|
||||
author:
|
||||
name: Reynir Björnsson
|
||||
email: reynir@reynir.dk
|
||||
link: https://reyn.ir/
|
||||
---
|
||||
|
||||
In a [previous post][gptar-post] I describe how I craft a hybrid GUID partition table (GPT) and tar archive by exploiting that there are disjoint areas of a 512 byte *block* that are important to tar headers and *protective* master boot records used in GPT respectively.
|
||||
I recommend reading it first if you haven't already for context.
|
||||
|
||||
After writing the above post I read an excellent and fun *and totally normal* article by Emily on how [she created **executable** tar archives][tar-executable].
|
||||
Therein I learned a clever hack:
|
||||
GNU tar has a tar extension for *volume headers*.
|
||||
These are essentially labels for your tape archives when you're forced to split an archive across multiple tapes.
|
||||
They can (seemingly) hold any text as label including shell scripts.
|
||||
What's more is GNU tar and bsdtar **does not** extract these as files!
|
||||
This is excellent, because I don't actually want to extract or list the GPT header when using GNU tar or bsdtar.
|
||||
This prompted me to [use a different link indicator](https://github.com/reynir/gptar/pull/1).
|
||||
|
||||
This worked pretty great.
|
||||
Listing the archive using GNU tar I still get `GPTAR`, but with verbose listing it's displayed as a `--Volume Header--`:
|
||||
|
||||
```shell
|
||||
$ tar -tvf disk.img
|
||||
Vr-------- 0/0 16896 1970-01-01 01:00 GPTAR--Volume Header--
|
||||
-rw-r--r-- 0/0 14 1970-01-01 01:00 test.txt
|
||||
```
|
||||
|
||||
And more importantly the `GPTAR` entry is ignored when extracting:
|
||||
|
||||
```shell
|
||||
$ mkdir tmp
|
||||
$ cd tmp/
|
||||
$ tar -xf ../disk.img
|
||||
$ ls
|
||||
test.txt
|
||||
```
|
||||
|
||||
## BSD tar / libarchive
|
||||
|
||||
Unfortunately, this broke bsdtar!
|
||||
|
||||
```shell
|
||||
$ bsdtar -tf disk.img
|
||||
bsdtar: Damaged tar archive
|
||||
bsdtar: Error exit delayed from previous errors.
|
||||
```
|
||||
|
||||
This is annoying because we run FreeBSD on the host for [opam.robur.coop](https://opam.robur.coop), our instance of [opam-mirror][opam-mirror].
|
||||
This Autumn we updated [opam-mirror][opam-mirror] to use the hybrid GPT+tar GPTar *tartition table*[^tartition] instead of hard coded or boot parameter specified disk offsets for the different partitions - which was extremely brittle!
|
||||
So we were no longer able to inspect the contents of the tar partition from the host!
|
||||
Unacceptable!
|
||||
So I started to dig into libarchive where bsdtar comes from.
|
||||
To my surprise, after building bsdtar from the git clone of the source code it ran perfectly fine!
|
||||
|
||||
```shell
|
||||
$ ./bsdtar -tf ../gptar/disk.img
|
||||
test.txt
|
||||
```
|
||||
|
||||
I eventually figure out [this change][libarchive-pr] fixed it for me.
|
||||
I got in touch with Emily to let her know that bsdtar recently fixed this (ab)use of GNU volume headers.
|
||||
Her reply was basically "as of when I wrote the article, I was pretty sure bsdtar ignored it."
|
||||
And indeed it did.
|
||||
Examining the diff further revealed that it ignored the GNU volume header - just not "correctly" when the GNU volume header was abused to carry file content as I did:
|
||||
|
||||
```diff
|
||||
/*
|
||||
* Interpret 'V' GNU tar volume header.
|
||||
*/
|
||||
static int
|
||||
header_volume(struct archive_read *a, struct tar *tar,
|
||||
struct archive_entry *entry, const void *h, size_t *unconsumed)
|
||||
{
|
||||
- (void)h;
|
||||
+ const struct archive_entry_header_ustar *header;
|
||||
+ int64_t size, to_consume;
|
||||
+
|
||||
+ (void)a; /* UNUSED */
|
||||
+ (void)tar; /* UNUSED */
|
||||
+ (void)entry; /* UNUSED */
|
||||
|
||||
- /* Just skip this and read the next header. */
|
||||
- return (tar_read_header(a, tar, entry, unconsumed));
|
||||
+ header = (const struct archive_entry_header_ustar *)h;
|
||||
+ size = tar_atol(header->size, sizeof(header->size));
|
||||
+ to_consume = ((size + 511) & ~511);
|
||||
+ *unconsumed += to_consume;
|
||||
+ return (ARCHIVE_OK);
|
||||
}
|
||||
```
|
||||
|
||||
So thanks to the above change we can expect a release of libarchive supporting further flavors of abuse of GNU volume headers!
|
||||
🥳
|
||||
|
||||
[gptar-post]: gptar.html
|
||||
[tar-executable]: https://uni.horse/executable-tarballs.html
|
||||
[opam-mirror]: https://git.robur.coop/robur/opam-mirror/
|
||||
[libarchive-pr]: https://github.com/libarchive/libarchive/pull/2127
|
||||
|
||||
[^tartition]: Emily came up with the much better term "tartition table" than what I had come up with - "GPTar".
|
|
@ -1,148 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogGPTar
|
||||
</title>
|
||||
<meta name="description" content="Hybrid GUID partition table and tar archive">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>GPTar</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-gpt">gpt</a></li><li><a href="https://blog.robur.coop/tags.html#tag-tar">tar</a></li><li><a href="https://blog.robur.coop/tags.html#tag-mbr">mbr</a></li><li><a href="https://blog.robur.coop/tags.html#tag-persistent storage">persistent storage</a></li></ul><p>At <a href="https://robur.coop/">Robur</a> we developed a piece of software for mirroring or exposing an <a href="https://opam.ocaml.org/">opam</a> repository.
|
||||
We have it deployed at <a href="https://opam.robur.coop/">opam.robur.coop</a>, and you can use it as an alternative to opam.ocaml.org.
|
||||
It is usually more up-to-date with the git <a href="https://github.com/ocaml/opam-repository">opam-repository</a> than opam.ocaml.org although in the past it suffered from <a href="https://blog.osau.re/articles/lwt_pause.html">occasional availability issues</a>.
|
||||
I can recommend reading Hannes' post about <a href="https://hannes.robur.coop/Posts/OpamMirror">opam-mirror</a>.
|
||||
This article is about adding a partition table to the disk <a href="https://hannes.robur.coop/Posts/OpamMirror#code-development-and-improvements">as used by opam-mirror</a>.
|
||||
For background I can recommend reading the previously linked subsection of the opam-mirror article.</p>
|
||||
<h2 id="the-opam-mirror-persistent-storage-scheme"><a class="anchor" aria-hidden="true" href="#the-opam-mirror-persistent-storage-scheme"></a>The opam-mirror persistent storage scheme</h2>
|
||||
<p>Opam-mirror uses a single block device for its persistent storage.
|
||||
On the block device it stores cached source code archives from the opam repository.
|
||||
These are stored in a <a href="https://en.wikipedia.org/wiki/Tar_(computing)">tar archive </a> consisting of files whose file name is the sha256 checksum of the file contents.
|
||||
Furthermore, at the end of the block device some space is allocated for dumping the cloned git state of the upstream (git) opam repository as well as caches storing maps from md5 and sha512 checksums respectively to the sha256 checksums.
|
||||
The partitioning scheme is entirely decided by command line arguments.
|
||||
In other words, there is no partition table on the disk image.</p>
|
||||
<p>This scheme has the nice property that the contents of the tar archive can be inspected by regular tar utilities in the host system.
|
||||
Due to the append-only nature of tar and in the presence of concurrent downloads a file written to the archive may be partial or corrupt.
|
||||
Opam-mirror handles this by prepending a <code>pending/</code> directory to partial downloads and <code>to-delete/</code> directory for corrupt downloads.
|
||||
If there are no files after the failed download in the tar archive the file can be removed without any issues.
|
||||
Otherwise a delete would involve moving all subsequent files further back in the archive - which is too error prone to do robustly.
|
||||
So using the tar utilities in the host we can inspect how much garbage has accumulated in the tar file system.</p>
|
||||
<p>The big downside to this scheme is that since the disk partitioning is not stored on the disk the contents can easily become corrupt if the wrong offsets are passed on the command line.
|
||||
Therefore I have for a long time been wanting to use an on-disk partition table.
|
||||
The problem is both <a href="https://en.wikipedia.org/wiki/MBR_partition_table">MBR</a> and <a href="https://en.m.wikipedia.org/wiki/GUID_Partition_Table">GPT</a> (GUID Partition Table) store the table at the beginning of the disk.
|
||||
If we write a partition table at the beginning it is suddenly not a valid tar archive anymore.
|
||||
Of course, in Mirage we can just write and read the table at the end if we please, but then we lose the ability to inspect the partition table in the host system.</p>
|
||||
<h2 id="gpt-header-as-tar-file-name"><a class="anchor" aria-hidden="true" href="#gpt-header-as-tar-file-name"></a>GPT header as tar file name</h2>
|
||||
<p>My first approach, which turned out to be a dead end, was when I realized that a GPT header consists of 92 bytes at the beginning followed by reserved space for the remainder of the LBA.
|
||||
The reserved space should be all zeroes, but it seems no one bothers to enforce this.
|
||||
What's more is that a tar header starts with the file name in the first 100 bytes.
|
||||
This got me thinking we could embed a GPT header inside a tar header by using the GPT header as the tar header file name!</p>
|
||||
<p>I started working on implementing this, but I quickly realized that 1) the tar header has a checksum, and 2) the gpt header has a checksum as well.
|
||||
Having two checksums that cover each other is tricky.
|
||||
Updating one checksum affects the other checksum.
|
||||
So I started reading a paper written by Martin Stigge et al. about <a href="https://sar.informatik.hu-berlin.de/research/publications/SAR-PR-2006-05/SAR-PR-2006-05_.pdf">reversing CRC</a> as the GPT header use CRC32 checksum.
|
||||
I ended up writing <a href="https://github.com/reynir/gptar/commit/e8269c4959e98aa0cf339cf250ff4e6671fd344c">something</a> that I knew was incorrect.</p>
|
||||
<p>Next, I realized the GPT header's checksum only covers the first 92 bytes - that is, the reserved space is not checksummed!
|
||||
I find this and the fact that the reserved space should be all zeroes but no one checks odd about GPT.
|
||||
This simplified things a lot as we don't have to reverse any checksums!
|
||||
Then I implemented a test binary that produces a half megabyte disk image with a hybrid GPT and tar header followed by a tar archive with a file <code>test.txt</code> whose content is <code>Hello, World!</code>.
|
||||
I had used the byte <code>G</code> as the link indicator.
|
||||
In POSIX.1-1988 the link indicators <code>A</code>-<code>Z</code> are reserved for vendor specific extensions, and it seemed <code>G</code> was unused.
|
||||
A mistake I made was to not update the tar header checksum - the <a href="https://github.com/mirage/ocaml-tar">ocaml-tar</a> library doesn't support this link indicator value so I had manually updated the byte value in the serialized header but forgot to update the checksum.
|
||||
This was easily remediated as the checksum is a simple sum of the bytes in the header.
|
||||
The changes made are viewable on <a href="https://github.com/reynir/gptar/compare/e8269c4959e98aa0cf339cf250ff4e6671fd344c...03a57a1cef17eeb66b0d883dbe441c9c4b9093bd">GitHub</a>.
|
||||
I also had to work around a <a href="https://github.com/mirage/ocaml-tar/issues/144">bug in ocaml-tar</a>.
|
||||
GNU tar was successfully able to list the archive.
|
||||
A quirk is that the archive will start with a dummy file <code>GPTAR</code> which consists of any remaining space in the first LBA if the sector size is greater than 512 bytes followed by the partition table.</p>
|
||||
<h2 id="protective-mbr"><a class="anchor" aria-hidden="true" href="#protective-mbr"></a>Protective MBR</h2>
|
||||
<p>Unfortunately, neither fdisk nor parted recognized the GPT partition table.
|
||||
I was able to successfully read the partition table using <a href="https://github.com/mirage/ocaml-gpt">ocaml-gpt</a> however.
|
||||
This puzzled me.
|
||||
Then I got a hunch: I had read about <a href="https://en.m.wikipedia.org/wiki/GUID_Partition_Table#Protective_MBR_(LBA_0)">protective MBRs</a> on the Wikipedia page on GPT.
|
||||
I had always thought it was optional and not needed in a new system such as Mirage that doesn't have to care too much about legacy code and operating systems.</p>
|
||||
<p>So I started comparing the layout of MBR and tar.
|
||||
The V7 tar format only uses the first 257 bytes of the 512 byte block.
|
||||
The V7 format is differentiated by the UStar, POSIX/pax and old GNU tar formats by not having the string <code>ustar</code> at byte offset 257<sup><a href="#fn-tar-ustar" id="ref-1-fn-tar-ustar" role="doc-noteref" class="fn-label">[1]</a></sup>.
|
||||
The master boot record format starts with the bootstrap code area.
|
||||
In the classic format it is the first 446 bytes.
|
||||
In the modern standard MBR format the first 446 bytes are mostly bootstrap code too with the exception of a handful bytes at offset 218 or so which are used for a timestamp or so.
|
||||
This section overlaps with the tar V7 linked file name field.
|
||||
In both formats these bytes can be zero without any issues, thankfully.</p>
|
||||
<p>This is great!
|
||||
This means we can put a tar header in the bootstrap code area of the MBR and have it be a valid tar header <em>and</em> MBR record at the same time.
|
||||
The protective MBR has one partition of type <code>0xEE</code> whose LBA starts at sector 1 and the number of LBAs should cover the whole disk, or be <code>0xFFFFFFFF</code> (maximum representable number in unsigned 32 bit).
|
||||
In practice this means we can get away with only touching byte offsets 446-453 and 510-511 for the protective MBR.
|
||||
The MBR does not have a checksum which also makes things easier.
|
||||
Using this I could create a disk image that parted and fdisk recognized as a GPT partitioned disk!
|
||||
With the caveat that they both reported that the backup GPT header was corrupt.
|
||||
I had just copied the primary GPT header to the end of the disk.
|
||||
It turns out that the alternate, or backup, GPT header should have the current LBA and backup LBA fields swapped (and the header crc32 recomputed).
|
||||
I updated the ocaml-gpt code so that it can <a href="https://github.com/mirage/ocaml-gpt/pull/14">marshal alternate GPT headers</a>.</p>
|
||||
<p>Finally we can produce GPT partitioned disks that can be inspected with tar utilities!</p>
|
||||
<pre><code>$ /usr/sbin/parted disk.img print
|
||||
WARNING: You are not superuser. Watch out for permissions.
|
||||
Model: (file)
|
||||
Disk /home/reynir/workspace/gptar/disk.img: 524kB
|
||||
Sector size (logical/physical): 512B/512B
|
||||
Partition Table: gpt
|
||||
Disk Flags: pmbr_boot
|
||||
|
||||
Number Start End Size File system Name Flags
|
||||
1 17.4kB 19.5kB 2048B Real tar archive hidden
|
||||
|
||||
</code></pre>
|
||||
<pre><code>$ tar -tvf disk.img
|
||||
?r-------- 0/0 16896 1970-01-01 01:00 GPTAR unknown file type ‘G’
|
||||
-r-------- 0/0 14 1970-01-01 01:00 test.txt
|
||||
</code></pre>
|
||||
<p>The <a href="https://github.com/reynir/gptar">code</a> is freely available on GitHub.</p>
|
||||
<h2 id="future-work"><a class="anchor" aria-hidden="true" href="#future-work"></a>Future work</h2>
|
||||
<p>One thing that bothers me a bit is the dummy file <code>GPTAR</code>.
|
||||
By using the <code>G</code> link indicator GNU tar will print a warning about the unknown file type <code>G</code>,
|
||||
but it will still extract the dummy file when extracting the archive.
|
||||
I have been thinking about what tar header I could put in the MBR so tar utilities will skip the partition table but not try to extract the dummy file.
|
||||
Ideas I've had is to:</p>
|
||||
<ul>
|
||||
<li>Pretend it is a directory with a non-zero file size which is nonsense.
|
||||
I'm unsure what tar utilities would do in that case.
|
||||
I fear not all implementations will skip to the next header correctly as a non-zero directory is nonsense.
|
||||
I may give it a try and check how GNU tar, FreeBSD tar and ocaml-tar react.</li>
|
||||
<li>Say it is a PAX extended header and use a nonsense tag or attribute whose value covers the GPT header and partition table.
|
||||
The problem is the PAX extended header content format is <code><length> <tag>=<value>\n</code> where <code><length></code> is the decimal string encoding of the length of <code><tag>=<value>\n</code>.
|
||||
In other words it must start with the correct length.
|
||||
For sector size 512 this is a problem because the PAX extended header content would start with the GPT header which starts with the string <code>EFI PART</code>.
|
||||
If the sector size is greater than 512 we can use the remaining space in LBA 0 to write a length, dummy tag and some padding.
|
||||
I may try this for a sector size of 4096, but I'm not happy that it doesn't work with sector size 512 which solo5 will default to.</li>
|
||||
</ul>
|
||||
<p>If you have other ideas what I can do please reach out!</p>
|
||||
<section role="doc-endnotes"><ol>
|
||||
<li id="fn-tar-ustar">
|
||||
<p>This is somewhat simplified. There are some more nuances between the different formats, but for this purpose they don't matter much.</p>
|
||||
<span><a href="#ref-1-fn-tar-ustar" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
159
articles/gptar.md
Normal file
159
articles/gptar.md
Normal file
|
@ -0,0 +1,159 @@
|
|||
---
|
||||
date: 2024-02-21
|
||||
title: GPTar
|
||||
description: Hybrid GUID partition table and tar archive
|
||||
tags:
|
||||
- OCaml
|
||||
- gpt
|
||||
- tar
|
||||
- mbr
|
||||
- persistent storage
|
||||
author:
|
||||
name: Reynir Björnsson
|
||||
email: reynir@reynir.dk
|
||||
link: https://reyn.ir/
|
||||
---
|
||||
|
||||
At [Robur][robur] we developed a piece of software for mirroring or exposing an [opam][opam] repository.
|
||||
We have it deployed at [opam.robur.coop](https://opam.robur.coop/), and you can use it as an alternative to opam.ocaml.org.
|
||||
It is usually more up-to-date with the git [opam-repository][opam-repository] than opam.ocaml.org although in the past it suffered from [occasional availability issues][lwtpause].
|
||||
I can recommend reading Hannes' post about [opam-mirror][opam-mirror].
|
||||
This article is about adding a partition table to the disk [as used by opam-mirror][opam-mirror-tar].
|
||||
For background I can recommend reading the previously linked subsection of the opam-mirror article.
|
||||
|
||||
## The opam-mirror persistent storage scheme
|
||||
|
||||
Opam-mirror uses a single block device for its persistent storage.
|
||||
On the block device it stores cached source code archives from the opam repository.
|
||||
These are stored in a [tar archive ][wiki-tar] consisting of files whose file name is the sha256 checksum of the file contents.
|
||||
Furthermore, at the end of the block device some space is allocated for dumping the cloned git state of the upstream (git) opam repository as well as caches storing maps from md5 and sha512 checksums respectively to the sha256 checksums.
|
||||
The partitioning scheme is entirely decided by command line arguments.
|
||||
In other words, there is no partition table on the disk image.
|
||||
|
||||
This scheme has the nice property that the contents of the tar archive can be inspected by regular tar utilities in the host system.
|
||||
Due to the append-only nature of tar and in the presence of concurrent downloads a file written to the archive may be partial or corrupt.
|
||||
Opam-mirror handles this by prepending a `pending/` directory to partial downloads and `to-delete/` directory for corrupt downloads.
|
||||
If there are no files after the failed download in the tar archive the file can be removed without any issues.
|
||||
Otherwise a delete would involve moving all subsequent files further back in the archive - which is too error prone to do robustly.
|
||||
So using the tar utilities in the host we can inspect how much garbage has accumulated in the tar file system.
|
||||
|
||||
The big downside to this scheme is that since the disk partitioning is not stored on the disk the contents can easily become corrupt if the wrong offsets are passed on the command line.
|
||||
Therefore I have for a long time been wanting to use an on-disk partition table.
|
||||
The problem is both [MBR][wiki-mbr] and [GPT][wiki-gpt] (GUID Partition Table) store the table at the beginning of the disk.
|
||||
If we write a partition table at the beginning it is suddenly not a valid tar archive anymore.
|
||||
Of course, in Mirage we can just write and read the table at the end if we please, but then we lose the ability to inspect the partition table in the host system.
|
||||
|
||||
## GPT header as tar file name
|
||||
|
||||
My first approach, which turned out to be a dead end, was when I realized that a GPT header consists of 92 bytes at the beginning followed by reserved space for the remainder of the LBA.
|
||||
The reserved space should be all zeroes, but it seems no one bothers to enforce this.
|
||||
What's more is that a tar header starts with the file name in the first 100 bytes.
|
||||
This got me thinking we could embed a GPT header inside a tar header by using the GPT header as the tar header file name!
|
||||
|
||||
I started working on implementing this, but I quickly realized that 1) the tar header has a checksum, and 2) the gpt header has a checksum as well.
|
||||
Having two checksums that cover each other is tricky.
|
||||
Updating one checksum affects the other checksum.
|
||||
So I started reading a paper written by Martin Stigge et al. about [reversing CRC][reversing-crc] as the GPT header use CRC32 checksum.
|
||||
I ended up writing [something](https://github.com/reynir/gptar/commit/e8269c4959e98aa0cf339cf250ff4e6671fd344c) that I knew was incorrect.
|
||||
|
||||
Next, I realized the GPT header's checksum only covers the first 92 bytes - that is, the reserved space is not checksummed!
|
||||
I find this and the fact that the reserved space should be all zeroes but no one checks odd about GPT.
|
||||
This simplified things a lot as we don't have to reverse any checksums!
|
||||
Then I implemented a test binary that produces a half megabyte disk image with a hybrid GPT and tar header followed by a tar archive with a file `test.txt` whose content is `Hello, World!`.
|
||||
I had used the byte `G` as the link indicator.
|
||||
In POSIX.1-1988 the link indicators `A`-`Z` are reserved for vendor specific extensions, and it seemed `G` was unused.
|
||||
A mistake I made was to not update the tar header checksum - the [ocaml-tar][ocaml-tar] library doesn't support this link indicator value so I had manually updated the byte value in the serialized header but forgot to update the checksum.
|
||||
This was easily remediated as the checksum is a simple sum of the bytes in the header.
|
||||
The changes made are viewable on [GitHub](https://github.com/reynir/gptar/compare/e8269c4959e98aa0cf339cf250ff4e6671fd344c...03a57a1cef17eeb66b0d883dbe441c9c4b9093bd).
|
||||
I also had to work around a [bug in ocaml-tar](https://github.com/mirage/ocaml-tar/issues/144).
|
||||
GNU tar was successfully able to list the archive.
|
||||
A quirk is that the archive will start with a dummy file `GPTAR` which consists of any remaining space in the first LBA if the sector size is greater than 512 bytes followed by the partition table.
|
||||
|
||||
## Protective MBR
|
||||
|
||||
Unfortunately, neither fdisk nor parted recognized the GPT partition table.
|
||||
I was able to successfully read the partition table using [ocaml-gpt][ocaml-gpt] however.
|
||||
This puzzled me.
|
||||
Then I got a hunch: I had read about [protective MBRs][wiki-protective-mbr] on the Wikipedia page on GPT.
|
||||
I had always thought it was optional and not needed in a new system such as Mirage that doesn't have to care too much about legacy code and operating systems.
|
||||
|
||||
So I started comparing the layout of MBR and tar.
|
||||
The V7 tar format only uses the first 257 bytes of the 512 byte block.
|
||||
The V7 format is differentiated by the UStar, POSIX/pax and old GNU tar formats by not having the string `ustar` at byte offset 257[^tar-ustar].
|
||||
The master boot record format starts with the bootstrap code area.
|
||||
In the classic format it is the first 446 bytes.
|
||||
In the modern standard MBR format the first 446 bytes are mostly bootstrap code too with the exception of a handful bytes at offset 218 or so which are used for a timestamp or so.
|
||||
This section overlaps with the tar V7 linked file name field.
|
||||
In both formats these bytes can be zero without any issues, thankfully.
|
||||
|
||||
This is great!
|
||||
This means we can put a tar header in the bootstrap code area of the MBR and have it be a valid tar header *and* MBR record at the same time.
|
||||
The protective MBR has one partition of type `0xEE` whose LBA starts at sector 1 and the number of LBAs should cover the whole disk, or be `0xFFFFFFFF` (maximum representable number in unsigned 32 bit).
|
||||
In practice this means we can get away with only touching byte offsets 446-453 and 510-511 for the protective MBR.
|
||||
The MBR does not have a checksum which also makes things easier.
|
||||
Using this I could create a disk image that parted and fdisk recognized as a GPT partitioned disk!
|
||||
With the caveat that they both reported that the backup GPT header was corrupt.
|
||||
I had just copied the primary GPT header to the end of the disk.
|
||||
It turns out that the alternate, or backup, GPT header should have the current LBA and backup LBA fields swapped (and the header crc32 recomputed).
|
||||
I updated the ocaml-gpt code so that it can [marshal alternate GPT headers](https://github.com/mirage/ocaml-gpt/pull/14).
|
||||
|
||||
Finally we can produce GPT partitioned disks that can be inspected with tar utilities!
|
||||
|
||||
```
|
||||
$ /usr/sbin/parted disk.img print
|
||||
WARNING: You are not superuser. Watch out for permissions.
|
||||
Model: (file)
|
||||
Disk /home/reynir/workspace/gptar/disk.img: 524kB
|
||||
Sector size (logical/physical): 512B/512B
|
||||
Partition Table: gpt
|
||||
Disk Flags: pmbr_boot
|
||||
|
||||
Number Start End Size File system Name Flags
|
||||
1 17.4kB 19.5kB 2048B Real tar archive hidden
|
||||
|
||||
```
|
||||
|
||||
```
|
||||
$ tar -tvf disk.img
|
||||
?r-------- 0/0 16896 1970-01-01 01:00 GPTAR unknown file type ‘G’
|
||||
-r-------- 0/0 14 1970-01-01 01:00 test.txt
|
||||
```
|
||||
The [code](https://github.com/reynir/gptar) is freely available on GitHub.
|
||||
|
||||
## Future work
|
||||
|
||||
One thing that bothers me a bit is the dummy file `GPTAR`.
|
||||
By using the `G` link indicator GNU tar will print a warning about the unknown file type `G`,
|
||||
but it will still extract the dummy file when extracting the archive.
|
||||
I have been thinking about what tar header I could put in the MBR so tar utilities will skip the partition table but not try to extract the dummy file.
|
||||
Ideas I've had is to:
|
||||
|
||||
- Pretend it is a directory with a non-zero file size which is nonsense.
|
||||
I'm unsure what tar utilities would do in that case.
|
||||
I fear not all implementations will skip to the next header correctly as a non-zero directory is nonsense.
|
||||
I may give it a try and check how GNU tar, FreeBSD tar and ocaml-tar react.
|
||||
- Say it is a PAX extended header and use a nonsense tag or attribute whose value covers the GPT header and partition table.
|
||||
The problem is the PAX extended header content format is `<length> <tag>=<value>\n` where `<length>` is the decimal string encoding of the length of `<tag>=<value>\n`.
|
||||
In other words it must start with the correct length.
|
||||
For sector size 512 this is a problem because the PAX extended header content would start with the GPT header which starts with the string `EFI PART`.
|
||||
If the sector size is greater than 512 we can use the remaining space in LBA 0 to write a length, dummy tag and some padding.
|
||||
I may try this for a sector size of 4096, but I'm not happy that it doesn't work with sector size 512 which solo5 will default to.
|
||||
|
||||
If you have other ideas what I can do please reach out!
|
||||
|
||||
|
||||
[robur]: https://robur.coop/
|
||||
[opam]: https://opam.ocaml.org/
|
||||
[opam-mirror]: https://hannes.robur.coop/Posts/OpamMirror
|
||||
[opam-repository]: https://github.com/ocaml/opam-repository
|
||||
[lwtpause]: https://blog.osau.re/articles/lwt_pause.html
|
||||
[opam-mirror-tar]: https://hannes.robur.coop/Posts/OpamMirror#code-development-and-improvements
|
||||
[wiki-mbr]: https://en.wikipedia.org/wiki/MBR_partition_table
|
||||
[wiki-gpt]: https://en.m.wikipedia.org/wiki/GUID_Partition_Table
|
||||
[wiki-tar]: https://en.wikipedia.org/wiki/Tar_(computing)
|
||||
[reversing-crc]: https://sar.informatik.hu-berlin.de/research/publications/SAR-PR-2006-05/SAR-PR-2006-05_.pdf
|
||||
[ocaml-tar]: https://github.com/mirage/ocaml-tar
|
||||
[ocaml-gpt]: https://github.com/mirage/ocaml-gpt
|
||||
[wiki-protective-mbr]: https://en.m.wikipedia.org/wiki/GUID_Partition_Table#Protective_MBR_(LBA_0)
|
||||
|
||||
[^tar-ustar]: This is somewhat simplified. There are some more nuances between the different formats, but for this purpose they don't matter much.
|
|
@ -1,263 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogCooperation and Lwt.pause
|
||||
</title>
|
||||
<meta name="description" content="A disgression about Lwt and Miou">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>Cooperation and Lwt.pause</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Scheduler">Scheduler</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Community">Community</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Unikernel">Unikernel</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Git">Git</a></li></ul><p>Here's a concrete example of the notion of availability and the scheduler used
|
||||
(in this case Lwt). As you may know, at Robur we have developed a unikernel:
|
||||
<a href="https://git.robur.coop/robur/opam-mirror">opam-mirror</a>. It launches an HTTP service that can be used as an
|
||||
OPAM overlay available from a Git repository (with <code>opam repository add <name> <url></code>).</p>
|
||||
<p>The purpose of such an unikernel was to respond to a failure of the official
|
||||
repository which fortunately did not last long and to offer decentralisation
|
||||
of such a service. You can use https://opam.robur.coop!</p>
|
||||
<p>It was also useful at the Mirage retreat, where we don't usually have a
|
||||
great internet connection. Caching packages for our OCaml users on the local
|
||||
network has benefited us in terms of our Internet bill by allowing the OCaml
|
||||
users to fetch opam packages over the local network instead of over the shared,
|
||||
metered 4G Internet conncetion.</p>
|
||||
<p>Finally, it's a unikernel that I also use on my server for my software
|
||||
<a href="https://blog.osau.re/articles/reproducible.html">reproducibility service</a> in order to have an overlay for my
|
||||
software like <a href="https://bob.osau.re/">Bob</a>.</p>
|
||||
<p>In short, I advise you to use it, you can see its installation
|
||||
<a href="https://blog.osau.re/articles/reproducible.html">here</a> (I think that in the context of a company, internally, it
|
||||
can be interesting to have such a unikernel available).</p>
|
||||
<p>However, this unikernel had a long-standing problem. We were already talking
|
||||
about it at the Mirleft retreat, when we tried to get the repository from Git,
|
||||
we had a (fairly long) unavailability of our HTTP server. Basically, we had to
|
||||
wait ~10 min before the service offered by the unikernel was available.</p>
|
||||
<h2 id="availability"><a class="anchor" aria-hidden="true" href="#availability"></a>Availability</h2>
|
||||
<p>If you follow my <a href="https://blog.osau.re/tags/scheduler.html">articles</a>, as far as Miou is concerned, from
|
||||
the outset I talk of the notion of availability if we were to make yet another
|
||||
new scheduler for OCaml 5. We emphasised this notion because we had quite a few
|
||||
problems on this subject and Lwt.</p>
|
||||
<p>In this case, the notion of availability requires the scheduler to be able to
|
||||
observe system events as often as possible. The problem is that Lwt doesn't
|
||||
really offer this approach.</p>
|
||||
<p>Indeed, Lwt offers a way of observing system events (<code>Lwt.pause</code>) but does not
|
||||
do so systematically. The only time you really give the scheduler the
|
||||
opportunity to see whether you can read or write is when you want to...
|
||||
read or write...</p>
|
||||
<p>More generally, it is said that Lwt's <strong>bind</strong> does not <em>yield</em>. In other words,
|
||||
you can chain any number of functions together (via the <code>>>=</code> operator), but
|
||||
from Lwt's point of view, there is no opportunity to see if an event has
|
||||
occurred. Lwt always tries to go as far down your chain as possible:</p>
|
||||
<ul>
|
||||
<li>and finish your promise</li>
|
||||
<li>or come across an operation that requires a system event (read or write)</li>
|
||||
<li>or come across an <code>Lwt.pause</code> (as a <em>yield</em> point)</li>
|
||||
</ul>
|
||||
<p>Lwt is rather sparse in adding cooperation points besides <code>Lwt.pause</code> and
|
||||
read/write operations, in contrast with Async where the bind operator is a
|
||||
cooperation point.</p>
|
||||
<h3 id="if-there-is-no-io-do-not-wrap-in-lwt"><a class="anchor" aria-hidden="true" href="#if-there-is-no-io-do-not-wrap-in-lwt"></a>If there is no I/O, do not wrap in Lwt</h3>
|
||||
<p>It was (bad<sup><a href="#fn1">1</a></sup>) advice I was given. If a function doesn't do
|
||||
I/O, there's no point in putting it in Lwt. At first glance, however, the idea
|
||||
may be a good one. If you have a function that doesn't do I/O, whether it's in
|
||||
the Lwt monad or not won't make any difference to the way Lwt tries to execute
|
||||
it. Once again, Lwt should go as far as possible. So Lwt tries to solve both
|
||||
functions in the same way:</p>
|
||||
<pre><code class="language-ocaml">val merge : int array -> int array -> int array
|
||||
|
||||
let rec sort0 arr =
|
||||
if Array.length arr <= 1 then arr
|
||||
else
|
||||
let m = Array.length arr / 2 in
|
||||
let arr0 = sort0 (Array.sub arr 0 m) in
|
||||
let arr1 = sort0 (Array.sub arr m (Array.length arr - m)) in
|
||||
merge arr0 arr1
|
||||
|
||||
let rec sort1 arr =
|
||||
let open Lwt.Infix in
|
||||
if Array.length arr <= 1 then Lwt.return arr
|
||||
else
|
||||
let m = Array.length arr / 2 in
|
||||
Lwt.both
|
||||
(sort1 (Array.sub arr m (Array.length arr - m)))
|
||||
(sort1 (Array.sub arr 0 m))
|
||||
>|= fun (arr0, arr1) ->
|
||||
merge arr0 arr1
|
||||
</code></pre>
|
||||
<p>If we trace the execution of the two functions (for example, by displaying our
|
||||
<code>arr</code> each time), we see the same behaviour whether Lwt is used or not. However,
|
||||
what is interesting in the Lwt code is the use of <code>both</code>, which suggests that
|
||||
the processes are running <em>at the same time</em>.</p>
|
||||
<p>"At the same time" does not necessarily suggest the use of several cores or "in
|
||||
parallel", but the possibility that the right-hand side may also have the
|
||||
opportunity to be executed even if the left-hand side has not finished. In other
|
||||
words, that the two processes can run <strong>concurrently</strong>.</p>
|
||||
<p>But factually, this is not the case, because even if we had the possibility of
|
||||
a point of cooperation (with the <code>>|=</code> operator), Lwt tries to go as far as
|
||||
possible and decides to finish the left part before launching the right part:</p>
|
||||
<pre><code class="language-shell">$ ./a.out
|
||||
sort0: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|]
|
||||
sort0: [|3; 4; 2; 1; 7|]
|
||||
sort0: [|3; 4|]
|
||||
sort0: [|2; 1; 7|]
|
||||
sort0: [|1; 7|]
|
||||
sort0: [|5; 8; 9; 0; 6|]
|
||||
sort0: [|5; 8|]
|
||||
sort0: [|9; 0; 6|]
|
||||
sort0: [|0; 6|]
|
||||
|
||||
sort1: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|]
|
||||
sort1: [|3; 4; 2; 1; 7|]
|
||||
sort1: [|3; 4|]
|
||||
sort1: [|2; 1; 7|]
|
||||
sort1: [|1; 7|]
|
||||
sort1: [|5; 8; 9; 0; 6|]
|
||||
sort1: [|5; 8|]
|
||||
sort1: [|9; 0; 6|]
|
||||
sort1: [|0; 6|]
|
||||
</code></pre>
|
||||
<hr>
|
||||
<p><strong><tag id="fn1">1</tag></strong>: However, if you are not interested in availability
|
||||
and would like the scheduler to try to resolve your promises as quickly as
|
||||
possible, this advice is clearly valid.</p>
|
||||
<h4 id="performances"><a class="anchor" aria-hidden="true" href="#performances"></a>Performances</h4>
|
||||
<p>It should be noted, however, that Lwt has an impact. Even if the behaviour is
|
||||
the same, the Lwt layer is not free. A quick benchmark shows that there is an
|
||||
overhead:</p>
|
||||
<pre><code class="language-ocaml">let _ =
|
||||
let t0 = Unix.gettimeofday () in
|
||||
for i = 0 to 1000 do let _ = sort0 arr in () done;
|
||||
let t1 = Unix.gettimeofday () in
|
||||
Fmt.pr "sort0 %fs\n%!" (t1 -. t0)
|
||||
|
||||
let _ =
|
||||
let t0 = Unix.gettimeofday () in
|
||||
Lwt_main.run @@ begin
|
||||
let open Lwt.Infix in
|
||||
let rec go idx = if idx = 1000 then Lwt.return_unit
|
||||
else sort1 arr >>= fun _ -> go (succ idx) in
|
||||
go 0 end;
|
||||
let t1 = Unix.gettimeofday () in
|
||||
Fmt.pr "sort1 %fs\n%!" (t1 -. t0)
|
||||
</code></pre>
|
||||
<pre><code class="language-sh">$ ./a.out
|
||||
sort0 0.000264s
|
||||
sort1 0.000676s
|
||||
</code></pre>
|
||||
<p>This is the fairly obvious argument for not using Lwt when there's no I/O. Then,
|
||||
if the Lwt monad is really needed, a simple <code>Lwt.return</code> at the very last
|
||||
instance is sufficient (or, better, the use of <code>Lwt.map</code> / <code>>|=</code>).</p>
|
||||
<h4 id="cooperation-and-concrete-example"><a class="anchor" aria-hidden="true" href="#cooperation-and-concrete-example"></a>Cooperation and concrete example</h4>
|
||||
<p>So <code>Lwt.both</code> is the one to use when we want to run two processes
|
||||
"at the same time". For the example, <a href="https://github.com/mirage/ocaml-git">ocaml-git</a> attempts <em>both</em> to
|
||||
retrieve a repository and also to analyse it. This can be seen in this snippet
|
||||
of <a href="https://github.com/mirage/ocaml-git/blob/a36c90404b149ab85f429439af8785bb1dde1bee/src/not-so-smart/smart_git.ml#L476-L481">code</a>.</p>
|
||||
<p>In our example with ocaml-git, the problem "shouldn't" appear because, in this
|
||||
case, both the left and right side do I/O (the left side binds into a socket
|
||||
while the right side saves Git objects in your file system). So, in our tests
|
||||
with <code>Git_unix</code>, we were able to see that the analysis (right-hand side) was
|
||||
well executed and 'interleaved' with the reception of objects from the network.</p>
|
||||
<h3 id="composability"><a class="anchor" aria-hidden="true" href="#composability"></a>Composability</h3>
|
||||
<p>However, if we go back to our initial problem, we were talking about our
|
||||
opam-mirror unikernel. As you might expect, there is no standalone MirageOS file
|
||||
system (and many of our unikernels don't need one). So, in the case of
|
||||
opam-mirror, we use the ocaml-git memory implementation: <code>Git_mem</code>.</p>
|
||||
<p><code>Git_mem</code> is different in that Git objects are simply stored in a <code>Hashtbl</code>.
|
||||
There is no cooperation point when it comes to obtaining Git objects from this
|
||||
<code>Hashtbl</code>. So let's return to our original advice:</p>
|
||||
<blockquote>
|
||||
<p>don't wrap code in Lwt if it doesn't do I/O.</p>
|
||||
</blockquote>
|
||||
<p>And, of course, <code>Git_mem</code> doesn't do I/O. It does, however, require the process
|
||||
to be able to work with Lwt. In this case, <code>Git_mem</code> wraps the results in Lwt
|
||||
<strong>as late as possible</strong> (as explained above, so as not to slow down our
|
||||
processes unnecessarily). The choice inevitably means that the right-hand side
|
||||
can no longer offer cooperation points. And this is where our problem begins:
|
||||
composition.</p>
|
||||
<p>In fact, we had something like:</p>
|
||||
<pre><code class="language-ocaml">let clone socket git =
|
||||
Lwt.both (receive_pack socket) (analyse_pack git) >>= fun ((), ()) ->
|
||||
Lwt.return_unit
|
||||
</code></pre>
|
||||
<p>However, our <code>analyse_pack</code> function is an injection of a functor representing
|
||||
the Git backend. In other words, <code>Git_unix</code> or <code>Git_mem</code>:</p>
|
||||
<pre><code class="language-ocaml">module Make (Git : Git.S) = struct
|
||||
let clone socket git =
|
||||
Lwt.both (receive_pack socket) (Git.analyse_pack git) >>= fun ((), ()) ->
|
||||
Lwt.return_unit
|
||||
end
|
||||
</code></pre>
|
||||
<p>Composability poses a problem here because even if <code>Git_unix</code> and <code>Git_mem</code>
|
||||
offer the same function (so both modules can be used), the fact remains that one
|
||||
will always offer a certain availability to other services (such as an HTTP
|
||||
service) while the other will offer a Lwt function which will try to go as far
|
||||
as possible quite to make other services unavailable.</p>
|
||||
<p>Composing with one or the other therefore does not produce the same behavior.</p>
|
||||
<h4 id="where-to-put-lwtpause"><a class="anchor" aria-hidden="true" href="#where-to-put-lwtpause"></a>Where to put <code>Lwt.pause</code>?</h4>
|
||||
<p>In this case, our <code>analyse_pack</code> does read/write on the Git store. As far as
|
||||
<code>Git_mem</code> is concerned, we said that these read/write accesses were just
|
||||
accesses to a <code>Hashtbl</code>.</p>
|
||||
<p>Thanks to <a href="https://hannes.robur.coop/">Hannes</a>' help, it took us an afternoon to work out where we
|
||||
needed to add cooperation points in <code>Git_mem</code> so that <code>analyse_pack</code> could give
|
||||
another service such as HTTP the opportunity to work. Basically, this series of
|
||||
<a href="https://github.com/mirage/ocaml-git/pull/631/files">commits</a> shows where we needed to add <code>Lwt.pause</code>.</p>
|
||||
<p>However, this points to a number of problems:</p>
|
||||
<ol>
|
||||
<li>it is not necessarily true that on the basis of composability alone (by
|
||||
<em>functor</em> or by value), Lwt reacts in the same way</li>
|
||||
<li>Subtly, you have to dig into the code to find the right opportunities where
|
||||
to put, by hand, <code>Lwt.pause</code>.</li>
|
||||
<li>In the end, Lwt has no mechanisms for ensuring the availability of a service
|
||||
(this is something that must be taken into account by the implementer).</li>
|
||||
</ol>
|
||||
<h3 id="in-depth-knowledge-of-lwt"><a class="anchor" aria-hidden="true" href="#in-depth-knowledge-of-lwt"></a>In-depth knowledge of Lwt</h3>
|
||||
<p>I haven't mentioned another problem we encountered with <a href="https://cambium.inria.fr/~agueneau/">Armael</a> when
|
||||
implementing <a href="https://discuss.ocaml.org/t/ann-release-of-multipart-form-0-2-0/7704#memory-bound-implementation">multipart_form</a> where the use of stream meant that
|
||||
Lwt didn't interleave the two processes and the use of a <em>bounded stream</em> was
|
||||
required. Again, even when it comes to I/O, Lwt always tries to go as far as
|
||||
possible in one of two branches of a <code>Lwt.both</code>.</p>
|
||||
<p>This allows us to conclude that beyond the monad, Lwt has subtleties in its
|
||||
behaviour which may be different from another scheduler such as Async (hence the
|
||||
incompatibility between the two, which is not just of the <code>'a t</code> type).</p>
|
||||
<h3 id="digression-on-miou"><a class="anchor" aria-hidden="true" href="#digression-on-miou"></a>Digression on Miou</h3>
|
||||
<p>That's why we put so much emphasis on the notion of availability when it comes
|
||||
to Miou: to avoid repeating the mistakes of the past. The choices that can be
|
||||
made with regard to this notion in particular have a major impact, and can be
|
||||
unsatisfactory to the user in certain cases (for example, so-called pure
|
||||
calculations could take longer with Miou than with another scheduler).</p>
|
||||
<p>In this sense, we have tried to constrain ourselves in the development of Miou
|
||||
through the use of <code>Effect.Shallow</code> which requires us to always re-attach our
|
||||
handler (our scheduler) as soon as an effect is produced, unlike <code>Effect.Deep</code>
|
||||
which can re-use the same handler for several effects. In other words, and as
|
||||
we've described here, <strong>an effect yields</strong>!</p>
|
||||
<h2 id="conclusion"><a class="anchor" aria-hidden="true" href="#conclusion"></a>Conclusion</h2>
|
||||
<p>As far as opam-mirror is concerned, we now have an unikernel that is available
|
||||
even if it attempts to clone a Git repository and save Git objects in memory. At
|
||||
least, an HTTP service can co-exist with ocaml-git!</p>
|
||||
<p>I hope we'll be able to use it at <a href="https://retreat.mirage.io/">the next retreat</a>, which I invite
|
||||
you to attend to talk more about Lwt, scheduler, Git and unikernels!</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
310
articles/lwt_pause.md
Normal file
310
articles/lwt_pause.md
Normal file
|
@ -0,0 +1,310 @@
|
|||
---
|
||||
date: 2024-02-11
|
||||
title: Cooperation and Lwt.pause
|
||||
description:
|
||||
A disgression about Lwt and Miou
|
||||
tags:
|
||||
- OCaml
|
||||
- Scheduler
|
||||
- Community
|
||||
- Unikernel
|
||||
- Git
|
||||
author:
|
||||
name: Romain Calascibetta
|
||||
email: romain.calascibetta@gmail.com
|
||||
link: https://blog.osau.re/
|
||||
breaks: false
|
||||
---
|
||||
|
||||
Here's a concrete example of the notion of availability and the scheduler used
|
||||
(in this case Lwt). As you may know, at Robur we have developed a unikernel:
|
||||
[opam-mirror][opam-mirror]. It launches an HTTP service that can be used as an
|
||||
OPAM overlay available from a Git repository (with `opam repository add <name>
|
||||
<url>`).
|
||||
|
||||
The purpose of such an unikernel was to respond to a failure of the official
|
||||
repository which fortunately did not last long and to offer decentralisation
|
||||
of such a service. You can use https://opam.robur.coop!
|
||||
|
||||
It was also useful at the Mirage retreat, where we don't usually have a
|
||||
great internet connection. Caching packages for our OCaml users on the local
|
||||
network has benefited us in terms of our Internet bill by allowing the OCaml
|
||||
users to fetch opam packages over the local network instead of over the shared,
|
||||
metered 4G Internet conncetion.
|
||||
|
||||
Finally, it's a unikernel that I also use on my server for my software
|
||||
[reproducibility service][reproducibility] in order to have an overlay for my
|
||||
software like [Bob][bob].
|
||||
|
||||
In short, I advise you to use it, you can see its installation
|
||||
[here][installation] (I think that in the context of a company, internally, it
|
||||
can be interesting to have such a unikernel available).
|
||||
|
||||
However, this unikernel had a long-standing problem. We were already talking
|
||||
about it at the Mirleft retreat, when we tried to get the repository from Git,
|
||||
we had a (fairly long) unavailability of our HTTP server. Basically, we had to
|
||||
wait ~10 min before the service offered by the unikernel was available.
|
||||
|
||||
## Availability
|
||||
|
||||
If you follow my [articles][miou-articles], as far as Miou is concerned, from
|
||||
the outset I talk of the notion of availability if we were to make yet another
|
||||
new scheduler for OCaml 5. We emphasised this notion because we had quite a few
|
||||
problems on this subject and Lwt.
|
||||
|
||||
In this case, the notion of availability requires the scheduler to be able to
|
||||
observe system events as often as possible. The problem is that Lwt doesn't
|
||||
really offer this approach.
|
||||
|
||||
Indeed, Lwt offers a way of observing system events (`Lwt.pause`) but does not
|
||||
do so systematically. The only time you really give the scheduler the
|
||||
opportunity to see whether you can read or write is when you want to...
|
||||
read or write...
|
||||
|
||||
More generally, it is said that Lwt's **bind** does not _yield_. In other words,
|
||||
you can chain any number of functions together (via the `>>=` operator), but
|
||||
from Lwt's point of view, there is no opportunity to see if an event has
|
||||
occurred. Lwt always tries to go as far down your chain as possible:
|
||||
- and finish your promise
|
||||
- or come across an operation that requires a system event (read or write)
|
||||
- or come across an `Lwt.pause` (as a _yield_ point)
|
||||
|
||||
Lwt is rather sparse in adding cooperation points besides `Lwt.pause` and
|
||||
read/write operations, in contrast with Async where the bind operator is a
|
||||
cooperation point.
|
||||
|
||||
### If there is no I/O, do not wrap in Lwt
|
||||
|
||||
It was (bad<sup>[1](#fn1)</sup>) advice I was given. If a function doesn't do
|
||||
I/O, there's no point in putting it in Lwt. At first glance, however, the idea
|
||||
may be a good one. If you have a function that doesn't do I/O, whether it's in
|
||||
the Lwt monad or not won't make any difference to the way Lwt tries to execute
|
||||
it. Once again, Lwt should go as far as possible. So Lwt tries to solve both
|
||||
functions in the same way:
|
||||
|
||||
```ocaml
|
||||
val merge : int array -> int array -> int array
|
||||
|
||||
let rec sort0 arr =
|
||||
if Array.length arr <= 1 then arr
|
||||
else
|
||||
let m = Array.length arr / 2 in
|
||||
let arr0 = sort0 (Array.sub arr 0 m) in
|
||||
let arr1 = sort0 (Array.sub arr m (Array.length arr - m)) in
|
||||
merge arr0 arr1
|
||||
|
||||
let rec sort1 arr =
|
||||
let open Lwt.Infix in
|
||||
if Array.length arr <= 1 then Lwt.return arr
|
||||
else
|
||||
let m = Array.length arr / 2 in
|
||||
Lwt.both
|
||||
(sort1 (Array.sub arr m (Array.length arr - m)))
|
||||
(sort1 (Array.sub arr 0 m))
|
||||
>|= fun (arr0, arr1) ->
|
||||
merge arr0 arr1
|
||||
```
|
||||
|
||||
If we trace the execution of the two functions (for example, by displaying our
|
||||
`arr` each time), we see the same behaviour whether Lwt is used or not. However,
|
||||
what is interesting in the Lwt code is the use of `both`, which suggests that
|
||||
the processes are running _at the same time_.
|
||||
|
||||
"At the same time" does not necessarily suggest the use of several cores or "in
|
||||
parallel", but the possibility that the right-hand side may also have the
|
||||
opportunity to be executed even if the left-hand side has not finished. In other
|
||||
words, that the two processes can run **concurrently**.
|
||||
|
||||
But factually, this is not the case, because even if we had the possibility of
|
||||
a point of cooperation (with the `>|=` operator), Lwt tries to go as far as
|
||||
possible and decides to finish the left part before launching the right part:
|
||||
|
||||
```shell
|
||||
$ ./a.out
|
||||
sort0: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|]
|
||||
sort0: [|3; 4; 2; 1; 7|]
|
||||
sort0: [|3; 4|]
|
||||
sort0: [|2; 1; 7|]
|
||||
sort0: [|1; 7|]
|
||||
sort0: [|5; 8; 9; 0; 6|]
|
||||
sort0: [|5; 8|]
|
||||
sort0: [|9; 0; 6|]
|
||||
sort0: [|0; 6|]
|
||||
|
||||
sort1: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|]
|
||||
sort1: [|3; 4; 2; 1; 7|]
|
||||
sort1: [|3; 4|]
|
||||
sort1: [|2; 1; 7|]
|
||||
sort1: [|1; 7|]
|
||||
sort1: [|5; 8; 9; 0; 6|]
|
||||
sort1: [|5; 8|]
|
||||
sort1: [|9; 0; 6|]
|
||||
sort1: [|0; 6|]
|
||||
```
|
||||
|
||||
<hr>
|
||||
|
||||
**<tag id="fn1">1</tag>**: However, if you are not interested in availability
|
||||
and would like the scheduler to try to resolve your promises as quickly as
|
||||
possible, this advice is clearly valid.
|
||||
|
||||
#### Performances
|
||||
|
||||
It should be noted, however, that Lwt has an impact. Even if the behaviour is
|
||||
the same, the Lwt layer is not free. A quick benchmark shows that there is an
|
||||
overhead:
|
||||
|
||||
```ocaml
|
||||
let _ =
|
||||
let t0 = Unix.gettimeofday () in
|
||||
for i = 0 to 1000 do let _ = sort0 arr in () done;
|
||||
let t1 = Unix.gettimeofday () in
|
||||
Fmt.pr "sort0 %fs\n%!" (t1 -. t0)
|
||||
|
||||
let _ =
|
||||
let t0 = Unix.gettimeofday () in
|
||||
Lwt_main.run @@ begin
|
||||
let open Lwt.Infix in
|
||||
let rec go idx = if idx = 1000 then Lwt.return_unit
|
||||
else sort1 arr >>= fun _ -> go (succ idx) in
|
||||
go 0 end;
|
||||
let t1 = Unix.gettimeofday () in
|
||||
Fmt.pr "sort1 %fs\n%!" (t1 -. t0)
|
||||
```
|
||||
|
||||
```sh
|
||||
$ ./a.out
|
||||
sort0 0.000264s
|
||||
sort1 0.000676s
|
||||
```
|
||||
|
||||
This is the fairly obvious argument for not using Lwt when there's no I/O. Then,
|
||||
if the Lwt monad is really needed, a simple `Lwt.return` at the very last
|
||||
instance is sufficient (or, better, the use of `Lwt.map` / `>|=`).
|
||||
|
||||
#### Cooperation and concrete example
|
||||
|
||||
So `Lwt.both` is the one to use when we want to run two processes
|
||||
"at the same time". For the example, [ocaml-git][ocaml-git] attempts _both_ to
|
||||
retrieve a repository and also to analyse it. This can be seen in this snippet
|
||||
of [code][ocaml-git-both].
|
||||
|
||||
In our example with ocaml-git, the problem "shouldn't" appear because, in this
|
||||
case, both the left and right side do I/O (the left side binds into a socket
|
||||
while the right side saves Git objects in your file system). So, in our tests
|
||||
with `Git_unix`, we were able to see that the analysis (right-hand side) was
|
||||
well executed and 'interleaved' with the reception of objects from the network.
|
||||
|
||||
### Composability
|
||||
|
||||
However, if we go back to our initial problem, we were talking about our
|
||||
opam-mirror unikernel. As you might expect, there is no standalone MirageOS file
|
||||
system (and many of our unikernels don't need one). So, in the case of
|
||||
opam-mirror, we use the ocaml-git memory implementation: `Git_mem`.
|
||||
|
||||
`Git_mem` is different in that Git objects are simply stored in a `Hashtbl`.
|
||||
There is no cooperation point when it comes to obtaining Git objects from this
|
||||
`Hashtbl`. So let's return to our original advice:
|
||||
|
||||
> don't wrap code in Lwt if it doesn't do I/O.
|
||||
|
||||
And, of course, `Git_mem` doesn't do I/O. It does, however, require the process
|
||||
to be able to work with Lwt. In this case, `Git_mem` wraps the results in Lwt
|
||||
**as late as possible** (as explained above, so as not to slow down our
|
||||
processes unnecessarily). The choice inevitably means that the right-hand side
|
||||
can no longer offer cooperation points. And this is where our problem begins:
|
||||
composition.
|
||||
|
||||
In fact, we had something like:
|
||||
|
||||
```ocaml
|
||||
let clone socket git =
|
||||
Lwt.both (receive_pack socket) (analyse_pack git) >>= fun ((), ()) ->
|
||||
Lwt.return_unit
|
||||
```
|
||||
|
||||
However, our `analyse_pack` function is an injection of a functor representing
|
||||
the Git backend. In other words, `Git_unix` or `Git_mem`:
|
||||
|
||||
```ocaml
|
||||
module Make (Git : Git.S) = struct
|
||||
let clone socket git =
|
||||
Lwt.both (receive_pack socket) (Git.analyse_pack git) >>= fun ((), ()) ->
|
||||
Lwt.return_unit
|
||||
end
|
||||
```
|
||||
|
||||
Composability poses a problem here because even if `Git_unix` and `Git_mem`
|
||||
offer the same function (so both modules can be used), the fact remains that one
|
||||
will always offer a certain availability to other services (such as an HTTP
|
||||
service) while the other will offer a Lwt function which will try to go as far
|
||||
as possible quite to make other services unavailable.
|
||||
|
||||
Composing with one or the other therefore does not produce the same behavior.
|
||||
|
||||
#### Where to put `Lwt.pause`?
|
||||
|
||||
In this case, our `analyse_pack` does read/write on the Git store. As far as
|
||||
`Git_mem` is concerned, we said that these read/write accesses were just
|
||||
accesses to a `Hashtbl`.
|
||||
|
||||
Thanks to [Hannes][hannes]' help, it took us an afternoon to work out where we
|
||||
needed to add cooperation points in `Git_mem` so that `analyse_pack` could give
|
||||
another service such as HTTP the opportunity to work. Basically, this series of
|
||||
[commits][commits] shows where we needed to add `Lwt.pause`.
|
||||
|
||||
However, this points to a number of problems:
|
||||
1) it is not necessarily true that on the basis of composability alone (by
|
||||
_functor_ or by value), Lwt reacts in the same way
|
||||
2) Subtly, you have to dig into the code to find the right opportunities where
|
||||
to put, by hand, `Lwt.pause`.
|
||||
3) In the end, Lwt has no mechanisms for ensuring the availability of a service
|
||||
(this is something that must be taken into account by the implementer).
|
||||
|
||||
### In-depth knowledge of Lwt
|
||||
|
||||
I haven't mentioned another problem we encountered with [Armael][armael] when
|
||||
implementing [multipart_form][multipart_form] where the use of stream meant that
|
||||
Lwt didn't interleave the two processes and the use of a _bounded stream_ was
|
||||
required. Again, even when it comes to I/O, Lwt always tries to go as far as
|
||||
possible in one of two branches of a `Lwt.both`.
|
||||
|
||||
This allows us to conclude that beyond the monad, Lwt has subtleties in its
|
||||
behaviour which may be different from another scheduler such as Async (hence the
|
||||
incompatibility between the two, which is not just of the `'a t` type).
|
||||
|
||||
### Digression on Miou
|
||||
|
||||
That's why we put so much emphasis on the notion of availability when it comes
|
||||
to Miou: to avoid repeating the mistakes of the past. The choices that can be
|
||||
made with regard to this notion in particular have a major impact, and can be
|
||||
unsatisfactory to the user in certain cases (for example, so-called pure
|
||||
calculations could take longer with Miou than with another scheduler).
|
||||
|
||||
In this sense, we have tried to constrain ourselves in the development of Miou
|
||||
through the use of `Effect.Shallow` which requires us to always re-attach our
|
||||
handler (our scheduler) as soon as an effect is produced, unlike `Effect.Deep`
|
||||
which can re-use the same handler for several effects. In other words, and as
|
||||
we've described here, **an effect yields**!
|
||||
|
||||
## Conclusion
|
||||
|
||||
As far as opam-mirror is concerned, we now have an unikernel that is available
|
||||
even if it attempts to clone a Git repository and save Git objects in memory. At
|
||||
least, an HTTP service can co-exist with ocaml-git!
|
||||
|
||||
I hope we'll be able to use it at [the next retreat][retreat], which I invite
|
||||
you to attend to talk more about Lwt, scheduler, Git and unikernels!
|
||||
|
||||
[opam-mirror]: https://git.robur.coop/robur/opam-mirror
|
||||
[reproducibility]: https://blog.osau.re/articles/reproducible.html
|
||||
[bob]: https://bob.osau.re/
|
||||
[installation]: https://blog.osau.re/articles/reproducible.html
|
||||
[ocaml-git]: https://github.com/mirage/ocaml-git
|
||||
[ocaml-git-both]: https://github.com/mirage/ocaml-git/blob/a36c90404b149ab85f429439af8785bb1dde1bee/src/not-so-smart/smart_git.ml#L476-L481
|
||||
[hannes]: https://hannes.robur.coop/
|
||||
[armael]: https://cambium.inria.fr/~agueneau/
|
||||
[multipart_form]: https://discuss.ocaml.org/t/ann-release-of-multipart-form-0-2-0/7704#memory-bound-implementation
|
||||
[retreat]: https://retreat.mirage.io/
|
||||
[commits]: https://github.com/mirage/ocaml-git/pull/631/files
|
||||
[miou-articles]: https://blog.osau.re/tags/scheduler.html
|
|
@ -1,58 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogMirageVPN updated (AEAD, NCP)
|
||||
</title>
|
||||
<meta name="description" content="How we resurrected MirageVPN from its bitrot state">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>MirageVPN updated (AEAD, NCP)</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-VPN">VPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li></ul><h2 id="updating-miragevpn"><a class="anchor" aria-hidden="true" href="#updating-miragevpn"></a>Updating MirageVPN</h2>
|
||||
<p>As announced <a href="https://blog.robur.coop/articles/miragevpn.html">earlier this month</a>, we've been working hard over the last months on MirageVPN (initially developed in 2019, targeting OpenVPN™ 2.4.7, now 2.6.6). We managed to receive funding from <a href="https://www.assure.ngi.eu/">NGI Assure</a> call (via <a href="https://nlnet.nl">NLnet</a>). We've made over 250 commits with more than 10k lines added, and 18k lines removed. We closed nearly all old issues, and opened 100 fresh ones, of which we already closed more than half of them. :D</p>
|
||||
<h3 id="actual-bugs-fixed-that-were-leading-to-non-working-miragevpn-applications"><a class="anchor" aria-hidden="true" href="#actual-bugs-fixed-that-were-leading-to-non-working-miragevpn-applications"></a>Actual bugs fixed (that were leading to non-working MirageVPN applications)</h3>
|
||||
<p>In more detail, we had a specific configuration running over all the years, namely UDP mode with static keys (no TLS handshake, etc.). There were several issues (bitrot) that we encountered and solved along the path, amongst others:</p>
|
||||
<ul>
|
||||
<li>related to the <a href="https://github.com/robur-coop/miragevpn/pull/111">static-key mode and TCP/IP</a>,</li>
|
||||
<li>the <a href="https://github.com/robur-coop/miragevpn/pull/98">order of ACK between the client and the server</a>,</li>
|
||||
<li><a href="https://github.com/robur-coop/miragevpn/pull/110">outgoing TLS packets</a>.</li>
|
||||
</ul>
|
||||
<p>To avoid any future breakage while revising the code (cleaning it up, extending it), we are now building several unikernels as part of our CI system. We also have setup OpenVPN™ servers with various configurations that we periodically test with our new code (we'll also work on further automation thereof).</p>
|
||||
<h3 id="new-features-aead-ciphers-supporting-more-configuration-primitives"><a class="anchor" aria-hidden="true" href="#new-features-aead-ciphers-supporting-more-configuration-primitives"></a>New features: AEAD ciphers, supporting more configuration primitives</h3>
|
||||
<p>We added various configuration primitives, amongst them configuratble tls ciphersuites, minimal and maximal tls version to use, <a href="https://blog.robur.coop/articles/miragevpn.html">tls-crypt-v2</a>, verify-x509-name, cipher, remote-random, ...</p>
|
||||
<p>From a cryptographic point of view, we are now supporting more <a href="https://github.com/robur-coop/miragevpn/pull/108">authentication hashes</a> via the configuration directive <code>auth</code>, namely the SHA2 family - previously, only SHA1 was supported, <a href="https://github.com/robur-coop/miragevpn/pull/125">AEAD ciphers</a> (AES-128-GCM, AES-256-GCM, CHACHA20-POLY1305) - previously only AES-256-CBC was supported.</p>
|
||||
<h3 id="ncp---negotiation-of-cryptographic-parameters"><a class="anchor" aria-hidden="true" href="#ncp---negotiation-of-cryptographic-parameters"></a>NCP - Negotiation of cryptographic parameters</h3>
|
||||
<p>OpenVPN™ has a way to negotiate cryptographic parameters, instead of hardcoding them in the configuration. The client can propose its supported ciphers, and other features (MTU, directly request a push message for IP configuration, use TLS exporter secret instead of the hand-crafted (TLS 1.0 based PRF), ...) once the TLS handshake has been completed.</p>
|
||||
<p>We are now supporting this negotiation protocol, and have been working on the different extensions that are useful to us. Namely, transmitting the <a href="https://github.com/robur-coop/miragevpn/pull/121">supported ciphers</a>, <a href="https://github.com/robur-coop/miragevpn/pull/129">request push</a> (which deletes an entire round-trip), <a href="https://github.com/robur-coop/miragevpn/pull/163">TLS-exporter</a>. This will also be part of the <a href="https://git.robur.coop/robur/openvpn-spec">protocol specification</a> that we're working on while finishing our implementation.</p>
|
||||
<h3 id="cleanups-and-refactorings"><a class="anchor" aria-hidden="true" href="#cleanups-and-refactorings"></a>Cleanups and refactorings</h3>
|
||||
<p>We also took some time to cleanup our code base, removing <code>Lwt.fail</code> (which doesn't produce proper backtraces), using lzo from the decompress package (since that code has been upstreamed a while ago), remove unneeded dependencies (rresult, astring), avoiding <code>assert false</code> in pattern matches by improving types, improve the log output (include a timestamp, show log source, use colors).</p>
|
||||
<h3 id="future"><a class="anchor" aria-hidden="true" href="#future"></a>Future</h3>
|
||||
<p>There is still some work that we want to do, namely a QubesOS client implementation, an operators manual, extending our specification, resurrecting and adapting the server implementation, supporting more NCP features (if appropriate), etc. So stay tuned, we'll also provide reproducible binaries once we're ready.</p>
|
||||
<p>Don't hesitate to reach out to us on <a href="https://github.com/robur-coop/miragevpn/issues">GitHub</a>, <a href="https://robur.coop/Contact">by mail</a> or me personally <a href="https://mastodon.social/@hannesm">on Mastodon</a> if you're stuck.</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
50
articles/miragevpn-ncp.md
Normal file
50
articles/miragevpn-ncp.md
Normal file
|
@ -0,0 +1,50 @@
|
|||
---
|
||||
date: 2023-11-20
|
||||
title: MirageVPN updated (AEAD, NCP)
|
||||
description:
|
||||
How we resurrected MirageVPN from its bitrot state
|
||||
tags:
|
||||
- OCaml
|
||||
- MirageOS
|
||||
- VPN
|
||||
- security
|
||||
author:
|
||||
name: Hannes Mehnert
|
||||
email: hannes@mehnert.org
|
||||
link: https://hannes.robur.coop
|
||||
---
|
||||
|
||||
## Updating MirageVPN
|
||||
|
||||
As announced [earlier this month](miragevpn.html), we've been working hard over the last months on MirageVPN (initially developed in 2019, targeting OpenVPN™ 2.4.7, now 2.6.6). We managed to receive funding from [NGI Assure](https://www.assure.ngi.eu/) call (via [NLnet](https://nlnet.nl)). We've made over 250 commits with more than 10k lines added, and 18k lines removed. We closed nearly all old issues, and opened 100 fresh ones, of which we already closed more than half of them. :D
|
||||
|
||||
### Actual bugs fixed (that were leading to non-working MirageVPN applications)
|
||||
|
||||
In more detail, we had a specific configuration running over all the years, namely UDP mode with static keys (no TLS handshake, etc.). There were several issues (bitrot) that we encountered and solved along the path, amongst others:
|
||||
- related to the [static-key mode and TCP/IP](https://github.com/robur-coop/miragevpn/pull/111),
|
||||
- the [order of ACK between the client and the server](https://github.com/robur-coop/miragevpn/pull/98),
|
||||
- [outgoing TLS packets](https://github.com/robur-coop/miragevpn/pull/110).
|
||||
|
||||
To avoid any future breakage while revising the code (cleaning it up, extending it), we are now building several unikernels as part of our CI system. We also have setup OpenVPN™ servers with various configurations that we periodically test with our new code (we'll also work on further automation thereof).
|
||||
|
||||
### New features: AEAD ciphers, supporting more configuration primitives
|
||||
|
||||
We added various configuration primitives, amongst them configuratble tls ciphersuites, minimal and maximal tls version to use, [tls-crypt-v2](miragevpn.html), verify-x509-name, cipher, remote-random, ...
|
||||
|
||||
From a cryptographic point of view, we are now supporting more [authentication hashes](https://github.com/robur-coop/miragevpn/pull/108) via the configuration directive `auth`, namely the SHA2 family - previously, only SHA1 was supported, [AEAD ciphers](https://github.com/robur-coop/miragevpn/pull/125) (AES-128-GCM, AES-256-GCM, CHACHA20-POLY1305) - previously only AES-256-CBC was supported.
|
||||
|
||||
### NCP - Negotiation of cryptographic parameters
|
||||
|
||||
OpenVPN™ has a way to negotiate cryptographic parameters, instead of hardcoding them in the configuration. The client can propose its supported ciphers, and other features (MTU, directly request a push message for IP configuration, use TLS exporter secret instead of the hand-crafted (TLS 1.0 based PRF), ...) once the TLS handshake has been completed.
|
||||
|
||||
We are now supporting this negotiation protocol, and have been working on the different extensions that are useful to us. Namely, transmitting the [supported ciphers](https://github.com/robur-coop/miragevpn/pull/121), [request push](https://github.com/robur-coop/miragevpn/pull/129) (which deletes an entire round-trip), [TLS-exporter](https://github.com/robur-coop/miragevpn/pull/163). This will also be part of the [protocol specification](https://git.robur.coop/robur/openvpn-spec) that we're working on while finishing our implementation.
|
||||
|
||||
### Cleanups and refactorings
|
||||
|
||||
We also took some time to cleanup our code base, removing `Lwt.fail` (which doesn't produce proper backtraces), using lzo from the decompress package (since that code has been upstreamed a while ago), remove unneeded dependencies (rresult, astring), avoiding `assert false` in pattern matches by improving types, improve the log output (include a timestamp, show log source, use colors).
|
||||
|
||||
### Future
|
||||
|
||||
There is still some work that we want to do, namely a QubesOS client implementation, an operators manual, extending our specification, resurrecting and adapting the server implementation, supporting more NCP features (if appropriate), etc. So stay tuned, we'll also provide reproducible binaries once we're ready.
|
||||
|
||||
Don't hesitate to reach out to us on [GitHub](https://github.com/robur-coop/miragevpn/issues), [by mail](https://robur.coop/Contact) or me personally [on Mastodon](https://mastodon.social/@hannesm) if you're stuck.
|
|
@ -1,60 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogSpeeding up MirageVPN and use it in the wild
|
||||
</title>
|
||||
<meta name="description" content="Performance engineering of MirageVPN, speeding it up by a factor of 25.">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>Speeding up MirageVPN and use it in the wild</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-cryptography">cryptography</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li><li><a href="https://blog.robur.coop/tags.html#tag-VPN">VPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-performance">performance</a></li></ul><p>As we were busy continuing to work on <a href="https://github.com/robur-coop/miragevpn">MirageVPN</a>, we got in touch with <a href="https://eduvpn.org">eduVPN</a>, who are interested about deploying MirageVPN. We got example configuration from their side, and <a href="https://github.com/robur-coop/miragevpn/pull/201">fixed</a> <a href="https://github.com/robur-coop/miragevpn/pull/168">some</a> <a href="https://github.com/robur-coop/miragevpn/pull/202">issues</a>, and also implemented <a href="https://github.com/robur-coop/miragevpn/pull/169">tls-crypt</a> - which was straightforward since we earlier spend time to implement <a href="https://blog.robur.coop/articles/miragevpn.html">tls-crypt-v2</a>.</p>
|
||||
<p>In January, they gave MirageVPN another try, and <a href="https://github.com/robur-coop/miragevpn/issues/206">measured the performance</a> -- which was very poor -- MirageVPN (run as a Unix binary) provided a bandwith of 9.3Mb/s, while OpenVPN provided a bandwidth of 360Mb/s (using a VPN tunnel over TCP).</p>
|
||||
<p>We aim at spending less resources for computing, thus the result was not satisfying for us. We re-read a lot of code, refactored a lot, and are now at ~250Mb/s.</p>
|
||||
<h2 id="tooling-for-performance-engineering-of-ocaml"><a class="anchor" aria-hidden="true" href="#tooling-for-performance-engineering-of-ocaml"></a>Tooling for performance engineering of OCaml</h2>
|
||||
<p>As a first approach we connected with the MirageVPN unix client & OpenVPN client to a eduVPN server and ran speed tests using <a href="https://fast.com">fast.com</a>. This was sufficient to show the initial huge gap in download speeds between MirageVPN and OpenVPN. There is <em>a lot</em> of noise in this approach as there are many computers and routers involved in this setup, and it is hard to reproduce.</p>
|
||||
<p>To get more reproducible results we set up a local VM with openvpn and iperf3 installed. On the host machine we can then connect to the VM's OpenVPN server and run iperf3 against the <em>VPN</em> ip address. This worked more reliably. However, it was still noisy and not suitable to measure single digit percentage changes in performance.
|
||||
To better guide the performance engineering, we also developed <a href="https://github.com/robur-coop/miragevpn/pull/230">a microbenchmark</a> using OCaml tooling. This will setup a client and server without any input and output, and transfer data in memory.</p>
|
||||
<p>We also re-read our code and used the Linux utility <a href="https://perf.wiki.kernel.org/index.php/Main_Page"><code>perf</code></a> together with <a href="https://github.com/brendangregg/FlameGraph">Flamegraph</a> to graph its output. This works nicely with OCaml programs (we're using the 4.14.1 compiler and runtime system). We did the performance engineering on Unix binaries, i.e. not on MirageOS unikernels - but the MirageVPN protocol is used in both scenarios - thus the performance improvements described here are also in the MirageVPN unikernels.</p>
|
||||
<h2 id="takeaway-of-performance-engineering"><a class="anchor" aria-hidden="true" href="#takeaway-of-performance-engineering"></a>Takeaway of performance engineering</h2>
|
||||
<p>The learnings of our performance engineering are in three areas:</p>
|
||||
<ul>
|
||||
<li>Formatting strings is computational expensive -- thus if in an error case a hexdump is produced of a packet, its construction must be delayed for when the error case is executed (we have <a href="https://github.com/robur-coop/miragevpn/pull/220">this PR</a> and <a href="https://github.com/robur-coop/miragevpn/pull/209">that PR</a>). Alain Frisch wrote a nice <a href="https://www.lexifi.com/blog/ocaml/note-about-performance-printf-and-format/#">blog post</a> at LexiFi about performance of <code>Printf</code> and <code>Format</code><sup><a href="#fn-lexifi-date" id="ref-1-fn-lexifi-date" role="doc-noteref" class="fn-label">[1]</a></sup>.</li>
|
||||
<li>Rethink allocations: fundamentally, only a single big buffer (to be send out) for each incoming packet should be allocated, not a series of buffers that are concatenated (see <a href="https://github.com/robur-coop/miragevpn/pull/217">this PR</a> and <a href="https://github.com/robur-coop/miragevpn/pull/219">that PR</a>). Additionally, not zeroing out the just allocated buffer (if it is filled with data anyways) removes some further instructions (see <a href="https://github.com/robur-coop/miragevpn/pull/218">this PR</a>). And we figured that appending to an empty buffer nevertheless allocated and copied in OCaml, so we worked on <a href="https://github.com/robur-coop/miragevpn/pull/214">this PR</a>.</li>
|
||||
<li>Still an open topic is: we are in the memory-safe language OCaml, and we sometimes extract data out of a buffer (or set data in a buffer). Now, each operation lead to bounds checks (that we do not touch memory that is not allocated or not ours). However, if we just checked for the buffer being long enough (either by checking the length, or by allocating a specific amount of data), these bounds checks are superfluous. So far, we don't have an automated solution for this issue, but we are <a href="https://discuss.ocaml.org/t/bounds-checks-for-string-and-bytes-when-retrieving-or-setting-subparts-thereof/">discussing it in the OCaml community</a>, and are eager to find a solution to avoid unneeded computations.</li>
|
||||
</ul>
|
||||
<h2 id="conclusion"><a class="anchor" aria-hidden="true" href="#conclusion"></a>Conclusion</h2>
|
||||
<p>To conclude: we already achieved a factor of 25 in performance by adapting the code in various ways. We have ideas to improve the performance even more in the future - we also work on using OCaml string and bytes, instead of off-the-OCaml-heap-allocated bigarrays (see <a href="https://blog.robur.coop/articles/speeding-ec-string.html">our previous article</a>, which provided some speedups).</p>
|
||||
<p>Don't hesitate to reach out to us on <a href="https://github.com/robur-coop/miragevpn/issues">GitHub</a>, or <a href="https://robur.coop/Contact">by mail</a> if you're stuck.</p>
|
||||
<p>We want to thank <a href="https://nlnet.nl">NLnet</a> for their funding (via <a href="https://www.assure.ngi.eu/">NGI assure</a>), and <a href="https://eduvpn.org">eduVPN</a> for their interest.</p>
|
||||
<section role="doc-endnotes"><ol>
|
||||
<li id="fn-lexifi-date">
|
||||
<p>It has come to our attention that the blog post is rather old (2012) and that the implementation has been completely rewritten since then.</p>
|
||||
<span><a href="#ref-1-fn-lexifi-date" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
54
articles/miragevpn-performance.md
Normal file
54
articles/miragevpn-performance.md
Normal file
|
@ -0,0 +1,54 @@
|
|||
---
|
||||
date: 2024-04-16
|
||||
title: Speeding up MirageVPN and use it in the wild
|
||||
description:
|
||||
Performance engineering of MirageVPN, speeding it up by a factor of 25.
|
||||
tags:
|
||||
- OCaml
|
||||
- MirageOS
|
||||
- cryptography
|
||||
- security
|
||||
- VPN
|
||||
- performance
|
||||
author:
|
||||
name: Hannes Mehnert
|
||||
email: hannes@mehnert.org
|
||||
link: https://hannes.robur.coop
|
||||
coauthors:
|
||||
- author:
|
||||
name: Reynir Björnsson
|
||||
email: reynir@reynir.dk
|
||||
link: https://reyn.ir/
|
||||
---
|
||||
|
||||
As we were busy continuing to work on [MirageVPN](https://github.com/robur-coop/miragevpn), we got in touch with [eduVPN](https://eduvpn.org), who are interested about deploying MirageVPN. We got example configuration from their side, and [fixed](https://github.com/robur-coop/miragevpn/pull/201) [some](https://github.com/robur-coop/miragevpn/pull/168) [issues](https://github.com/robur-coop/miragevpn/pull/202), and also implemented [tls-crypt](https://github.com/robur-coop/miragevpn/pull/169) - which was straightforward since we earlier spend time to implement [tls-crypt-v2](miragevpn.html).
|
||||
|
||||
In January, they gave MirageVPN another try, and [measured the performance](https://github.com/robur-coop/miragevpn/issues/206) -- which was very poor -- MirageVPN (run as a Unix binary) provided a bandwith of 9.3Mb/s, while OpenVPN provided a bandwidth of 360Mb/s (using a VPN tunnel over TCP).
|
||||
|
||||
We aim at spending less resources for computing, thus the result was not satisfying for us. We re-read a lot of code, refactored a lot, and are now at ~250Mb/s.
|
||||
|
||||
## Tooling for performance engineering of OCaml
|
||||
|
||||
As a first approach we connected with the MirageVPN unix client & OpenVPN client to a eduVPN server and ran speed tests using [fast.com](https://fast.com). This was sufficient to show the initial huge gap in download speeds between MirageVPN and OpenVPN. There is *a lot* of noise in this approach as there are many computers and routers involved in this setup, and it is hard to reproduce.
|
||||
|
||||
To get more reproducible results we set up a local VM with openvpn and iperf3 installed. On the host machine we can then connect to the VM's OpenVPN server and run iperf3 against the *VPN* ip address. This worked more reliably. However, it was still noisy and not suitable to measure single digit percentage changes in performance.
|
||||
To better guide the performance engineering, we also developed [a microbenchmark](https://github.com/robur-coop/miragevpn/pull/230) using OCaml tooling. This will setup a client and server without any input and output, and transfer data in memory.
|
||||
|
||||
We also re-read our code and used the Linux utility [`perf`](https://perf.wiki.kernel.org/index.php/Main_Page) together with [Flamegraph](https://github.com/brendangregg/FlameGraph) to graph its output. This works nicely with OCaml programs (we're using the 4.14.1 compiler and runtime system). We did the performance engineering on Unix binaries, i.e. not on MirageOS unikernels - but the MirageVPN protocol is used in both scenarios - thus the performance improvements described here are also in the MirageVPN unikernels.
|
||||
|
||||
## Takeaway of performance engineering
|
||||
|
||||
The learnings of our performance engineering are in three areas:
|
||||
- Formatting strings is computational expensive -- thus if in an error case a hexdump is produced of a packet, its construction must be delayed for when the error case is executed (we have [this PR](https://github.com/robur-coop/miragevpn/pull/220) and [that PR](https://github.com/robur-coop/miragevpn/pull/209)). Alain Frisch wrote a nice [blog post](https://www.lexifi.com/blog/ocaml/note-about-performance-printf-and-format/#) at LexiFi about performance of `Printf` and `Format`[^lexifi-date].
|
||||
- Rethink allocations: fundamentally, only a single big buffer (to be send out) for each incoming packet should be allocated, not a series of buffers that are concatenated (see [this PR](https://github.com/robur-coop/miragevpn/pull/217) and [that PR](https://github.com/robur-coop/miragevpn/pull/219)). Additionally, not zeroing out the just allocated buffer (if it is filled with data anyways) removes some further instructions (see [this PR](https://github.com/robur-coop/miragevpn/pull/218)). And we figured that appending to an empty buffer nevertheless allocated and copied in OCaml, so we worked on [this PR](https://github.com/robur-coop/miragevpn/pull/214).
|
||||
- Still an open topic is: we are in the memory-safe language OCaml, and we sometimes extract data out of a buffer (or set data in a buffer). Now, each operation lead to bounds checks (that we do not touch memory that is not allocated or not ours). However, if we just checked for the buffer being long enough (either by checking the length, or by allocating a specific amount of data), these bounds checks are superfluous. So far, we don't have an automated solution for this issue, but we are [discussing it in the OCaml community](https://discuss.ocaml.org/t/bounds-checks-for-string-and-bytes-when-retrieving-or-setting-subparts-thereof/), and are eager to find a solution to avoid unneeded computations.
|
||||
|
||||
## Conclusion
|
||||
|
||||
To conclude: we already achieved a factor of 25 in performance by adapting the code in various ways. We have ideas to improve the performance even more in the future - we also work on using OCaml string and bytes, instead of off-the-OCaml-heap-allocated bigarrays (see [our previous article](speeding-ec-string.html), which provided some speedups).
|
||||
|
||||
Don't hesitate to reach out to us on [GitHub](https://github.com/robur-coop/miragevpn/issues), or [by mail](https://robur.coop/Contact) if you're stuck.
|
||||
|
||||
We want to thank [NLnet](https://nlnet.nl) for their funding (via [NGI assure](https://www.assure.ngi.eu/)), and [eduVPN](https://eduvpn.org) for their interest.
|
||||
|
||||
[^lexifi-date]: It has come to our attention that the blog post is rather old (2012) and that the implementation has been completely rewritten since then.
|
|
@ -1,46 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogMirageVPN server
|
||||
</title>
|
||||
<meta name="description" content="Announcement of our MirageVPN server.">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>MirageVPN server</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-cryptography">cryptography</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li><li><a href="https://blog.robur.coop/tags.html#tag-VPN">VPN</a></li></ul><p>It is a great pleasure to finally announce that we have finished a server implementation for MirageVPN (OpenVPN™-compatible). This allows to setup a very robust VPN network on both the client and the server side.</p>
|
||||
<p>As announced last year, <a href="https://blog.robur.coop/articles/miragevpn.html">MirageVPN</a> is a reimplemtation of OpenVPN™ in OCaml, with <a href="https://mirage.io">MirageOS</a> unikernels.</p>
|
||||
<h2 id="why-a-miragevpn-server"><a class="anchor" aria-hidden="true" href="#why-a-miragevpn-server"></a>Why a MirageVPN server?</h2>
|
||||
<p>Providing Internet services with programming languages that have not much safety requires a lot of discipline by the developers to avoid issues which may lead to exploitable services that are attacked (and thus will circumvent any security goals). Especially services that are critical for security and privacy, it is crucial to avoid common memory safety pitfalls.</p>
|
||||
<p>Some years back, when we worked on the client implementation, we also drafted a server implementation. The reasoning was that a lot of the code was already there, and just a few things needed to be developed to allow clients to connect there.</p>
|
||||
<p>Now, we spend several months to push our server implementation into a state where it is usable and we are happy for everyone who wants to test it. We also adapted the modern ciphers we recently implemented for the client, and also tls-crypt and tls-crypt-v2 for the server implementation.</p>
|
||||
<p>The overall progress was tracked in <a href="https://github.com/robur-coop/miragevpn/issues/15">this issue</a>. We developed, next to the MirageOS unikernel, also a test server that doesn't use any tun interface.</p>
|
||||
<p>Please move along to our handbook with the <a href="https://robur-coop.github.io/miragevpn-handbook/miragevpn_server.html">chapter on MirageVPN server</a>.</p>
|
||||
<p>If you encounter any issues, please open an issue at <a href="https://github.com/robur-coop/miragevpn">the repository</a>.</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
39
articles/miragevpn-server.md
Normal file
39
articles/miragevpn-server.md
Normal file
|
@ -0,0 +1,39 @@
|
|||
---
|
||||
date: 2024-06-17
|
||||
title: MirageVPN server
|
||||
description:
|
||||
Announcement of our MirageVPN server.
|
||||
tags:
|
||||
- OCaml
|
||||
- MirageOS
|
||||
- cryptography
|
||||
- security
|
||||
- VPN
|
||||
author:
|
||||
name: Hannes Mehnert
|
||||
email: hannes@mehnert.org
|
||||
link: https://hannes.robur.coop
|
||||
coauthors:
|
||||
- author:
|
||||
name: Reynir Björnsson
|
||||
email: reynir@reynir.dk
|
||||
link: https://reyn.ir/
|
||||
---
|
||||
|
||||
It is a great pleasure to finally announce that we have finished a server implementation for MirageVPN (OpenVPN™-compatible). This allows to setup a very robust VPN network on both the client and the server side.
|
||||
|
||||
As announced last year, [MirageVPN](miragevpn.html) is a reimplemtation of OpenVPN™ in OCaml, with [MirageOS](https://mirage.io) unikernels.
|
||||
|
||||
## Why a MirageVPN server?
|
||||
|
||||
Providing Internet services with programming languages that have not much safety requires a lot of discipline by the developers to avoid issues which may lead to exploitable services that are attacked (and thus will circumvent any security goals). Especially services that are critical for security and privacy, it is crucial to avoid common memory safety pitfalls.
|
||||
|
||||
Some years back, when we worked on the client implementation, we also drafted a server implementation. The reasoning was that a lot of the code was already there, and just a few things needed to be developed to allow clients to connect there.
|
||||
|
||||
Now, we spend several months to push our server implementation into a state where it is usable and we are happy for everyone who wants to test it. We also adapted the modern ciphers we recently implemented for the client, and also tls-crypt and tls-crypt-v2 for the server implementation.
|
||||
|
||||
The overall progress was tracked in [this issue](https://github.com/robur-coop/miragevpn/issues/15). We developed, next to the MirageOS unikernel, also a test server that doesn't use any tun interface.
|
||||
|
||||
Please move along to our handbook with the [chapter on MirageVPN server](https://robur-coop.github.io/miragevpn-handbook/miragevpn_server.html).
|
||||
|
||||
If you encounter any issues, please open an issue at [the repository](https://github.com/robur-coop/miragevpn).
|
54
articles/miragevpn-testing.md
Normal file
54
articles/miragevpn-testing.md
Normal file
|
@ -0,0 +1,54 @@
|
|||
---
|
||||
date: 2024-06-26
|
||||
title: Testing MirageVPN against OpenVPN™
|
||||
description: Some notes about how we test MirageVPN against OpenVPN™
|
||||
tags:
|
||||
- OCaml
|
||||
- MirageOS
|
||||
- cryptography
|
||||
- security
|
||||
- testing
|
||||
- vpn
|
||||
author:
|
||||
name: Reynir Björnsson
|
||||
email: reynir@reynir.dk
|
||||
link: https://reyn.ir/
|
||||
---
|
||||
|
||||
As our last milestone for the [EU NGI Assure](https://www.assure.ngi.eu/) funded MirageVPN project (for now) we have been working on testing MirageVPN, our OpenVPN™-compatible VPN implementation against the upstream OpenVPN™.
|
||||
During the development we have conducted many manual tests.
|
||||
However, this scales poorly and it is easy to forget testing certain cases.
|
||||
Therefore, we designed and implemented interoperability testing, driving the C implementation on the one side, and our OCaml implementation on the other side. The input for such a test is a configuration file that both implementations can use.
|
||||
Thus we test establishment of the tunnel as well as the tunnel itself.
|
||||
|
||||
While conducting the tests, our instrumented binaries expose code coverage information. We use that to guide ourselves which other configurations are worth testing. Our goal is to achieve a high code coverage rate while using a small amount of different configurations. These interoperability tests are running fast enough, so they are executed on each commit by CI.
|
||||
|
||||
A nice property of this test setup is that it runs with an unmodified OpenVPN binary.
|
||||
This means we can use an off-the-shelf OpenVPN binary from the package repository and does not entail further maintenance of an OpenVPN fork.
|
||||
Testing against a future version of OpenVPN becomes trivial.
|
||||
We do not just test a single part of our implementation but achieve an end-to-end test.
|
||||
The same configuration files are used for both our implementation and the C implementation, and each configuration is used twice, once our implementation acts as the client, once as the server.
|
||||
|
||||
We added a flag to our client and our [recently finished server](miragevpn-server) applications, `--test`, which make them to exit once a tunnel is established and an ICMP echo request from the client has been replied to by the server.
|
||||
Our client and server can be run without a tun device which otherwise would require elevated privileges.
|
||||
Unfortunately, OpenVPN requires privileges to at least configure a tun device.
|
||||
Our MirageVPN implementation does IP packet parsing in userspace.
|
||||
We test our protocol implementation, not the entire unikernel - but the unikernel code is a tiny layer on top of the purely functional protocol implementation.
|
||||
|
||||
We explored unit testing the packet decoding and decryption with our implementation and the C implementation.
|
||||
Specifically, we encountered a packet whose message authentication code (MAC) was deemed invalid by the C implementation.
|
||||
It helped us discover the MAC computation was correct but the packet encoding was truncated - both implementations agreed that the MAC was bad.
|
||||
The test was very tedious to write and would not easily scale to cover a large portion of the code.
|
||||
If of interest, take a look into our [modifications to OpenVPN](https://github.com/reynir/openvpn/tree/badmac-test) and [modifications to MirageVPN](https://github.com/robur-coop/miragevpn/tree/badmac-test).
|
||||
|
||||
The end-to-end testing is in addition to our unit tests and fuzz testing; and to our [benchmarking](miragevpn-performance.html) binary.
|
||||
|
||||
Our results are that with 4 configurations we achieve above 75% code coverage in MirageVPN.
|
||||
While investigating the code coverage results, we found various pieces of code that were never executed, and we were able to remove them.
|
||||
Code that does not exist is bug-free :D
|
||||
With these tests in place future maintenance is less daunting as they will help us guard us from breaking the code.
|
||||
|
||||
At the moment we do not exercise the error paths very well in the code.
|
||||
This is much less straightforward to test in this manner, and is important future work.
|
||||
We plan to develop a client and server that injects faults at various stages of the protocol to test these error paths.
|
||||
OpenVPN built with debugging enabled also comes with a `--gremlin` mode that injects faults, and would be interesting to investigate.
|
|
@ -1,124 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogMirageVPN & tls-crypt-v2
|
||||
</title>
|
||||
<meta name="description" content="How we implementated tls-crypt-v2 for miragevpn">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>MirageVPN & tls-crypt-v2</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-VPN">VPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li></ul><p>In 2019 <a href="https://robur.coop/">Robur</a> started working on a <a href="https://github.com/robur-coop/miragevpn/">OpenVPN™-compatible implementation in OCaml</a>.
|
||||
The project was funded for 6 months in 2019 by <a href="https://prototypefund.de">prototypefund</a>.
|
||||
In late 2022 we applied again for funding this time to the <a href="https://www.assure.ngi.eu/">NGI Assure</a> open call, and our application was eventually accepted.
|
||||
In this blog post I will explain why reimplementing the OpenVPN™ protocol in OCaml is a worthwhile effort, and describe the Miragevpn implementation and in particular the <code>tls-crypt-v2</code> mechanism.</p>
|
||||
<h2 id="what-even-is-openvpn™"><a class="anchor" aria-hidden="true" href="#what-even-is-openvpn™"></a>What even is OpenVPN™?</h2>
|
||||
<p><a href="https://openvpn.net/">OpenVPN™</a> is a protocol and software implementation to provide <a href="https://en.wikipedia.org/wiki/Virtual_private_network">virtual private networks</a>: computer networks that do not exist in hardware and are encrypted and tunnelled through existing networks.
|
||||
Common use cases for this is to provide access to internal networks for remote workers, and for routing internet traffic through another machine for various reasons e.g. when using untrusted wifi, privacy from a snooping ISP, circumventing geoblock etc.</p>
|
||||
<p>It is a protocol that has been worked on and evolved over the decades.
|
||||
OpenVPN™ has a number of modes of operations as well as a number of options in the order of hundreds.
|
||||
The modes can be categorized into two main categories: static mode and TLS mode.
|
||||
The former mode uses static symmetric keys, and will be removed in the upcoming OpenVPN™ 2.7 (community edition).
|
||||
I will not focus on static mode in this post.
|
||||
The latter uses separate data & control channels where the control channel uses TLS - more on that later.</p>
|
||||
<h3 id="why-reimplement-it-and-why-in-ocaml"><a class="anchor" aria-hidden="true" href="#why-reimplement-it-and-why-in-ocaml"></a>Why reimplement it? And why in OCaml?</h3>
|
||||
<p>Before diving into TLS mode and eventually tls-crypt-v2 it's worth to briefly discuss why we spend time reimplementing the OpenVPN™ protocol.
|
||||
You may ask yourself: why not just use the existing tried and tested implementation?</p>
|
||||
<p>OpenVPN™ community edition is implemented in the C programming language.
|
||||
It heavily uses the OpenSSL library<sup><a href="#fn-mbedtls" id="ref-1-fn-mbedtls" role="doc-noteref" class="fn-label">[1]</a></sup> which is as well written in C and has in the past had some notable security vulnerabilities.
|
||||
Many vulnerabilities and bugs in C can be easily avoided in other languages due to bounds checking and stricter and more expressive type systems.
|
||||
The state machine of the protocol can be more easily be expressed in OCaml, and some properties of the protocol can be encoded in the type system.</p>
|
||||
<p>Another reason is <a href="https://mirage.io/">Mirage OS</a>, a library operating system implemented in OCaml.
|
||||
We work on the Mirage project and write applications (unikernels) using Mirage.
|
||||
In many cases it would be desirable to be able to connect to an existing VPN network<sup><a href="#fn-vpn-network" id="ref-1-fn-vpn-network" role="doc-noteref" class="fn-label">[2]</a></sup>,
|
||||
or be able to offer a VPN network to clients using OpenVPN™.</p>
|
||||
<p>Consider a VPN provider:
|
||||
The VPN provider runs many machines that run an operating system in order to run the user-space OpenVPN™ service.
|
||||
There are no <em>real</em> users on the system, and a lot of unrelated processes and legacy layers are around that are not needed.
|
||||
With a Mirage OS unikernel, which is basically a statically linked binary and operating system such a setup becomes simpler with fewer layers.
|
||||
With <a href="https://robur.coop/Projects/Reproducible_builds">reproducible builds</a> deployment and updates will be straightforward.</p>
|
||||
<p>Another very interesting example is a unikernel for <a href="https://www.qubes-os.org/">Qubes OS</a> that we have planned.
|
||||
Qubes OS is an operating system with a high focus on security.
|
||||
It offers an almost seamless experience of running applications in different virtual machines on the same machine.
|
||||
The networking provided to a application (virtual machine) can be restricted to only go through the VPN.
|
||||
It is possible to use OpenVPN™ for such a setup, but that requires running OpenVPN™ in a full Linux virtual machine.
|
||||
With Mirage OS the resource footprint is typically much smaller than an equivalent application running in a Linux virtual machine; often the memory footprint is smaller by an order.</p>
|
||||
<p>Finally, while it's not an explicit goal of ours, reimplementing a protocol without an explicit specification can help uncover bugs and things that need better documentation in the original implementation.</p>
|
||||
<h3 id="tls-mode"><a class="anchor" aria-hidden="true" href="#tls-mode"></a>TLS mode</h3>
|
||||
<p>There are different variants of TLS mode, but what they share is separate "control" channel and "data" channel.
|
||||
The control channel is used to do a TLS handshake, and with the established TLS session data channel encryption keys, username/password authentication, etc. is negotiated.
|
||||
Once this dance has been performed and data channel encryption keys have been negotiated the peers can exchange IP packets over the data channel.</p>
|
||||
<p>Over the years a number of mechanisms has been implemented to protect the TLS stack from being exposed to third parties, protect against denial of service attacks and to hide information exchanged during a TLS handshake such as certificates (which was an isue before TLS 1.3).
|
||||
These are known as <code>tls-auth</code>, <code>tls-crypt</code> and <code>tls-crypt-v2</code>.
|
||||
The <code>tls-auth</code> mechanism adds a pre-shared key for hmac authentication on the control channel.
|
||||
This makes it possible for an OpenVPN™ server to reject early clients that don't know the shared key before any TLS handshakes are performed.
|
||||
In <code>tls-crypt</code> the control channel is encrypted as well as hmac authenticated using a pre-shared key.
|
||||
Common to both is that the pre-shared key is shared between the server and all clients.
|
||||
For large deployments this significantly reduces the usefulness - the key is more likely to be leaked the greate the number of clients who share this key.</p>
|
||||
<h3 id="tls-crypt-v2"><a class="anchor" aria-hidden="true" href="#tls-crypt-v2"></a>tls-crypt-v2</h3>
|
||||
<p>To improve on <code>tls-crypt</code>, <code>tls-crypt-v2</code> uses one pre-shared key per client.
|
||||
This could be a lot of keys for the server to keep track of, so instead of storing all the client keys on the server the server has a special tls-crypt-v2 server key that is used to <em><a href="https://en.wikipedia.org/wiki/Key_wrap">wrap</a></em> the client keys.
|
||||
That is, each client has their own client key as well as the client key wrapped using the server key.
|
||||
The protocol is then extended so the client in the first message appends the wrapped key <em>unencrypted</em>.
|
||||
The server can then decrypt and verify the client key and decrypt the rest of the packet.
|
||||
Then the client and server use the client key just as in <code>tls-crypt</code>.</p>
|
||||
<p>This is great!
|
||||
Each client can have their own key, and the server doesn't need to keep a potentially large database of client keys.
|
||||
What if the client's key is leaked?
|
||||
A detail I didn't mention is that the wrapped key contains metadata.
|
||||
By default this is the current timestamp, but it is possible on creation to put any (relative short) binary data in there as the metadata.
|
||||
The server can then be configured to check the metadata by calling a script.</p>
|
||||
<p>An issue exists that an initial packet takes up resources on the server because the server needs to</p>
|
||||
<ol>
|
||||
<li>decrypt the wrapped key, and</li>
|
||||
<li>keep the unwrapped key and other data in memory while waiting for the handshake to complete.</li>
|
||||
</ol>
|
||||
<p>This can be abused in an attack very similar to a TCP <a href="https://en.wikipedia.org/wiki/SYN_flood">SYN flood</a>.
|
||||
Without <code>tls-crypt-v2</code> OpenVPN uses a specially crafted session ID (a 64 bit identifier) to avoid this issue similar to <a href="https://en.wikipedia.org/wiki/SYN_cookies">SYN cookies</a>.
|
||||
To address this in OpenVPN 2.6 the protocol for <code>tls-crypt-v2</code> was extended yet further with a 'HMAC cookie' mechanism.
|
||||
The client sends the same packet as before, but uses a sequence number <code>0x0f000001</code> instead of <code>1</code> to signal support of this mechanism.
|
||||
The server responds in a similar manner with a sequence number of <code>0x0f000001</code> and the packet is appended with a tag-length-value encoded list of flags.
|
||||
At the moment only one tag and one value is defined which signifies the server supports HMAC cookies - this seems unnecessarily complex, but is done to allow future extensibility.
|
||||
Finally, if the server supports HMAC cookies, the client sends a packet where the wrapped key is appended in cleartext.
|
||||
The server is now able to decrypt the third packet without having to keep the key from the first packet around and can verify the session id.</p>
|
||||
<h2 id="cool-lets-deploy-it"><a class="anchor" aria-hidden="true" href="#cool-lets-deploy-it"></a>Cool! Let's deploy it!</h2>
|
||||
<p>Great!
|
||||
We build on a daily basis unikernels in our <a href="https://builds.robur.coop/">reproducible builds setup</a>.
|
||||
At the time of writing we have published a <a href="https://builds.robur.coop/job/miragevpn-router">Miragevpn router unikernel</a> acting as a client.
|
||||
For general instructions on running Mirage unikernels see our <a href="https://robur.coop/Projects/Reproducible_builds">reproducible builds</a> blog post.
|
||||
The unikernel will need a block device containing the OpenVPN™ configuration and a network device.
|
||||
More detailed instructions Will Follow Soon™!
|
||||
Don't hesitate to reach out to us on <a href="https://github.com/robur-coop/miragevpn/issues">GitHub</a>, <a href="https://robur.coop/Contact">by mail</a> or me personally <a href="https://bsd.network/@reynir">on Mastodon</a> if you're stuck.</p>
|
||||
<section role="doc-endnotes"><ol>
|
||||
<li id="fn-mbedtls">
|
||||
<p>It is possible to compile OpenVPN™ community edition with Mbed TLS instead of OpenSSL which is written in C as well.</p>
|
||||
<span><a href="#ref-1-fn-mbedtls" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-vpn-network">
|
||||
<p>I use the term "VPN network" to mean the virtual private network itself. It is a bit odd because the 'N' in 'VPN' is 'Network', but without disambiguation 'VPN' could refer to the network itself, the software or the service.</p>
|
||||
<span><a href="#ref-1-fn-vpn-network" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
134
articles/miragevpn.md
Normal file
134
articles/miragevpn.md
Normal file
|
@ -0,0 +1,134 @@
|
|||
---
|
||||
date: 2023-11-14
|
||||
title: MirageVPN & tls-crypt-v2
|
||||
description:
|
||||
How we implementated tls-crypt-v2 for miragevpn
|
||||
tags:
|
||||
- OCaml
|
||||
- MirageOS
|
||||
- VPN
|
||||
- security
|
||||
author:
|
||||
name: Reynir Björnsson
|
||||
email: reynir@reynir.dk
|
||||
link: https://reyn.ir/
|
||||
---
|
||||
|
||||
In 2019 [Robur][robur.coop] started working on a [OpenVPN™-compatible implementation in OCaml][miragevpn].
|
||||
The project was funded for 6 months in 2019 by [prototypefund](https://prototypefund.de).
|
||||
In late 2022 we applied again for funding this time to the [NGI Assure][ngi-assure] open call, and our application was eventually accepted.
|
||||
In this blog post I will explain why reimplementing the OpenVPN™ protocol in OCaml is a worthwhile effort, and describe the Miragevpn implementation and in particular the `tls-crypt-v2` mechanism.
|
||||
|
||||
## What even is OpenVPN™?
|
||||
|
||||
[OpenVPN™][openvpn] is a protocol and software implementation to provide [virtual private networks][vpn-wiki]: computer networks that do not exist in hardware and are encrypted and tunnelled through existing networks.
|
||||
Common use cases for this is to provide access to internal networks for remote workers, and for routing internet traffic through another machine for various reasons e.g. when using untrusted wifi, privacy from a snooping ISP, circumventing geoblock etc.
|
||||
|
||||
It is a protocol that has been worked on and evolved over the decades.
|
||||
OpenVPN™ has a number of modes of operations as well as a number of options in the order of hundreds.
|
||||
The modes can be categorized into two main categories: static mode and TLS mode.
|
||||
The former mode uses static symmetric keys, and will be removed in the upcoming OpenVPN™ 2.7 (community edition).
|
||||
I will not focus on static mode in this post.
|
||||
The latter uses separate data & control channels where the control channel uses TLS - more on that later.
|
||||
|
||||
### Why reimplement it? And why in OCaml?
|
||||
|
||||
Before diving into TLS mode and eventually tls-crypt-v2 it's worth to briefly discuss why we spend time reimplementing the OpenVPN™ protocol.
|
||||
You may ask yourself: why not just use the existing tried and tested implementation?
|
||||
|
||||
OpenVPN™ community edition is implemented in the C programming language.
|
||||
It heavily uses the OpenSSL library[^mbedtls] which is as well written in C and has in the past had some notable security vulnerabilities.
|
||||
Many vulnerabilities and bugs in C can be easily avoided in other languages due to bounds checking and stricter and more expressive type systems.
|
||||
The state machine of the protocol can be more easily be expressed in OCaml, and some properties of the protocol can be encoded in the type system.
|
||||
|
||||
Another reason is [Mirage OS][mirage], a library operating system implemented in OCaml.
|
||||
We work on the Mirage project and write applications (unikernels) using Mirage.
|
||||
In many cases it would be desirable to be able to connect to an existing VPN network[^vpn-network],
|
||||
or be able to offer a VPN network to clients using OpenVPN™.
|
||||
|
||||
Consider a VPN provider:
|
||||
The VPN provider runs many machines that run an operating system in order to run the user-space OpenVPN™ service.
|
||||
There are no *real* users on the system, and a lot of unrelated processes and legacy layers are around that are not needed.
|
||||
With a Mirage OS unikernel, which is basically a statically linked binary and operating system such a setup becomes simpler with fewer layers.
|
||||
With [reproducible builds][reproducible-builds] deployment and updates will be straightforward.
|
||||
|
||||
Another very interesting example is a unikernel for [Qubes OS][qubes] that we have planned.
|
||||
Qubes OS is an operating system with a high focus on security.
|
||||
It offers an almost seamless experience of running applications in different virtual machines on the same machine.
|
||||
The networking provided to a application (virtual machine) can be restricted to only go through the VPN.
|
||||
It is possible to use OpenVPN™ for such a setup, but that requires running OpenVPN™ in a full Linux virtual machine.
|
||||
With Mirage OS the resource footprint is typically much smaller than an equivalent application running in a Linux virtual machine; often the memory footprint is smaller by an order.
|
||||
|
||||
Finally, while it's not an explicit goal of ours, reimplementing a protocol without an explicit specification can help uncover bugs and things that need better documentation in the original implementation.
|
||||
|
||||
### TLS mode
|
||||
|
||||
There are different variants of TLS mode, but what they share is separate "control" channel and "data" channel.
|
||||
The control channel is used to do a TLS handshake, and with the established TLS session data channel encryption keys, username/password authentication, etc. is negotiated.
|
||||
Once this dance has been performed and data channel encryption keys have been negotiated the peers can exchange IP packets over the data channel.
|
||||
|
||||
Over the years a number of mechanisms has been implemented to protect the TLS stack from being exposed to third parties, protect against denial of service attacks and to hide information exchanged during a TLS handshake such as certificates (which was an isue before TLS 1.3).
|
||||
These are known as `tls-auth`, `tls-crypt` and `tls-crypt-v2`.
|
||||
The `tls-auth` mechanism adds a pre-shared key for hmac authentication on the control channel.
|
||||
This makes it possible for an OpenVPN™ server to reject early clients that don't know the shared key before any TLS handshakes are performed.
|
||||
In `tls-crypt` the control channel is encrypted as well as hmac authenticated using a pre-shared key.
|
||||
Common to both is that the pre-shared key is shared between the server and all clients.
|
||||
For large deployments this significantly reduces the usefulness - the key is more likely to be leaked the greate the number of clients who share this key.
|
||||
|
||||
### tls-crypt-v2
|
||||
|
||||
To improve on `tls-crypt`, `tls-crypt-v2` uses one pre-shared key per client.
|
||||
This could be a lot of keys for the server to keep track of, so instead of storing all the client keys on the server the server has a special tls-crypt-v2 server key that is used to *[wrap][wiki-wrap]* the client keys.
|
||||
That is, each client has their own client key as well as the client key wrapped using the server key.
|
||||
The protocol is then extended so the client in the first message appends the wrapped key *unencrypted*.
|
||||
The server can then decrypt and verify the client key and decrypt the rest of the packet.
|
||||
Then the client and server use the client key just as in `tls-crypt`.
|
||||
|
||||
This is great!
|
||||
Each client can have their own key, and the server doesn't need to keep a potentially large database of client keys.
|
||||
What if the client's key is leaked?
|
||||
A detail I didn't mention is that the wrapped key contains metadata.
|
||||
By default this is the current timestamp, but it is possible on creation to put any (relative short) binary data in there as the metadata.
|
||||
The server can then be configured to check the metadata by calling a script.
|
||||
|
||||
An issue exists that an initial packet takes up resources on the server because the server needs to
|
||||
|
||||
1) decrypt the wrapped key, and
|
||||
2) keep the unwrapped key and other data in memory while waiting for the handshake to complete.
|
||||
|
||||
This can be abused in an attack very similar to a TCP [SYN flood][syn-flood].
|
||||
Without `tls-crypt-v2` OpenVPN uses a specially crafted session ID (a 64 bit identifier) to avoid this issue similar to [SYN cookies][syn-cookie].
|
||||
To address this in OpenVPN 2.6 the protocol for `tls-crypt-v2` was extended yet further with a 'HMAC cookie' mechanism.
|
||||
The client sends the same packet as before, but uses a sequence number `0x0f000001` instead of `1` to signal support of this mechanism.
|
||||
The server responds in a similar manner with a sequence number of `0x0f000001` and the packet is appended with a tag-length-value encoded list of flags.
|
||||
At the moment only one tag and one value is defined which signifies the server supports HMAC cookies - this seems unnecessarily complex, but is done to allow future extensibility.
|
||||
Finally, if the server supports HMAC cookies, the client sends a packet where the wrapped key is appended in cleartext.
|
||||
The server is now able to decrypt the third packet without having to keep the key from the first packet around and can verify the session id.
|
||||
|
||||
## Cool! Let's deploy it!
|
||||
|
||||
Great!
|
||||
We build on a daily basis unikernels in our [reproducible builds setup][builder-web].
|
||||
At the time of writing we have published a [Miragevpn router unikernel][miragevpn-router] acting as a client.
|
||||
For general instructions on running Mirage unikernels see our [reproducible builds][reproducible-builds] blog post.
|
||||
The unikernel will need a block device containing the OpenVPN™ configuration and a network device.
|
||||
More detailed instructions Will Follow Soon™!
|
||||
Don't hesitate to reach out to us on [GitHub](https://github.com/robur-coop/miragevpn/issues), [by mail](https://robur.coop/Contact) or me personally [on Mastodon](https://bsd.network/@reynir) if you're stuck.
|
||||
|
||||
[robur.coop]: https://robur.coop/
|
||||
[miragevpn]: https://github.com/robur-coop/miragevpn/
|
||||
[ngi-assure]: https://www.assure.ngi.eu/
|
||||
[openvpn]: https://openvpn.net/
|
||||
[vpn-wiki]: https://en.wikipedia.org/wiki/Virtual_private_network
|
||||
[mirage]: https://mirage.io/
|
||||
[reproducible-builds]: https://robur.coop/Projects/Reproducible_builds
|
||||
[qubes]: https://www.qubes-os.org/
|
||||
[wiki-wrap]: https://en.wikipedia.org/wiki/Key_wrap
|
||||
[syn-flood]: https://en.wikipedia.org/wiki/SYN_flood
|
||||
[syn-cookie]: https://en.wikipedia.org/wiki/SYN_cookies
|
||||
[builder-web]: https://builds.robur.coop/
|
||||
[miragevpn-router]: https://builds.robur.coop/job/miragevpn-router
|
||||
|
||||
[^mbedtls]: It is possible to compile OpenVPN™ community edition with Mbed TLS instead of OpenSSL which is written in C as well.
|
||||
|
||||
[^vpn-network]: I use the term "VPN network" to mean the virtual private network itself. It is a bit odd because the 'N' in 'VPN' is 'Network', but without disambiguation 'VPN' could refer to the network itself, the software or the service.
|
|
@ -1,83 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogqubes-miragevpn, a MirageVPN client for QubesOS
|
||||
</title>
|
||||
<meta name="description" content="A new OpenVPN client for QubesOS">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>qubes-miragevpn, a MirageVPN client for QubesOS</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-vpn">vpn</a></li><li><a href="https://blog.robur.coop/tags.html#tag-unikernel">unikernel</a></li><li><a href="https://blog.robur.coop/tags.html#tag-QubesOS">QubesOS</a></li></ul><p>We are pleased to announce the arrival of a new unikernel:
|
||||
<a href="https://github.com/robur-coop/qubes-miragevpn">qubes-miragevpn</a>. The latter is the result of work begun
|
||||
several months ago on <a href="https://github.com/robur-coop/miragevpn">miragevpn</a>.</p>
|
||||
<p>Indeed, with the ambition of completing our unikernel suite and the success of
|
||||
<a href="https://github.com/mirage/qubes-mirage-firewall">qubes-mirage-firewall</a> - as well as the general aims of
|
||||
QubesOS - we thought it would be a good idea to offer this community a unikernel
|
||||
capable of acting as an OpenVPN client, from which other virtual machines (app
|
||||
qubes) can connect so that all their connections pass through the OpenVPN
|
||||
tunnel.</p>
|
||||
<h2 id="qubesos--mirageos"><a class="anchor" aria-hidden="true" href="#qubesos--mirageos"></a>QubesOS & MirageOS</h2>
|
||||
<p>Unikernels and QubesOS have always been a tempting idea for users in the sense
|
||||
that a network application (such as a firewall or VPN client) could be smaller
|
||||
than a Linux kernel: no keyboard, mouse, wifi management, etc. Just network
|
||||
management via virtual interfaces should suffice.</p>
|
||||
<p>In this case, the unikernel corresponds to this ideal where, starting from a
|
||||
base (<a href="https://github.com/Solo5/solo5">Solo5</a>) that only allows the strictly necessary (reading and
|
||||
writing on a virtual interface or block device) and building on top of it all
|
||||
the application logic strictly necessary to the objective we wish to achieve
|
||||
reduces, in effect, drastically:</p>
|
||||
<ol>
|
||||
<li>the unikernel's attack surface</li>
|
||||
<li>its weight</li>
|
||||
<li>its memory usage</li>
|
||||
</ol>
|
||||
<p>We won't go into all the work that's been done to maintain and improve
|
||||
<a href="https://github.com/mirage/qubes-mirage-firewall">qubes-mirage-firewall</a> over the last 10
|
||||
years<sup><a href="#fn1">1</a></sup>, but it's clear that this particular unikernel has
|
||||
found its audience, who aren't necessarily OCaml and MirageOS aficionados.</p>
|
||||
<p>In other words, <a href="https://github.com/mirage/qubes-mirage-firewall">qubes-mirage-firewall</a> may well be a
|
||||
fine example of what can actually be done with MirageOS, and of real utility.</p>
|
||||
<hr>
|
||||
<p><tag id="fn1"><strong>1</strong></tag>: <a href="https://github.com/marmarek">marmarek</a>, <a href="https://github.com/yomimono">Mindy</a> or
|
||||
<a href="https://github.com/mato">mato</a> were (and still are) heavily involved in the work between QubesOS
|
||||
and MirageOS. We'd also like to thank them, because if we're able to continue
|
||||
this adventure, it's also thanks to them.</p>
|
||||
<h2 id="qubesos--miragevpn"><a class="anchor" aria-hidden="true" href="#qubesos--miragevpn"></a>QubesOS & MirageVPN</h2>
|
||||
<p>So, after a lengthy development phase for MirageVPN, we set about developing a
|
||||
unikernel for QubesOS to offer an OpenVPN client as an operating system. We'd
|
||||
like to give special thanks to <a href="https://github.com/palainp">Pierre Alain</a>, who helped us to better
|
||||
understand QubesOS and its possibilities.</p>
|
||||
<p>The unikernel is available here: https://github.com/robur-coop/qubes-miragevpn
|
||||
A tutorial has just been created to help QubesOS users install and configure
|
||||
such an unikernel: https://robur-coop.github.io/miragevpn-handbook/</p>
|
||||
<p>In the same way as <a href="https://github.com/mirage/qubes-mirage-firewall">qubes-mirage-firewall</a>, we hope to
|
||||
offer a solution that works and expand the circle of MirageOS and unikernel
|
||||
users!</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
82
articles/qubes-miragevpn.md
Normal file
82
articles/qubes-miragevpn.md
Normal file
|
@ -0,0 +1,82 @@
|
|||
---
|
||||
date: 2024-06-24
|
||||
title: qubes-miragevpn, a MirageVPN client for QubesOS
|
||||
description: A new OpenVPN client for QubesOS
|
||||
tags:
|
||||
- OCaml
|
||||
- vpn
|
||||
- unikernel
|
||||
- QubesOS
|
||||
author:
|
||||
name: Romain Calascibetta
|
||||
email: romain.calascibetta@gmail.com
|
||||
link: https://blog.osau.re/
|
||||
---
|
||||
|
||||
We are pleased to announce the arrival of a new unikernel:
|
||||
[qubes-miragevpn][qubes-miragevpn]. The latter is the result of work begun
|
||||
several months ago on [miragevpn][miragevpn].
|
||||
|
||||
Indeed, with the ambition of completing our unikernel suite and the success of
|
||||
[qubes-mirage-firewall][qubes-mirage-firewall] - as well as the general aims of
|
||||
QubesOS - we thought it would be a good idea to offer this community a unikernel
|
||||
capable of acting as an OpenVPN client, from which other virtual machines (app
|
||||
qubes) can connect so that all their connections pass through the OpenVPN
|
||||
tunnel.
|
||||
|
||||
## QubesOS & MirageOS
|
||||
|
||||
Unikernels and QubesOS have always been a tempting idea for users in the sense
|
||||
that a network application (such as a firewall or VPN client) could be smaller
|
||||
than a Linux kernel: no keyboard, mouse, wifi management, etc. Just network
|
||||
management via virtual interfaces should suffice.
|
||||
|
||||
In this case, the unikernel corresponds to this ideal where, starting from a
|
||||
base ([Solo5][solo5]) that only allows the strictly necessary (reading and
|
||||
writing on a virtual interface or block device) and building on top of it all
|
||||
the application logic strictly necessary to the objective we wish to achieve
|
||||
reduces, in effect, drastically:
|
||||
1) the unikernel's attack surface
|
||||
2) its weight
|
||||
3) its memory usage
|
||||
|
||||
|
||||
We won't go into all the work that's been done to maintain and improve
|
||||
[qubes-mirage-firewall][qubes-mirage-firewall] over the last 10
|
||||
years<sup>[1](#fn1)</sup>, but it's clear that this particular unikernel has
|
||||
found its audience, who aren't necessarily OCaml and MirageOS aficionados.
|
||||
|
||||
In other words, [qubes-mirage-firewall][qubes-mirage-firewall] may well be a
|
||||
fine example of what can actually be done with MirageOS, and of real utility.
|
||||
|
||||
<hr>
|
||||
|
||||
<tag id="fn1">**1**</tag>: [marmarek][marmarek], [Mindy][yomimono] or
|
||||
[mato][mato] were (and still are) heavily involved in the work between QubesOS
|
||||
and MirageOS. We'd also like to thank them, because if we're able to continue
|
||||
this adventure, it's also thanks to them.
|
||||
|
||||
## QubesOS & MirageVPN
|
||||
|
||||
So, after a lengthy development phase for MirageVPN, we set about developing a
|
||||
unikernel for QubesOS to offer an OpenVPN client as an operating system. We'd
|
||||
like to give special thanks to [Pierre Alain][palainp], who helped us to better
|
||||
understand QubesOS and its possibilities.
|
||||
|
||||
The unikernel is available here: https://github.com/robur-coop/qubes-miragevpn
|
||||
A tutorial has just been created to help QubesOS users install and configure
|
||||
such an unikernel: https://robur-coop.github.io/miragevpn-handbook/
|
||||
|
||||
In the same way as [qubes-mirage-firewall][qubes-mirage-firewall], we hope to
|
||||
offer a solution that works and expand the circle of MirageOS and unikernel
|
||||
users!
|
||||
|
||||
[qubes-miragevpn]: https://github.com/robur-coop/qubes-miragevpn
|
||||
[miragevpn]: https://github.com/robur-coop/miragevpn
|
||||
[qubes-mirage-firewall]: https://github.com/mirage/qubes-mirage-firewall
|
||||
[glossary]: https://www.qubes-os.org/doc/glossary/
|
||||
[solo5]: https://github.com/Solo5/solo5
|
||||
[palainp]: https://github.com/palainp
|
||||
[marmarek]: https://github.com/marmarek
|
||||
[yomimono]: https://github.com/yomimono
|
||||
[mato]: https://github.com/mato
|
|
@ -1,138 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogSpeeding elliptic curve cryptography
|
||||
</title>
|
||||
<meta name="description" content="How we improved the performance of elliptic curves by only modifying the underlying byte array">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>Speeding elliptic curve cryptography</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-cryptography">cryptography</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li></ul><p>TL;DR: replacing cstruct with string, we gain a factor of 2.5 in performance.</p>
|
||||
<h2 id="mirage-crypto-ec"><a class="anchor" aria-hidden="true" href="#mirage-crypto-ec"></a>Mirage-crypto-ec</h2>
|
||||
<p>In April 2021 We published our implementation of <a href="https://hannes.robur.coop/Posts/EC">elliptic curve cryptography</a> (as <code>mirage-crypto-ec</code> opam package) - this is DSA and DH for NIST curves P224, P256, P384, and P521, and also Ed25519 (EdDSA) and X25519 (ECDH). We use <a href="https://github.com/mit-plv/fiat-crypto/">fiat-crypto</a> for the cryptographic primitives, which emits C code that by construction is correct (note: earlier we stated "free of timing side-channels", but this is a huge challenge, and as <a href="https://discuss.systems/@edwintorok/111925959867297453">reported by Edwin Török</a> likely impossible on current x86 hardware). More C code (such as <code>point_add</code>, <code>point_double</code>, and further 25519 computations including tables) have been taken from the BoringSSL code base. A lot of OCaml code originates from our TLS 1.3 work in 2018, where Etienne Millon, Nathan Rebours, and Clément Pascutto interfaced <a href="https://github.com/mirage/fiat/">elliptic curves for OCaml</a> (with the goal of being usable with MirageOS).</p>
|
||||
<p>The goal of mirage-crypto-ec was: develop elliptic curve support for OCaml & MirageOS quickly - which didn't leave much time to focus on performance. As time goes by, our mileage varies, and we're keen to use fewer resources - and thus fewer CPU time and a smaller memory footprint is preferable.</p>
|
||||
<h2 id="memory-allocation-and-calls-to-c"><a class="anchor" aria-hidden="true" href="#memory-allocation-and-calls-to-c"></a>Memory allocation and calls to C</h2>
|
||||
<p>OCaml uses managed memory with a generational copying collection. To safely call a C function at any point in time when the arguments are OCaml values (memory allocated on the OCaml heap), it is crucial that while the C function is executed, the arguments should stay at the same memory location, and not being moved by the GC. Otherwise the C code may be upset retrieving wrong data or accessing unmapped memory.</p>
|
||||
<p>There are several strategies to achieve this, ranging from "let's use another memory area where the GC doesn't mess around with", "do not run any GC while executing the C code" (read further in the OCaml <a href="https://v2.ocaml.org/releases/4.14/htmlman/intfc.html#ss:c-direct-call">cheaper C calls</a> manual), "deeply copy the arguments to a non-moving memory area before executing C code", and likely others.</p>
|
||||
<p>For our elliptic curve operations, the C code is pretty simple - there are no memory allocations happening in C, neither are exceptions raised. Also, the execution time of the code is constant and pretty small.</p>
|
||||
<h2 id="ocaml-cstruct"><a class="anchor" aria-hidden="true" href="#ocaml-cstruct"></a>ocaml-cstruct</h2>
|
||||
<p>In the <a href="https://mirage.io">MirageOS</a> ecosystem, a core library is <a href="https://github.com/mirage/ocaml-cstruct">cstruct</a> - which purpose is manifold: provide ppx rewriters to define C structure layouts in OCaml (getter/setter functions are generated), as well as enums; also a fundamental idea is to use OCaml bigarray which is non-moving memory not allocated on the OCaml heap but directly by calling <code>malloc</code>. The memory can even be page-aligned, as required by some C software, such as Xen. Convenient functionality, such as "retrieve a big-endian unsigned 32 bit integer from offset X in this buffer" are provided as well.</p>
|
||||
<p>But there's a downside to it - as time moves along, Xen is no longer the only target for MirageOS, and other virtualization mechanisms (such as KVM / virtio) do not require page-aligned memory ranges that are retained at a given memory address. It also turns out that cstruct spends a lot of time in bounds checks. Another huge downside is that OCaml tooling (such as statmemprof) was for a long time (maybe still is not?) unaware of out-of-OCaml-GC allocated memory (cstruct uses bigarray as underlying buffer). Freeing up the memory requires finalizers to be executed - after all pretty tedious (expensive) and against the OCaml runtime philosophy.</p>
|
||||
<p>As time moves forward, also the OCaml standard library got support for (a) strings are immutable byte vectors now (since 4.06 - released in 2017 -- there's as well an interface for mutable/immutable cstruct, but that is not used as far as I can tell), (b) retrieve a certain amount of octets in a string or byte as (unsigned) integer number (since 4.08 - released in 2019, while some additional functionality is only available in 4.13).</p>
|
||||
<p>Still, bigarrays are necessary in certain situations - if you need to have a non-moving (shared) area of memory, as in the Xen interface, but also e.g. when you compute in parallel in different processes, or when you need mmap()ed files.</p>
|
||||
<h2 id="putting-it-together"><a class="anchor" aria-hidden="true" href="#putting-it-together"></a>Putting it together</h2>
|
||||
<p>Already in October 2021, Romain <a href="https://github.com/mirage/mirage-crypto/pull/146">proposed</a> to not use cstruct, but bytes for mirage-crypto-ec. The PR was sitting around since there were benchmarks missing, and developer time was small. But recently, Virgile Robles <a href="https://github.com/mirage/mirage-crypto/pull/191">proposed</a> another line of work to use pre-computed tables for NIST curves to speed up the elliptic curve cryptography. Conducting performance evaluation resulted that the "use bytes instead of cstruct" combined with pre-computed tables made a huge difference (factor of 6) compared to the latest release.</p>
|
||||
<p>To ease reviewing changes, we decided to focus on landing the "use bytes instead of cstruct" first, and gladly Pierre Alain had already rebased the existing patch onto the latest release of mirage-crypto-ec. We also went further and use string where applicable instead of bytes. For safety reasons we also introduced an API layer which (a) allocates a byte vector for the result (b) calls the primitive, and (c) transforms the byte vector into an immutable string. This API is more in line with functional programming (immutable values), and since allocations and deallocations of values are cheap, there's no measurable performance decrease.</p>
|
||||
<p>All the changes are internal, there's no external API that needs to be adjusted - still there's at the API boundary one conversion of cstruct to string (and back for the return value) done.</p>
|
||||
<p>We used <code>perf</code> to construct some flame graphs (of the ECDSA P256 sign), shown below.</p>
|
||||
<p><img src="../images/trace-cstruct-440.svg" alt="Flamegraph of ECDSA sign with cstruct" ></p>
|
||||
<p>The flame graph of P256 ECDSA sign using the mirage-crypto release 0.11.2. The majority of time is spent in "do_sign", which calls <code>inv</code> (inversion), <code>scalar_mult</code> (majority of time), and <code>x_of_finite_point_mod_n</code>. The scalar multiplication spends time in <code>add</code>, <code>double</code> and <code>select</code>. Several towers starting at <code>Cstruct.create_919</code> are visible.</p>
|
||||
<p>With PR#146, the flame graph looks different:</p>
|
||||
<p><img src="../images/trace-string-770.svg" alt="Flamegraph of ECDSA sign with string" ></p>
|
||||
<p>Now, the allocation towers do not exist anymore. The time of a sign operation is spend in <code>inv</code>, <code>scalar_mult</code>, and <code>x_of_finite_point_mod_n</code>. There's still room for improvements in these operations.</p>
|
||||
<h2 id="performance-numbers"><a class="anchor" aria-hidden="true" href="#performance-numbers"></a>Performance numbers</h2>
|
||||
<p>All numbers were gathered on a Lenovo X250 laptop with a Intel i7-5600U CPU @ 2.60GHz. We used OCaml 4.14.1 as compiler. The baseline is OpenSSL 3.0.12. All numbers are in operations per second.</p>
|
||||
<p>NIST P-256</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>op</th>
|
||||
<th>0.11.2</th>
|
||||
<th>PR#146</th>
|
||||
<th>speedup</th>
|
||||
<th>OpenSSL</th>
|
||||
<th>speedup</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>sign</td>
|
||||
<td>748</td>
|
||||
<td>1806</td>
|
||||
<td>2.41x</td>
|
||||
<td>34392</td>
|
||||
<td>19.04x</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>verify</td>
|
||||
<td>285</td>
|
||||
<td>655</td>
|
||||
<td>2.30x</td>
|
||||
<td>12999</td>
|
||||
<td>19.85x</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ecdh</td>
|
||||
<td>858</td>
|
||||
<td>1785</td>
|
||||
<td>2.08x</td>
|
||||
<td>16514</td>
|
||||
<td>9.25x</td>
|
||||
</tr>
|
||||
</table></div><p>Curve 25519</p>
|
||||
<div role="region"><table>
|
||||
<tr>
|
||||
<th>op</th>
|
||||
<th>0.11.2</th>
|
||||
<th>PR#146</th>
|
||||
<th>speedup</th>
|
||||
<th>OpenSSL</th>
|
||||
<th>speedup</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>sign</td>
|
||||
<td>10713</td>
|
||||
<td>11560</td>
|
||||
<td>1.08x</td>
|
||||
<td>21943</td>
|
||||
<td>1.90x</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>verify</td>
|
||||
<td>7600</td>
|
||||
<td>8314</td>
|
||||
<td>1.09x</td>
|
||||
<td>7081</td>
|
||||
<td>0.85x</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ecdh</td>
|
||||
<td>12144</td>
|
||||
<td>13457</td>
|
||||
<td>1.11x</td>
|
||||
<td>26201</td>
|
||||
<td>1.95x</td>
|
||||
</tr>
|
||||
</table></div><p>Note: to re-create the performance numbers, you can run <code>openssl speed ecdsap256 ecdhp256 ed25519 ecdhx25519</code> - for the OCaml site, use <code>dune bu bench/speed.exe --rel</code> and <code>_build/default/bench/speed.exe ecdsa-sign ecdsa-verify ecdh-share</code>.</p>
|
||||
<p>The performance improvements are up to 2.5 times compared to the latest mirage-crypto-ec release (look at the 4th column). In comparison to OpenSSL, we still lack a factor of 20 for the NIST curves, and up to a factor of 2 for 25519 computations (look at the last column).</p>
|
||||
<p>If you have ideas for improvements, let us know via an issue, eMail, or a pull request :) We started to <a href="https://github.com/mirage/mirage-crypto/issues/193">gather some</a> for 25519 by comparing our code with changes in BoringSSL over the last years.</p>
|
||||
<p>As a spoiler, for P-256 sign there's another improvement of around 4.5 with <a href="https://github.com/mirage/mirage-crypto/pull/191">Virgile's PR</a> using pre-computed tables also for NIST curves.</p>
|
||||
<h2 id="the-road-ahead-for-2024"><a class="anchor" aria-hidden="true" href="#the-road-ahead-for-2024"></a>The road ahead for 2024</h2>
|
||||
<p>Remove all cstruct, everywhere, apart from in mirage-block-xen and mirage-net-xen ;). It was a fine decision in the early MirageOS days, but from a performance point of view, and for making our packages more broadly usable without many dependencies, it is time to remove cstruct. Earlier this year we already <a href="https://github.com/mirage/ocaml-tar/pull/137">removed cstruct from ocaml-tar</a> for similar reasons.</p>
|
||||
<p>Our MirageOS work is only partially funded, we cross-fund our work by commercial contracts and public (EU) funding. We are part of a non-profit company, you can make a (tax-deductable - at least in the EU) <a href="https://aenderwerk.de/donate/">donation</a> (select "DONATION robur" in the dropdown menu).</p>
|
||||
<p>We're keen to get MirageOS deployed in production - if you would like to do that, don't hesitate to reach out to us via eMail team at robur.coop</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
98
articles/speeding-ec-string.md
Normal file
98
articles/speeding-ec-string.md
Normal file
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
date: 2024-02-13
|
||||
title: Speeding elliptic curve cryptography
|
||||
description:
|
||||
How we improved the performance of elliptic curves by only modifying the underlying byte array
|
||||
tags:
|
||||
- OCaml
|
||||
- MirageOS
|
||||
- cryptography
|
||||
- security
|
||||
author:
|
||||
name: Hannes Mehnert
|
||||
email: hannes@mehnert.org
|
||||
link: https://hannes.robur.coop
|
||||
---
|
||||
|
||||
TL;DR: replacing cstruct with string, we gain a factor of 2.5 in performance.
|
||||
|
||||
## Mirage-crypto-ec
|
||||
|
||||
In April 2021 We published our implementation of [elliptic curve cryptography](https://hannes.robur.coop/Posts/EC) (as `mirage-crypto-ec` opam package) - this is DSA and DH for NIST curves P224, P256, P384, and P521, and also Ed25519 (EdDSA) and X25519 (ECDH). We use [fiat-crypto](https://github.com/mit-plv/fiat-crypto/) for the cryptographic primitives, which emits C code that by construction is correct (note: earlier we stated "free of timing side-channels", but this is a huge challenge, and as [reported by Edwin Török](https://discuss.systems/@edwintorok/111925959867297453) likely impossible on current x86 hardware). More C code (such as `point_add`, `point_double`, and further 25519 computations including tables) have been taken from the BoringSSL code base. A lot of OCaml code originates from our TLS 1.3 work in 2018, where Etienne Millon, Nathan Rebours, and Clément Pascutto interfaced [elliptic curves for OCaml](https://github.com/mirage/fiat/) (with the goal of being usable with MirageOS).
|
||||
|
||||
The goal of mirage-crypto-ec was: develop elliptic curve support for OCaml & MirageOS quickly - which didn't leave much time to focus on performance. As time goes by, our mileage varies, and we're keen to use fewer resources - and thus fewer CPU time and a smaller memory footprint is preferable.
|
||||
|
||||
## Memory allocation and calls to C
|
||||
|
||||
OCaml uses managed memory with a generational copying collection. To safely call a C function at any point in time when the arguments are OCaml values (memory allocated on the OCaml heap), it is crucial that while the C function is executed, the arguments should stay at the same memory location, and not being moved by the GC. Otherwise the C code may be upset retrieving wrong data or accessing unmapped memory.
|
||||
|
||||
There are several strategies to achieve this, ranging from "let's use another memory area where the GC doesn't mess around with", "do not run any GC while executing the C code" (read further in the OCaml [cheaper C calls](https://v2.ocaml.org/releases/4.14/htmlman/intfc.html#ss:c-direct-call) manual), "deeply copy the arguments to a non-moving memory area before executing C code", and likely others.
|
||||
|
||||
For our elliptic curve operations, the C code is pretty simple - there are no memory allocations happening in C, neither are exceptions raised. Also, the execution time of the code is constant and pretty small.
|
||||
|
||||
## ocaml-cstruct
|
||||
|
||||
In the [MirageOS](https://mirage.io) ecosystem, a core library is [cstruct](https://github.com/mirage/ocaml-cstruct) - which purpose is manifold: provide ppx rewriters to define C structure layouts in OCaml (getter/setter functions are generated), as well as enums; also a fundamental idea is to use OCaml bigarray which is non-moving memory not allocated on the OCaml heap but directly by calling `malloc`. The memory can even be page-aligned, as required by some C software, such as Xen. Convenient functionality, such as "retrieve a big-endian unsigned 32 bit integer from offset X in this buffer" are provided as well.
|
||||
|
||||
But there's a downside to it - as time moves along, Xen is no longer the only target for MirageOS, and other virtualization mechanisms (such as KVM / virtio) do not require page-aligned memory ranges that are retained at a given memory address. It also turns out that cstruct spends a lot of time in bounds checks. Another huge downside is that OCaml tooling (such as statmemprof) was for a long time (maybe still is not?) unaware of out-of-OCaml-GC allocated memory (cstruct uses bigarray as underlying buffer). Freeing up the memory requires finalizers to be executed - after all pretty tedious (expensive) and against the OCaml runtime philosophy.
|
||||
|
||||
As time moves forward, also the OCaml standard library got support for (a) strings are immutable byte vectors now (since 4.06 - released in 2017 -- there's as well an interface for mutable/immutable cstruct, but that is not used as far as I can tell), (b) retrieve a certain amount of octets in a string or byte as (unsigned) integer number (since 4.08 - released in 2019, while some additional functionality is only available in 4.13).
|
||||
|
||||
Still, bigarrays are necessary in certain situations - if you need to have a non-moving (shared) area of memory, as in the Xen interface, but also e.g. when you compute in parallel in different processes, or when you need mmap()ed files.
|
||||
|
||||
## Putting it together
|
||||
|
||||
Already in October 2021, Romain [proposed](https://github.com/mirage/mirage-crypto/pull/146) to not use cstruct, but bytes for mirage-crypto-ec. The PR was sitting around since there were benchmarks missing, and developer time was small. But recently, Virgile Robles [proposed](https://github.com/mirage/mirage-crypto/pull/191) another line of work to use pre-computed tables for NIST curves to speed up the elliptic curve cryptography. Conducting performance evaluation resulted that the "use bytes instead of cstruct" combined with pre-computed tables made a huge difference (factor of 6) compared to the latest release.
|
||||
|
||||
To ease reviewing changes, we decided to focus on landing the "use bytes instead of cstruct" first, and gladly Pierre Alain had already rebased the existing patch onto the latest release of mirage-crypto-ec. We also went further and use string where applicable instead of bytes. For safety reasons we also introduced an API layer which (a) allocates a byte vector for the result (b) calls the primitive, and \(c) transforms the byte vector into an immutable string. This API is more in line with functional programming (immutable values), and since allocations and deallocations of values are cheap, there's no measurable performance decrease.
|
||||
|
||||
All the changes are internal, there's no external API that needs to be adjusted - still there's at the API boundary one conversion of cstruct to string (and back for the return value) done.
|
||||
|
||||
We used `perf` to construct some flame graphs (of the ECDSA P256 sign), shown below.
|
||||
|
||||
![Flamegraph of ECDSA sign with cstruct](../images/trace-cstruct-440.svg)
|
||||
|
||||
The flame graph of P256 ECDSA sign using the mirage-crypto release 0.11.2. The majority of time is spent in "do_sign", which calls `inv` (inversion), `scalar_mult` (majority of time), and `x_of_finite_point_mod_n`. The scalar multiplication spends time in `add`, `double` and `select`. Several towers starting at `Cstruct.create_919` are visible.
|
||||
|
||||
With PR#146, the flame graph looks different:
|
||||
|
||||
![Flamegraph of ECDSA sign with string](../images/trace-string-770.svg)
|
||||
|
||||
Now, the allocation towers do not exist anymore. The time of a sign operation is spend in `inv`, `scalar_mult`, and `x_of_finite_point_mod_n`. There's still room for improvements in these operations.
|
||||
|
||||
## Performance numbers
|
||||
|
||||
All numbers were gathered on a Lenovo X250 laptop with a Intel i7-5600U CPU @ 2.60GHz. We used OCaml 4.14.1 as compiler. The baseline is OpenSSL 3.0.12. All numbers are in operations per second.
|
||||
|
||||
NIST P-256
|
||||
|
||||
| op | 0.11.2 | PR#146 | speedup | OpenSSL | speedup |
|
||||
| - | - | - | - | - | - |
|
||||
| sign | 748 | 1806 | 2.41x | 34392 | 19.04x |
|
||||
| verify | 285 | 655 | 2.30x | 12999 | 19.85x |
|
||||
| ecdh | 858 | 1785 | 2.08x | 16514 | 9.25x |
|
||||
|
||||
Curve 25519
|
||||
|
||||
| op | 0.11.2 | PR#146 | speedup | OpenSSL | speedup |
|
||||
| - | - | - | - | - | - |
|
||||
| sign | 10713 | 11560 | 1.08x | 21943 | 1.90x |
|
||||
| verify | 7600 | 8314 | 1.09x | 7081 | 0.85x |
|
||||
| ecdh | 12144 | 13457 | 1.11x | 26201 | 1.95x |
|
||||
|
||||
|
||||
Note: to re-create the performance numbers, you can run `openssl speed ecdsap256 ecdhp256 ed25519 ecdhx25519` - for the OCaml site, use `dune bu bench/speed.exe --rel` and `_build/default/bench/speed.exe ecdsa-sign ecdsa-verify ecdh-share`.
|
||||
|
||||
The performance improvements are up to 2.5 times compared to the latest mirage-crypto-ec release (look at the 4th column). In comparison to OpenSSL, we still lack a factor of 20 for the NIST curves, and up to a factor of 2 for 25519 computations (look at the last column).
|
||||
|
||||
If you have ideas for improvements, let us know via an issue, eMail, or a pull request :) We started to [gather some](https://github.com/mirage/mirage-crypto/issues/193) for 25519 by comparing our code with changes in BoringSSL over the last years.
|
||||
|
||||
As a spoiler, for P-256 sign there's another improvement of around 4.5 with [Virgile's PR](https://github.com/mirage/mirage-crypto/pull/191) using pre-computed tables also for NIST curves.
|
||||
|
||||
## The road ahead for 2024
|
||||
|
||||
Remove all cstruct, everywhere, apart from in mirage-block-xen and mirage-net-xen ;). It was a fine decision in the early MirageOS days, but from a performance point of view, and for making our packages more broadly usable without many dependencies, it is time to remove cstruct. Earlier this year we already [removed cstruct from ocaml-tar](https://github.com/mirage/ocaml-tar/pull/137) for similar reasons.
|
||||
|
||||
Our MirageOS work is only partially funded, we cross-fund our work by commercial contracts and public (EU) funding. We are part of a non-profit company, you can make a (tax-deductable - at least in the EU) [donation](https://aenderwerk.de/donate/) (select "DONATION robur" in the dropdown menu).
|
||||
|
||||
We're keen to get MirageOS deployed in production - if you would like to do that, don't hesitate to reach out to us via eMail team at robur.coop
|
|
@ -1,405 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogThe new Tar release, a retrospective
|
||||
</title>
|
||||
<meta name="description" content="A little retrospective to the new Tar release and changes">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="https://blog.robur.coop/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>The new Tar release, a retrospective</h1>
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Cstruct">Cstruct</a></li><li><a href="https://blog.robur.coop/tags.html#tag-functors">functors</a></li></ul><p>We are delighted to announce the new release of <code>ocaml-tar</code>. A small library for
|
||||
reading and writing tar archives in OCaml. Since this is a major release, we'll
|
||||
take the time in this article to explain the work that's been done by the
|
||||
cooperative on this project.</p>
|
||||
<p>Tar is an <strong>old</strong> project. Originally written by David Scott as part of Mirage,
|
||||
this project is particularly interesting for building bridges between the tools
|
||||
we can offer and what already exists. Tar is, in fact, widely used. So we're
|
||||
both dealing with a format that's older than I am (but I'm used to it by email)
|
||||
and a project that's been around since... 2012 (over 10 years!).</p>
|
||||
<p>But we intend to maintain and improve it, since we're using it for the
|
||||
<a href="https://hannes.robur.coop/Posts/OpamMirror">opam-mirror</a> project among other things - this unikernel is to
|
||||
provide an opam-repository "tarball" for opam when you do <code>opam update</code>.</p>
|
||||
<h2 id="cstructt--bytes"><a class="anchor" aria-hidden="true" href="#cstructt--bytes"></a><code>Cstruct.t</code> & bytes</h2>
|
||||
<p>As some of you may have noticed, over the last few months we've begun a fairly
|
||||
substantial change to the Mirage ecosystem, replacing the use of <code>Cstruct.t</code> in
|
||||
key places with bytes/string.</p>
|
||||
<p>This choice is based on 2 considerations:</p>
|
||||
<ul>
|
||||
<li>we came to realize that <code>Cstruct.t</code> could be very costly in terms of
|
||||
performance</li>
|
||||
<li><code>Cstruct.t</code> remains a "Mirage" structure; outside the Mirage ecosystem, the
|
||||
use of <code>Cstruct.t</code> is not so "obvious".</li>
|
||||
</ul>
|
||||
<p>The pull-request is available here: https://github.com/mirage/ocaml-tar/pull/137.
|
||||
The discussion can be interesting in discovering common bugs (uninitialized
|
||||
buffer, invalid access). There's also a small benchmark to support our initial
|
||||
intuition<sup><a href="#fn1">1</a></sup>.</p>
|
||||
<p>But this PR can also be an opportunity to understand the existence of
|
||||
<code>Cstruct.t</code> in the Mirage ecosystem and the reasons for this historic choice.</p>
|
||||
<h3 id="cstructt-as-a-non-moveable-data"><a class="anchor" aria-hidden="true" href="#cstructt-as-a-non-moveable-data"></a><code>Cstruct.t</code> as a non-moveable data</h3>
|
||||
<p>I've already <a href="https://discuss.ocaml.org/t/buffered-io-bytes-vs-bigstring/8978/3">made</a> a list of pros/cons when it comes to
|
||||
bigarrays. Indeed, <code>Cstruct.t</code> is based on a bigarray:</p>
|
||||
<pre><code class="language-ocaml">type buffer = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t
|
||||
|
||||
type t =
|
||||
{ buffer : buffer
|
||||
; off : int
|
||||
; len : int }
|
||||
</code></pre>
|
||||
<p>The experienced reader may rightly wonder why Cstruct.t is a bigarray with <code>off</code>
|
||||
and <code>len</code>. First, we need to clarify what a bigarray is for OCaml.</p>
|
||||
<p>A bigarray is a somewhat special value in OCaml. This value is allocated in the
|
||||
C heap. In other words, its contents are not in OCaml's garbage collector, but
|
||||
exist outside it. The first (and very important) implication of this feature is
|
||||
that the contents of a bigarray do not move (even if the GC tries to defragment
|
||||
the memory). This feature has several advantages:</p>
|
||||
<ul>
|
||||
<li>in parallel programming, it can be very interesting to use a bigarray knowing
|
||||
that, from the point of view of the 2 processes, the position of the bigarray
|
||||
will never change - this is essentially what <a href="https://github.com/rdicosmo/parmap">parmap</a> does (before
|
||||
OCaml 5).</li>
|
||||
<li>for calculations such as checksum or hash, it can be interesting to use a
|
||||
bigarray. The calculation would not be interrupted by the GC since the
|
||||
bigarray does not move. The calculation can therefore be continued at the same
|
||||
point, which can help the CPU to better predict the next stage of the
|
||||
calculation. This is what <a href="https://github.com/mirage/digestif">digestif</a> offers and what
|
||||
<a href="https://github.com/mirage/decompress">decompress</a> requires.</li>
|
||||
<li>for one reason or another, particularly when interacting with something other
|
||||
than OCaml, you need to offer a memory zone that cannot move. This is
|
||||
particularly true for unikernels as Xen guests (where the <em>net device</em>
|
||||
corresponds to a fixed memory zone with which we need to interact) or
|
||||
<a href="https://ocaml.org/manual/5.2/api/Unix.html#1_Mappingfilesintomemory">mmap</a>.</li>
|
||||
<li>there are other subtleties more related to the way OCaml compiles. For
|
||||
example, using bigarray layouts to manipulate "bigger words" can really have
|
||||
an impact on performance, as <a href="https://github.com/robur-coop/utcp/pull/29">this PR</a> on <a href="https://github.com/robur-coop/utcp">utcp</a> shows.</li>
|
||||
<li>finally, it may be useful to store sensitive information in a bigarray so as
|
||||
to have the opportunity to clean up this information as quickly as possible
|
||||
(ensuring that the GC has not made a copy) in certain situations.</li>
|
||||
</ul>
|
||||
<p>All these examples show that bigarrays can be of real interest as long as
|
||||
<strong>their uses are properly contextualized</strong> - which ultimately remains very
|
||||
specific. Our experience of using them in Mirage has shown us their advantages,
|
||||
but also, and above all, their disadvantages:</p>
|
||||
<ul>
|
||||
<li>keep in mind that bigarray allocation uses either a system call like <code>mmap</code> or
|
||||
<code>malloc()</code>. The latter, compared with what OCaml can offer, is slow. As soon
|
||||
as you need to allocate bytes/strings smaller than
|
||||
<a href="https://github.com/ocaml/ocaml/blob/744006bfbfa045cc1ca442ff7b52c2650d2abe32/runtime/alloc.c#L175"><code>(256 * words)</code></a>, these values are allocated in the minor heap,
|
||||
which is incredibly fast to allocate (3 processor instructions which can be
|
||||
predicted very well). So, preferring to allocate a 10-byte bigarray rather
|
||||
than a 10-byte <code>bytes</code> penalizes you enormously.</li>
|
||||
<li>since the bigarray exists in the C heap, the GC has a special mechanism for
|
||||
knowing when to <code>free()</code> the zone as soon as the value is no longer in use.
|
||||
Reference-counting is used to then allocate "small" values in the OCaml heap
|
||||
and use them to manipulate <em>indirectly</em> the bigarray.</li>
|
||||
</ul>
|
||||
<h4 id="ownership-proxy-and-gc"><a class="anchor" aria-hidden="true" href="#ownership-proxy-and-gc"></a>Ownership, proxy and GC</h4>
|
||||
<p>This last point deserves a little clarification, particularly with regard to the
|
||||
<code>Bigarray.sub</code> function. This function will not create a new, smaller bigarray
|
||||
and copy what was in the old one to the new one (as <code>Bytes.sub</code>/<code>String.sub</code>
|
||||
does). In fact, OCaml will allocate a "proxy" of your bigarray that represents a
|
||||
subfield. This is where <em>reference-counting</em> comes in. This proxy value needs
|
||||
the initial bigarray to be manipulated. So, as long as proxies exist, the GC
|
||||
cannot <code>free()</code> the initial bigarray.</p>
|
||||
<p>This poses several problems:</p>
|
||||
<ul>
|
||||
<li>the first is the allocation of these proxies. They can help us to manipulate
|
||||
the initial bigarray in several places without copying it, but as time goes
|
||||
by, these proxies could be very expensive</li>
|
||||
<li>the second is GC intervention. You still need to scan the bigarray, in a
|
||||
particular way, to know whether or not to keep it. This particular scan, once
|
||||
again in time immemorial, was not all that common.</li>
|
||||
<li>the third concerns bigarray ownership. Since we're talking about proxies, we
|
||||
can imagine 2 competing tasks having access to the same bigarray.</li>
|
||||
</ul>
|
||||
<p>As far as the first point is concerned, <code>Bigarray.sub</code> could still be "slow" for
|
||||
small data since it was, <em>de facto</em> (since a bigarray always has a finalizer -
|
||||
don't forget reference counting!), allocated in the major heap. And, in truth,
|
||||
this is perhaps the main reason for the existence of Cstruct! To have a "proxy"
|
||||
to a bigarray allocated in the minor heap (and, be fast). But since
|
||||
<a href="https://github.com/ocaml/ocaml/pull/92">Pierre Chambart's PR#92</a>, the problem is no more.</p>
|
||||
<p>The second point, on the other hand, is still topical, even if we can see that
|
||||
<a href="https://github.com/ocaml/ocaml/pull/1738">considerable efforts</a> have been made. What we see every
|
||||
day on our unikernels is <a href="https://github.com/ocaml/ocaml/issues/7750">the pressure</a> that can be put on
|
||||
the GC when it comes to bigarrays. Indeed, bigarrays use memory and making the C
|
||||
heap cohabit with the OCaml heap inevitably comes at a cost. As far as
|
||||
unikernels are concerned, which have a more limited memory than an OCaml
|
||||
application, we reach this limit rather quickly and we therefore ask the GC to
|
||||
work more specifically on our 10 or 20 byte bigarrays...</p>
|
||||
<p>Finally, the third point can be the toughest. On several occasions, we've
|
||||
noticed competing accesses on our bigarrays that we didn't want (for example,
|
||||
<code>http-lwt-client</code> had <a href="https://github.com/robur-coop/http-lwt-client/pull/16">this problem</a>). In our experience,
|
||||
it's very difficult to observe and know that there is indeed an unauthorized
|
||||
concurrent access changing the contents of our buffer. In this respect, the
|
||||
question remains open as regards <code>Cstruct.t</code> and the possibility of encoding
|
||||
ownership of a <code>Cstruct.t</code> in the type to prevent unauthorized access.
|
||||
<a href="https://github.com/mirage/ocaml-cstruct/pull/237">This PR</a> is interesting to see all the discussions that have taken
|
||||
place on this subject<sup><a href="#fn2">2</a></sup>.</p>
|
||||
<p>It should be noted that, with regard to the third point, the problem also
|
||||
applies to bytes and the use of <code>Bytes.unsafe_to_string</code>!</p>
|
||||
<h3 id="conclusion-about-cstruct"><a class="anchor" aria-hidden="true" href="#conclusion-about-cstruct"></a>Conclusion about Cstruct</h3>
|
||||
<p>We hope we've been thorough enough in our experience with Cstruct. If we go back
|
||||
to the initial definition of our <code>Cstruct.t</code> shown above and take all the
|
||||
history into account, it becomes increasingly difficult to argue for a
|
||||
<strong>systematic</strong> use of Cstruct in our unikernels. In fact, the question of
|
||||
<code>Cstruct.t</code> versus bytes/string remains completely open.</p>
|
||||
<p>It's worth noting that the original reasons for <code>Cstruct.t</code> are no longer really
|
||||
relevant if we consider how OCaml has evolved. It should also be noted that this
|
||||
systematic approach to using <code>Cstruct.t</code> rather than bytes/string has cost us.</p>
|
||||
<p>This is not to say that <code>Cstruct.t</code> is obsolete. The library is very good and
|
||||
offers an API where manipulating bytes to extract information such as a TCP/IP
|
||||
packet remains more pleasant than directly using bytes (even if, here too,
|
||||
<a href="https://github.com/ocaml/ocaml/pull/1864">efforts</a> have been made).</p>
|
||||
<p>As far as <code>ocaml-tar</code> is concerned, what really counts is the possibility for
|
||||
other projects to use this library without requiring <code>Cstruct.t</code> - thus
|
||||
facilitating its adoption. In other words, given the advantages/disadvantages of
|
||||
<code>Cstruct.t</code>, we felt it would be a good idea to remove this dependency.</p>
|
||||
<hr />
|
||||
<p><tag id="fn1"><strong>1</strong></tag>: It should be noted that the benchmark also concerns
|
||||
compression. In this case, we use <code>decompress</code>, which uses bigarrays. So there's
|
||||
some copying involved (from bytes to bigarrays)! But despite this copying, it
|
||||
seems that the change is worthwhile.</p>
|
||||
<p><tag id="fn2"><strong>2</strong></tag>: It reminds me that we've been experimenting with
|
||||
capabilities and using the type system to enforce certain characteristics. To
|
||||
date, <code>Cstruct_cap</code> has not been used anywhere, which raises a real question
|
||||
about the advantages/disadvantages in everyday use.</p>
|
||||
<h2 id="functors"><a class="anchor" aria-hidden="true" href="#functors"></a>Functors</h2>
|
||||
<p>This is perhaps the other point of the Mirage ecosystem that is also the subject
|
||||
of debate. Functors! Before we talk about functors, we need to understand their
|
||||
relevance in the context of Mirage.</p>
|
||||
<p>Mirage transforms an application into an operating system. What's the difference
|
||||
between a "normal" application and a unikernel: the "subsystem" with which you
|
||||
interact. In this case, a normal application will interact with the host system,
|
||||
whereas a unikernel will have to interact with the Solo5 <em>mini-system</em>.</p>
|
||||
<p>What Mirage is trying to offer is the ability for an application to transform
|
||||
itself into either without changing a thing! Mirage's aim is to <strong>inject</strong> the
|
||||
subsystem into your application. In this case:</p>
|
||||
<ul>
|
||||
<li>inject <code>unix.cmxa</code> when you want a Mirage application to become a simple
|
||||
executable</li>
|
||||
<li>inject <a href="https://github.com/mirage/ocaml-solo5">ocaml-solo5</a> when you want to produce a unikernel</li>
|
||||
</ul>
|
||||
<p>So we're not going to talk about the pros and cons of this approach here, but
|
||||
consider this feature as one that requires us to use functors.</p>
|
||||
<p>Indeed, what's the best way in OCaml to inject one implementation into another:
|
||||
functors? There are definite advantages here too, but we're going to concentrate
|
||||
on one in particular: the expressiveness of types at module level (which can be
|
||||
used as arguments to our functors).</p>
|
||||
<p>For example, did you know that OCaml has a dependent type system?</p>
|
||||
<pre><code class="language-ocaml">type 'a nat = Zero : zero nat | Succ : 'a nat -> 'a succ nat
|
||||
and zero = |
|
||||
and 'a succ = S
|
||||
|
||||
module type T = sig type t val v : t nat end
|
||||
module type Rec = functor (T:T) -> T
|
||||
module type Nat = functor (S:Rec) -> functor (Z:T) -> T
|
||||
|
||||
module Zero = functor (S:Rec) -> functor (Z:T) -> Z
|
||||
module Succ = functor (N:Nat) -> functor (S:Rec) -> functor (Z:T) -> S(N(S)(Z))
|
||||
module Add = functor (X:Nat) -> functor (Y:Nat) -> functor (S:Rec) -> functor (Z:T) -> X(S)(Y(S)(Z))
|
||||
|
||||
module One = Succ(Zero)
|
||||
module Two_a = Add(One)(One)
|
||||
module Two_b = Succ(One)
|
||||
|
||||
module Z : T with type t = zero = struct
|
||||
type t = zero
|
||||
let v = Zero
|
||||
end
|
||||
|
||||
module S (T:T) : T with type t = T.t succ = struct
|
||||
type t = T.t succ
|
||||
let v = Succ T.v
|
||||
end
|
||||
|
||||
module A = Two_a(S)(Z)
|
||||
module B = Two_b(S)(Z)
|
||||
|
||||
type ('a, 'b) refl = Refl : ('a, 'a) refl
|
||||
|
||||
let _ : (A.t, B.t) refl = Refl (* 1+1 == succ 1 *)
|
||||
</code></pre>
|
||||
<p>The code is ... magical, but it shows that two differently constructed modules
|
||||
(<code>Two_a</code> & <code>Two_b</code>) ultimately produce the same type, and OCaml is able to prove
|
||||
this equality. Above all, the example shows just how powerful functors can be.
|
||||
But it also shows just how difficult functors can be to understand and use.</p>
|
||||
<p>In fact, this is one of Mirage's biggest drawbacks: the overuse of functors
|
||||
makes the code difficult to read and understand. It can be difficult to deduce
|
||||
in your head the type that results from an application of functors, and the
|
||||
constraints associated with it... (yes, I don't use <code>merlin</code>).</p>
|
||||
<p>But back to our initial problem: injection! In truth, the functor is a
|
||||
fly-killing sledgehammer in most cases. There are many other ways of injecting
|
||||
what the system would be (and how to do a <code>read</code> or <code>write</code>) into an
|
||||
implementation. The best example, as <a href="https://discuss.ocaml.org/t/best-practices-and-design-patterns-for-supporting-concurrent-io-in-libraries/15001/4?u=dinosaure">@nojb pointed out</a>, is of
|
||||
course <a href="https://github.com/mirleft/ocaml-tls">ocaml-tls</a> - this answer also shows a contrast between the
|
||||
functor approach (with <a href="https://github.com/mirage/ocaml-cohttp">CoHTTP</a> for example) and the "pure value-passing
|
||||
interface" of <code>ocaml-tls</code>.</p>
|
||||
<p>What's more, we've been trying to find other approaches for injecting the system
|
||||
we want for several years now. We can already list several:</p>
|
||||
<ul>
|
||||
<li><code>ocaml-tls</code>' "value-passing" approach, of course, but also <code>decompress</code></li>
|
||||
<li>of course, there's the passing of <a href="https://github.com/mirage/colombe/blob/07cd4cf134168ecd841924ee7ddda1a9af8fbd5a/src/sigs.ml#L13-L16">a record</a> (a sort of
|
||||
mini-module with fewer possibilities with types, but which does the job - a
|
||||
poor man's functor, in short) which would have the functions to perform the
|
||||
system's operations</li>
|
||||
<li><a href="https://github.com/dinosaure/mimic">mimic</a> can be used to inject a module as an implementation of a
|
||||
flow/stream according to a resolution mechanism (DNS, <code>/etc/services</code>, etc.) -
|
||||
a little closer to the idea of <em>runtime-resolved implicit implementations</em></li>
|
||||
<li>there are, of course, the variants (but if we go back to 2010, this solution
|
||||
wasn't so obvious) popularized by <a href="https://github.com/dbuenzli/ptime">ptime</a>/<a href="https://github.com/dbuenzli/mtime">mtime</a>, <code>digestif</code> &
|
||||
<a href="https://github.com/ocaml/dune/pull/1207">dune</a></li>
|
||||
<li>and finally, <a href="https://github.com/mirage/decompress/blob/c8301ba674e037b682338958d6d0bb5c42fd720e/lib/lzo.ml#L164-L175">GADTs</a>, which describe what the process should
|
||||
do, then let the user implement the <code>run</code> function according to the system.</li>
|
||||
</ul>
|
||||
<p>In short, based on this list and the various experiments we've carried out on a
|
||||
number of projects, we've decided to remove the functors from <code>ocaml-tar</code>! The
|
||||
crucial question now is: which method to choose?</p>
|
||||
<h3 id="the-best-answers"><a class="anchor" aria-hidden="true" href="#the-best-answers"></a>The best answers</h3>
|
||||
<p>There's no real answer to that, and in truth it depends on what level of
|
||||
abstraction you're at. In fact, you'd like to have a fairly simple method of
|
||||
abstraction from the system at the start and at the lowest level, to end up
|
||||
proposing a functor that does all the <em>ceremony</em> (the glue between your
|
||||
implementation and the system) at the end - that's what <a href="https://github.com/mirage/ocaml-git">ocaml-git</a>
|
||||
does, for example.</p>
|
||||
<p>The abstraction you choose also depends on how the process is going to work. As
|
||||
far as streams/protocols are concerned, the <code>ocaml-tls</code>/<code>decompress</code> approach
|
||||
still seems the best. But when it comes to introspecting a file/block-device, it
|
||||
may be preferable to use a GADT that will force the user to implement an
|
||||
arbitrary memory access rather than consume a sequence of bytes. In short, at
|
||||
this stage, experience speaks for itself and, just as we were wrong about
|
||||
functors, we won't be advising you to use this or that solution.</p>
|
||||
<p>But based on our experience of <code>ocaml-tls</code> & <code>decompress</code> with LZO (which
|
||||
requires arbitrary access to the content) and the way Tar works, we decided to
|
||||
use a "value-passing" approach (to describe when we need to read/write) and a
|
||||
GADT to describe calculations such as:</p>
|
||||
<ul>
|
||||
<li>iterating over the files/folders contained in a Tar document</li>
|
||||
<li>producing a Tar file according to a "dispenser" of inputs</li>
|
||||
</ul>
|
||||
<pre><code class="language-ocaml">val decode : decode_state -> string ->
|
||||
decode_state *
|
||||
* [ `Read of int
|
||||
| `Skip of int
|
||||
| `Header of Header.t ] option
|
||||
* Header.Extended.t option
|
||||
(** [decode state] returns a new state and what the user should do next:
|
||||
- [`Skip] skip bytes
|
||||
- [`Read] read bytes
|
||||
- [`Header hdr] do something according the last header extracted
|
||||
(like stream-out the contents of a file). *)
|
||||
|
||||
type ('a, 'err) t =
|
||||
| Really_read : int -> (string, 'err) t
|
||||
| Read : int -> (string, 'err) t
|
||||
| Seek : int -> (unit, 'err) t
|
||||
| Bind : ('a, 'err) t * ('a -> ('b, 'err) t) -> ('b, 'err) t
|
||||
| Return : ('a, 'err) result -> ('a, 'err) t
|
||||
| Write : string -> (unit, 'err) t
|
||||
</code></pre>
|
||||
<p>However, and this is where we come back to OCaml's limitations and where
|
||||
functors could help us: higher kinded polymorphism!</p>
|
||||
<h3 id="higher-kinded-polymorphism"><a class="anchor" aria-hidden="true" href="#higher-kinded-polymorphism"></a>Higher kinded Polymorphism</h3>
|
||||
<p>If we return to our functor example above, there's one element that may be of
|
||||
interest: <code>T with type t = T.t succ</code></p>
|
||||
<p>In other words, add a constraint to a signature type. A constraint often seen
|
||||
with Mirage (but deprecated now according to <a href="https://github.com/mirage/mirage/issues/1004#issue-507517315">this issue</a>) is the
|
||||
type <code>io</code> and its constraint: <code>type 'a io</code>, <code>with type 'a io = 'a Lwt.t</code>.</p>
|
||||
<p>So we had this type in Tar. The problem is that our GADT can't understand that
|
||||
sometimes it will have to manipulate <em>Lwt</em> values, sometimes <em>Async</em> or
|
||||
sometimes <em>Eio</em> (or <em>Miou</em>!). In other words: how do we compose our <code>Bind</code> with
|
||||
the <code>Bind</code> of these three targets? The difficulty lies above all in history?
|
||||
Supporting this library requires us to assume a certain compatibility with
|
||||
applications over which we have no control. What's more, we need to maintain
|
||||
support for all three libraries without imposing one.</p>
|
||||
<hr />
|
||||
<p>A small disgression at this stage seems important to us, as we've been working
|
||||
in this way for over 10 years. Of course, despite all the solutions mentioned
|
||||
above, not depending on a system (and/or a scheduler) also allows us to ensure
|
||||
the existence of libraries like Tar over more than a decade! The OCaml ecosystem
|
||||
is changing, and choosing this or that library to facilitate the development of
|
||||
an application has implications we might regret 10 years down the line (for
|
||||
example... <code>Cstruct.t</code>!). So, it can be challenging to ensure compatibility with
|
||||
all systems, but the result is libraries steeped in the experience and know-how
|
||||
of many developers!</p>
|
||||
<hr />
|
||||
<p>So, and this is why we talk about Higher Kinded Polymorphism, how do we abstract
|
||||
the <code>t</code> from <code>'a t</code> (to replace it with <code>Lwt.t</code> or even with a type such as
|
||||
<code>type 'a t = 'a</code>)? This is where we're going to use the trick explained in
|
||||
<a href="https://www.cl.cam.ac.uk/~jdy22/papers/lightweight-higher-kinded-polymorphism.pdf">this paper</a>. The trick is to consider a "new type" that will represent our
|
||||
monad (lwt or async) and inject/project a value from this monad to something
|
||||
understandable by our GADT: <code>High : ('a, 't) io -> ('a, 't) t</code>.</p>
|
||||
<pre><code class="language-ocaml">type ('a, 't) io
|
||||
|
||||
type ('a, 'err, 't) t =
|
||||
| Really_read : int -> (string, 'err, 't) t
|
||||
| Read : int -> (string, 'err, 't) t
|
||||
| Seek : int -> (unit, 'err, 't) t
|
||||
| Bind : ('a, 'err, 't) t * ('a -> ('b, 'err, 't) t) -> ('b, 'err, 't) t
|
||||
| Return : ('a, 'err) result -> ('a, 'err, 't) t
|
||||
| Write : string -> (unit, 'err, 't) t
|
||||
| High : ('a, 't) io -> ('a, 'err, 't) t
|
||||
</code></pre>
|
||||
<p>Next, we need to create this new type according to the chosen scheduler. Let's
|
||||
take <em>Lwt</em> as an example:</p>
|
||||
<pre><code class="language-ocaml">module Make (X : sig type 'a t end) = struct
|
||||
type t (* our new type *)
|
||||
type 'a s = 'a X.t
|
||||
|
||||
external inj : 'a s -> ('a, t) io = "%identity"
|
||||
external prj : ('a, t) io -> 'a s = "%identity"
|
||||
end
|
||||
|
||||
module L = Make(Lwt)
|
||||
|
||||
let rec run
|
||||
: type a err. (a, err, L.t) t -> (a, err) result Lwt.t
|
||||
= function
|
||||
| High v -> Ok (L.prj v)
|
||||
| Return v -> Lwt.return v
|
||||
| Bind (x, f) ->
|
||||
run x >>= fun value -> run (f value)
|
||||
| _ -> ...
|
||||
</code></pre>
|
||||
<p>So, as you can see, it's a real trick to avoid doing at home without a
|
||||
companion. Indeed, the use of <code>%identity</code> corresponds to an <code>Obj.magic</code>! So even
|
||||
if the <code>io</code> type is exposed (to let the user derive Tar for their own system),
|
||||
this trick is not exposed for other packages, and we instead suggest helpers
|
||||
such as:</p>
|
||||
<pre><code class="language-ocaml">val lwt : 'a Lwt.t -> ('a, 'err, lwt) t
|
||||
val miou : 'a -> ('a, 'err, miou) t
|
||||
</code></pre>
|
||||
<p>But this way, Tar can always be derived from another system, and the process for
|
||||
extracting entries from a Tar file is the same for <strong>all</strong> systems!</p>
|
||||
<h2 id="conclusion"><a class="anchor" aria-hidden="true" href="#conclusion"></a>Conclusion</h2>
|
||||
<p>This Tar release isn't as impressive as this article, but it does sum up all the
|
||||
work we've been able to do over the last few months and years. We hope that our
|
||||
work is appreciated and that this article, which sets out all the thoughts we've
|
||||
had (and still have), helps you to better understand our work!</p>
|
||||
|
||||
</article>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
464
articles/tar-release.md
Normal file
464
articles/tar-release.md
Normal file
|
@ -0,0 +1,464 @@
|
|||
---
|
||||
date: 2024-08-15
|
||||
title: The new Tar release, a retrospective
|
||||
description: A little retrospective to the new Tar release and changes
|
||||
tags:
|
||||
- OCaml
|
||||
- Cstruct
|
||||
- functors
|
||||
author:
|
||||
name: Romain Calascibetta
|
||||
email: romain.calascibetta@gmail.com
|
||||
link: https://blog.osau.re
|
||||
---
|
||||
We are delighted to announce the new release of `ocaml-tar`. A small library for
|
||||
reading and writing tar archives in OCaml. Since this is a major release, we'll
|
||||
take the time in this article to explain the work that's been done by the
|
||||
cooperative on this project.
|
||||
|
||||
Tar is an **old** project. Originally written by David Scott as part of Mirage,
|
||||
this project is particularly interesting for building bridges between the tools
|
||||
we can offer and what already exists. Tar is, in fact, widely used. So we're
|
||||
both dealing with a format that's older than I am (but I'm used to it by email)
|
||||
and a project that's been around since... 2012 (over 10 years!).
|
||||
|
||||
But we intend to maintain and improve it, since we're using it for the
|
||||
[opam-mirror][opam-mirror] project among other things - this unikernel is to
|
||||
provide an opam-repository "tarball" for opam when you do `opam update`.
|
||||
|
||||
## `Cstruct.t` & bytes
|
||||
|
||||
As some of you may have noticed, over the last few months we've begun a fairly
|
||||
substantial change to the Mirage ecosystem, replacing the use of `Cstruct.t` in
|
||||
key places with bytes/string.
|
||||
|
||||
This choice is based on 2 considerations:
|
||||
- we came to realize that `Cstruct.t` could be very costly in terms of
|
||||
performance
|
||||
- `Cstruct.t` remains a "Mirage" structure; outside the Mirage ecosystem, the
|
||||
use of `Cstruct.t` is not so "obvious".
|
||||
|
||||
The pull-request is available here: https://github.com/mirage/ocaml-tar/pull/137.
|
||||
The discussion can be interesting in discovering common bugs (uninitialized
|
||||
buffer, invalid access). There's also a small benchmark to support our initial
|
||||
intuition<sup>[1](#fn1)</sup>.
|
||||
|
||||
But this PR can also be an opportunity to understand the existence of
|
||||
`Cstruct.t` in the Mirage ecosystem and the reasons for this historic choice.
|
||||
|
||||
### `Cstruct.t` as a non-moveable data
|
||||
|
||||
I've already [made][discuss-cstruct] a list of pros/cons when it comes to
|
||||
bigarrays. Indeed, `Cstruct.t` is based on a bigarray:
|
||||
```ocaml
|
||||
type buffer = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t
|
||||
|
||||
type t =
|
||||
{ buffer : buffer
|
||||
; off : int
|
||||
; len : int }
|
||||
```
|
||||
|
||||
The experienced reader may rightly wonder why Cstruct.t is a bigarray with `off`
|
||||
and `len`. First, we need to clarify what a bigarray is for OCaml.
|
||||
|
||||
A bigarray is a somewhat special value in OCaml. This value is allocated in the
|
||||
C heap. In other words, its contents are not in OCaml's garbage collector, but
|
||||
exist outside it. The first (and very important) implication of this feature is
|
||||
that the contents of a bigarray do not move (even if the GC tries to defragment
|
||||
the memory). This feature has several advantages:
|
||||
- in parallel programming, it can be very interesting to use a bigarray knowing
|
||||
that, from the point of view of the 2 processes, the position of the bigarray
|
||||
will never change - this is essentially what [parmap][parmap] does (before
|
||||
OCaml 5).
|
||||
- for calculations such as checksum or hash, it can be interesting to use a
|
||||
bigarray. The calculation would not be interrupted by the GC since the
|
||||
bigarray does not move. The calculation can therefore be continued at the same
|
||||
point, which can help the CPU to better predict the next stage of the
|
||||
calculation. This is what [digestif][digestif] offers and what
|
||||
[decompress][decompress] requires.
|
||||
- for one reason or another, particularly when interacting with something other
|
||||
than OCaml, you need to offer a memory zone that cannot move. This is
|
||||
particularly true for unikernels as Xen guests (where the _net device_
|
||||
corresponds to a fixed memory zone with which we need to interact) or
|
||||
[mmap][mmap].
|
||||
- there are other subtleties more related to the way OCaml compiles. For
|
||||
example, using bigarray layouts to manipulate "bigger words" can really have
|
||||
an impact on performance, as [this PR][pr-utcp] on [utcp][utcp] shows.
|
||||
- finally, it may be useful to store sensitive information in a bigarray so as
|
||||
to have the opportunity to clean up this information as quickly as possible
|
||||
(ensuring that the GC has not made a copy) in certain situations.
|
||||
|
||||
All these examples show that bigarrays can be of real interest as long as
|
||||
**their uses are properly contextualized** - which ultimately remains very
|
||||
specific. Our experience of using them in Mirage has shown us their advantages,
|
||||
but also, and above all, their disadvantages:
|
||||
- keep in mind that bigarray allocation uses either a system call like `mmap` or
|
||||
`malloc()`. The latter, compared with what OCaml can offer, is slow. As soon
|
||||
as you need to allocate bytes/strings smaller than
|
||||
[`(256 * words)`][minor-alloc], these values are allocated in the minor heap,
|
||||
which is incredibly fast to allocate (3 processor instructions which can be
|
||||
predicted very well). So, preferring to allocate a 10-byte bigarray rather
|
||||
than a 10-byte `bytes` penalizes you enormously.
|
||||
- since the bigarray exists in the C heap, the GC has a special mechanism for
|
||||
knowing when to `free()` the zone as soon as the value is no longer in use.
|
||||
Reference-counting is used to then allocate "small" values in the OCaml heap
|
||||
and use them to manipulate _indirectly_ the bigarray.
|
||||
|
||||
#### Ownership, proxy and GC
|
||||
|
||||
This last point deserves a little clarification, particularly with regard to the
|
||||
`Bigarray.sub` function. This function will not create a new, smaller bigarray
|
||||
and copy what was in the old one to the new one (as `Bytes.sub`/`String.sub`
|
||||
does). In fact, OCaml will allocate a "proxy" of your bigarray that represents a
|
||||
subfield. This is where _reference-counting_ comes in. This proxy value needs
|
||||
the initial bigarray to be manipulated. So, as long as proxies exist, the GC
|
||||
cannot `free()` the initial bigarray.
|
||||
|
||||
This poses several problems:
|
||||
- the first is the allocation of these proxies. They can help us to manipulate
|
||||
the initial bigarray in several places without copying it, but as time goes
|
||||
by, these proxies could be very expensive
|
||||
- the second is GC intervention. You still need to scan the bigarray, in a
|
||||
particular way, to know whether or not to keep it. This particular scan, once
|
||||
again in time immemorial, was not all that common.
|
||||
- the third concerns bigarray ownership. Since we're talking about proxies, we
|
||||
can imagine 2 competing tasks having access to the same bigarray.
|
||||
|
||||
As far as the first point is concerned, `Bigarray.sub` could still be "slow" for
|
||||
small data since it was, _de facto_ (since a bigarray always has a finalizer -
|
||||
don't forget reference counting!), allocated in the major heap. And, in truth,
|
||||
this is perhaps the main reason for the existence of Cstruct! To have a "proxy"
|
||||
to a bigarray allocated in the minor heap (and, be fast). But since
|
||||
[Pierre Chambart's PR#92][bigarray-minor], the problem is no more.
|
||||
|
||||
The second point, on the other hand, is still topical, even if we can see that
|
||||
[considerable efforts][better-bigarray-free] have been made. What we see every
|
||||
day on our unikernels is [the pressure][gc-bigarray-pressure] that can be put on
|
||||
the GC when it comes to bigarrays. Indeed, bigarrays use memory and making the C
|
||||
heap cohabit with the OCaml heap inevitably comes at a cost. As far as
|
||||
unikernels are concerned, which have a more limited memory than an OCaml
|
||||
application, we reach this limit rather quickly and we therefore ask the GC to
|
||||
work more specifically on our 10 or 20 byte bigarrays...
|
||||
|
||||
Finally, the third point can be the toughest. On several occasions, we've
|
||||
noticed competing accesses on our bigarrays that we didn't want (for example,
|
||||
`http-lwt-client` had [this problem][http-lwt-client-bug]). In our experience,
|
||||
it's very difficult to observe and know that there is indeed an unauthorized
|
||||
concurrent access changing the contents of our buffer. In this respect, the
|
||||
question remains open as regards `Cstruct.t` and the possibility of encoding
|
||||
ownership of a `Cstruct.t` in the type to prevent unauthorized access.
|
||||
[This PR][cstruct-cap] is interesting to see all the discussions that have taken
|
||||
place on this subject<sup>[2](#fn2)</sup>.
|
||||
|
||||
It should be noted that, with regard to the third point, the problem also
|
||||
applies to bytes and the use of `Bytes.unsafe_to_string`!
|
||||
|
||||
### Conclusion about Cstruct
|
||||
|
||||
We hope we've been thorough enough in our experience with Cstruct. If we go back
|
||||
to the initial definition of our `Cstruct.t` shown above and take all the
|
||||
history into account, it becomes increasingly difficult to argue for a
|
||||
**systematic** use of Cstruct in our unikernels. In fact, the question of
|
||||
`Cstruct.t` versus bytes/string remains completely open.
|
||||
|
||||
It's worth noting that the original reasons for `Cstruct.t` are no longer really
|
||||
relevant if we consider how OCaml has evolved. It should also be noted that this
|
||||
systematic approach to using `Cstruct.t` rather than bytes/string has cost us.
|
||||
|
||||
This is not to say that `Cstruct.t` is obsolete. The library is very good and
|
||||
offers an API where manipulating bytes to extract information such as a TCP/IP
|
||||
packet remains more pleasant than directly using bytes (even if, here too,
|
||||
[efforts][ocaml-getters] have been made).
|
||||
|
||||
As far as `ocaml-tar` is concerned, what really counts is the possibility for
|
||||
other projects to use this library without requiring `Cstruct.t` - thus
|
||||
facilitating its adoption. In other words, given the advantages/disadvantages of
|
||||
`Cstruct.t`, we felt it would be a good idea to remove this dependency.
|
||||
|
||||
<hr />
|
||||
|
||||
<tag id="fn1">**1**</tag>: It should be noted that the benchmark also concerns
|
||||
compression. In this case, we use `decompress`, which uses bigarrays. So there's
|
||||
some copying involved (from bytes to bigarrays)! But despite this copying, it
|
||||
seems that the change is worthwhile.
|
||||
|
||||
<tag id="fn2">**2**</tag>: It reminds me that we've been experimenting with
|
||||
capabilities and using the type system to enforce certain characteristics. To
|
||||
date, `Cstruct_cap` has not been used anywhere, which raises a real question
|
||||
about the advantages/disadvantages in everyday use.
|
||||
|
||||
## Functors
|
||||
|
||||
This is perhaps the other point of the Mirage ecosystem that is also the subject
|
||||
of debate. Functors! Before we talk about functors, we need to understand their
|
||||
relevance in the context of Mirage.
|
||||
|
||||
Mirage transforms an application into an operating system. What's the difference
|
||||
between a "normal" application and a unikernel: the "subsystem" with which you
|
||||
interact. In this case, a normal application will interact with the host system,
|
||||
whereas a unikernel will have to interact with the Solo5 _mini-system_.
|
||||
|
||||
What Mirage is trying to offer is the ability for an application to transform
|
||||
itself into either without changing a thing! Mirage's aim is to **inject** the
|
||||
subsystem into your application. In this case:
|
||||
- inject `unix.cmxa` when you want a Mirage application to become a simple
|
||||
executable
|
||||
- inject [ocaml-solo5][ocaml-solo5] when you want to produce a unikernel
|
||||
|
||||
So we're not going to talk about the pros and cons of this approach here, but
|
||||
consider this feature as one that requires us to use functors.
|
||||
|
||||
Indeed, what's the best way in OCaml to inject one implementation into another:
|
||||
functors? There are definite advantages here too, but we're going to concentrate
|
||||
on one in particular: the expressiveness of types at module level (which can be
|
||||
used as arguments to our functors).
|
||||
|
||||
For example, did you know that OCaml has a dependent type system?
|
||||
```ocaml
|
||||
type 'a nat = Zero : zero nat | Succ : 'a nat -> 'a succ nat
|
||||
and zero = |
|
||||
and 'a succ = S
|
||||
|
||||
module type T = sig type t val v : t nat end
|
||||
module type Rec = functor (T:T) -> T
|
||||
module type Nat = functor (S:Rec) -> functor (Z:T) -> T
|
||||
|
||||
module Zero = functor (S:Rec) -> functor (Z:T) -> Z
|
||||
module Succ = functor (N:Nat) -> functor (S:Rec) -> functor (Z:T) -> S(N(S)(Z))
|
||||
module Add = functor (X:Nat) -> functor (Y:Nat) -> functor (S:Rec) -> functor (Z:T) -> X(S)(Y(S)(Z))
|
||||
|
||||
module One = Succ(Zero)
|
||||
module Two_a = Add(One)(One)
|
||||
module Two_b = Succ(One)
|
||||
|
||||
module Z : T with type t = zero = struct
|
||||
type t = zero
|
||||
let v = Zero
|
||||
end
|
||||
|
||||
module S (T:T) : T with type t = T.t succ = struct
|
||||
type t = T.t succ
|
||||
let v = Succ T.v
|
||||
end
|
||||
|
||||
module A = Two_a(S)(Z)
|
||||
module B = Two_b(S)(Z)
|
||||
|
||||
type ('a, 'b) refl = Refl : ('a, 'a) refl
|
||||
|
||||
let _ : (A.t, B.t) refl = Refl (* 1+1 == succ 1 *)
|
||||
```
|
||||
|
||||
The code is ... magical, but it shows that two differently constructed modules
|
||||
(`Two_a` & `Two_b`) ultimately produce the same type, and OCaml is able to prove
|
||||
this equality. Above all, the example shows just how powerful functors can be.
|
||||
But it also shows just how difficult functors can be to understand and use.
|
||||
|
||||
In fact, this is one of Mirage's biggest drawbacks: the overuse of functors
|
||||
makes the code difficult to read and understand. It can be difficult to deduce
|
||||
in your head the type that results from an application of functors, and the
|
||||
constraints associated with it... (yes, I don't use `merlin`).
|
||||
|
||||
But back to our initial problem: injection! In truth, the functor is a
|
||||
fly-killing sledgehammer in most cases. There are many other ways of injecting
|
||||
what the system would be (and how to do a `read` or `write`) into an
|
||||
implementation. The best example, as [@nojb pointed out][nojb-response], is of
|
||||
course [ocaml-tls][ocaml-tls] - this answer also shows a contrast between the
|
||||
functor approach (with [CoHTTP][cohttp] for example) and the "pure value-passing
|
||||
interface" of `ocaml-tls`.
|
||||
|
||||
What's more, we've been trying to find other approaches for injecting the system
|
||||
we want for several years now. We can already list several:
|
||||
- `ocaml-tls`' "value-passing" approach, of course, but also `decompress`
|
||||
- of course, there's the passing of [a record][poor-man-functor] (a sort of
|
||||
mini-module with fewer possibilities with types, but which does the job - a
|
||||
poor man's functor, in short) which would have the functions to perform the
|
||||
system's operations
|
||||
- [mimic][mimic] can be used to inject a module as an implementation of a
|
||||
flow/stream according to a resolution mechanism (DNS, `/etc/services`, etc.) -
|
||||
a little closer to the idea of _runtime-resolved implicit implementations_
|
||||
- there are, of course, the variants (but if we go back to 2010, this solution
|
||||
wasn't so obvious) popularized by [ptime][ptime]/[mtime][mtime], `digestif` &
|
||||
[dune][dune-variants]
|
||||
- and finally, [GADTs][decompress-lzo], which describe what the process should
|
||||
do, then let the user implement the `run` function according to the system.
|
||||
|
||||
In short, based on this list and the various experiments we've carried out on a
|
||||
number of projects, we've decided to remove the functors from `ocaml-tar`! The
|
||||
crucial question now is: which method to choose?
|
||||
|
||||
### The best answers
|
||||
|
||||
There's no real answer to that, and in truth it depends on what level of
|
||||
abstraction you're at. In fact, you'd like to have a fairly simple method of
|
||||
abstraction from the system at the start and at the lowest level, to end up
|
||||
proposing a functor that does all the _ceremony_ (the glue between your
|
||||
implementation and the system) at the end - that's what [ocaml-git][ocaml-git]
|
||||
does, for example.
|
||||
|
||||
The abstraction you choose also depends on how the process is going to work. As
|
||||
far as streams/protocols are concerned, the `ocaml-tls`/`decompress` approach
|
||||
still seems the best. But when it comes to introspecting a file/block-device, it
|
||||
may be preferable to use a GADT that will force the user to implement an
|
||||
arbitrary memory access rather than consume a sequence of bytes. In short, at
|
||||
this stage, experience speaks for itself and, just as we were wrong about
|
||||
functors, we won't be advising you to use this or that solution.
|
||||
|
||||
But based on our experience of `ocaml-tls` & `decompress` with LZO (which
|
||||
requires arbitrary access to the content) and the way Tar works, we decided to
|
||||
use a "value-passing" approach (to describe when we need to read/write) and a
|
||||
GADT to describe calculations such as:
|
||||
- iterating over the files/folders contained in a Tar document
|
||||
- producing a Tar file according to a "dispenser" of inputs
|
||||
|
||||
```ocaml
|
||||
val decode : decode_state -> string ->
|
||||
decode_state *
|
||||
* [ `Read of int
|
||||
| `Skip of int
|
||||
| `Header of Header.t ] option
|
||||
* Header.Extended.t option
|
||||
(** [decode state] returns a new state and what the user should do next:
|
||||
- [`Skip] skip bytes
|
||||
- [`Read] read bytes
|
||||
- [`Header hdr] do something according the last header extracted
|
||||
(like stream-out the contents of a file). *)
|
||||
|
||||
type ('a, 'err) t =
|
||||
| Really_read : int -> (string, 'err) t
|
||||
| Read : int -> (string, 'err) t
|
||||
| Seek : int -> (unit, 'err) t
|
||||
| Bind : ('a, 'err) t * ('a -> ('b, 'err) t) -> ('b, 'err) t
|
||||
| Return : ('a, 'err) result -> ('a, 'err) t
|
||||
| Write : string -> (unit, 'err) t
|
||||
```
|
||||
|
||||
However, and this is where we come back to OCaml's limitations and where
|
||||
functors could help us: higher kinded polymorphism!
|
||||
|
||||
### Higher kinded Polymorphism
|
||||
|
||||
If we return to our functor example above, there's one element that may be of
|
||||
interest: `T with type t = T.t succ`
|
||||
|
||||
In other words, add a constraint to a signature type. A constraint often seen
|
||||
with Mirage (but deprecated now according to [this issue][mirage-lwt]) is the
|
||||
type `io` and its constraint: `type 'a io`, `with type 'a io = 'a Lwt.t`.
|
||||
|
||||
So we had this type in Tar. The problem is that our GADT can't understand that
|
||||
sometimes it will have to manipulate _Lwt_ values, sometimes _Async_ or
|
||||
sometimes _Eio_ (or _Miou_!). In other words: how do we compose our `Bind` with
|
||||
the `Bind` of these three targets? The difficulty lies above all in history?
|
||||
Supporting this library requires us to assume a certain compatibility with
|
||||
applications over which we have no control. What's more, we need to maintain
|
||||
support for all three libraries without imposing one.
|
||||
|
||||
<hr />
|
||||
|
||||
A small disgression at this stage seems important to us, as we've been working
|
||||
in this way for over 10 years. Of course, despite all the solutions mentioned
|
||||
above, not depending on a system (and/or a scheduler) also allows us to ensure
|
||||
the existence of libraries like Tar over more than a decade! The OCaml ecosystem
|
||||
is changing, and choosing this or that library to facilitate the development of
|
||||
an application has implications we might regret 10 years down the line (for
|
||||
example... `Cstruct.t`!). So, it can be challenging to ensure compatibility with
|
||||
all systems, but the result is libraries steeped in the experience and know-how
|
||||
of many developers!
|
||||
|
||||
<hr />
|
||||
|
||||
So, and this is why we talk about Higher Kinded Polymorphism, how do we abstract
|
||||
the `t` from `'a t` (to replace it with `Lwt.t` or even with a type such as
|
||||
`type 'a t = 'a`)? This is where we're going to use the trick explained in
|
||||
[this paper][hkt]. The trick is to consider a "new type" that will represent our
|
||||
monad (lwt or async) and inject/project a value from this monad to something
|
||||
understandable by our GADT: `High : ('a, 't) io -> ('a, 't) t`.
|
||||
|
||||
```ocaml
|
||||
type ('a, 't) io
|
||||
|
||||
type ('a, 'err, 't) t =
|
||||
| Really_read : int -> (string, 'err, 't) t
|
||||
| Read : int -> (string, 'err, 't) t
|
||||
| Seek : int -> (unit, 'err, 't) t
|
||||
| Bind : ('a, 'err, 't) t * ('a -> ('b, 'err, 't) t) -> ('b, 'err, 't) t
|
||||
| Return : ('a, 'err) result -> ('a, 'err, 't) t
|
||||
| Write : string -> (unit, 'err, 't) t
|
||||
| High : ('a, 't) io -> ('a, 'err, 't) t
|
||||
```
|
||||
|
||||
Next, we need to create this new type according to the chosen scheduler. Let's
|
||||
take _Lwt_ as an example:
|
||||
|
||||
```ocaml
|
||||
module Make (X : sig type 'a t end) = struct
|
||||
type t (* our new type *)
|
||||
type 'a s = 'a X.t
|
||||
|
||||
external inj : 'a s -> ('a, t) io = "%identity"
|
||||
external prj : ('a, t) io -> 'a s = "%identity"
|
||||
end
|
||||
|
||||
module L = Make(Lwt)
|
||||
|
||||
let rec run
|
||||
: type a err. (a, err, L.t) t -> (a, err) result Lwt.t
|
||||
= function
|
||||
| High v -> Ok (L.prj v)
|
||||
| Return v -> Lwt.return v
|
||||
| Bind (x, f) ->
|
||||
run x >>= fun value -> run (f value)
|
||||
| _ -> ...
|
||||
```
|
||||
|
||||
So, as you can see, it's a real trick to avoid doing at home without a
|
||||
companion. Indeed, the use of `%identity` corresponds to an `Obj.magic`! So even
|
||||
if the `io` type is exposed (to let the user derive Tar for their own system),
|
||||
this trick is not exposed for other packages, and we instead suggest helpers
|
||||
such as:
|
||||
|
||||
```ocaml
|
||||
val lwt : 'a Lwt.t -> ('a, 'err, lwt) t
|
||||
val miou : 'a -> ('a, 'err, miou) t
|
||||
```
|
||||
|
||||
But this way, Tar can always be derived from another system, and the process for
|
||||
extracting entries from a Tar file is the same for **all** systems!
|
||||
|
||||
## Conclusion
|
||||
|
||||
This Tar release isn't as impressive as this article, but it does sum up all the
|
||||
work we've been able to do over the last few months and years. We hope that our
|
||||
work is appreciated and that this article, which sets out all the thoughts we've
|
||||
had (and still have), helps you to better understand our work!
|
||||
|
||||
[opam-mirror]: https://hannes.robur.coop/Posts/OpamMirror
|
||||
[discuss-cstruct]: https://discuss.ocaml.org/t/buffered-io-bytes-vs-bigstring/8978/3
|
||||
[parmap]: https://github.com/rdicosmo/parmap
|
||||
[digestif]: https://github.com/mirage/digestif
|
||||
[decompress]: https://github.com/mirage/decompress
|
||||
[pr-utcp]: https://github.com/robur-coop/utcp/pull/29
|
||||
[utcp]: https://github.com/robur-coop/utcp
|
||||
[mmap]: https://ocaml.org/manual/5.2/api/Unix.html#1_Mappingfilesintomemory
|
||||
[minor-alloc]: https://github.com/ocaml/ocaml/blob/744006bfbfa045cc1ca442ff7b52c2650d2abe32/runtime/alloc.c#L175
|
||||
[bigarray-minor]: https://github.com/ocaml/ocaml/pull/92
|
||||
[http-lwt-client-bug]: https://github.com/robur-coop/http-lwt-client/pull/16
|
||||
[cstruct-cap]: https://github.com/mirage/ocaml-cstruct/pull/237
|
||||
[gc-bigarray-pressure]: https://github.com/ocaml/ocaml/issues/7750
|
||||
[better-bigarray-free]: https://github.com/ocaml/ocaml/pull/1738
|
||||
[ocaml-getters]: https://github.com/ocaml/ocaml/pull/1864
|
||||
[ocaml-solo5]: https://github.com/mirage/ocaml-solo5
|
||||
[nojb-response]: https://discuss.ocaml.org/t/best-practices-and-design-patterns-for-supporting-concurrent-io-in-libraries/15001/4?u=dinosaure
|
||||
[ocaml-tls]: https://github.com/mirleft/ocaml-tls
|
||||
[cohttp]: https://github.com/mirage/ocaml-cohttp
|
||||
[poor-man-functor]: https://github.com/mirage/colombe/blob/07cd4cf134168ecd841924ee7ddda1a9af8fbd5a/src/sigs.ml#L13-L16
|
||||
[mimic]: https://github.com/dinosaure/mimic
|
||||
[ptime]: https://github.com/dbuenzli/ptime
|
||||
[mtime]: https://github.com/dbuenzli/mtime
|
||||
[dune-variants]: https://github.com/ocaml/dune/pull/1207
|
||||
[decompress-lzo]: https://github.com/mirage/decompress/blob/c8301ba674e037b682338958d6d0bb5c42fd720e/lib/lzo.ml#L164-L175
|
||||
[ocaml-git]: https://github.com/mirage/ocaml-git
|
||||
[mirage-lwt]: https://github.com/mirage/mirage/issues/1004#issue-507517315
|
||||
[hkt]: https://www.cl.cam.ac.uk/~jdy22/papers/lightweight-higher-kinded-polymorphism.pdf
|
||||
|
167
atom.xml
167
atom.xml
|
@ -1,167 +0,0 @@
|
|||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<feed xmlns="http://www.w3.org/2005/Atom">
|
||||
<id>https://blog.robur.coop/atom.xml</id>
|
||||
<title type="text">The Robur's blog</title>
|
||||
<generator uri="https://github.com/xhtmlboi/yocaml" version="2">YOCaml</generator>
|
||||
<updated>2024-10-25T00:00:00Z</updated>
|
||||
<author>
|
||||
<name>The Robur Team</name>
|
||||
</author>
|
||||
<link href="https://blog.robur.coop"/>
|
||||
<link href="https://blog.robur.coop/atom.xml" rel="self"/>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/dnsvizor01.html</id>
|
||||
<title type="text">Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel</title>
|
||||
<updated>2024-10-25T00:00:00Z</updated>
|
||||
<summary type="text">
|
||||
The NGI-funded DNSvizor provides core network services on your network; DNS resolution and DHCP.
|
||||
</summary>
|
||||
<link href="https://blog.robur.coop/articles/dnsvizor01.html" rel="alternate" title="Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="MirageOS"/>
|
||||
<category term="DNSvizor"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/arguments.html</id>
|
||||
<title type="text">Runtime arguments in MirageOS</title>
|
||||
<updated>2024-10-22T00:00:00Z</updated>
|
||||
<summary type="text">The history of runtime arguments to a MirageOS unikernel</summary>
|
||||
<link href="https://blog.robur.coop/articles/arguments.html" rel="alternate" title="Runtime arguments in MirageOS"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="MirageOS"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/finances.html</id>
|
||||
<title type="text">How has robur financially been doing since 2018?</title>
|
||||
<updated>2024-10-21T00:00:00Z</updated>
|
||||
<summary type="text">How we organise as a collective, and why we're doing that.</summary>
|
||||
<link href="https://blog.robur.coop/articles/finances.html" rel="alternate" title="How has robur financially been doing since 2018?"/>
|
||||
<category term="finances"/>
|
||||
<category term="cooperative"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/2024-08-21-OpenVPN-and-MirageVPN.html</id>
|
||||
<title type="text">MirageVPN and OpenVPN</title>
|
||||
<updated>2024-08-21T00:00:00Z</updated>
|
||||
<summary type="text">Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library</summary>
|
||||
<link href="https://blog.robur.coop/articles/2024-08-21-OpenVPN-and-MirageVPN.html" rel="alternate" title="MirageVPN and OpenVPN"/>
|
||||
<category term="MirageVPN"/>
|
||||
<category term="OpenVPN"/>
|
||||
<category term="security"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/tar-release.html</id>
|
||||
<title type="text">The new Tar release, a retrospective</title>
|
||||
<updated>2024-08-15T00:00:00Z</updated>
|
||||
<summary type="text">A little retrospective to the new Tar release and changes</summary>
|
||||
<link href="https://blog.robur.coop/articles/tar-release.html" rel="alternate" title="The new Tar release, a retrospective"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="Cstruct"/>
|
||||
<category term="functors"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/qubes-miragevpn.html</id>
|
||||
<title type="text">qubes-miragevpn, a MirageVPN client for QubesOS</title>
|
||||
<updated>2024-06-24T00:00:00Z</updated>
|
||||
<summary type="text">A new OpenVPN client for QubesOS</summary>
|
||||
<link href="https://blog.robur.coop/articles/qubes-miragevpn.html" rel="alternate" title="qubes-miragevpn, a MirageVPN client for QubesOS"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="vpn"/>
|
||||
<category term="unikernel"/>
|
||||
<category term="QubesOS"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/miragevpn-server.html</id>
|
||||
<title type="text">MirageVPN server</title>
|
||||
<updated>2024-06-17T00:00:00Z</updated>
|
||||
<summary type="text">Announcement of our MirageVPN server.</summary>
|
||||
<link href="https://blog.robur.coop/articles/miragevpn-server.html" rel="alternate" title="MirageVPN server"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="MirageOS"/>
|
||||
<category term="cryptography"/>
|
||||
<category term="security"/>
|
||||
<category term="VPN"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/miragevpn-performance.html</id>
|
||||
<title type="text">Speeding up MirageVPN and use it in the wild</title>
|
||||
<updated>2024-04-16T00:00:00Z</updated>
|
||||
<summary type="text">Performance engineering of MirageVPN, speeding it up by a factor of 25.</summary>
|
||||
<link href="https://blog.robur.coop/articles/miragevpn-performance.html" rel="alternate" title="Speeding up MirageVPN and use it in the wild"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="MirageOS"/>
|
||||
<category term="cryptography"/>
|
||||
<category term="security"/>
|
||||
<category term="VPN"/>
|
||||
<category term="performance"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/gptar.html</id>
|
||||
<title type="text">GPTar</title>
|
||||
<updated>2024-02-21T00:00:00Z</updated>
|
||||
<summary type="text">Hybrid GUID partition table and tar archive</summary>
|
||||
<link href="https://blog.robur.coop/articles/gptar.html" rel="alternate" title="GPTar"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="gpt"/>
|
||||
<category term="tar"/>
|
||||
<category term="mbr"/>
|
||||
<category term="persistent storage"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/speeding-ec-string.html</id>
|
||||
<title type="text">Speeding elliptic curve cryptography</title>
|
||||
<updated>2024-02-13T00:00:00Z</updated>
|
||||
<summary type="text">
|
||||
How we improved the performance of elliptic curves by only modifying the underlying byte array
|
||||
</summary>
|
||||
<link href="https://blog.robur.coop/articles/speeding-ec-string.html" rel="alternate" title="Speeding elliptic curve cryptography"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="MirageOS"/>
|
||||
<category term="cryptography"/>
|
||||
<category term="security"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/lwt_pause.html</id>
|
||||
<title type="text">Cooperation and Lwt.pause</title>
|
||||
<updated>2024-02-11T00:00:00Z</updated>
|
||||
<summary type="text">A disgression about Lwt and Miou</summary>
|
||||
<link href="https://blog.robur.coop/articles/lwt_pause.html" rel="alternate" title="Cooperation and Lwt.pause"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="Scheduler"/>
|
||||
<category term="Community"/>
|
||||
<category term="Unikernel"/>
|
||||
<category term="Git"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/2024-02-03-python-str-repr.html</id>
|
||||
<title type="text">Python's `str.__repr__()`</title>
|
||||
<updated>2024-02-03T00:00:00Z</updated>
|
||||
<summary type="text">Reimplementing Python string escaping in OCaml</summary>
|
||||
<link href="https://blog.robur.coop/articles/2024-02-03-python-str-repr.html" rel="alternate" title="Python's `str.__repr__()`"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="Python"/>
|
||||
<category term="unicode"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/miragevpn-ncp.html</id>
|
||||
<title type="text">MirageVPN updated (AEAD, NCP)</title>
|
||||
<updated>2023-11-20T00:00:00Z</updated>
|
||||
<summary type="text">How we resurrected MirageVPN from its bitrot state</summary>
|
||||
<link href="https://blog.robur.coop/articles/miragevpn-ncp.html" rel="alternate" title="MirageVPN updated (AEAD, NCP)"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="MirageOS"/>
|
||||
<category term="VPN"/>
|
||||
<category term="security"/>
|
||||
</entry>
|
||||
<entry>
|
||||
<id>https://blog.robur.coop/articles/miragevpn.html</id>
|
||||
<title type="text">MirageVPN & tls-crypt-v2</title>
|
||||
<updated>2023-11-14T00:00:00Z</updated>
|
||||
<summary type="text">How we implementated tls-crypt-v2 for miragevpn</summary>
|
||||
<link href="https://blog.robur.coop/articles/miragevpn.html" rel="alternate" title="MirageVPN & tls-crypt-v2"/>
|
||||
<category term="OCaml"/>
|
||||
<category term="MirageOS"/>
|
||||
<category term="VPN"/>
|
||||
<category term="security"/>
|
||||
</entry>
|
||||
</feed>
|
681
bin/blog.ml
Normal file
681
bin/blog.ml
Normal file
|
@ -0,0 +1,681 @@
|
|||
open Yocaml
|
||||
|
||||
module SM = Map.Make(String)
|
||||
|
||||
let is_empty_list = function [] -> true | _ -> false
|
||||
|
||||
module Date = struct
|
||||
type month =
|
||||
| Jan
|
||||
| Feb
|
||||
| Mar
|
||||
| Apr
|
||||
| May
|
||||
| Jun
|
||||
| Jul
|
||||
| Aug
|
||||
| Sep
|
||||
| Oct
|
||||
| Nov
|
||||
| Dec
|
||||
|
||||
type day_of_week = Mon | Tue | Wed | Thu | Fri | Sat | Sun
|
||||
type year = int
|
||||
type day = int
|
||||
type hour = int
|
||||
type min = int
|
||||
type sec = int
|
||||
|
||||
type t = {
|
||||
year : year
|
||||
; month : month
|
||||
; day : day
|
||||
; hour : hour
|
||||
; min : min
|
||||
; sec : sec
|
||||
}
|
||||
|
||||
let invalid_int x message =
|
||||
Data.Validation.fail_with ~given:(string_of_int x) message
|
||||
|
||||
let month_from_int x =
|
||||
if x > 0 && x <= 12 then
|
||||
Result.ok
|
||||
[| Jan; Feb; Mar; Apr; May; Jun; Jul; Aug; Sep; Oct; Nov; Dec |].(x - 1)
|
||||
else invalid_int x "Invalid month value"
|
||||
|
||||
let year_from_int x =
|
||||
if x >= 0 then Result.ok x else invalid_int x "Invalid year value"
|
||||
|
||||
let is_leap year =
|
||||
if year mod 100 = 0 then year mod 400 = 0 else year mod 4 = 0
|
||||
|
||||
let days_in_month year month =
|
||||
match month with
|
||||
| Jan | Mar | May | Jul | Aug | Oct | Dec -> 31
|
||||
| Feb -> if is_leap year then 29 else 28
|
||||
| _ -> 30
|
||||
|
||||
let day_from_int year month x =
|
||||
let dim = days_in_month year month in
|
||||
if x >= 1 && x <= dim then Result.ok x
|
||||
else invalid_int x "Invalid day value"
|
||||
|
||||
let hour_from_int x =
|
||||
if x >= 0 && x < 24 then Result.ok x else invalid_int x "Invalid hour value"
|
||||
|
||||
let min_from_int x =
|
||||
if x >= 0 && x < 60 then Result.ok x else invalid_int x "Invalid min value"
|
||||
|
||||
let sec_from_int x =
|
||||
if x >= 0 && x < 60 then Result.ok x else invalid_int x "Invalid sec value"
|
||||
|
||||
let ( let* ) = Result.bind
|
||||
|
||||
let make ?(time = (0, 0, 0)) ~year ~month ~day () =
|
||||
let hour, min, sec = time in
|
||||
let* year = year_from_int year in
|
||||
let* month = month_from_int month in
|
||||
let* day = day_from_int year month day in
|
||||
let* hour = hour_from_int hour in
|
||||
let* min = min_from_int min in
|
||||
let* sec = sec_from_int sec in
|
||||
Result.ok { year; month; day; hour; min; sec }
|
||||
|
||||
let validate_from_datetime_str str =
|
||||
let str = String.trim str in
|
||||
match
|
||||
Scanf.sscanf_opt str "%04d%c%02d%c%02d%c%02d%c%02d%c%02d"
|
||||
(fun year _ month _ day _ hour _ min _ sec ->
|
||||
((hour, min, sec), year, month, day))
|
||||
with
|
||||
| None -> Data.Validation.fail_with ~given:str "Invalid date format"
|
||||
| Some (time, year, month, day) -> make ~time ~year ~month ~day ()
|
||||
|
||||
let validate_from_date_str str =
|
||||
let str = String.trim str in
|
||||
match
|
||||
Scanf.sscanf_opt str "%04d%c%02d%c%02d" (fun year _ month _ day ->
|
||||
(year, month, day))
|
||||
with
|
||||
| None -> Data.Validation.fail_with ~given:str "Invalid date format"
|
||||
| Some (year, month, day) -> make ~year ~month ~day ()
|
||||
|
||||
let validate =
|
||||
let open Data.Validation in
|
||||
string & (validate_from_datetime_str / validate_from_date_str)
|
||||
|
||||
let month_to_int = function
|
||||
| Jan -> 1
|
||||
| Feb -> 2
|
||||
| Mar -> 3
|
||||
| Apr -> 4
|
||||
| May -> 5
|
||||
| Jun -> 6
|
||||
| Jul -> 7
|
||||
| Aug -> 8
|
||||
| Sep -> 9
|
||||
| Oct -> 10
|
||||
| Nov -> 11
|
||||
| Dec -> 12
|
||||
|
||||
let dow_to_int = function
|
||||
| Mon -> 0
|
||||
| Tue -> 1
|
||||
| Wed -> 2
|
||||
| Thu -> 3
|
||||
| Fri -> 4
|
||||
| Sat -> 5
|
||||
| Sun -> 6
|
||||
|
||||
let compare_date a b =
|
||||
let cmp = Int.compare a.year b.year in
|
||||
if Int.equal cmp 0 then
|
||||
let cmp = Int.compare (month_to_int a.month) (month_to_int b.month) in
|
||||
if Int.equal cmp 0 then Int.compare a.day b.day else cmp
|
||||
else cmp
|
||||
|
||||
let compare_time a b =
|
||||
let cmp = Int.compare a.hour b.hour in
|
||||
if Int.equal cmp 0 then
|
||||
let cmp = Int.compare a.min b.min in
|
||||
if Int.equal cmp 0 then Int.compare a.sec b.sec else cmp
|
||||
else cmp
|
||||
|
||||
let compare a b =
|
||||
let cmp = compare_date a b in
|
||||
if Int.equal cmp 0 then compare_time a b else cmp
|
||||
|
||||
let pp_date ppf { year; month; day; _ } =
|
||||
Format.fprintf ppf "%04d-%02d-%02d" year (month_to_int month) day
|
||||
|
||||
let month_value = function
|
||||
| Jan -> 0
|
||||
| Feb -> 3
|
||||
| Mar -> 3
|
||||
| Apr -> 6
|
||||
| May -> 1
|
||||
| Jun -> 4
|
||||
| Jul -> 6
|
||||
| Aug -> 2
|
||||
| Sep -> 5
|
||||
| Oct -> 0
|
||||
| Nov -> 3
|
||||
| Dec -> 5
|
||||
|
||||
let day_of_week { year; month; day; _ } =
|
||||
let yy = year mod 100 in
|
||||
let cc = (year - yy) / 100 in
|
||||
let c_code = [| 6; 4; 2; 0 |].(cc mod 4) in
|
||||
let y_code = (yy + (yy / 4)) mod 7 in
|
||||
let m_code =
|
||||
let v = month_value month in
|
||||
if is_leap year && (month = Jan || month = Feb) then v - 1 else v
|
||||
in
|
||||
let index = (c_code + y_code + m_code + day) mod 7 in
|
||||
[| Sun; Mon; Tue; Wed; Thu; Fri; Sat |].(index)
|
||||
|
||||
let normalize ({ year; month; day; hour; min; sec } as dt) =
|
||||
let day_of_week = day_of_week dt in
|
||||
let open Data in
|
||||
record
|
||||
[
|
||||
("year", int year); ("month", int (month_to_int month)); ("day", int day)
|
||||
; ("hour", int hour); ("min", int min); ("sec", int sec)
|
||||
; ("day_of_week", int (dow_to_int day_of_week))
|
||||
; ("human", string (Format.asprintf "%a" pp_date dt))
|
||||
]
|
||||
|
||||
let to_archetype_date_time { year; month; day; hour; min; sec } =
|
||||
let time = (hour, min, sec) in
|
||||
let month = month_to_int month in
|
||||
Result.get_ok (Archetype.Datetime.make ~time ~year ~month ~day ())
|
||||
end
|
||||
|
||||
module Page = struct
|
||||
let entity_name = "Page"
|
||||
|
||||
class type t = object ('self)
|
||||
method title : string option
|
||||
method charset : string option
|
||||
method description : string option
|
||||
method tags : string list
|
||||
method with_host : string -> 'self
|
||||
method get_host : string option
|
||||
end
|
||||
|
||||
class page ?title ?description ?charset ?(tags = []) () =
|
||||
object (_ : #t)
|
||||
method title = title
|
||||
method charset = charset
|
||||
method description = description
|
||||
method tags = tags
|
||||
val host = None
|
||||
method with_host v = {< host = Some v >}
|
||||
method get_host = host
|
||||
end
|
||||
|
||||
let neutral = Result.ok @@ new page ()
|
||||
|
||||
let validate fields =
|
||||
let open Data.Validation in
|
||||
let+ title = optional fields "title" string
|
||||
and+ description = optional fields "description" string
|
||||
and+ charset = optional fields "charset" string
|
||||
and+ tags = optional_or fields ~default:[] "tags" (list_of string) in
|
||||
new page ?title ?description ?charset ~tags ()
|
||||
|
||||
let validate =
|
||||
let open Data.Validation in
|
||||
record validate
|
||||
end
|
||||
|
||||
module Author = struct
|
||||
class type t = object
|
||||
method name : string
|
||||
method link : string
|
||||
method email : string
|
||||
method avatar : string option
|
||||
end
|
||||
|
||||
let gravatar email =
|
||||
let tk = String.(lowercase_ascii (trim email)) in
|
||||
let hs = Digest.(to_hex (string tk)) in
|
||||
"https://www.gravatar.com/avatar/" ^ hs
|
||||
|
||||
class author ~name ~link ~email ?(avatar = gravatar email) () =
|
||||
object (_ : #t)
|
||||
method name = name
|
||||
method link = link
|
||||
method email = email
|
||||
method avatar = Some avatar
|
||||
end
|
||||
|
||||
let validate fields =
|
||||
let open Data.Validation in
|
||||
let+ name = required fields "name" string
|
||||
and+ link = required fields "link" string
|
||||
and+ email = required fields "email" string
|
||||
and+ avatar = optional fields "avatar" string in
|
||||
match avatar with
|
||||
| None -> new author ~name ~link ~email ()
|
||||
| Some avatar -> new author ~name ~link ~email ~avatar ()
|
||||
|
||||
let validate =
|
||||
let open Data.Validation in
|
||||
record validate
|
||||
|
||||
let normalize obj =
|
||||
let open Data in
|
||||
record
|
||||
[
|
||||
("name", string obj#name); ("link", string obj#link)
|
||||
; ("email", string obj#email); ("avatar", option string obj#avatar)
|
||||
]
|
||||
end
|
||||
|
||||
let robur_coop =
|
||||
new Author.author
|
||||
~name:"The Robur Team" ~link:"https://robur.coop/"
|
||||
~email:"team@robur.coop" ()
|
||||
|
||||
module Article = struct
|
||||
let entity_name = "Article"
|
||||
|
||||
class type t = object ('self)
|
||||
method title : string
|
||||
method description : string
|
||||
method charset : string option
|
||||
method tags : string list
|
||||
method date : Date.t
|
||||
method author : Author.t
|
||||
method co_authors : Author.t list
|
||||
method with_host : string -> 'self
|
||||
method get_host : string option
|
||||
end
|
||||
|
||||
class article ~title ~description ?charset ?(tags = []) ~date ~author
|
||||
?(co_authors = []) () =
|
||||
object (_ : #t)
|
||||
method title = title
|
||||
method description = description
|
||||
method charset = charset
|
||||
method tags = tags
|
||||
method date = date
|
||||
method author = author
|
||||
method co_authors = co_authors
|
||||
val host = None
|
||||
method with_host v = {< host = Some v >}
|
||||
method get_host = host
|
||||
end
|
||||
|
||||
let title p = p#title
|
||||
let description p = p#description
|
||||
let date p = p#date
|
||||
|
||||
let neutral =
|
||||
Data.Validation.fail_with ~given:"null" "Cannot be null"
|
||||
|> Result.map_error (fun error ->
|
||||
Required.Validation_error { entity = entity_name; error })
|
||||
|
||||
let validate fields =
|
||||
let open Data.Validation in
|
||||
let+ title = required fields "title" string
|
||||
and+ description = required fields "description" string
|
||||
and+ charset = optional fields "charset" string
|
||||
and+ tags = optional_or fields ~default:[] "tags" (list_of string)
|
||||
and+ date = required fields "date" Date.validate
|
||||
and+ author =
|
||||
optional_or fields ~default:robur_coop "author" Author.validate
|
||||
and+ co_authors =
|
||||
optional_or fields ~default:[] "co-authors" (list_of Author.validate)
|
||||
in
|
||||
new article ~title ~description ?charset ~tags ~date ~author ~co_authors ()
|
||||
|
||||
let validate =
|
||||
let open Data.Validation in
|
||||
record validate
|
||||
|
||||
let normalize obj =
|
||||
Data.
|
||||
[
|
||||
("title", string obj#title); ("description", string obj#description)
|
||||
; ("date", Date.normalize obj#date); ("charset", option string obj#charset)
|
||||
; ("tags", list_of string obj#tags)
|
||||
; ("author", Author.normalize obj#author)
|
||||
; ("co-authors", list_of Author.normalize obj#co_authors)
|
||||
; ("host", option string obj#get_host)
|
||||
]
|
||||
end
|
||||
|
||||
module Articles = struct
|
||||
class type t = object ('self)
|
||||
method title : string option
|
||||
method description : string option
|
||||
method articles : (Path.t * Article.t) list
|
||||
method with_host : string -> 'self
|
||||
method get_host : string option
|
||||
end
|
||||
|
||||
class articles ?title ?description articles =
|
||||
object (_ : #t)
|
||||
method title = title
|
||||
method description = description
|
||||
method articles = articles
|
||||
val host = None
|
||||
method with_host v = {< host = Some v >}
|
||||
method get_host = host
|
||||
end
|
||||
|
||||
let sort_by_date ?(increasing = false) articles =
|
||||
List.sort
|
||||
(fun (_, articleA) (_, articleB) ->
|
||||
let r = Date.compare articleA#date articleB#date in
|
||||
if increasing then r else ~-r)
|
||||
articles
|
||||
|
||||
let fetch (module P : Required.DATA_PROVIDER) ?increasing
|
||||
?(filter = fun x -> x) ?(on = `Source) ~where ~compute_link path =
|
||||
Task.from_effect begin fun () ->
|
||||
let open Eff in
|
||||
let* files = read_directory ~on ~only:`Files ~where path in
|
||||
let+ articles =
|
||||
List.traverse
|
||||
(fun file ->
|
||||
let url = compute_link file in
|
||||
let+ metadata, _content =
|
||||
Eff.read_file_with_metadata (module P) (module Article) ~on file
|
||||
in
|
||||
(url, metadata))
|
||||
files
|
||||
in
|
||||
articles |> sort_by_date ?increasing |> filter end
|
||||
|
||||
let compute_index (module P : Required.DATA_PROVIDER) ?increasing
|
||||
?(filter = fun x -> x) ?(on = `Source) ~where ~compute_link path =
|
||||
let open Task in
|
||||
(fun x -> (x, ()))
|
||||
|>> second
|
||||
(fetch (module P) ?increasing ~filter ~on ~where ~compute_link path)
|
||||
>>> lift (fun (v, articles) ->
|
||||
new articles ?title:v#title ?description:v#description articles)
|
||||
|
||||
let normalize (ident, article) =
|
||||
let open Data in
|
||||
record (("url", string @@ Path.to_string ident) :: Article.normalize article)
|
||||
|
||||
let normalize obj =
|
||||
let open Data in
|
||||
[
|
||||
("articles", list_of normalize obj#articles)
|
||||
; ("has_articles", bool @@ is_empty_list obj#articles)
|
||||
; ("title", option string obj#title)
|
||||
; ("description", option string obj#description)
|
||||
; ("host", option string obj#get_host)
|
||||
]
|
||||
end
|
||||
|
||||
module Tag = struct
|
||||
type t = {
|
||||
name : string;
|
||||
articles : (Path.t * Article.t) list;
|
||||
}
|
||||
|
||||
let make ~name ~articles =
|
||||
{ name; articles }
|
||||
|
||||
let normalize_article (ident, article) =
|
||||
let open Data in
|
||||
record (("url", string @@ Path.to_string ident) :: Article.normalize article)
|
||||
|
||||
let normalize { name; articles } =
|
||||
let open Data in
|
||||
[
|
||||
("name", string name);
|
||||
("articles", (list_of normalize_article) articles);
|
||||
]
|
||||
end
|
||||
|
||||
module Tags = struct
|
||||
class type t = object ('self)
|
||||
inherit Articles.t
|
||||
method tags : Tag.t list
|
||||
end
|
||||
|
||||
class tags ?title ?description articles =
|
||||
object
|
||||
inherit Articles.articles ?title ?description articles as super
|
||||
method! title = Some "Tags"
|
||||
method tags =
|
||||
let tags =
|
||||
let update article sm tag =
|
||||
SM.update tag
|
||||
(function
|
||||
| None -> Some [article]
|
||||
| Some urls -> Some (article :: urls))
|
||||
sm
|
||||
in
|
||||
List.fold_left
|
||||
(fun sm (url, article) ->
|
||||
List.fold_left (update (url, article)) sm article#tags)
|
||||
SM.empty
|
||||
super#articles
|
||||
|> SM.bindings
|
||||
in
|
||||
List.map (fun (tag, articles) ->
|
||||
Tag.make ~name:tag ~articles)
|
||||
tags
|
||||
end
|
||||
|
||||
let of_articles articles =
|
||||
new tags ?title:articles#title ?description:articles#description articles#articles
|
||||
|
||||
let normalize_tag tag =
|
||||
let open Data in
|
||||
record (Tag.normalize tag)
|
||||
|
||||
let normalize tags =
|
||||
let open Data in
|
||||
("all_tags", (list_of normalize_tag tags#tags)) :: Articles.normalize tags
|
||||
end
|
||||
|
||||
module Make_with_target (S : sig
|
||||
val source : Path.t
|
||||
val target : Path.t
|
||||
end) =
|
||||
struct
|
||||
let source_root = S.source
|
||||
|
||||
module Source = struct
|
||||
let css = Path.(source_root / "css")
|
||||
let js = Path.(source_root / "js")
|
||||
let images = Path.(source_root / "images")
|
||||
let articles = Path.(source_root / "articles")
|
||||
let index = Path.(source_root / "pages" / "index.md")
|
||||
let tags = Path.(source_root / "pages" / "tags.md")
|
||||
let templates = Path.(source_root / "templates")
|
||||
let template file = Path.(templates / file)
|
||||
let binary = Path.rel [ Sys.argv.(0) ]
|
||||
let cache = Path.(source_root / "_cache")
|
||||
end
|
||||
|
||||
module Target = struct
|
||||
let target_root = S.target
|
||||
let pages = target_root
|
||||
let articles = Path.(target_root / "articles")
|
||||
let rss2 = Path.(target_root / "feed.xml")
|
||||
|
||||
let as_html into file =
|
||||
file |> Path.move ~into |> Path.change_extension "html"
|
||||
end
|
||||
|
||||
let target = Target.target_root
|
||||
|
||||
let process_css_files =
|
||||
Action.copy_directory ~into:Target.target_root Source.css
|
||||
|
||||
let process_js_files =
|
||||
Action.copy_directory ~into:Target.target_root Source.js
|
||||
|
||||
let process_images_files =
|
||||
Action.copy_directory ~into:Target.target_root Source.images
|
||||
|
||||
let process_article ~host file =
|
||||
let file_target = Target.(as_html articles file) in
|
||||
let open Task in
|
||||
Action.write_static_file file_target
|
||||
begin
|
||||
Pipeline.track_file Source.binary
|
||||
>>> Yocaml_yaml.Pipeline.read_file_with_metadata (module Article) file
|
||||
>>* (fun (obj, str) -> Eff.return (obj#with_host host, str))
|
||||
>>> Yocaml_cmarkit.content_to_html ~strict:false ()
|
||||
>>> Yocaml_jingoo.Pipeline.as_template
|
||||
(module Article)
|
||||
(Source.template "article.html")
|
||||
>>> Yocaml_jingoo.Pipeline.as_template
|
||||
(module Article)
|
||||
(Source.template "layout.html")
|
||||
>>> drop_first ()
|
||||
end
|
||||
|
||||
let process_articles ~host =
|
||||
Action.batch ~only:`Files ~where:(Path.has_extension "md") Source.articles
|
||||
(process_article ~host)
|
||||
|
||||
let process_index ~host =
|
||||
let file = Source.index in
|
||||
let file_target = Target.(as_html pages file) in
|
||||
|
||||
let open Task in
|
||||
let compute_index =
|
||||
Articles.compute_index
|
||||
(module Yocaml_yaml)
|
||||
~where:(Path.has_extension "md")
|
||||
~compute_link:(Target.as_html @@ Path.abs [ "articles" ])
|
||||
Source.articles
|
||||
in
|
||||
|
||||
Action.write_static_file file_target
|
||||
begin
|
||||
Pipeline.track_files [ Source.binary; Source.articles ]
|
||||
>>> Yocaml_yaml.Pipeline.read_file_with_metadata (module Page) file
|
||||
>>> Yocaml_cmarkit.content_to_html ~strict:false ()
|
||||
>>> first compute_index
|
||||
>>* (fun (obj, str) -> Eff.return (obj#with_host host, str))
|
||||
>>> Yocaml_jingoo.Pipeline.as_template ~strict:true
|
||||
(module Articles)
|
||||
(Source.template "index.html")
|
||||
>>> Yocaml_jingoo.Pipeline.as_template ~strict:true
|
||||
(module Articles)
|
||||
(Source.template "layout.html")
|
||||
>>> drop_first ()
|
||||
end
|
||||
|
||||
let process_tags ~host =
|
||||
let file = Source.tags in
|
||||
let file_target = Target.(as_html pages file) in
|
||||
|
||||
let open Task in
|
||||
let compute_index =
|
||||
Articles.compute_index
|
||||
(module Yocaml_yaml)
|
||||
~where:(Path.has_extension "md")
|
||||
~compute_link:(Target.as_html @@ Path.abs [ "articles" ])
|
||||
Source.articles
|
||||
in
|
||||
|
||||
Action.write_static_file file_target
|
||||
begin
|
||||
Pipeline.track_files [ Source.binary; Source.articles ]
|
||||
>>> Yocaml_yaml.Pipeline.read_file_with_metadata (module Page) file
|
||||
>>> Yocaml_cmarkit.content_to_html ~strict:false ()
|
||||
>>> first compute_index
|
||||
>>* (fun (obj, str) -> Eff.return (Tags.of_articles (obj#with_host host), str))
|
||||
>>> Yocaml_jingoo.Pipeline.as_template ~strict:true
|
||||
(module Tags)
|
||||
(Source.template "tags.html")
|
||||
>>> Yocaml_jingoo.Pipeline.as_template ~strict:true
|
||||
(module Tags)
|
||||
(Source.template "layout.html")
|
||||
>>> drop_first ()
|
||||
end
|
||||
|
||||
let feed_title = "The Robur's blog"
|
||||
let site_url = "https://blog.robur.coop"
|
||||
let feed_description = "The Robur cooperative blog"
|
||||
|
||||
let fetch_articles =
|
||||
let open Task in
|
||||
Pipeline.track_files [ Source.binary; Source.articles ]
|
||||
>>> Articles.fetch
|
||||
(module Yocaml_yaml)
|
||||
~where:(Path.has_extension "md")
|
||||
~compute_link:(Target.as_html @@ Path.abs [ "articles" ])
|
||||
Source.articles
|
||||
|
||||
let rss2 =
|
||||
let open Task in
|
||||
let from_articles ~title ~site_url ~description ~feed_url () =
|
||||
let open Yocaml_syndication in
|
||||
lift
|
||||
begin
|
||||
fun articles ->
|
||||
let last_build_date =
|
||||
List.fold_left
|
||||
begin
|
||||
fun acc (_, elt) ->
|
||||
let v = Date.to_archetype_date_time (Article.date elt) in
|
||||
match acc with
|
||||
| None -> Some v
|
||||
| Some a ->
|
||||
if Archetype.Datetime.compare a v > 0 then Some a
|
||||
else Some v
|
||||
end
|
||||
None articles
|
||||
|> Option.map Datetime.make
|
||||
in
|
||||
let feed =
|
||||
Rss2.feed ?last_build_date ~title ~link:site_url ~url:feed_url
|
||||
~description
|
||||
begin
|
||||
fun (path, article) ->
|
||||
let title = Article.title article in
|
||||
let link = site_url ^ Path.to_string path in
|
||||
let guid = Rss2.guid_from_link in
|
||||
let description = Article.description article in
|
||||
let pub_date =
|
||||
Datetime.make
|
||||
(Date.to_archetype_date_time (Article.date article))
|
||||
in
|
||||
Rss2.item ~title ~link ~guid ~description ~pub_date ()
|
||||
end
|
||||
articles
|
||||
in
|
||||
Xml.to_string feed
|
||||
end
|
||||
in
|
||||
Action.write_static_file Target.rss2
|
||||
begin
|
||||
fetch_articles
|
||||
>>> from_articles ~title:feed_title ~site_url
|
||||
~description:feed_description
|
||||
~feed_url:"https://blog.robur.coop/feed.xml" ()
|
||||
end
|
||||
|
||||
let process_all ~host =
|
||||
let open Eff in
|
||||
Action.restore_cache ~on:`Source Source.cache
|
||||
>>= process_css_files >>= process_js_files >>= process_images_files
|
||||
>>= process_tags ~host
|
||||
>>= process_articles ~host >>= process_index ~host >>= rss2
|
||||
>>= Action.store_cache ~on:`Source Source.cache
|
||||
end
|
||||
|
||||
module Make (S : sig
|
||||
val source : Path.t
|
||||
end) =
|
||||
Make_with_target (struct
|
||||
include S
|
||||
|
||||
let target = Path.(source / "_site")
|
||||
end)
|
14
bin/blog.mli
Normal file
14
bin/blog.mli
Normal file
|
@ -0,0 +1,14 @@
|
|||
module Make_with_target (_ : sig
|
||||
val source : Yocaml.Path.t
|
||||
val target : Yocaml.Path.t
|
||||
end) : sig
|
||||
val target : Yocaml.Path.t
|
||||
val process_all : host:string -> unit Yocaml.Eff.t
|
||||
end
|
||||
|
||||
module Make (_ : sig
|
||||
val source : Yocaml.Path.t
|
||||
end) : sig
|
||||
val target : Yocaml.Path.t
|
||||
val process_all : host:string -> unit Yocaml.Eff.t
|
||||
end
|
24
bin/dune
Normal file
24
bin/dune
Normal file
|
@ -0,0 +1,24 @@
|
|||
(executable
|
||||
(name watch)
|
||||
(libraries
|
||||
yocaml
|
||||
yocaml_syndication
|
||||
yocaml_yaml
|
||||
yocaml_jingoo
|
||||
yocaml_cmarkit
|
||||
yocaml_unix))
|
||||
|
||||
(executable
|
||||
(name push)
|
||||
(libraries
|
||||
fmt.tty
|
||||
logs.fmt
|
||||
git-unix
|
||||
bos
|
||||
yocaml
|
||||
yocaml_git
|
||||
yocaml_syndication
|
||||
yocaml_yaml
|
||||
yocaml_jingoo
|
||||
yocaml_cmarkit
|
||||
yocaml_unix))
|
82
bin/push.ml
Normal file
82
bin/push.ml
Normal file
|
@ -0,0 +1,82 @@
|
|||
let reporter ppf =
|
||||
let report src level ~over k msgf =
|
||||
let k _ =
|
||||
over ();
|
||||
k ()
|
||||
in
|
||||
let with_metadata header _tags k ppf fmt =
|
||||
Format.kfprintf k ppf
|
||||
("%a[%a]: " ^^ fmt ^^ "\n%!")
|
||||
Logs_fmt.pp_header (level, header)
|
||||
Fmt.(styled `Magenta string)
|
||||
(Logs.Src.name src)
|
||||
in
|
||||
msgf @@ fun ?header ?tags fmt -> with_metadata header tags k ppf fmt
|
||||
in
|
||||
{ Logs.report }
|
||||
|
||||
let run_git_rev_parse () =
|
||||
let open Bos in
|
||||
let value = OS.Cmd.run_out
|
||||
Cmd.(v "git" % "describe" % "--always" % "--dirty"
|
||||
% "--exclude=*" % "--abbrev=0")
|
||||
in
|
||||
match OS.Cmd.out_string value with
|
||||
| Ok (value, (_, `Exited 0)) -> Some value
|
||||
| Ok (value, (run_info, _)) ->
|
||||
Logs.warn (fun m -> m "Failed to get commit id: %a: %s"
|
||||
Cmd.pp (OS.Cmd.run_info_cmd run_info)
|
||||
value);
|
||||
None
|
||||
| Error `Msg e ->
|
||||
Logs.warn (fun m -> m "Failed to get commit id: %s" e);
|
||||
None
|
||||
|
||||
let message () =
|
||||
match run_git_rev_parse () with
|
||||
| Some hash -> Fmt.str "Pushed by YOCaml 2 from %s" hash
|
||||
| None -> Fmt.str "Pushed by YOCaml 2"
|
||||
|
||||
let () = Fmt_tty.setup_std_outputs ~style_renderer:`Ansi_tty ~utf_8:true ()
|
||||
let () = Logs.set_reporter (reporter Fmt.stdout)
|
||||
(* let () = Logs.set_level ~all:true (Some Logs.Debug) *)
|
||||
let author = ref "The Robur Team"
|
||||
let email = ref "team@robur.coop"
|
||||
let message = ref (message ())
|
||||
let remote = ref "git@git.robur.coop:robur/blog.robur.coop.git#gh-pages"
|
||||
let host = ref "https://blog.robur.coop"
|
||||
|
||||
module Source = Yocaml_git.From_identity (Yocaml_unix.Runtime)
|
||||
|
||||
let usage =
|
||||
Fmt.str
|
||||
"%s [--message <message>] [--author <author>] [--email <email>] -r \
|
||||
<repository>#<branch>"
|
||||
Sys.argv.(0)
|
||||
|
||||
let specification =
|
||||
[
|
||||
("--message", Arg.Set_string message, "The commit message")
|
||||
; ("--email", Arg.Set_string email, "The email used to craft the commit")
|
||||
; ("-r", Arg.Set_string remote, "The Git repository including #branch, e.g. " ^ !remote)
|
||||
; ("--author", Arg.Set_string author, "The Git commit author")
|
||||
; ("--host", Arg.Set_string host, "The host where the blog is available")
|
||||
]
|
||||
|
||||
let () =
|
||||
Arg.parse specification ignore usage;
|
||||
let author = !author
|
||||
and email = !email
|
||||
and message = !message
|
||||
and remote = !remote in
|
||||
let module Blog = Blog.Make_with_target (struct
|
||||
let source = Yocaml.Path.rel []
|
||||
let target = Yocaml.Path.rel []
|
||||
end) in
|
||||
Yocaml_git.run
|
||||
(module Source)
|
||||
(module Pclock)
|
||||
~context:`SSH ~author ~email ~message ~remote
|
||||
(fun () -> Blog.process_all ~host:!host)
|
||||
|> Lwt_main.run
|
||||
|> Result.iter_error (fun (`Msg err) -> invalid_arg err)
|
15
bin/watch.ml
Normal file
15
bin/watch.ml
Normal file
|
@ -0,0 +1,15 @@
|
|||
let port = ref 8000
|
||||
let usage = Fmt.str "%s [--port <port>]" Sys.argv.(0)
|
||||
|
||||
let specification =
|
||||
[ ("--port", Arg.Set_int port, "The port where we serve the website") ]
|
||||
|
||||
module Dest = Blog.Make (struct
|
||||
let source = Yocaml.Path.rel []
|
||||
end)
|
||||
|
||||
let () =
|
||||
Arg.parse specification ignore usage;
|
||||
let host = Fmt.str "http://localhost:%d" !port in
|
||||
Yocaml_unix.serve ~level:`Info ~target:Dest.target ~port:!port
|
||||
@@ fun () -> Dest.process_all ~host
|
35
blogger.opam
Normal file
35
blogger.opam
Normal file
|
@ -0,0 +1,35 @@
|
|||
opam-version: "2.0"
|
||||
version: "dev"
|
||||
synopsis: " The source code of the generator and the content of my blog, naively using YOCaml "
|
||||
maintainer: "romain.calascibetta@gmail.com"
|
||||
authors: [ "The XHTMLBoy <xhtmlboi@gmail.com>" ]
|
||||
|
||||
build: [
|
||||
[ "dune" "subst" ] {dev}
|
||||
[ "dune" "build" "-p" name "-j" jobs ]
|
||||
[ "dune" "runtest" "-p" name ] {with-test}
|
||||
[ "dune" "build" "@doc" "-p" name ] {with-doc}
|
||||
]
|
||||
|
||||
license: "GPL-3.0-or-later"
|
||||
tags: [ "angry" "cuisine" "nerd" "ocaml" "preface" ]
|
||||
homepage: "https://github.com/dinosaure/blogger"
|
||||
dev-repo: "git://github.com/dinosaure/blogger.git"
|
||||
bug-reports: "https://github.com/dinosaure/blogger/issues"
|
||||
|
||||
depends: [
|
||||
"ocaml" { >= "5.1.0" }
|
||||
"dune" { >= "3.16.0" }
|
||||
"preface" { >= "0.1.0" }
|
||||
"logs" {>= "0.7.0" }
|
||||
"cmdliner" { >= "1.0.0"}
|
||||
"http-lwt-client"
|
||||
"bos"
|
||||
"yocaml" {>= "2.0.1"}
|
||||
"yocaml_unix"
|
||||
"yocaml_yaml"
|
||||
"yocaml_cmarkit"
|
||||
"yocaml_git"
|
||||
"yocaml_jingoo"
|
||||
"yocaml_syndication"
|
||||
]
|
Binary file not shown.
2
dune-project
Normal file
2
dune-project
Normal file
|
@ -0,0 +1,2 @@
|
|||
(lang dune 3.16)
|
||||
(name blogger)
|
118
feed.xml
118
feed.xml
|
@ -1,118 +0,0 @@
|
|||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
|
||||
<channel>
|
||||
<title>The Robur's blog</title>
|
||||
<link>https://blog.robur.coop</link>
|
||||
<description><![CDATA[The Robur cooperative blog]]></description>
|
||||
<atom:link href="https://blog.robur.coop/feed.xml" rel="self" type="application/rss+xml"/>
|
||||
<lastBuildDate>Fri, 25 Oct 2024 00:00:00 GMT</lastBuildDate>
|
||||
<docs>https://www.rssboard.org/rss-specification</docs>
|
||||
<generator>YOCaml</generator>
|
||||
<item>
|
||||
<title>Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel</title>
|
||||
<link>https://blog.robur.coop/articles/dnsvizor01.html</link>
|
||||
<description>
|
||||
<![CDATA[The NGI-funded DNSvizor provides core network services on your network; DNS resolution and DHCP.]]>
|
||||
</description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/dnsvizor01.html</guid>
|
||||
<pubDate>Fri, 25 Oct 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>Runtime arguments in MirageOS</title>
|
||||
<link>https://blog.robur.coop/articles/arguments.html</link>
|
||||
<description><![CDATA[The history of runtime arguments to a MirageOS unikernel]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/arguments.html</guid>
|
||||
<pubDate>Tue, 22 Oct 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>How has robur financially been doing since 2018?</title>
|
||||
<link>https://blog.robur.coop/articles/finances.html</link>
|
||||
<description><![CDATA[How we organise as a collective, and why we're doing that.]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/finances.html</guid>
|
||||
<pubDate>Mon, 21 Oct 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>MirageVPN and OpenVPN</title>
|
||||
<link>https://blog.robur.coop/articles/2024-08-21-OpenVPN-and-MirageVPN.html</link>
|
||||
<description>
|
||||
<![CDATA[Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library]]>
|
||||
</description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/2024-08-21-OpenVPN-and-MirageVPN.html</guid>
|
||||
<pubDate>Wed, 21 Aug 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>The new Tar release, a retrospective</title>
|
||||
<link>https://blog.robur.coop/articles/tar-release.html</link>
|
||||
<description><![CDATA[A little retrospective to the new Tar release and changes]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/tar-release.html</guid>
|
||||
<pubDate>Thu, 15 Aug 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>qubes-miragevpn, a MirageVPN client for QubesOS</title>
|
||||
<link>https://blog.robur.coop/articles/qubes-miragevpn.html</link>
|
||||
<description><![CDATA[A new OpenVPN client for QubesOS]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/qubes-miragevpn.html</guid>
|
||||
<pubDate>Mon, 24 Jun 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>MirageVPN server</title>
|
||||
<link>https://blog.robur.coop/articles/miragevpn-server.html</link>
|
||||
<description><![CDATA[Announcement of our MirageVPN server.]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/miragevpn-server.html</guid>
|
||||
<pubDate>Mon, 17 Jun 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>Speeding up MirageVPN and use it in the wild</title>
|
||||
<link>https://blog.robur.coop/articles/miragevpn-performance.html</link>
|
||||
<description>
|
||||
<![CDATA[Performance engineering of MirageVPN, speeding it up by a factor of 25.]]>
|
||||
</description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/miragevpn-performance.html</guid>
|
||||
<pubDate>Tue, 16 Apr 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>GPTar</title>
|
||||
<link>https://blog.robur.coop/articles/gptar.html</link>
|
||||
<description><![CDATA[Hybrid GUID partition table and tar archive]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/gptar.html</guid>
|
||||
<pubDate>Wed, 21 Feb 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>Speeding elliptic curve cryptography</title>
|
||||
<link>https://blog.robur.coop/articles/speeding-ec-string.html</link>
|
||||
<description>
|
||||
<![CDATA[How we improved the performance of elliptic curves by only modifying the underlying byte array]]>
|
||||
</description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/speeding-ec-string.html</guid>
|
||||
<pubDate>Tue, 13 Feb 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>Cooperation and Lwt.pause</title>
|
||||
<link>https://blog.robur.coop/articles/lwt_pause.html</link>
|
||||
<description><![CDATA[A disgression about Lwt and Miou]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/lwt_pause.html</guid>
|
||||
<pubDate>Sun, 11 Feb 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>Python's `str.__repr__()`</title>
|
||||
<link>https://blog.robur.coop/articles/2024-02-03-python-str-repr.html</link>
|
||||
<description><![CDATA[Reimplementing Python string escaping in OCaml]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/2024-02-03-python-str-repr.html</guid>
|
||||
<pubDate>Sat, 03 Feb 2024 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>MirageVPN updated (AEAD, NCP)</title>
|
||||
<link>https://blog.robur.coop/articles/miragevpn-ncp.html</link>
|
||||
<description><![CDATA[How we resurrected MirageVPN from its bitrot state]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/miragevpn-ncp.html</guid>
|
||||
<pubDate>Mon, 20 Nov 2023 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
<item>
|
||||
<title>MirageVPN & tls-crypt-v2</title>
|
||||
<link>https://blog.robur.coop/articles/miragevpn.html</link>
|
||||
<description><![CDATA[How we implementated tls-crypt-v2 for miragevpn]]></description>
|
||||
<guid isPermaLink="true">https://blog.robur.coop/articles/miragevpn.html</guid>
|
||||
<pubDate>Tue, 14 Nov 2023 00:00:00 GMT</pubDate>
|
||||
</item>
|
||||
</channel>
|
||||
</rss>
|
BIN
images/smtp.jpg
Normal file
BIN
images/smtp.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 128 KiB |
218
index.html
218
index.html
|
@ -1,218 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blogIndex
|
||||
</title>
|
||||
<meta name="description" content="The famous root of the website">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="https://blog.robur.coop/css/style.css">
|
||||
<script src="https://blog.robur.coop/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="https://blog.robur.coop/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a class="small-button rss" href="https://blog.robur.coop/feed.xml">RSS</a><p>The Robur blog.</p>
|
||||
|
||||
<h3>Essays and ramblings</h3>
|
||||
|
||||
<ol reversed class="list-articles"><li>
|
||||
<div class="side">
|
||||
<a href="https://hannes.robur.coop">
|
||||
<img src="https://www.gravatar.com/avatar/25558b4457cf73159f5427fdf2b4a717">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-10-25</span>
|
||||
<a href="https://blog.robur.coop/articles/dnsvizor01.html">Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel</a><br />
|
||||
<p>The NGI-funded DNSvizor provides core network services on your network; DNS resolution and DHCP.</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-DNSvizor">DNSvizor</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://hannes.robur.coop">
|
||||
<img src="https://www.gravatar.com/avatar/25558b4457cf73159f5427fdf2b4a717">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-10-22</span>
|
||||
<a href="https://blog.robur.coop/articles/arguments.html">Runtime arguments in MirageOS</a><br />
|
||||
<p>The history of runtime arguments to a MirageOS unikernel</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://hannes.robur.coop">
|
||||
<img src="https://www.gravatar.com/avatar/25558b4457cf73159f5427fdf2b4a717">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-10-21</span>
|
||||
<a href="https://blog.robur.coop/articles/finances.html">How has robur financially been doing since 2018?</a><br />
|
||||
<p>How we organise as a collective, and why we're doing that.</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-finances">finances</a></li><li><a href="https://blog.robur.coop/tags.html#tag-cooperative">cooperative</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://reyn.ir/">
|
||||
<img src="https://www.gravatar.com/avatar/54a15736b37879bc9708c1618a7cc130">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-08-21</span>
|
||||
<a href="https://blog.robur.coop/articles/2024-08-21-OpenVPN-and-MirageVPN.html">MirageVPN and OpenVPN</a><br />
|
||||
<p>Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-MirageVPN">MirageVPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-OpenVPN">OpenVPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://blog.osau.re">
|
||||
<img src="https://www.gravatar.com/avatar/e243d18f97471424ca390e85820797ac">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-08-15</span>
|
||||
<a href="https://blog.robur.coop/articles/tar-release.html">The new Tar release, a retrospective</a><br />
|
||||
<p>A little retrospective to the new Tar release and changes</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Cstruct">Cstruct</a></li><li><a href="https://blog.robur.coop/tags.html#tag-functors">functors</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://blog.osau.re/">
|
||||
<img src="https://www.gravatar.com/avatar/e243d18f97471424ca390e85820797ac">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-06-24</span>
|
||||
<a href="https://blog.robur.coop/articles/qubes-miragevpn.html">qubes-miragevpn, a MirageVPN client for QubesOS</a><br />
|
||||
<p>A new OpenVPN client for QubesOS</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-vpn">vpn</a></li><li><a href="https://blog.robur.coop/tags.html#tag-unikernel">unikernel</a></li><li><a href="https://blog.robur.coop/tags.html#tag-QubesOS">QubesOS</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://hannes.robur.coop">
|
||||
<img src="https://www.gravatar.com/avatar/25558b4457cf73159f5427fdf2b4a717">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-06-17</span>
|
||||
<a href="https://blog.robur.coop/articles/miragevpn-server.html">MirageVPN server</a><br />
|
||||
<p>Announcement of our MirageVPN server.</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-cryptography">cryptography</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li><li><a href="https://blog.robur.coop/tags.html#tag-VPN">VPN</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://hannes.robur.coop">
|
||||
<img src="https://www.gravatar.com/avatar/25558b4457cf73159f5427fdf2b4a717">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-04-16</span>
|
||||
<a href="https://blog.robur.coop/articles/miragevpn-performance.html">Speeding up MirageVPN and use it in the wild</a><br />
|
||||
<p>Performance engineering of MirageVPN, speeding it up by a factor of 25.</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-cryptography">cryptography</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li><li><a href="https://blog.robur.coop/tags.html#tag-VPN">VPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-performance">performance</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://reyn.ir/">
|
||||
<img src="https://www.gravatar.com/avatar/54a15736b37879bc9708c1618a7cc130">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-02-21</span>
|
||||
<a href="https://blog.robur.coop/articles/gptar.html">GPTar</a><br />
|
||||
<p>Hybrid GUID partition table and tar archive</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-gpt">gpt</a></li><li><a href="https://blog.robur.coop/tags.html#tag-tar">tar</a></li><li><a href="https://blog.robur.coop/tags.html#tag-mbr">mbr</a></li><li><a href="https://blog.robur.coop/tags.html#tag-persistent storage">persistent storage</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://hannes.robur.coop">
|
||||
<img src="https://www.gravatar.com/avatar/25558b4457cf73159f5427fdf2b4a717">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-02-13</span>
|
||||
<a href="https://blog.robur.coop/articles/speeding-ec-string.html">Speeding elliptic curve cryptography</a><br />
|
||||
<p>How we improved the performance of elliptic curves by only modifying the underlying byte array</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-cryptography">cryptography</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://blog.osau.re/">
|
||||
<img src="https://www.gravatar.com/avatar/e243d18f97471424ca390e85820797ac">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-02-11</span>
|
||||
<a href="https://blog.robur.coop/articles/lwt_pause.html">Cooperation and Lwt.pause</a><br />
|
||||
<p>A disgression about Lwt and Miou</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Scheduler">Scheduler</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Community">Community</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Unikernel">Unikernel</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Git">Git</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://reyn.ir/">
|
||||
<img src="https://www.gravatar.com/avatar/54a15736b37879bc9708c1618a7cc130">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2024-02-03</span>
|
||||
<a href="https://blog.robur.coop/articles/2024-02-03-python-str-repr.html">Python's `str.__repr__()`</a><br />
|
||||
<p>Reimplementing Python string escaping in OCaml</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-Python">Python</a></li><li><a href="https://blog.robur.coop/tags.html#tag-unicode">unicode</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://hannes.robur.coop">
|
||||
<img src="https://www.gravatar.com/avatar/25558b4457cf73159f5427fdf2b4a717">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2023-11-20</span>
|
||||
<a href="https://blog.robur.coop/articles/miragevpn-ncp.html">MirageVPN updated (AEAD, NCP)</a><br />
|
||||
<p>How we resurrected MirageVPN from its bitrot state</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-VPN">VPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li><li>
|
||||
<div class="side">
|
||||
<a href="https://reyn.ir/">
|
||||
<img src="https://www.gravatar.com/avatar/54a15736b37879bc9708c1618a7cc130">
|
||||
</a></div>
|
||||
<div class="content">
|
||||
<span class="date">2023-11-14</span>
|
||||
<a href="https://blog.robur.coop/articles/miragevpn.html">MirageVPN & tls-crypt-v2</a><br />
|
||||
<p>How we implementated tls-crypt-v2 for miragevpn</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list"><li><a href="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><a href="https://blog.robur.coop/tags.html#tag-MirageOS">MirageOS</a></li><li><a href="https://blog.robur.coop/tags.html#tag-VPN">VPN</a></li><li><a href="https://blog.robur.coop/tags.html#tag-security">security</a></li></ul>
|
||||
</div>
|
||||
</div>
|
||||
</li></ol>
|
||||
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
6
pages/index.md
Normal file
6
pages/index.md
Normal file
|
@ -0,0 +1,6 @@
|
|||
---
|
||||
title: Index
|
||||
description: The famous root of the website
|
||||
---
|
||||
|
||||
The Robur blog.
|
0
pages/tags.md
Normal file
0
pages/tags.md
Normal file
104
rss1.xml
104
rss1.xml
|
@ -1,104 +0,0 @@
|
|||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
|
||||
<channel rdf:about="https://blog.robur.coop/rss1.xml">
|
||||
<title>The Robur's blog</title>
|
||||
<link>https://blog.robur.coop</link>
|
||||
<description><![CDATA[The Robur cooperative blog]]></description>
|
||||
<items>
|
||||
<rdf:Seq>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/dnsvizor01.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/arguments.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/finances.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/2024-08-21-OpenVPN-and-MirageVPN.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/tar-release.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/qubes-miragevpn.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/miragevpn-server.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/miragevpn-performance.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/gptar.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/speeding-ec-string.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/lwt_pause.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/2024-02-03-python-str-repr.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/miragevpn-ncp.html"/>
|
||||
<rdf:li resource="https://blog.robur.coop/articles/miragevpn.html"/>
|
||||
</rdf:Seq>
|
||||
</items>
|
||||
</channel>
|
||||
<item rdf:about="https://blog.robur.coop/articles/dnsvizor01.html">
|
||||
<title>Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel</title>
|
||||
<link>https://blog.robur.coop/articles/dnsvizor01.html</link>
|
||||
<description>
|
||||
<![CDATA[The NGI-funded DNSvizor provides core network services on your network; DNS resolution and DHCP.]]>
|
||||
</description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/arguments.html">
|
||||
<title>Runtime arguments in MirageOS</title>
|
||||
<link>https://blog.robur.coop/articles/arguments.html</link>
|
||||
<description><![CDATA[The history of runtime arguments to a MirageOS unikernel]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/finances.html">
|
||||
<title>How has robur financially been doing since 2018?</title>
|
||||
<link>https://blog.robur.coop/articles/finances.html</link>
|
||||
<description><![CDATA[How we organise as a collective, and why we're doing that.]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/2024-08-21-OpenVPN-and-MirageVPN.html">
|
||||
<title>MirageVPN and OpenVPN</title>
|
||||
<link>https://blog.robur.coop/articles/2024-08-21-OpenVPN-and-MirageVPN.html</link>
|
||||
<description>
|
||||
<![CDATA[Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library]]>
|
||||
</description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/tar-release.html">
|
||||
<title>The new Tar release, a retrospective</title>
|
||||
<link>https://blog.robur.coop/articles/tar-release.html</link>
|
||||
<description><![CDATA[A little retrospective to the new Tar release and changes]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/qubes-miragevpn.html">
|
||||
<title>qubes-miragevpn, a MirageVPN client for QubesOS</title>
|
||||
<link>https://blog.robur.coop/articles/qubes-miragevpn.html</link>
|
||||
<description><![CDATA[A new OpenVPN client for QubesOS]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/miragevpn-server.html">
|
||||
<title>MirageVPN server</title>
|
||||
<link>https://blog.robur.coop/articles/miragevpn-server.html</link>
|
||||
<description><![CDATA[Announcement of our MirageVPN server.]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/miragevpn-performance.html">
|
||||
<title>Speeding up MirageVPN and use it in the wild</title>
|
||||
<link>https://blog.robur.coop/articles/miragevpn-performance.html</link>
|
||||
<description>
|
||||
<![CDATA[Performance engineering of MirageVPN, speeding it up by a factor of 25.]]>
|
||||
</description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/gptar.html">
|
||||
<title>GPTar</title>
|
||||
<link>https://blog.robur.coop/articles/gptar.html</link>
|
||||
<description><![CDATA[Hybrid GUID partition table and tar archive]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/speeding-ec-string.html">
|
||||
<title>Speeding elliptic curve cryptography</title>
|
||||
<link>https://blog.robur.coop/articles/speeding-ec-string.html</link>
|
||||
<description>
|
||||
<![CDATA[How we improved the performance of elliptic curves by only modifying the underlying byte array]]>
|
||||
</description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/lwt_pause.html">
|
||||
<title>Cooperation and Lwt.pause</title>
|
||||
<link>https://blog.robur.coop/articles/lwt_pause.html</link>
|
||||
<description><![CDATA[A disgression about Lwt and Miou]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/2024-02-03-python-str-repr.html">
|
||||
<title>Python's `str.__repr__()`</title>
|
||||
<link>https://blog.robur.coop/articles/2024-02-03-python-str-repr.html</link>
|
||||
<description><![CDATA[Reimplementing Python string escaping in OCaml]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/miragevpn-ncp.html">
|
||||
<title>MirageVPN updated (AEAD, NCP)</title>
|
||||
<link>https://blog.robur.coop/articles/miragevpn-ncp.html</link>
|
||||
<description><![CDATA[How we resurrected MirageVPN from its bitrot state]]></description>
|
||||
</item>
|
||||
<item rdf:about="https://blog.robur.coop/articles/miragevpn.html">
|
||||
<title>MirageVPN & tls-crypt-v2</title>
|
||||
<link>https://blog.robur.coop/articles/miragevpn.html</link>
|
||||
<description><![CDATA[How we implementated tls-crypt-v2 for miragevpn]]></description>
|
||||
</item>
|
||||
</rdf:RDF>
|
163
tags.html
163
tags.html
|
@ -1,163 +0,0 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blog
|
||||
</title>
|
||||
<meta name="description" content="">
|
||||
<link type="text/css" rel="stylesheet" href="/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="/css/style.css">
|
||||
<script src="/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main><a href="/index.html">Back to index</a>
|
||||
|
||||
<ul class="tags-list aeration"><li><a href="#tag-Community">Community</a></li><li><a href="#tag-Cstruct">Cstruct</a></li><li><a href="#tag-DNSvizor">DNSvizor</a></li><li><a href="#tag-Git">Git</a></li><li><a href="#tag-MirageOS">MirageOS</a></li><li><a href="#tag-MirageVPN">MirageVPN</a></li><li><a href="#tag-OCaml">OCaml</a></li><li><a href="#tag-OpenVPN">OpenVPN</a></li><li><a href="#tag-Python">Python</a></li><li><a href="#tag-QubesOS">QubesOS</a></li><li><a href="#tag-Scheduler">Scheduler</a></li><li><a href="#tag-Unikernel">Unikernel</a></li><li><a href="#tag-VPN">VPN</a></li><li><a href="#tag-cooperative">cooperative</a></li><li><a href="#tag-cryptography">cryptography</a></li><li><a href="#tag-finances">finances</a></li><li><a href="#tag-functors">functors</a></li><li><a href="#tag-gpt">gpt</a></li><li><a href="#tag-mbr">mbr</a></li><li><a href="#tag-performance">performance</a></li><li><a href="#tag-persistent storage">persistent storage</a></li><li><a href="#tag-security">security</a></li><li><a href="#tag-tar">tar</a></li><li><a href="#tag-unicode">unicode</a></li><li><a href="#tag-unikernel">unikernel</a></li><li><a href="#tag-vpn">vpn</a></li></ul><div class="tag-box" id="tag-Community">
|
||||
<h3>
|
||||
<span>Community</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/lwt_pause.html">Cooperation and Lwt.pause</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-Cstruct">
|
||||
<h3>
|
||||
<span>Cstruct</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/tar-release.html">The new Tar release, a retrospective</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-DNSvizor">
|
||||
<h3>
|
||||
<span>DNSvizor</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/dnsvizor01.html">Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-Git">
|
||||
<h3>
|
||||
<span>Git</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/lwt_pause.html">Cooperation and Lwt.pause</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-MirageOS">
|
||||
<h3>
|
||||
<span>MirageOS</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/miragevpn.html">MirageVPN & tls-crypt-v2</a></li><li><a href="/articles/miragevpn-ncp.html">MirageVPN updated (AEAD, NCP)</a></li><li><a href="/articles/speeding-ec-string.html">Speeding elliptic curve cryptography</a></li><li><a href="/articles/miragevpn-performance.html">Speeding up MirageVPN and use it in the wild</a></li><li><a href="/articles/miragevpn-server.html">MirageVPN server</a></li><li><a href="/articles/arguments.html">Runtime arguments in MirageOS</a></li><li><a href="/articles/dnsvizor01.html">Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-MirageVPN">
|
||||
<h3>
|
||||
<span>MirageVPN</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/2024-08-21-OpenVPN-and-MirageVPN.html">MirageVPN and OpenVPN</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-OCaml">
|
||||
<h3>
|
||||
<span>OCaml</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/miragevpn.html">MirageVPN & tls-crypt-v2</a></li><li><a href="/articles/miragevpn-ncp.html">MirageVPN updated (AEAD, NCP)</a></li><li><a href="/articles/2024-02-03-python-str-repr.html">Python's `str.__repr__()`</a></li><li><a href="/articles/lwt_pause.html">Cooperation and Lwt.pause</a></li><li><a href="/articles/speeding-ec-string.html">Speeding elliptic curve cryptography</a></li><li><a href="/articles/gptar.html">GPTar</a></li><li><a href="/articles/miragevpn-performance.html">Speeding up MirageVPN and use it in the wild</a></li><li><a href="/articles/miragevpn-server.html">MirageVPN server</a></li><li><a href="/articles/qubes-miragevpn.html">qubes-miragevpn, a MirageVPN client for QubesOS</a></li><li><a href="/articles/tar-release.html">The new Tar release, a retrospective</a></li><li><a href="/articles/arguments.html">Runtime arguments in MirageOS</a></li><li><a href="/articles/dnsvizor01.html">Meet DNSvizor: run your own DHCP and DNS MirageOS unikernel</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-OpenVPN">
|
||||
<h3>
|
||||
<span>OpenVPN</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/2024-08-21-OpenVPN-and-MirageVPN.html">MirageVPN and OpenVPN</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-Python">
|
||||
<h3>
|
||||
<span>Python</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/2024-02-03-python-str-repr.html">Python's `str.__repr__()`</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-QubesOS">
|
||||
<h3>
|
||||
<span>QubesOS</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/qubes-miragevpn.html">qubes-miragevpn, a MirageVPN client for QubesOS</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-Scheduler">
|
||||
<h3>
|
||||
<span>Scheduler</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/lwt_pause.html">Cooperation and Lwt.pause</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-Unikernel">
|
||||
<h3>
|
||||
<span>Unikernel</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/lwt_pause.html">Cooperation and Lwt.pause</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-VPN">
|
||||
<h3>
|
||||
<span>VPN</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/miragevpn.html">MirageVPN & tls-crypt-v2</a></li><li><a href="/articles/miragevpn-ncp.html">MirageVPN updated (AEAD, NCP)</a></li><li><a href="/articles/miragevpn-performance.html">Speeding up MirageVPN and use it in the wild</a></li><li><a href="/articles/miragevpn-server.html">MirageVPN server</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-cooperative">
|
||||
<h3>
|
||||
<span>cooperative</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/finances.html">How has robur financially been doing since 2018?</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-cryptography">
|
||||
<h3>
|
||||
<span>cryptography</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/speeding-ec-string.html">Speeding elliptic curve cryptography</a></li><li><a href="/articles/miragevpn-performance.html">Speeding up MirageVPN and use it in the wild</a></li><li><a href="/articles/miragevpn-server.html">MirageVPN server</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-finances">
|
||||
<h3>
|
||||
<span>finances</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/finances.html">How has robur financially been doing since 2018?</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-functors">
|
||||
<h3>
|
||||
<span>functors</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/tar-release.html">The new Tar release, a retrospective</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-gpt">
|
||||
<h3>
|
||||
<span>gpt</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/gptar.html">GPTar</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-mbr">
|
||||
<h3>
|
||||
<span>mbr</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/gptar.html">GPTar</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-performance">
|
||||
<h3>
|
||||
<span>performance</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/miragevpn-performance.html">Speeding up MirageVPN and use it in the wild</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-persistent storage">
|
||||
<h3>
|
||||
<span>persistent storage</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/gptar.html">GPTar</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-security">
|
||||
<h3>
|
||||
<span>security</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/miragevpn.html">MirageVPN & tls-crypt-v2</a></li><li><a href="/articles/miragevpn-ncp.html">MirageVPN updated (AEAD, NCP)</a></li><li><a href="/articles/speeding-ec-string.html">Speeding elliptic curve cryptography</a></li><li><a href="/articles/miragevpn-performance.html">Speeding up MirageVPN and use it in the wild</a></li><li><a href="/articles/miragevpn-server.html">MirageVPN server</a></li><li><a href="/articles/2024-08-21-OpenVPN-and-MirageVPN.html">MirageVPN and OpenVPN</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-tar">
|
||||
<h3>
|
||||
<span>tar</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/gptar.html">GPTar</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-unicode">
|
||||
<h3>
|
||||
<span>unicode</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/2024-02-03-python-str-repr.html">Python's `str.__repr__()`</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-unikernel">
|
||||
<h3>
|
||||
<span>unikernel</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/qubes-miragevpn.html">qubes-miragevpn, a MirageVPN client for QubesOS</a></li></ul>
|
||||
</div><div class="tag-box" id="tag-vpn">
|
||||
<h3>
|
||||
<span>vpn</span>
|
||||
</h3>
|
||||
<ul><li><a href="/articles/qubes-miragevpn.html">qubes-miragevpn, a MirageVPN client for QubesOS</a></li></ul>
|
||||
</div>
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
13
templates/article.html
Normal file
13
templates/article.html
Normal file
|
@ -0,0 +1,13 @@
|
|||
<a href="/index.html">Back to index</a>
|
||||
|
||||
<article>
|
||||
<h1>{{ title }}</h1>
|
||||
<ul class="tags-list">
|
||||
{%- for tag in tags -%}
|
||||
<li><a href="/tags.html#tag-{{ tag }}">{{ tag }}</a></li>
|
||||
{%- endfor -%}
|
||||
</ul>
|
||||
{%- autoescape false -%}
|
||||
{{ yocaml_body }}
|
||||
{% endautoescape -%}
|
||||
</article>
|
36
templates/index.html
Normal file
36
templates/index.html
Normal file
|
@ -0,0 +1,36 @@
|
|||
<a class="small-button rss" href="/feed.xml">RSS</a>
|
||||
|
||||
{%- autoescape false -%}
|
||||
{{ yocaml_body }}
|
||||
{% endautoescape -%}
|
||||
|
||||
<h3>Essays and ramblings</h3>
|
||||
|
||||
<ol reversed class="list-articles">
|
||||
{%- for article in articles -%}
|
||||
<li>
|
||||
<div class="side">
|
||||
<a href="{{ article.author.link }}">
|
||||
<img src="{{ article.author.avatar }}">
|
||||
</a>
|
||||
{%- for co_author in article.co_authors -%}
|
||||
<a href="{{ coauthor.author.link }}">
|
||||
<img src="{{ coauthor.author.avatar }}">
|
||||
</a>
|
||||
{%- endfor -%}
|
||||
</div>
|
||||
<div class="content">
|
||||
<span class="date">{{ article.date.human }}</span>
|
||||
<a href="{{ article.url }}">{{ article.title }}</a><br />
|
||||
<p>{{ article.description }}</p>
|
||||
<div class="bottom">
|
||||
<ul class="tags-list">
|
||||
{%- for tag in article.tags -%}
|
||||
<li><a href="/tags.html#tag-{{ tag }}">{{ tag }}</a></li>
|
||||
{%- endfor -%}
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</li>
|
||||
{%- endfor -%}
|
||||
</ol>
|
34
templates/layout.html
Normal file
34
templates/layout.html
Normal file
|
@ -0,0 +1,34 @@
|
|||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<title>
|
||||
Robur's blog - {{ title }}
|
||||
</title>
|
||||
<meta name="description" content="{{ description }}">
|
||||
<link type="text/css" rel="stylesheet" href="/css/hl.css">
|
||||
<link type="text/css" rel="stylesheet" href="/css/style.css">
|
||||
<script src="/js/hl.js"></script>
|
||||
<link rel="alternate" type="application/rss+xml" href="/feed.xml" title="blog.robur.coop">
|
||||
</head>
|
||||
<body>
|
||||
<header>
|
||||
<h1>blog.robur.coop</h1>
|
||||
<blockquote>
|
||||
The <strong>Robur</strong> cooperative blog.
|
||||
</blockquote>
|
||||
</header>
|
||||
<main>
|
||||
{%- autoescape false -%}
|
||||
{{ yocaml_body }}
|
||||
{% endautoescape -%}
|
||||
</main>
|
||||
<footer>
|
||||
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||||
<br />
|
||||
</footer>
|
||||
<script>hljs.highlightAll();</script>
|
||||
</body>
|
||||
</html>
|
20
templates/tags.html
Normal file
20
templates/tags.html
Normal file
|
@ -0,0 +1,20 @@
|
|||
<a href="/index.html">Back to index</a>
|
||||
|
||||
<ul class="tags-list aeration">
|
||||
{%- for tag in all_tags -%}
|
||||
<li><a href="#tag-{{ tag.name }}">{{ tag.name }}</a></li>
|
||||
{%- endfor -%}
|
||||
</ul>
|
||||
|
||||
{%- for tag in all_tags -%}
|
||||
<div class="tag-box" id="tag-{{ tag.name }}">
|
||||
<h3>
|
||||
<span>{{ tag.name }}</span>
|
||||
</h3>
|
||||
<ul>
|
||||
{%- for article in tag.articles -%}
|
||||
<li><a href="{{ article.url }}">{{ article.title }}</a></li>
|
||||
{%- endfor -%}
|
||||
</ul>
|
||||
</div>
|
||||
{%- endfor -%}
|
7
update.sh
Executable file
7
update.sh
Executable file
|
@ -0,0 +1,7 @@
|
|||
#!/bin/sh
|
||||
|
||||
opam exec -- dune exec bin/push.exe --
|
||||
-r git@git.robur.coop:robur/blog.robur.coop.git#gh-pages \
|
||||
--host https://blog.robur.coop \
|
||||
--name "The Robur team" \
|
||||
--email team@robur.coop
|
Loading…
Reference in a new issue