RFC: docker for CI

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFC: docker for CI

hanwenn
Proposal: rather than using the patchy scripts for validating
LilyPond, we use docker images.

General idea
============

There is a script ("driver") that drives docker running on a dedicated
build machine ("host").

There are several images:

* The base dev image.

The base image is based on some stripped Linux distribution, with all
the devtools necessary for compiling LilyPond. In addition, it
contains a copy of ccache, and a git clone of the LilyPond sourcecode

* The base release image for a specific git commit.

The procedure to build it is as follows:

  * take the base dev image
  * fetch the git commit
  * runs (make ; make test-baseline)
  * runs (make dist-clean)

This saves the result as a docker image. The Docker image now contains
a clean lilypond tree, the C++ compilation results (in ccache), and a
test baseline.

The base release image is made at official LilyPond releases, or at
any release that has a new graphical regtest result


CI: build binary
================

Given a proposed change (as git commit):

 * take base release image
 * run (make; make doc) >& log-file

On success, the driver saves the result as a docker image, tagged with the
commit sha1.

On failure, the driver uploads the last bit of the log-file to our code
review system.


CI: regtest
===========

Given a proposed change (as git commit)

  * take CI build image
  * run (make check >& log-file)
  * use a headless browser to take a image snapshot of the top of regtest
result page.


On success, the driver uploads the image snapshot to code review.

On failure, the driver uploads the last bit of the log-file to code review.


Considerations
==============

* Because the build happens inside a container, we can test multiple
  builds. We could build against guile 1.8 and 2.2 at the same time,
  for example

* Because the "build binary" step reuses CCache results, it can
  complete quickly.

* The regtest continues to be expensive to compute. In the future, I
  hope it would not need a human to kick it off or post results back
  into review, but likely, it should require a manual step in the
  review process to kick off, eg. in Gerrit "Run-Regtest" +1 vote.

* For security, the host should use https://github.com/google/gvisor
  to avoid being hacked by malicious code in proposed changes.

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dan Eble
On Feb 7, 2020, at 07:21, Han-Wen Nienhuys <[hidden email]> wrote:
>
> contains a copy of ccache

ccache is interesting.  It speeds up recompiling files after a make clean, which is great if you often have to make clean because your makefile is broken, or if you often reconfigure your build options (e.g. debug v. release), but doesn't improve regular incremental builds.  There are some documented pitfalls that I can't remember off the top of my head.

At work in 2019, I added ccache to a Docker-based build at the request of one of the developers.  It worked fine when hosted on PCs but failed on the CI servers with symptoms that did not make obvious sense and which I couldn't justify spending time to investigate.  

That's my experience with ccache and Docker.  Cool, but watch out.

Dan


Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dan Eble
In reply to this post by hanwenn
On Feb 7, 2020, at 07:21, Han-Wen Nienhuys <[hidden email]> wrote:
>
>  * runs (make ; make test-baseline)

If this says "(make ;" because you think that "make test-baseline" requires a prior make, I think it is incorrect.  (If "make test-baseline" doesn't work on its own, it should be fixed.)

If this says "(make ;" because you want to build everything else that "make test-baseline" might not build, then it's fine, though maybe it should be "make && make test-baseline" or two separate steps, depending on how errors are detected and reported.

Dan


Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Werner LEMBERG
In reply to this post by hanwenn

> * Because the build happens inside a container, we can test multiple
>   builds. We could build against guile 1.8 and 2.2 at the same time,
>   for example.

I have zero experience with docker, but your suggestions sound quite
interesting!

Regarding image comparison: The Chromium team uses FreeType HEAD for
building; they also have collected a large set of graphical images
(mostly browser rendering snapshots) for comparison purposes.  If
something happens on the FreeType side, this is, if rendering
differences occur, their build process emits an alert, and manual
inspection takes place.

To me, this seems very similar to what LilyPond needs.  Since you are
at Google it probably makes sense to check what they have done to
automate everything.  My main contact person is Dominik Röttsches
<[hidden email]>; I guess you could talk to him.


    Werner
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dan Eble
In reply to this post by hanwenn
On Feb 7, 2020, at 07:21, Han-Wen Nienhuys <[hidden email]> wrote:
>  * use a headless browser to take a image snapshot of the top of regtest
> result page.

Sounds convoluted.  Why not attach the difference images directly?

> On success, the driver uploads the image snapshot to code review.
>
> On failure, the driver uploads the last bit of the log-file to code review.

Are full logs and test results retained, or does a developer need to reproduce the test locally to get them?

Dan


Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

hanwenn
On Fri, Feb 7, 2020 at 9:12 PM Dan Eble <[hidden email]> wrote:

> On Feb 7, 2020, at 07:21, Han-Wen Nienhuys <[hidden email]> wrote:
> >  * use a headless browser to take a image snapshot of the top of regtest
> > result page.
>
> Sounds convoluted.  Why not attach the difference images directly?
>

Those are potentially 1372 images to attach if you made a change with
global impact.

> On success, the driver uploads the image snapshot to code review.
> >
> > On failure, the driver uploads the last bit of the log-file to code
> review.
>
> Are full logs and test results retained, or does a developer need to
> reproduce the test locally to get them?
>

You'd retain the full logs and results as part of the docker image.
Currently my checkout is about 1.8G of data, and a lilypond docker image
itself would be close  to that too. Artifacts that large will be painful to
upload/download/store. (Not to mention network egress, should one host them
on AWS or GCP), so that is why I suggest to only post small bits of it.



> —
> Dan
>
>

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Kevin Barry
In reply to this post by hanwenn
On Fri, Feb 07, 2020 at 01:21:35PM +0100, Han-Wen Nienhuys wrote:
> Proposal: rather than using the patchy scripts for validating
> LilyPond, we use docker images.
Without getting into technical details, I think this is a really good
idea. Automatic building/testing saves lots of time, and having a
containerised build environment means it can be as portable as a single
dockerfile (or one for each version of guile, if that is what you were
thinking).

Kevin

P.S. I think I have seen a dockerfile for creating a build environment
for LilyPond somewhere. I wonder whether an official docker file would
be beneficial in its own right - at least for Linux users, it would
decouple the build environment from the OS packages.

Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

hanwenn
In reply to this post by Dan Eble
On Fri, Feb 7, 2020 at 8:55 PM Dan Eble <[hidden email]> wrote:

> On Feb 7, 2020, at 07:21, Han-Wen Nienhuys <[hidden email]> wrote:
> >
> >  * runs (make ; make test-baseline)
>
> If this says "(make ;" because you think that "make test-baseline"
> requires a prior make, I think it is incorrect.  (If "make test-baseline"
> doesn't work on its own, it should be fixed.)
>
> If this says "(make ;" because you want to build everything else that
> "make test-baseline" might not build, then it's fine, though maybe it
> should be "make && make test-baseline" or two separate steps, depending on
> how errors are detected and reported.
>

Oh, it was a typo. Yes, make && make test-baseline is what I meant.

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dan Eble
In reply to this post by hanwenn
On Feb 7, 2020, at 15:21, Han-Wen Nienhuys <[hidden email]> wrote:
>>>   * use a headless browser to take a image snapshot of the top of regtest
>>>  result page.
>>>
>> Sounds convoluted.  Why not attach the difference images directly?
>  
> Those are potentially 1372 images to attach if you made a change with global impact.

Why not attach the N images with the greatest differences directly?

More generally, I'd want a digest of the results (not all of which are visual) that is as useful as possible for the size we are willing to post to the review.  We control output-distance.py, so we could generate something new that fits this case.

>> Are full logs and test results retained, or does a developer need to reproduce the test locally to get them?
>
> You'd retain the full logs and results as part of the docker image. Currently my checkout is about 1.8G of data, and a lilypond docker image itself would be close  to that too.

This approach is new to me.  I'm used to CI systems that are configured to archive particular files from the workspace (e.g., the regtest output tree, the final docs) and full build log for a limited time (days to weeks).  I think it balances the types of things you can investigate without reproducing the build yourself against retaining a huge amount of data.

Can you expand on the purpose of saving the full Docker image--which is not just the LilyPond workspace but the OS too, correct?  Are you thinking that someone would prefer to download it and debug in a container rather than reproduce the build in their usual development environment?

Dan


Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dan Eble
In reply to this post by Kevin Barry
On Feb 7, 2020, at 15:22, Kevin Barry <[hidden email]> wrote:
> P.S. I think I have seen a dockerfile for creating a build environment
> for LilyPond somewhere.

https://github.com/fedelibre/LilyDev/tree/master/docker
I'm using it.  I'm not sure who else is.

Dan


Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

hanwenn
In reply to this post by Dan Eble
On Fri, Feb 7, 2020 at 9:48 PM Dan Eble <[hidden email]> wrote:

> On Feb 7, 2020, at 15:21, Han-Wen Nienhuys <[hidden email]> wrote:
> >>>   * use a headless browser to take a image snapshot of the top of
> regtest
> >>>  result page.
> >>>
> >> Sounds convoluted.  Why not attach the difference images directly?
> >
> > Those are potentially 1372 images to attach if you made a change with
> global impact.
>
> Why not attach the N images with the greatest differences directly?
>
> More generally, I'd want a digest of the results (not all of which are
> visual) that is as useful as possible for the size we are willing to post
> to the review.  We control output-distance.py, so we could generate
> something new that fits this case.
>
>
More work , and I'm lazy :)

but yes, you are right. We could potentially do somehting more clever here.


> >> Are full logs and test results retained, or does a developer need to
> reproduce the test locally to get them?
> >
> > You'd retain the full logs and results as part of the docker image.
> Currently my checkout is about 1.8G of data, and a lilypond docker image
> itself would be close  to that too.
>
> This approach is new to me.  I'm used to CI systems that are configured to
> archive particular files from the workspace (e.g., the regtest output tree,
> the final docs) and full build log for a limited time (days to weeks).  I
> think it balances the types of things you can investigate without
> reproducing the build yourself against retaining a huge amount of data.
>

IIRC, the regtest output tree is also fairly large.

Can you expand on the purpose of saving the full Docker image--which is not
> just the LilyPond workspace but the OS too, correct?  Are you thinking that
> someone would prefer to download it and debug in a container rather than
> reproduce the build in their usual development environment?
>

You would save the output of the binary build, because it's the input to
the regtest.

The OS would be in a different layer, so we wouldn't shipping around OS
images, but my hunch is that it will be a lot of data to ship around. I
haven't measured though.


> —
> Dan
>
>

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dan Eble
On Feb 7, 2020, at 16:23, Han-Wen Nienhuys <[hidden email]> wrote:
>
> More work , and I'm lazy :)

No problem!  Jonas is probably bored now that there's nothing left to port to Python 3.

Dan
I aim to communicate with empathy.  Have I failed?  ¯\_(ツ)_/¯


Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dev mailing list
In reply to this post by hanwenn
Am Freitag, den 07.02.2020, 13:21 +0100 schrieb Han-Wen Nienhuys:

> Proposal: rather than using the patchy scripts for validating
> LilyPond, we use docker images.
>
> General idea
> ============
>
> There is a script ("driver") that drives docker running on a dedicated
> build machine ("host").
>
> There are several images:
>
> * The base dev image.
>
> The base image is based on some stripped Linux distribution, with all
> the devtools necessary for compiling LilyPond. In addition, it
> contains a copy of ccache, and a git clone of the LilyPond sourcecode
>
> * The base release image for a specific git commit.
>
> The procedure to build it is as follows:
>
>   * take the base dev image
>   * fetch the git commit
>   * runs (make ; make test-baseline)
>   * runs (make dist-clean)
>
> This saves the result as a docker image. The Docker image now contains
> a clean lilypond tree, the C++ compilation results (in ccache), and a
> test baseline.
>
> The base release image is made at official LilyPond releases, or at
> any release that has a new graphical regtest result
>
>
> CI: build binary
> ================
>
> Given a proposed change (as git commit):
>
>  * take base release image
>  * run (make; make doc) >& log-file
>
> On success, the driver saves the result as a docker image, tagged with the
> commit sha1.
>
> On failure, the driver uploads the last bit of the log-file to our code
> review system.
>
>
> CI: regtest
> ===========
>
> Given a proposed change (as git commit)
>
>   * take CI build image
>   * run (make check >& log-file)
>   * use a headless browser to take a image snapshot of the top of regtest
> result page.
>
>
> On success, the driver uploads the image snapshot to code review.
>
> On failure, the driver uploads the last bit of the log-file to code review.
>
>
> Considerations
> ==============
>
> * Because the build happens inside a container, we can test multiple
>   builds. We could build against guile 1.8 and 2.2 at the same time,
>   for example
I don't agree that we need containers for this, you can easily set
environment variables to make configure pick up the version you want to
use.

> * Because the "build binary" step reuses CCache results, it can
>   complete quickly.

Maybe I don't fully understand the proposal, but:
 * if we only build the release image for every "official" tag, it will
not provide quicker builds - especially towards the end of a cycle when
many changes have accumulated.
 * if instead we build images for every commit, then incremental
building of a provided patch will be fast(er) (_if_ it doesn't touch
any header file). But what's then the point of using ccache, we can
just trigger a full build?

Jonas

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

David Kastrup
Jonas Hahnfeld via Discussions on LilyPond development
<[hidden email]> writes:

> Am Freitag, den 07.02.2020, 13:21 +0100 schrieb Han-Wen Nienhuys:
>>
>> Considerations
>> ==============
>>
>> * Because the build happens inside a container, we can test multiple
>>   builds. We could build against guile 1.8 and 2.2 at the same time,
>>   for example
>
> I don't agree that we need containers for this, you can easily set
> environment variables to make configure pick up the version you want to
> use.

I use stuff like

./configure GUILE_CONFIG=/usr/local/tmp/guile-1.8/bin/guile-config GUILE=/usr/bin/guile

all the time.

>> * Because the "build binary" step reuses CCache results, it can
>>   complete quickly.
>
> Maybe I don't fully understand the proposal, but:
>  * if we only build the release image for every "official" tag, it will
> not provide quicker builds - especially towards the end of a cycle when
> many changes have accumulated.
>  * if instead we build images for every commit, then incremental
> building of a provided patch will be fast(er) (_if_ it doesn't touch
> any header file). But what's then the point of using ccache, we can
> just trigger a full build?

Full builds are slower.  But I really don't trust our dependencies all
too much, and for example Clang builds don't get a working set of
dependencies anyway (which is sort of curious since it is the modular
Clang that should be able to parse for them easily).

--
David Kastrup

Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dev mailing list
Am Samstag, den 08.02.2020, 13:51 +0100 schrieb David Kastrup:

> Jonas Hahnfeld via Discussions on LilyPond development
> <
> [hidden email]
> > writes:
>
> > Am Freitag, den 07.02.2020, 13:21 +0100 schrieb Han-Wen Nienhuys:
> > > Considerations
> > > ==============
> > >
> > > * Because the build happens inside a container, we can test multiple
> > >   builds. We could build against guile 1.8 and 2.2 at the same time,
> > >   for example
> >
> > I don't agree that we need containers for this, you can easily set
> > environment variables to make configure pick up the version you want to
> > use.
>
> I use stuff like
>
> ./configure GUILE_CONFIG=/usr/local/tmp/guile-1.8/bin/guile-config GUILE=/usr/bin/guile
>
> all the time.
Exactly.

> > > * Because the "build binary" step reuses CCache results, it can
> > >   complete quickly.
> >
> > Maybe I don't fully understand the proposal, but:
> >  * if we only build the release image for every "official" tag, it will
> > not provide quicker builds - especially towards the end of a cycle when
> > many changes have accumulated.
> >  * if instead we build images for every commit, then incremental
> > building of a provided patch will be fast(er) (_if_ it doesn't touch
> > any header file). But what's then the point of using ccache, we can
> > just trigger a full build?
>
> Full builds are slower.
True, but my point is that it doesn't matter: You have to do a full
build to populate ccache; or you just build with the changes already
applied, what's the difference?

Jonas

> But I really don't trust our dependencies all
> too much, and for example Clang builds don't get a working set of
> dependencies anyway (which is sort of curious since it is the modular
> Clang that should be able to parse for them easily).

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

hanwenn
On Sat, Feb 8, 2020 at 2:05 PM Jonas Hahnfeld <[hidden email]> wrote:

>
> > >  * if instead we build images for every commit, then incremental
> > > building of a provided patch will be fast(er) (_if_ it doesn't touch
> > > any header file). But what's then the point of using ccache, we can
> > > just trigger a full build?
> >
> > Full builds are slower.
>
> True, but my point is that it doesn't matter: You have to do a full
> build to populate ccache; or you just build with the changes already
> applied, what's the difference?
>
>
the point is that you can take a snapshot of the full build at a point in
time.  As long as the C++ code doesn't change dramatically between that
point and the commit to be tested, you'd get cache hits on a "clean" build
at a new commit, making the whole thing faster.

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

hanwenn
In reply to this post by Dan Eble
On Fri, Feb 7, 2020 at 9:48 PM Dan Eble <[hidden email]> wrote:

> On Feb 7, 2020, at 15:21, Han-Wen Nienhuys <[hidden email]> wrote:
> >>>   * use a headless browser to take a image snapshot of the top of
> regtest
> >>>  result page.
> >>>
> >> Sounds convoluted.  Why not attach the difference images directly?
> >
> > Those are potentially 1372 images to attach if you made a change with
> global impact.
>
> Why not attach the N images with the greatest differences directly?
>
> More generally, I'd want a digest of the results (not all of which are
> visual) that is as useful as possible for the size we are willing to post
> to the review.  We control output-distance.py, so we could generate
> something new that fits this case.
>
>
come to think of it, most of the automated infrastructure (eg. travis,
Google cloud build etc.) operates in terms of builds in containers, and
when something happens, you get the final log file as diagnostic output.
The process in the container doesn't have access to a credential, so it
cannot post anything to Github/gerrit/gitlab/etc.

So it's best if we can make the whole process give diagnostics as ASCII
build logs.

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

Dev mailing list
In reply to this post by hanwenn
Am Samstag, den 08.02.2020, 19:18 +0100 schrieb Han-Wen Nienhuys:

>
>
> On Sat, Feb 8, 2020 at 2:05 PM Jonas Hahnfeld <[hidden email]> wrote:
> > > >  * if instead we build images for every commit, then incremental
> > > > building of a provided patch will be fast(er) (_if_ it doesn't touch
> > > > any header file). But what's then the point of using ccache, we can
> > > > just trigger a full build?
> > >
> > > Full builds are slower.
> >
> > True, but my point is that it doesn't matter: You have to do a full
> > build to populate ccache; or you just build with the changes already
> > applied, what's the difference?
> >
>
> the point is that you can take a snapshot of the full build at a point in time.  As long as the C++ code doesn't change dramatically between that point and the commit to be tested, you'd get cache hits on a "clean" build at a new commit, making the whole thing faster.
So you do intend to create a new "base release image" for every commit?
The initial proposal had
> The base release image is made at official LilyPond releases, or at
> any release that has a new graphical regtest result
which means we will have "dramatic" changes of the C++ code later in
the cycle.

Jonas

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

hanwenn
On Sat, Feb 8, 2020 at 7:24 PM Jonas Hahnfeld <[hidden email]> wrote:

> > the point is that you can take a snapshot of the full build at a point
> in time.  As long as the C++ code doesn't change dramatically between that
> point and the commit to be tested, you'd get cache hits on a "clean" build
> at a new commit, making the whole thing faster.
>
> So you do intend to create a new "base release image" for every commit?
> The initial proposal had
> > The base release image is made at official LilyPond releases, or at
> > any release that has a new graphical regtest result
> which means we will have "dramatic" changes of the C++ code later in
> the cycle.
>

We'd make them as often as necessary. My thinking is that we don't have to
do this on every commit.



> Jonas
>


--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: RFC: docker for CI

janek.lilypond
In reply to this post by hanwenn
In principle, thumbs up. However, I think it's essential that we don't try
to do too much at once; I'd suggest to focus on one most important aspect
first. To do that, I'd like to ask a helper question: what are 2 most
important reasons for using Docker instead of Patchy? In other words, what
are 2 things that we want to be able to do which are impossible/difficult
with Patchy? (your proposal tells about the "what?" and the "how?", I'd
like to know the "why?")

cheers :-)
Janek



pt., 7 lut 2020 o 13:21 Han-Wen Nienhuys <[hidden email]> napisał(a):

> Proposal: rather than using the patchy scripts for validating
> LilyPond, we use docker images.
>
> General idea
> ============
>
> There is a script ("driver") that drives docker running on a dedicated
> build machine ("host").
>
> There are several images:
>
> * The base dev image.
>
> The base image is based on some stripped Linux distribution, with all
> the devtools necessary for compiling LilyPond. In addition, it
> contains a copy of ccache, and a git clone of the LilyPond sourcecode
>
> * The base release image for a specific git commit.
>
> The procedure to build it is as follows:
>
>   * take the base dev image
>   * fetch the git commit
>   * runs (make ; make test-baseline)
>   * runs (make dist-clean)
>
> This saves the result as a docker image. The Docker image now contains
> a clean lilypond tree, the C++ compilation results (in ccache), and a
> test baseline.
>
> The base release image is made at official LilyPond releases, or at
> any release that has a new graphical regtest result
>
>
> CI: build binary
> ================
>
> Given a proposed change (as git commit):
>
>  * take base release image
>  * run (make; make doc) >& log-file
>
> On success, the driver saves the result as a docker image, tagged with the
> commit sha1.
>
> On failure, the driver uploads the last bit of the log-file to our code
> review system.
>
>
> CI: regtest
> ===========
>
> Given a proposed change (as git commit)
>
>   * take CI build image
>   * run (make check >& log-file)
>   * use a headless browser to take a image snapshot of the top of regtest
> result page.
>
>
> On success, the driver uploads the image snapshot to code review.
>
> On failure, the driver uploads the last bit of the log-file to code review.
>
>
> Considerations
> ==============
>
> * Because the build happens inside a container, we can test multiple
>   builds. We could build against guile 1.8 and 2.2 at the same time,
>   for example
>
> * Because the "build binary" step reuses CCache results, it can
>   complete quickly.
>
> * The regtest continues to be expensive to compute. In the future, I
>   hope it would not need a human to kick it off or post results back
>   into review, but likely, it should require a manual step in the
>   review process to kick off, eg. in Gerrit "Run-Regtest" +1 vote.
>
> * For security, the host should use https://github.com/google/gvisor
>   to avoid being hacked by malicious code in proposed changes.
>
> --
> Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
>
12