Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

classic Classic list List threaded Threaded
23 messages Options
12
dak
Reply | Threaded
Open this post in threaded view
|

Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

dak
Ok, that should probably have been stressed somewhat stronger before:
have you checked the preexisting Guilev2 work in branches that Harm
pointed you to?

The problem you are addressing here seems to be the same as the one
tackled in commit c0a8eec034b94be14c7665e2d4c8a94fae111b78 , namely
commit c0a8eec034b94be14c7665e2d4c8a94fae111b78 (origin/dev/guilev2)
Author: David Kastrup <[hidden email]>
Date:   Sun Sep 21 18:40:06 2014 +0200

    Source_file::init_port: Keep GUILEv2 from redecoding string input

diff --git a/lily/source-file.cc b/lily/source-file.cc
index 5a94927a7f..5ad9c4c6e8 100644
--- a/lily/source-file.cc
+++ b/lily/source-file.cc
@@ -152,7 +152,11 @@ Source_file::init_port ()
   // we do our own utf8 encoding and verification in the parser, so we
   // use the no-conversion equivalent of latin1
   SCM str = scm_from_latin1_string (c_str ());
-  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG,
__FUNCTION__);
+  scm_dynwind_begin ((scm_t_dynwind_flags)0);
+  // Why doesn't scm_set_port_encoding_x work here?
+  scm_dynwind_fluid (ly_lily_module_constant
("%default-port-encoding"), SCM_BOOL_F);
+  str_port_ = scm_open_input_string (str);
+  scm_dynwind_end ();
   scm_set_port_filename_x (str_port_, ly_string2scm (name_));
 }

Now it is a valid question why this isn't, GUILEV2-guarded, already in
the main code base.  Maybe we should integrate all of the already
existing Guilev2 work into master with priority in order to avoid
duplicate work.

https://codereview.appspot.com/557330043/

Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
Reviewers: dak,

Message:
On 2020/02/07 15:14:52, dak wrote:
> Ok, that should probably have been stressed somewhat stronger before:
have you
> checked the preexisting Guilev2 work in branches that Harm pointed you
to?

No, I didn't

> Date:   Sun Sep 21 18:40:06 2014 +0200

Is there more 6 year old code that is still relevant but not integrated
to mainline?

>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>
> diff --git a/lily/source-file.cc b/lily/source-file.cc
> index 5a94927a7f..5ad9c4c6e8 100644
> --- a/lily/source-file.cc
> +++ b/lily/source-file.cc
> @@ -152,7 +152,11 @@ Source_file::init_port ()
>    // we do our own utf8 encoding and verification in the parser, so
we
>    // use the no-conversion equivalent of latin1
>    SCM str = scm_from_latin1_string (c_str ());
> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG,
__FUNCTION__);
> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
> +  // Why doesn't scm_set_port_encoding_x work here?
> +  scm_dynwind_fluid (ly_lily_module_constant
("%default-port-encoding"),
> SCM_BOOL_F);
> +  str_port_ = scm_open_input_string (str);
> +  scm_dynwind_end ();
>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>  }
>
> Now it is a valid question why this isn't, GUILEV2-guarded, already in
the main
> code base.  Maybe we should integrate all of the already existing
Guilev2 work
> into master with priority in order to avoid duplicate work.

Yes, that would be great idea. Is there any other pending work that you
know of?

I think the patch you quote here is doing the wrong thing, though. If
you have Scheme code with UTF-8 encoded data embedded into it, it will
get parsed out as Latin1.



Description:
Parse inline scheme using per-expression port

Introduces a throw-away string port, Overlay_string_port, which makes
a port out of a random section of a C string.  It does not own the
string, so there is no overhead in creating and discarding the ports
during parse time.

For GUILE v2, Overlay_string_port uses UTF-8 encoding, so UTF-8
encoded string constants within the Scheme constants are interpreted
as Unicode correctly.

This obviates the string port that Source_file carries along.

This commit fixes a problem with GUILE 2.2.6, where LilyPond
calculates offsets in the source file as bytes, while GUILE interprets
the source file as UTF-8 encoded Unicode. As a result, files with
Unicode before embedded scheme break completely.

Please review this at https://codereview.appspot.com/557330043/

Affected files (+206, -78 lines):
  M lily/include/lily-parser.hh
  A lily/include/overlay-string-port.hh
  M lily/include/source-file.hh
  A + lily/overlay-string-port.cc
  M lily/parse-scm.cc
  M lily/source-file.cc



Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
In reply to this post by dak
On 2020/02/07 15:14:52, dak wrote:
> Ok, that should probably have been stressed somewhat stronger before:
have you
> checked the preexisting Guilev2 work in branches that Harm pointed you
to?

No, I didn't

> Date:   Sun Sep 21 18:40:06 2014 +0200

Is there more 6 year old code that is still relevant but not integrated
to mainline?

>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>
> diff --git a/lily/source-file.cc b/lily/source-file.cc
> index 5a94927a7f..5ad9c4c6e8 100644
> --- a/lily/source-file.cc
> +++ b/lily/source-file.cc
> @@ -152,7 +152,11 @@ Source_file::init_port ()
>    // we do our own utf8 encoding and verification in the parser, so
we
>    // use the no-conversion equivalent of latin1
>    SCM str = scm_from_latin1_string (c_str ());
> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG,
__FUNCTION__);
> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
> +  // Why doesn't scm_set_port_encoding_x work here?
> +  scm_dynwind_fluid (ly_lily_module_constant
("%default-port-encoding"),
> SCM_BOOL_F);
> +  str_port_ = scm_open_input_string (str);
> +  scm_dynwind_end ();
>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>  }
>
> Now it is a valid question why this isn't, GUILEV2-guarded, already in
the main
> code base.  Maybe we should integrate all of the already existing
Guilev2 work
> into master with priority in order to avoid duplicate work.

Yes, that would be great idea. Is there any other pending work that you
know of?

I think the patch you quote here is doing the wrong thing, though. If
you have Scheme code with UTF-8 encoded data embedded into it, it will
get parsed out as Latin1.



https://codereview.appspot.com/557330043/

dak
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

dak
[hidden email] writes:

> On 2020/02/07 15:14:52, dak wrote:
>> Ok, that should probably have been stressed somewhat stronger before: have you
>> checked the preexisting Guilev2 work in branches that Harm pointed you to?
>
> No, I didn't
>
>> Date:   Sun Sep 21 18:40:06 2014 +0200
>
> Is there more 6 year old code that is still relevant but not integrated
> to mainline?

Yes.  That's what the branches are about.

>>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>>
>> diff --git a/lily/source-file.cc b/lily/source-file.cc
>> index 5a94927a7f..5ad9c4c6e8 100644
>> --- a/lily/source-file.cc
>> +++ b/lily/source-file.cc
>> @@ -152,7 +152,11 @@ Source_file::init_port ()
>>    // we do our own utf8 encoding and verification in the parser, so we
>>    // use the no-conversion equivalent of latin1
>>    SCM str = scm_from_latin1_string (c_str ());
>> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
>> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
>> +  // Why doesn't scm_set_port_encoding_x work here?
>> +  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"),
>> SCM_BOOL_F);
>> +  str_port_ = scm_open_input_string (str);
>> +  scm_dynwind_end ();
>>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>>  }
>>
>> Now it is a valid question why this isn't, GUILEV2-guarded, already in the main
>> code base.  Maybe we should integrate all of the already existing Guilev2 work
>> into master with priority in order to avoid duplicate work.
>
> Yes, that would be great idea. Is there any other pending work that you
> know of?
>
> I think the patch you quote here is doing the wrong thing, though. If
> you have Scheme code with UTF-8 encoded data embedded into it, it will
> get parsed out as Latin1.

Oh, definitely.  Just like Guile-1.8 does.  This is a "Get things to
work like previously" patch that keeps abominations like
ly:wide-char->utf-8 operative as well as required like before.  All
strings taken from the input are essentially byte strings and utf-8 has
to be handled at the byte level.

It is not a desirable end point for Guilev2 migration but a migration
strategy designed for getting back something working as before.  Which
is a starting point from which one can then explore the task of getting
to an environment that looks like Unicode rather than bytes from the
Scheme side.

--
David Kastrup

Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
On Fri, Feb 7, 2020 at 5:39 PM David Kastrup <[hidden email]> wrote:

>
> Private reply intended?  Feel free to copy this to the review if you
> want to.
>

yes, but if you are fine, then back to the list.


> Han-Wen Nienhuys <[hidden email]> writes:
>
> > Just so you know, I am confused by this conversation.
> >
> > I am trying to ask you to review this code. It's not clear to me whether
> > you prefer the code I wrote here, or the 2014 code.
>
> I think it is overkill and/or counterproductive for the intermediate
> goal of getting stuff to work under Guilev2+ in its original unchanged
> form, and I think it would be easier to evaluate whether your approach
> is desirable and/or necessary once we have achieved that intermediate
> goal.
>
> That intermediate goal seems important to me in order not to have the
> migration effort run out of steam eventually (like it previously did
> after a lot of tiresome back-and-forth with Guile central) but I cannot
> properly estimate or direct the amount of steam you have at your
> disposal.
>

Here is my take on things.  Even though GUILE 2.x is not optimal, because
the GC is more expensive, it is unfeasible and undesirable to stay on an
unmaintained and ancient branch of GUILE. I don't see fundamental problems
with migrating to 2.x, and I think we should do it ASAP. I'd rather get in
the hands of users sooner than later.

I think GUILE 2.2 with its slower BDW garbage collector but faster
evaluation would let us throw away a lot of very finicky code for dealing
with memory management. This comes at the price of a slightly slower
execution. However, since our largest problem is that almost nobody
understands the code base well enough, something that drastically
simplifies the code is at a moderate price is a good tradeoff.

This code would not be my choice since it goes to a lot of effort in
> order to achieve nothing substantially different from what we had before
> and what the referenced patch tries coaxing the Guile port machinery to
> do.  Changing encoding methods in midstream, which is what your patch
> essentially does, is a nightmare for keeping input locations work, like
> they are needed in point-and-click and in error message locators.
>

Why do you say it is a nightmare? All the input locations are char* as
before and they are computed exactly in the same way as before. There is
also no such concept as "mid-stream", as this change creates a separate
mini-stream for each scheme expression we parse.

The only distasteful part is the code for the 1.8 API for ports, which we
could drop once we move to 2.x

But I don't see a clear path for a migration strategy to "UTF-8 like
> Guile intended" either.


Why not? The Guile unicode strings are lazy, so they are free if you don't
use unicode, and we mostly don't.


>   So my first choice was to stick with the "UTF-8
> but don't tell Guile" strategy we had before, get things working in that
> context, and see whether this makes it easier to find a better path from
> there.
>

FWIW, the 2014 commit lacks explanation and comments of what is going on. I
would rather see "production quality" code that we submit to master, so we
have  a precise record of what is going on, and can make sure the code
keeps working.

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
dak
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

dak
In reply to this post by dak
On 2020/02/07 17:10:24, hanwenn wrote:
 
> FWIW, the 2014 commit lacks explanation and comments of what is going
on. I
> would rather see "production quality" code that we submit to master,
so we
> have  a precise record of what is going on, and can make sure the code
> keeps working.

The one I now committed was at a later stage of the whack-a-mole game we
were playing with Guile developers changing the semantics of their
string ports around within the "stable" Guile 2 series.  This commit is
better documented.  The commit I had pointed out to you (sorry for that)
would neither have compiled in current LilyPond, nor worked with Guile
versions later than something like 2.0.9 or so.

https://codereview.appspot.com/557330043/

Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
In reply to this post by dak
David, please take another look.

https://codereview.appspot.com/557330043/

dak
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

dak
In reply to this post by dak
On 2020/02/09 17:18:19, hanwenn wrote:
> David, please take another look.

First, you are aware that the rationale for

    This commit fixes a problem with GUILE 2.2.6, where LilyPond
    calculates offsets in the source file as bytes, while GUILE
interprets
    the source file as UTF-8 encoded Unicode. As a result, files with
    Unicode before embedded scheme break completely.

is fixed in current master?  So we are, due to the late merges into
master of our Guilev2 related work, in the somewhat unhappy situation
that both you and I have worked on the problem and produced different
solutions, you mainly by changing approach and writing a different
function that essentially has the same problem but does not get thrown
permanently out of synch by just restarting every time.  My own solution
basically was obtained by several iterations of pestering the Guile
developers to get something fixed.  It currently is in a form that has
gotten their active blessing.

Now my original solution was aimed at staying in binary even during
Scheme times while Antonio Ospite does go all-in on UTF-8 in Scheme.  So
the result of combining my and his work is a chimæra which I am less
confident about than my own code on its own.  Being in binary mode for
Flex means that we can deal robustly with code that is not properly in
UTF-8.  For example, we had input files that were ASCII-only except for
some comments containing latters in the Latin-1 plane.  In binary mode,
our lexer deals gracefully with such abominations while Guile balks at
such files when doing UTF-8 processing.

I'll take a look at some test cases with either version and see whether
I can trigger obvious problems.  That would not really make an
evaluation of code quality, just give some better idea whether one of
the approaches would warrant further work before getting selected.

https://codereview.appspot.com/557330043/

Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
On Sun, Feb 9, 2020 at 7:00 PM <[hidden email]> wrote:

>
> Now my original solution was aimed at staying in binary even during
> Scheme times while Antonio Ospite does go all-in on UTF-8 in Scheme.  So
> the result of combining my and his work is a chimæra which I am less
> confident about than my own code on its own.  Being in binary mode for
> Flex means that we can deal robustly with code that is not properly in
> UTF-8.  For example, we had input files that were ASCII-only except for
> some comments containing latters in the Latin-1 plane.  In binary mode,
> our lexer deals gracefully with such abominations while Guile balks at
> such files when doing UTF-8 processing.
>

The reason we've been doing it this way, is that Gary gave us this code in
1999, when we didn't know anything about encodings, ports or Scheme, and it
worked.

However, there is no good reason why we should have port for the entire
file. If you have a .ly file that doesn't have a single # , then there is
no reason to have a port at all.


--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
In reply to this post by dak
On Sun, Feb 9, 2020 at 7:00 PM <[hidden email]> wrote:

> On 2020/02/09 17:18:19, hanwenn wrote:
> > David, please take another look.
>
> First, you are aware that the rationale for
>
>     This commit fixes a problem with GUILE 2.2.6, where LilyPond
>     calculates offsets in the source file as bytes, while GUILE
> interprets
>     the source file as UTF-8 encoded Unicode. As a result, files with
>     Unicode before embedded scheme break completely.
>
> is fixed in current master?


Yes. See also the list of commits that are effectively reverted by this
commit.

Note that there is still an off-by one error in the handling of closures
and parsing #{

which makes the satb template regtests fail, both in GUILE 1.8 and 2.x

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
On Mon, Feb 10, 2020 at 9:41 AM Han-Wen Nienhuys <[hidden email]> wrote:

> Yes. See also the list of commits that are effectively reverted by this
> commit.
>
> Note that there is still an off-by one error in the handling of closures
> and parsing #{
>
>
Short repro:

fail-func =
#(define-music-function (name)
   (string?)
  #{
    \new Staff = #(string-append "bla" name) { c'4 }
  #}
  )

\fail-func "bla"

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
dak
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

dak
In reply to this post by Han-Wen Nienhuys-3
Han-Wen Nienhuys <[hidden email]> writes:

> On Sun, Feb 9, 2020 at 7:00 PM <[hidden email]> wrote:
>
>> On 2020/02/09 17:18:19, hanwenn wrote:
>> > David, please take another look.
>>
>> First, you are aware that the rationale for
>>
>>     This commit fixes a problem with GUILE 2.2.6, where LilyPond
>>     calculates offsets in the source file as bytes, while GUILE
>> interprets
>>     the source file as UTF-8 encoded Unicode. As a result, files with
>>     Unicode before embedded scheme break completely.
>>
>> is fixed in current master?
>
>
> Yes. See also the list of commits that are effectively reverted by this
> commit.
>
> Note that there is still an off-by one error in the handling of closures
> and parsing #{
>
> which makes the satb template regtests fail, both in GUILE 1.8 and 2.x

I assume that you mean your patch since I get make test to compile on
either current master or the dev/guile-v2-work branch (once I apply the
fix for display-lily-tests from issue 5743).

You recently introduced another string conversion function for UTF-8.
It is possible that the problem you were trying to fix is related to the
locale fiddling in the remaining two patches from the dev/guile-v2-work
branch, so I cannot rule out that you may have already rendered the last
two remaining awkward-looking patches for the build system unnecessary.
But I don't think we have a test case readily available for that.

--
David Kastrup
My replies have a tendency to cause friction.  To help mitigating
damage, feel free to forward problematic posts to me adding a subject
like "timeout 1d" (for a suggested timeout of 1 day) or "offensive".

dak
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

dak
In reply to this post by Han-Wen Nienhuys-3
Han-Wen Nienhuys <[hidden email]> writes:

> On Mon, Feb 10, 2020 at 9:41 AM Han-Wen Nienhuys <[hidden email]> wrote:
>
>> Yes. See also the list of commits that are effectively reverted by this
>> commit.
>>
>> Note that there is still an off-by one error in the handling of closures
>> and parsing #{
>>
>>
> Short repro:
>
> fail-func =
> #(define-music-function (name)
>    (string?)
>   #{
>     \new Staff = #(string-append "bla" name) { c'4 }
>   #}
>   )
>
> \fail-func "bla"

Still assuming that this is with your patch.  "Off-by-one" rings a bell
with me: could be related to port "lookahead", namely peek-char.  Though
this particular example does not appear to need it since the occurences
of # are all followed by a delimited thing: #( ... ) or #{ ... #}.  The
peekahead is more for things like #'bla and #7 so I don't really have an
idea how this could apply here.  Just throwing it out in case it could
prove related.

--
David Kastrup
My replies have a tendency to cause friction.  To help mitigating
damage, feel free to forward problematic posts to me adding a subject
like "timeout 1d" (for a suggested timeout of 1 day) or "offensive".

dak
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

dak
In reply to this post by Han-Wen Nienhuys-3
Han-Wen Nienhuys <[hidden email]> writes:

> On Sun, Feb 9, 2020 at 7:00 PM <[hidden email]> wrote:
>
>>
>> Now my original solution was aimed at staying in binary even during
>> Scheme times while Antonio Ospite does go all-in on UTF-8 in Scheme.  So
>> the result of combining my and his work is a chimæra which I am less
>> confident about than my own code on its own.  Being in binary mode for
>> Flex means that we can deal robustly with code that is not properly in
>> UTF-8.  For example, we had input files that were ASCII-only except for
>> some comments containing latters in the Latin-1 plane.  In binary mode,
>> our lexer deals gracefully with such abominations while Guile balks at
>> such files when doing UTF-8 processing.
>>
>
> The reason we've been doing it this way, is that Gary gave us this code in
> 1999, when we didn't know anything about encodings, ports or Scheme, and it
> worked.
>
> However, there is no good reason why we should have port for the entire
> file.

Error message context possibly?  #{ ... #} is interpreted from a string
which means that the context is missing (I've thought about diverting a
larger string for that purpose but that seems awkward).  Not having the
context for #anything seems like more of a nuisance.  Can this be an
issue?

> If you have a .ly file that doesn't have a single # , then there is no
> reason to have a port at all.

Well, on the other hand few documents don't have even a single # since
it is used for banal things like number entry.  And not having a few
thousand port openings is possibly advantageous.  Usually a port should
be easier to reposition than to open, though our way of repositioning,
if I am not mistaken, is not particularly efficient.

--
David Kastrup

Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
In reply to this post by dak
On Mon, Feb 10, 2020 at 2:40 PM David Kastrup <[hidden email]> wrote:

>
> > Short repro:
> >
> > fail-func =
> > #(define-music-function (name)
> >    (string?)
> >   #{
> >     \new Staff = #(string-append "bla" name) { c'4 }
> >   #}
> >   )
> >
> > \fail-func "bla"
>
> Still assuming that this is with your patch.  "Off-by-one" rings a bell
> with me: could be related to port "lookahead", namely peek-char.  Though
> this particular example does not appear to need it since the occurences
> of # are all followed by a delimited thing: #( ... ) or #{ ... #}.  The
> peekahead is more for things like #'bla and #7 so I don't really have an
> idea how this could apply here.  Just throwing it out in case it could
> prove related.
>

It has to do with the newline. If everything is on one line, it works. If
you do

 fail-func =
#(define-music-function (name)
   (string?)
  #{
#(string-append "bla" name)  #}
  )

The Scheme code registers the inner # at offset 2, even though it clearly
is at byte 1 (byte 0 is the \n). Maybe something tries to insert a \r ?

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
dak
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

dak
Han-Wen Nienhuys <[hidden email]> writes:

> On Mon, Feb 10, 2020 at 2:40 PM David Kastrup <[hidden email]> wrote:
>
>>
>> > Short repro:
>> >
>> > fail-func =
>> > #(define-music-function (name)
>> >    (string?)
>> >   #{
>> >     \new Staff = #(string-append "bla" name) { c'4 }
>> >   #}
>> >   )
>> >
>> > \fail-func "bla"
>>
>> Still assuming that this is with your patch.  "Off-by-one" rings a bell
>> with me: could be related to port "lookahead", namely peek-char.  Though
>> this particular example does not appear to need it since the occurences
>> of # are all followed by a delimited thing: #( ... ) or #{ ... #}.  The
>> peekahead is more for things like #'bla and #7 so I don't really have an
>> idea how this could apply here.  Just throwing it out in case it could
>> prove related.
>>
>
> It has to do with the newline. If everything is on one line, it works. If
> you do
>
>  fail-func =
> #(define-music-function (name)
>    (string?)
>   #{
> #(string-append "bla" name)  #}
>   )
>
> The Scheme code registers the inner # at offset 2, even though it clearly
> is at byte 1 (byte 0 is the \n). Maybe something tries to insert a \r
> ?

Or maybe the file offset got advanced because Guile peeked ahead in
order to check whether there is an UTF-8 signature?  I forget what they
are called, basically zero-space characters.  Byte order mark?  I am not
saying that it is that, but I remember that Guile could do some pretty
funky things at opening even something like a string port.

--
David Kastrup
My replies have a tendency to cause friction.  To help mitigating
damage, feel free to forward problematic posts to me adding a subject
like "timeout 1d" (for a suggested timeout of 1 day) or "offensive".

Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
On Mon, Feb 10, 2020 at 11:21 PM David Kastrup <[hidden email]> wrote:

>
> Or maybe the file offset got advanced because Guile peeked ahead in
> order to check whether there is an UTF-8 signature?  I forget what they
> are called, basically zero-space characters.  Byte order mark?  I am not
> saying that it is that, but I remember that Guile could do some pretty
> funky things at opening even something like a string port.
>


More mundane. Source_file::get_counts returns a byte_offset, but that is
not a global but per-line offset.

PTAL

--
Han-Wen Nienhuys - [hidden email] - http://www.xs4all.nl/~hanwen
Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Han-Wen Nienhuys-3
In reply to this post by dak
note: this goes on top of DAK's recent patch for the .SCM file.

https://codereview.appspot.com/557330043/

Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

Dev mailing list
In reply to this post by dak
minor nit


https://codereview.appspot.com/557330043/diff/565660043/lily/include/overlay-string-port.hh
File lily/include/overlay-string-port.hh (right):

https://codereview.appspot.com/557330043/diff/565660043/lily/include/overlay-string-port.hh#newcode4
lily/include/overlay-string-port.hh:4: Copyright (C) 1997--2020 Han-Wen
Nienhuys <[hidden email]>
This is a new file; shouldn't this be simply

  Copyright (c) 2020 ...

instead?

https://codereview.appspot.com/557330043/

Reply | Threaded
Open this post in threaded view
|

Re: Parse inline scheme using per-expression port (issue 557330043 by hanwenn@gmail.com)

jonas.hahnfeld
In reply to this post by dak

https://codereview.appspot.com/557330043/diff/579310044/configure.ac
File configure.ac (left):

https://codereview.appspot.com/557330043/diff/579310044/configure.ac#oldcode42
configure.ac:42: GUILEv2=no
Is this meant to be part of this patch? I'm all in for requiring /
defaulting to Guile 2.2 once ready, but this should probably be
separate.

https://codereview.appspot.com/557330043/diff/579310044/lily/include/overlay-string-port.hh
File lily/include/overlay-string-port.hh (right):

https://codereview.appspot.com/557330043/diff/579310044/lily/include/overlay-string-port.hh#newcode50
lily/include/overlay-string-port.hh:50: {
Is there an advantage of having the implementation in the header file?
If not, I think this should go to the .cc file.

https://codereview.appspot.com/557330043/

12