First of all, I’d like to apologize for all the errors in this post. I just haven’t got time to properly proof-read it.
A while ago I was trying to fix a problem in Rakudo which, under certain
conditions, causes some external symbols to become invisible for importing code,
even if explicit use
statement is used. And, indeed, it is really confusing
when:
use L1::L2::L3::Class;
L1::L2::L3::Class.new;
fails with “Class symbol doesn’t exists in L1::L2::L3” error! It’s ok if use
throws when there is no corresponding module. But .new
??
Skip This Unless You Know What A Package Is
This section is needed to understand the rest of the post. A package in Raku is a typeobject which has a symbol table attached. The table is called stash (stands for “symbol table hash”) and is represented by an instance of Stash class, which is, basically, is a hash with minor tweaks. Normally each package instance has its own stash. For example, it is possible to manually create two different packages with the same name:
my $p1a := Metamodel::PackageHOW.new_type(:name<P1>);
my $p1b := Metamodel::PackageHOW.new_type(:name<P1>);
say $p1a.WHICH, " ", $p1a.WHO.WHICH; # P1|U140722834897656 Stash|140723638807008
say $p1b.WHICH, " ", $p1b.WHO.WHICH; # P1|U140722834897800 Stash|140723638818544
Note that they have different stashes as well.
A package is barely used in Raku as is. Usually we deal with packagy things like modules and classes.
Back On The Track
Back then I managed to trace the problem down to deserialization process within
MoarVM
backend. At that point I realized that somehow it pulls in packagy
objects which are supposed to be the same thing, but they happen to be
different and have different stashes. Because MoarVM
doesn’t (and must not)
have any idea about the structure of high-level Raku objects, there is no way it
could properly handle this situation. Instead it considers one of the
conflicting stashes as “the winner” and drops the other one. Apparently, symbols
unique to the “loser” are lost then.
It took me time to find out what exactly happens. But not until a couple of days ago I realized what is the root cause and how to get around the bug.
Package Tree
What happens when we do something like:
module Foo {
module Bar {
}
}
How do we access Bar
, speaking of the technical side of things? Foo::Bar
syntax basically maps into Foo.WHO<Bar>
. In other words, Bar
gets installed
as a symbol into Foo
stash. We can also rewrite it with special syntax sugar:
Foo::<Bar>
because Foo::
is a representation for Foo
stash.
So far, so good; but where do we find Foo
itself? In Raku there is a special
symbol called GLOBAL
which is the root namespace (or a package if you wish)
of any code. GLOBAL::
, or GLOBAL.WHO
is where one finds all the top-level
symbols.
Say, we have a few packages like L11::L21
, L11::L22
, L12::L21
, L12::L22
.
Then the namespace structure would be represented by this tree:
GLOBAL
`- L11
`- L21
`- L22
`- L12
`- L21
`- L22
Normally there is one per-process GLOBAL
symbol and it belongs to the compunit
which used to start the program. Normally it’s a .raku file, or a string
supplied on command line with -e
option, etc. But each
compunit
also gets its own GLOBALish
package which acts as compunit’s GLOBAL
until it
is fully incorporated into the main code. Say, we declare a module in file
Foo.rakumod:
unit module Foo;
sub print-GLOBAL($when) is export {
say "$when: ", GLOBAL.WHICH, " ", GLOBALish.WHICH;
}
print-GLOBAL 'LOAD';
And use it in a script:
use Foo;
print-GLOBAL 'RUN ';
Then we can get an ouput like this:
LOAD: GLOBAL|U140694020262024 GLOBAL|U140694020262024
RUN : GLOBAL|U140694284972696 GLOBAL|U140694020262024
Notice that GLOBALish
symbol remains the same object, whereas GLOBAL
gets
different. If we add a line to the script which also prints GLOBAL.WHICH
then
we’re going to get something like:
MAIN: GLOBAL|U140694284972696
Let’s get done with this part of the story for a while a move onto another subject.
Compunit Compilation
This is going to be a shorter story. It is not a secret that however powerful Raku’s grammars are, they need some core developer’s attention to make them really fast. In the meanwhile, compilation speed is somewhat suboptimal. It means that if a project consist of many compunits (think of modules, for example), it would make sense to try to compile them in parallel if possible. Unfortunately, the compiler is not thread-safe either. To resolve this complication Rakudo implementation parallelizes compilation by spawning individual processes per each compunit.
For example, let’s refer back to the module tree example above and imagine that
all modules are use
d by a script. In this case there is a chance that we would
end up with six rakudo
processes, each compiling its own L*
module.
Apparently, things get slightly more complicated if there are cross-module
use
s, like L11::L21
could refer to L21
, which, in turn, refers to
L11::L22
, or whatever. In this case we need to use topological sort to
determine in what order the modules are to be compiled; but that’s not the
point.
The point is that since each process does independent compilation, each compunit
needs independent GLOBAL
to manage its symbols. For the time being, what we
later know as GLOBALish
serves this duty for the compiler.
Later, when all pre-compiled modules are getting incorporated into the code
which use
s them, symbols installed into each individual GLOBAL
are getting
merged together to form the final namespace, available for our program. There
are even methods in the source, using merge_global
in their names.
TA-TA-TAAA!
(Note the clickable section header; I love the guy!)
Now, you can feel the catch. Somebody might have even guessed what it is. It
crossed my mind after I was trying to implement legal symbol auto-registration
which doesn’t involve using QAST
to install a phaser. At some point I got an
idea of using GLOBAL
to hold a register object which would keep track of
specially flagged roles. Apparently it failed due to the parallelized
compilation mentioned above. It doesn’t matter, why; but at that point I started
building a mental model of what happens when merge is taking place. And one
detail drew my special attention: what happens if a package in a long name is
not explicitly declared?
Say, there is a class named Foo::Bar::Baz
one creates as:
unit class Foo::Bar;
class Baz { }
In this case the compiler creates a stub package for Foo
. The stub is used to
install class Bar
. Then it all gets serialized into bytecode.
At the same time there is another module with another class:
unit class Foo::Bar::Fubar;
It is not aware of Foo::Bar::Baz
, and the compiler has to create two stubs:
Foo
and Foo::Bar
. And not only two versions of Foo
are different and have
different stashes; but so are the two versions of Bar
where one is a real
class, the other is a stub package.
Most of the time the compiler does damn good job of merging symbols in such
cases. It took me stripping down a real-life code to golf it down to some
minimal set of modules which reproduces the situation where a require
call
comes back with a Failure
and a symbol becomes missing. The remaining part of
this post will be dedicated to this
example.
In particular, this whole text is dedicated to one
line.
Before we proceed further, I’d like to state that I might be speculating about some aspects of the problem cause because some details are gone from my memory and I don’t have time to re-investigate them. Still, so far my theory is backed by working workaround presented at the end.
To make it a bit easier to analyze the case, let’s start with namespace tree:
GLOBAL
`- L1
`- App
`- L2
`- Collection
`- Driver
`- FS
Rough purpose is for application to deal with some kind of collection which
stores its items with help of a driver which is loaded dynamically, depending,
say, on a user configuration. We have the only driver implemented: File System
(FS
).
If you checkout the repository and try raku -Ilib symbol-merge.raku
in the
examples/2021-10-05-merge-symbols directory, you will see some output ending
up with a line like Failure|140208738884744
(certainly true for up until
Rakudo v2021.09 and likely to be so for at least a couple of versions later).
The key conflict in this example are modules Collection
and Driver
. The full
name of Collection
is L1::L2::Collection
. L1
and L2
are both stubs.
Driver
is L1::L2::Collection::Driver
and because it
imports
L1::L2
, L2
is a class; but L1
remains to be a stub. By commenting out the
import we’d get the bug resolved and the script would end up with something
like:
L1::L2::Collection::FS|U140455893341088
This means that the driver module was successfully loaded and the driver class symbol is available.
Ok, uncomment the import and start the script again. And then once again to get rid of the output produced by compilation-time processes. We should see something like this:
[7329] L1 in L1::L2 : L1|U140360937889112
[7329] L1 in Driver : L1|U140361742786216
[7329] L1 in Collection : L1|U140361742786480
[7329] L1 in App : L1|U140361742786720
[7329] L1 in MAIN : L1|U140361742786720
[7329] L1 in FS : L1|U140361742788136
Failure|140360664014848
We already know that L1
is a stub. Dumping object IDs also reveals that each
compunit has its own copy of L1
, except for App
and the script (marked as
MAIN). This is pretty much expected because each L1
symbol is installed at
compile-time into per-compunit GLOBALish
. This is where each module finds it.
App
is different because it is directly imported by the script and was
compiled by the same compiler process, and shared its GLOBAL
with the script.
Now comes the black magic. Open lib/L1/L2/Collection/FS.rakumod and uncomment the last line in the file. Then give it a try. The output would seem impossible at first; hell with it, even at second glance it is still impossible:
[17579] Runtime Collection syms : (Driver)
Remember, this line belongs to L1::L2::Collection::FS
! How come we don’t see
FS
in Collection
stash?? No wonder that when the package cannot see itself
others cannot see it too!
Here comes a bit of my speculation based on what I vaguely remember from the times ~2 years ago when I was trying to resolve this bug for the first time.
When Driver
imports L1::L2
, Collection
gets installed into L2
stash, and
Driver
is recorded in Collection
stash. Then it all gets serialized with
Driver
compunit.
Now, when FS
imports Driver
to consume the role, it gets the stash of L2
serialized at the previous stage. But its own L2
is a stub under L1
stub.
So, it gets replaced with the serialized “real thing” which doesn’t have FS
under Collection
! Bingo and oops…
A Workaround
Walk through all the example files and uncomment use L1
statement. That’s it.
All compunits will now have a common anchor to which their namespaces will be
attached.
The common rule would state that if a problem of the kind occurs then make sure
there’re no stub packages in the chain from GLOBAL
down to the “missing”
symbol. In particular, commenting out use L1::L2
in Driver
will get our
error back because it would create a “hole” between L1
and Collection
and
get us back into the situation where conflicting Collection
namespaces are
created because they’re bound to different L2
packages.
It doesn’t really matter how exactly the stubs are avoided. For example, we can
easily move use L1::L2
into Collection
and make sure that use L1
is still
part of L2
. So, for simplicity a child package may import its parent; and
parent may then import its parent; and so on.
Sure, this adds to the boilerplate. But I hope the situation is temporary and there will be a fix.
Fix?
The one I was playing with required a compunit to serialize its own GLOBALish
stash at the end of the compilation in a location where it would not be at risk
of overwriting. Basically, it means cloning and storing it locally on the
compunit (the package stash is part of the low-level VM structures). Then
compunit mainline code would invoke a method on the Stash
class which would
forcibly merge the recorded symbols back right after deserialization of
compunit’s bytecode. It was seemingly working, but looked more of a kind of a
hack, than a real fix. This and a few smaller issues (like a segfault which I
failed to track down) caused it to be frozen.
As I was thinking of it lately, more proper fix must be based upon a common
GLOBAL
shared by all compunits of a process. In this case there will be no
worry about multiple stub generated for the same package because each stub will
be shared by all compunits until, perhaps, the real package is found in one of
them.
Unfortunately, the complexity of implementing the ‘single GLOBAL
’ approach is
such that I’m unsure if anybody with appropriate skill could fit it into their
schedule.
Comments