This had to be a decent release announcement with a little bit of bragging about
the new features in Test::Async
.
But it wouldn’t be me unless I screw up in a way. Apparently, this time a have
a little story to tell. But first, the announce itself.
v0.1.0 and v0.1.1
Tool Call Stack And Achoring
Versions of Test::Async
prior to
v0.1.0 were using a tool caller concept for reporting problems and setting
context for EVAL
-based tests. The information was stored in two attributes on
a test suite object: one for a CallFrame
, and another for a
Stash
/PseudoStash
. Everything was fine until I realized that if a test tool
invokes another test tool then the caller gets overwritten (what a groundbreaking
discovery, isn’t it? 🤦). For example:
method is-my-structure-correct(...)
is test-tool
{
...
self.is-deeply: ...;
# At this point tool caller is pointing at the line above.
# Therefore, proclaim will not report the
# original location in a .rakutest file.
self.proclaim: False, ...;
}
BTW, I call this kind of test tools compound ones.
The solution was to replace tool caller with tool call stack. Now
is-deeply
and any other correctly implemented tool must push what it considers
as its caller location to the stack and pop it back when done.
But this is only a part of the problem. What if instead of is-deeply
, which is
rather simple one, our tool would use throws-like
, another compound one and
implemented around a subtest? Not only throws-like
would be reporting it’s
invocation site when fails, but it will use incorrect context when tested code
supplied in a string form.
Ok, an example would serve better than a thousand words. Again, here is a compound test tool:
method my-compound-test(Str:D $code, ...)
is test-tool
{
...
self.throws-like: ...;
...
}
And there is a test file with something like:
subtest "Complex one" => {
my $obj = MyClass.new;
my-compound-test q<$obj.must-throw-my-exception>, ...;
}
This will throw. Though not with my exception, but with
X::Undeclared
because when
throws-like
EVALs the code string it would use the closure from
my-compound-test
method body as the context. And the closure doesn’t have any
$obj
declared!
My answer to the challenge is anchoring a tool call stack entry. It means that any nested call to any test tool will consider that entry as if it is its own direct caller:
method my-compound-test(Str:D $code, ...)
is test-tool(:anchored)
{
...
self.throws-like: ...;
...
}
That’s all. throws-like
will now consider itself called in the context of
“Complex one” subtest from the above example. And even if we wrap it into a
nested subtest:
method my-compound-test(Str:D $code, ...)
is test-tool(:anchored)
{
...
self.subtest: "compound", :hidden, :instant, {
...
self.throws-like: ...;
...
}
...
}
Both the subtest and the throw-like
would “stick” to the same context in which
my-compound-test
is called.
Inline test bundles
Previously to declare own test bundle with custom, project-specific test tools,
one had to write and use
a module. Now it can be done in a .rakutest file
if it’s the only place where these test tools are used:
use Test::Async::Decl;
test-bundle LocalBundle {
method my-test(...) is test-tool(:anchored) {
...
}
}
use Test::Async <Base>;
plan 1;
my-test ...;
The advantage of declaring my-test
this way instead of making it a plain sub
is that it gets all the cookies of Test::Async
infrastructure directly. For
example, for my-test
from the above example anchoring will make it easier to
spot failure locations in the test file.
Test Aborting
I always felt like skip-rest
is only partial solution for the problem of
aborting a test suite early. I mean, what would be the common way of using it?
if ok(do-something, "we're ok") {
...; # Run remaining tests
}
else {
skip-rest "can't continue";
}
So far, so good, until we need similar construct among the remaining tests.
If it’s the global context of a test file then we can make our life easier with
exit
:
unless ok(...) {
skip-rest "reason";
done-testing;
exit 1;
}
But when we’re inside of a subtest things get more complicated and most likely
one would end up with nested if {...} then {...} else {...}
constructs each
time skip-rest
is needed.
In Test::Async
I implemented another solution to this. It is called
skip-remaining
and it makes all remaining tests to be kind of replaced with
skip
:
unless ok(...) {
skip-remaining "makes no sense";
}
is ...;
isa-ok ...;
done-testing;
Both is
and isa-ok
will do nothing if ok
fails. Instead they will emit
Event::Skip
with “makes no sense” message. This looks better, but still not
ideal. Consider this:
my $got = may-result-in-a-Failure;
unless is-deeply($got, $expected, "structure ok") {
skip-remaining "invalid result produced";
}
my-compound-test $got, "integrity test";
done-testing;
Apparently, my-compound-test
will explode if $got
contains a Failure
.
While overall the above code would still be a failed test, but for many reasons
a thrown exception might not be an outcome we agree with. Especially in such a
simple case where else
would easily solve… Wait, what, else
again?
My short answer to such a long pre-amble is abort-testing
. It is a test tool
similar in nature to done-testing
with the only difference: it quits the
current test suite. In case of a child suite like subtest
it results in
calling suite’s abort
method. For the top suite (test file global context)
abort-testing
uses plain exit
. Now we can have something like this:
plan ...;
my $got = may-result-in-a-Failure;
unless is-deeply($got, $expected, "structure ok") {
skip-rest "invalid result produced";
abort-testing;
}
my-compound-test $got, "integrity test";
unless my-other-sensitive-test(...) {
skip-rest "all is worse than expected";
abort-testing;
}
test-something-else ...;
done-testing;
I think nobody would disagree that a linear code of the kind is much easier to maintain than a pile of nesting conditions.
The other great thing is that the example can be easily be wrapped in a subtest with no changes needed.
Lessons Learned
This section should’ve been named Things f*ed up and fixed, but then it wouldn’t sound that academic!
Soon after releasing v0.1.0 I decided it is time to get back to my other
projects where I use Test::Async
. Those I mostly develop on a multi-multi-core
server which is fantastically good for testing concurrent code. Apparently, the
server proved its reputation by refusing to install the update! Tests behaving
nicely over multiple runs on my MacBook suddenly collapsed with astonishing
glory! That was the beginning of a new little quest…
Don’t Share Data Across Threads
Not that I didn’t know this rule before or I was ever forgetting about it. But
what I did forget about was a very useful feature of Test::Async
which allows
to bind a number of concurrent threads to a test suite and make sure the suite
doesn’t finish until all threads done. For example:
subtest "Concurrent Case" => {
for ^5 -> $thread-num {
test-suite.start: {
do-in-thread-test: id => $thread-num;
}
}
# Do some more testing which doesn't depend
# on the threads started
...;
}
Method start
of a test suite creates and starts a new job in a dedicated
thread. The subtest (which is our test suite in this case) will never finish
until all five threads are done. The great thing about this feature is that
within do-in-thread-test
one can call any of the Test::Async
provided test
tools, including possibly loaded third-party test bundles:
test-bundle MyAppBundle {
method do-in-thread-test(..., :$id) is test-tool {
...
self.pass: "control: we're in a thread";
...
self.is: $got, $expected, "message";
...
}
}
By this moment someone might have already understood what was going on: the test tool stack happened. The damn thing was implemented as an array attribute on the test suite object and was shared among all threads started! I was lucky enough to somehow evade this race condition on 16 cores, but on 56 it was nearly unavoidable…
For better or for worse (and for many reasons it is for better, as to my view),
in Raku there is no guarantee that code started in a thread
would be ran by the same one forever until done. Whenever something like await
takes a break there is a non-zero chance that the resumption would happen on a
different thread. So, if one prints $*THREAD.id
regularly they may notice the
value changing. What it meant to me is that I don’t have a reliable way to
identify the current stack based on the data available via $*THREAD
.
I needed other way around. Apparently, Raku provides it: the
attribute was replaced with a dynamic variable @*TEST-TOOL-STACK
. The variable
is then set individually per each job created by Test::Async::JobMgr
role; and
there one set in PROCESS::
namespace for the global scope.
There is nothing really complicated about this case. But I used it as a good reason to demo once again how easy it is to find the right solution to a problem in Raku.
Don’t Be Greedy
Now it felt like things are ready for the bugfix release of v0.1.1. Just one
more test and… Needless to say, I got another punch in my face! To make long
story short, the problem was tracked down to the following construct, which is a
part of t/060-subtest.t
:
todo "subtest fails";
subtest "TODO before subtest" => {
flunk "this test fails but the subtest is TODO";
}
For the clarity of it, the construct could fail in the presence of another concurrently running subtest, started earlier:
subtest "Concurrent", :async, {
...
}
This is a rarely happening flapper case. Briefly, what happens here is that
todo
internally sets a counter which tells the core how many of the following
tests must be marked as TODO
. Because subtests can be ran concurrently or even
postponed until the end of execution of the enclosing test suite, they pick
up their TODO
status as early as possible, even before the suite object
they’re based upon, is instantiated.
So far, so good. When a subtest finishes it is using it’s parent suite object to
report the results in order to simulate behavior of other test tools and to
provide correct indentation of TAP output. And it does so by calling parent’s
proclaim
method. proclaim
, in turn, uses send-test
method which is the
central point of emitting Event::Test
filled with all the information to be
reported. One its duties is taking into account the current TODO counter. Oops,
we do it again!
Here is what is the diagnosis: when the concurrent subtest finishes it might
pick up the TODO
status on the parent before the “TODO” subtest is getting
there for it! As a result I was seeing an ok subtest marked with TODO
, and
the flunking one… Well, it was actually flunking.
I needed to somehow explicitly tell send-test
method not to consider TODO
counter when this is not needed. There were two ways to have this done: either
add a parameter to the method itself and to the proclaim
method; or use
another thread-safe way to raise a flag.
The first approach required a slight, but still backward-incompatible, change to the suite object API. The second was only possible with a help of a dynamic variable.
The first approach I didn’t like. The second one was even worse.
And so the choice was clear, I had to bump :api
version of the module. After
all, whereas Test::Async
v0.1.0 was implementing API v0.1.0; v0.1.1 of the
module does API v0.1.1. I don’t like it, it looks like abusing the feature; but
have to admit this cost of insufficient pre-release testing.
Post-Release
Heh, a long period of silence I compensated with a monstrous post which was
initially planned as a few paragraphs introducing just a couple of new features
in Test::Async
. Another proof of the saying “Wanna make the God laugh? Tell
him about your plans!”. Anyway, my plans now is to get back to the work I
postponed. And to finally make the decision as to whether I have time for
having a talk at the upcoming Perl Conference…
Comments