MAR9000 posts feed.

Efficient DTOs

2020-10-07T22:00:00Z

When we need to return data to be displayed on the UI as DTOs we are going to read from multiple repositories unless for trivial UI. For example forum posts can have author (a user) and comments. Comments can have author too.

{
  posts: [{
    title: "Efficent DTO",
    author: {
      name: "Marco"
    },
    comments: [{
      author: {
        ...
}

In this context, no matter the technology we are going to use, the N+1 problem will pop up. For instance with:

ReST API: we execute a request for the list of posts then N requests for the User resources.
ReST API + hint, like /posts?with=user: now we have only one ReST request but probably on the server users are loaded one by one from their repository.
GraphQL: without data loaders you still have the same problem, the user of a post is resolved for each post.

The last example is not accidentally: when I faced this problem I thought at first to resort to GraphQL and this is why my idea comes from it. However I did not have dynamic queries but static ones, read "several endpoints that return always the same JSON", hence I did not want to adopt en engine to parse, for a given endpoint, the same query over and over again.

I found that what I needed were batched loaders and data loaders, something that every GraphQL implementation have as associated library. In our example for every post DTO resolution the load of the user is delayed until we have the complete list of the users to load (same for comments and users of comments) so we can batch load users.
Other sulutions, illustrated below, do not scale, at least in my use cases.
The library that implements the above algorithm, without GraphQL, is graph-loader.

Context

So we have implemented boundary, repositories, application and domain services and we have to return data to display the UI. Unless for trivial UI the ReST paradigm won't work. We will have to request posts then users, them probably comments, etc:

/posts
/posts/1/user
/posts/2/user
...

Let's try ReST plus some hints like with=users,comments to extend our response content. Now how are we going to implement this request on the backend? We load a list of posts then for each post we load its user?
This will yield the N+1 problem right? No problem we can collect the list of the users for all posts and load them in a batch, same algoritm used by GraphLoder (GL) but implemented manually. So far so good.

Problem

What about loading also comment users or user's country to show a flag icon? Once you have loaded all users you have to collect, manually, all countries of all users to batch the load of countries. So far we have only named three repositories but for rich UI it's common the case to load from ten different repositories, an optimized loading logic become quickly cumbersome.

One of my assumptions is that my queries are static, one endpoint should resolve a small set of DTO shapes but a query language is not required. The solution, batch loads, is one of the possibile solutions and actually the one that we are going to explore. But doing it manually would be a nightmare.

At this point I evaluated GraphQL. I liked its typed API but AFAIK (I've deeply explored its source code) the entry point to execute the resolvers and data loaders one has configured, is only by a GraphQL query. Indeed one endpoint, binded to a given GraphQL query, would execute the same query over and over again to compose its response.

Trying to avoid this overhead, while at the same time construct a typed API, led me to GraphLoader.

Solution

For the solution space the scheme of our API is as follows:

we have a type used as key, one of them indicated generically as K.
Every lader has its own key so if needed we can have a set of types K.
I have one and its Integer.
a certain Repository<V> of a type V can batch loads a list a V given a list of keys of type K.
For example a repository that load a set of Integer into a set of Post.
our response is composed out of a set of types, each one having one or more assembler, for example an Assembler<V,D> to transform a value V into a DTO D.
These assmble for example Post into PostDTO another into DetailedPostDTO.
assemblers can queue more loads once they receive their own value V using its properties as key values.

For instance:

loadPosts
  - for each post
    - queue the load of its author
    - queue the load of its comments
  while (there are pending loads)
  - execute pending loads
    (this pass each loaded V to the assembler
       -> assembler eventually queues other loads)

A call to GraphLoader.resolve() looks like:

GlResult<PostResource> result = graphLoader.resolve(1L, "postLoader", new PostResourceAssembler());

The first phase load is not mandatory, when we have a value with an associated assembler we can execute only the transformation of V into D. For this reason GL has resolve methods for keys and for values, take a looke at resolve() and resolveValue() as starting point. When we have already a Post for example:

GlResult<PostResource> result = graphLoader.resolveValue(post, new PostResourceAssembler());

There are also 2 methods to work with lists of keys or lists of values, see resolveMany(List<K> keys, ...) and resolveValues(List<V> values, ...).

Implementation

The key point is that loaders method load() does not return values V but instead accepts a consumer of V to handle the result once a given key gets loaded and the resulting V is ready.

Consider for instance the resolution of post's author presents into the PostResourceAssembler:

PostResource resource = new PostResource();
authorLoader.load(post.authorId,
                  user -> resource.author =
                      authorAssembler.assemble(user, context));

Performance

The benchmark is very simple at the moment, but data are promising:

Benchmark                                       Mode  Cnt      Score   Units
GLBenchmark.glAvgTime                           avgt   3       1.212   us/op
GLBenchmark.glAvgTime:·gc.alloc.rate.norm       avgt   3    2728.000   B/op
GraphQLJava.graphqlAvgTime                      avgt   3      94.926   us/op
GraphQLJava.graphqlAvgTime:·gc.alloc.rate.norm  avgt   3  162968.507   B/op

Said that, I'll do a much more complex graph resolution/query as soon as possible, I mean a query that returns 100 rows and uses 10 repositories.

Additional considerations

Only in case we have SQL repositories (as the ones I had) is tempting to try:

More complex query: for instance specialized repository for Post that joins also the table for author. One-to-one associations are reletively simple to load this way, nevertheless how many tables are we going to join when the UI gets richer? Moreover load associations with a many end, for instance comments, quickly become complicated. Not to mention loading associations of associations.

JPA entity graph: if it's always an option, read "they are reachable from the same aggregate", e.g. from Post you can reach author (User) and Comments and comment's author, it probably means you have defined too big aggregates.

The algoritm used by GL is in my opinion the best trade-off in case of static DTO queries that suggests to do not use GraphQL. Moreover if one decides to migrate to GraphQL later on, most of the classes defined to work with GL can be reused. Repositories and batch loaders can be used almost without modifications.

Conclusion

GraphQL and java-dataloader are great projects when you don't know the queries your clients are going to send. GitHub for instance move to GraphQL with version 4 of the their API. graphql-java supports CompletableFuture and is highly configurable. But if you have only one front-end speaking with your back-end I think that graph-loader, or at least the idea behind graph-loader, is simpler, smaller, faster and one can be up and running with a small effort.

A grammar for projectional editor

2015-12-13T13:00:00Z

Describing a language editor can be repetitive, for instance when you have to define expressions. expression '+' expression and expression '*' expression is a typical example. From the grammar file that describes the language structure it's possible to recognize repetitive rule structures and build an editor in a consistent way.

The PE project defines a grammar and generates AST in a way not related to any projectional editor. PE4MPS project imports the generated ASTs into MPS generating for example from this rule:

Graph:
  strict=STRICT? type=GraphType name=string?
  statementList<indentList('{', '}')>=Statement*
;

this MPS editor:

The defined list of indented statement can be utilized also in:

Subgraph:
  (SUBGRAPH label=Id?)?
  statementList<indentList('{', '}')>=Statement*
;

giving:

The last example for this small introduction are optional group of elements like:

NodeId:
  id=Id (':' first=Id (':' second=Id)?)?
;

translated in MPS into two intentions:

The two mentioned project are still at their very first version and many features are still missing, probably the most important one is scopes handling.

New ECMAScript4MPS project

2015-02-12T13:00:00Z

There are several strategies for code generation. The one used by MPS could be called no code generation strategy. In fact the suggested implementation for code generation is first to transform models in your language to models in the target language, or if you prefer first to transform AST representation of your program into AST representation of target language. Then models from a real language, for example Java, are translated to text. This means that only construct, like if-then-else, of language that one can compile or execute, like Java, have a place where one say how to translate them to text, in MPS called TextGen aspect.

MPS gives you for free a language called baseLanguge that transform seamlessly to Java. Whereas one of my target languages is Javascript, so I created this new MPS language. The new project is hosted on github.

I plan to write some more documentation in the future, but I want highlight few points here.

At first I started from the grammar-like specifications one can find on the www.ecma-international.org site. Probably it will be useful in the future to implement all lexer rules, still missing in this very first version. But with MPS you implement AST models while grammars are more focused on parsing strategies.

Then I came to the Mozilla Parser API that in addition has real parser implementations like Esprima. This was quite useful, in fact ECMAScript4MPS fully respect the AST documented by the Mozilla documentation with few exceptions.

Many editing features are still missing and editing is at the moment more an AST editing than text editing. However my goal is javascript generation so ECMAScript4MPS will be used for template definition, here editing with intentions and less automatic side transform I think it's acceptable. Said than lot of intentions and side transforms are still missing, stay tuned for next release.

Writing a developer's guide that could help other developers understanding MPS is also one of my goals.

Sunset in Antignano

2014-10-24T12:00:00Z

On the horizon the Gorgona island.

Sketching UI with text tools

2014-09-29T12:00:00Z

The idea to sketching UI, or other types of drawing, using text file is not new. Text can be embedded into text files so a whole document can be defined using only text. This is the approach used by Sphinx, with it you can use reStructuredText to define your HTML or PDF documentation.

Sphinx is integrated with PlantUML so you are able also to define UI, UML diagrams, charts using text.

The subproject used by PlantUML to define UI sketch is called Salt. What I do not like about Salt it's that it uses lines, curves and text to draw a UI sketch. This way the result is not realistic.
And ok... I would like to exercise with ANTLR.

The task that let me include realistic screenshots into my documents was divided into two subprojects:

I wrote an ANTLR parser for the part of Salt syntax I needed so far.
This result in the Salt9000 project.
then I integrated Salt9000 into a customization version of PlantUML.
This result in the Plant UML 9000 project.

Don't be confused about my real target: I would prefer to generate my documentation directly from source code artifacts, as you can do with the mbeddr project. However even with such powerful tool, it seems to me that a DSL to define UI it's still useful while you are writing first requirements.

An example

Let's say you want to design a login form like this:

instead of open a graphic tool with line, curves, color... you can enter this text into a definition file and process it with Salt9000:

{
  {* Help}
  {Username: | "          " | Password: | "          "}
  {[Ok] | [Cancel]}
}

For reference the image generated by original Salt would be:

With Sphinx and PlantUML you don't have to generate this image manually, just include something like this in your source document to have your image rendered into the PDF or HTML that Sphinx generates:

.. uml::

   @startsalt9000
   {
     {* Help}
     {Username: | "          " | Password: | "          "}
     {[Ok] | [Cancel]}
   }
   @endsalt9000

ANTLR4 grammar for Markdown

2014-08-31T22:00:00Z

Markdown is today used in several place including this blog.
ANTLR is also used, almost, everywhere where you need a parser.
So to learn ANTLR I have chosen to try teach ANTLR how to parse Markdown syntax. This task has been much more hard then I expected.

The result has been published as the ANTMark project.

The main problem is that everything is context sensitive, including newlines.

The hardest thing to parse were ordered and unordered lists. Here there is another problem: there is not reference syntax definition for Markdown a different implementations parse nested ordered or unordered list differently.

To force ANTLR to parse Markdown the result as not as fast as I expected, I mean it's pretty slow and I had to break long test cases in smaller one otherwise the parsing never ends.

State of the art

Under tests you can find 143 tests that can be executed with the MarkdownTest class.
There are some homemade tests, almost all the Markdown default tests (version 1.0.2), and all tests of the markdown-testsuite.
Due to the problem highlighted below the project is at the moment only an exercise of style.

Main problems

I don't know with other parsing engines, but with ANTLR4 I spent lot of time parsing lists. The main problem with ANTLR4 is probably that one would have a stopping rule for rules such as (...)*? that forces the parser to stop to consume tokens.
ANTLR4 stops to consume token depending on what follow the example rule above, but this in my grammar was not enough.

So I used semantic predicates, but the parser is very very slow, it's not able to parse a file that contains more than 2 lists, I also tried to inspect the generated DSA but it was for me out of reach.
Adding more lists causes that the parsing never ends. I hope this will have some interest for the ANTLR gurus.

Use cases

After you have cloned the project, import it into eclipse, than you can:

compile the MarkdownLexer.g4 grammar with compile-lexer run/debug configuration.
compile the MarkdownParser.g4 grammar with compile-parser run/debug configuration.
open the class MarkdownTest and Run As -> JUnit Test from the right click menu.

Future directions

Use a scanner less gramar: unfortunately almost at the end of my work I realized that the lexer was not doing that much so a scanner less grammar could be probably adopted easily. In addition the semantic predicates I wrote all of them act inspecting the token stream. In a scanner less scenario should be easier to inspect a CharStream.

Parser modularization: in case no global solution exists to build a real parser for the whole language, one could try to first build a parser for the block structure of Markdown, for example identifying first lists, verbatim, heading..., than parse each block content to parser emph, strong, links and all the other span elements.
I already used this approach for blockquotes: here on each line the starting tokens > are removed and the result is passed to an new instance of the parser. Probably some fixes are required in case of presence of reference links into the blockquote.

Support/discussions

Because the project is just started I think a generic group for discussions and comments is enough: https://groups.google.com/forum/#!forum/antmark-discussion.

The smallest static site generator

2014-03-03T21:00:00Z

Now that I can write posts in HTML and in Markdown what is missing to create the smallest static web generator we can imagine?

In fact this is only the first list of missing features, for example Octopress has a beautiful syntax highlight, anyway... I asked to myself the minimum to be able to publish and organize content and let people follows my blog, so:

my index page is almost empty, I'll publish here last posts abstract to drive the reader.
already with the index page I could write posts and readers find posts on the index. However minimum search capabilities can be implemented with tags even if they will become useful only with several posts.
Atom feed is the minimum to keep in touch with readers.

Abstract on the index page

This will require for sure to order posts with respect to their date to publish only some of last published ones. To do this I will collect posts metadata during posts processing. As you can imagine this will be used also when we will implement the tags pages. The bootstrap construct used is the description lists. I already had an index template I modified it and added a new composite attribute to the StringTemplate instance used.

Tags pages

While processing posts I collect data about posts into a ArrayList used by Blog.createTagsPages(). A main page with all tags together with a page for each tag get created by the above method.

Atom feed

Atom feed is created from the same data used for the main index page. When I searched for a library to generate an atom file from Java the surprise was that there are not so much alternatives. Finally I have chosen ROME to do not chose the Apache alternative Abdera that seemed to me to have more dependencies. The documentation is not that big nevertheless I found the example I was looking for.

At this point I also added Font Awesome to add the feed symbol on the navigation bar.

As a final note I have chosen to implement Atom instead of RSS because it's a newer format.

Adding a markup language

2014-03-01T23:00:00Z

The previous post introduces a small language to define post files and in this project you can find some java classes to transform them into a small blog, based on bootstrap. The first improvement I see that was missing it's a markup language to speed up writing of posts. I decide to investigate a bit on static site generators like for example Jekyll, actually the one used by github.

I started searching for static site generator or static web framework and I found among others: Jekyll, Pelican, Octopress, Assemble, Nanos, Wintersmith, Phrozn, Cactus, Hexo, Genit.
One based on java is JBake.
There is a comprehensive list here.

At this point I asked to myself what I would like to do in the future: use a fully featured static site generator or continue to use my code. I decided for the ladder because the blog itself is not my first goal, instead I would also to experiment with grammars and code generation. So I looked at the above projects more as examples of files organization than projects to use in practice. I will add feature one by one as soon as I will need something not yet implemented.

Write directly in HTML is time consuming, so the first thing to implement is the possibility to use Markdown instead of plain HTML, a list of implementations is maintained here.

Between others MarkdownPapers use JavaCC, and PegDown is derived from a PEG grammar. I chose the ladder because of better documentation.

After adding some jars I was able to write my posts using markdown. Now post files with extension .md are parsed as post but abstract and content are transformed with PegDown.
This post is the first of this kind, check out adding-markup-language.md .
This extension has also the side effect that posts written in markdown can be edited with syntax highlight if opened with a specialized editor.

The java part had very few modifications, for example to process the content I just needed, see the Blog class:

if (post.getName().endsWith(MARKDOWN)) {
    content = new PegDownProcessor().markdownToHtml(content);
}

Post #1

2014-02-14T23:00:00Z

If you are a Java developer and you are interested in Domain Specific Language (DSL) and Code Generation, soon or late you are going to play a bit with ANTLR. In addition if you are such kind of person you will probably know the Martin Fowler bliki. Now something personal: I in general dislike working with graphic tools when I can do the same thing by coding and/or command line (who knows if in one of my next posts I will decide to explain why). I also dislike to store into a database things that are much more comfortable into the file system. All these reasons drive me to implement my own bliki.

I have given also an opportunity to WordPress, indeed a spanish blog I translate to italian is maintained with WordPress, but let's speak about the static part of this site.

The first (bad) grammar

Because I aim to experiment with ANTLR I decided to wrote a small language to define a blog post. Once you have such a language and you can parse your posts you can use this data to:

create a main page with only last posts.
create a page to show only posts tagged with a specific label.
create your RSS feed.

A post will be something like:

title: 
url: 
date: 
tags: antlr,java, ..
content: the HTML part of the post

If you are new to ANTLR the first grammar will be:

post: title url date tags content;

title: 'title:' LINE;
url: 'url:' LINE;
date: 'date:' DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT NL;
tags: 'tags:' WORDS? (',' WORDS)? '\n';
content: 'content:' .*;

DIGIT: [0-9];
WORDS: ([a-zA-Z0-9] | ' ')+;
LINE: ~[\r\n]* NL;
NL: '\r'? '\n';

If you try this grammar with:

$ antlr4 BadBlog.g4
$ javac *.java
$ grun BadBlog post -tokens test1.post

You will receive some errors like this line 1:0 missing 'title:' at 'title: something\n'. Why ANTLR says title: is missing if it's actually inside the file?

The first good grammar

This fact is stated at page 15 of The Definitive ANTLR 4 Reference:

Note that lexers try to match the longest string possible

Out lexer consume title: while matching the LINE rule, and this is visible from the preceeding command:

[@0,0:18='title: something\n',<11>,1:0]

The token 11 is LINE.

The solution is to implement everything at lexer level (I introduce "..." to end the content rule):

post: TITLE URL DATE TAGS CONTENT;

TITLE: 'title:' .*? NL;
URL: 'url:' .*? NL;
DATE: 'date:' .*? NL;
TAGS: 'tags:' .*? NL;
CONTENT: 'content:' .*? NL '...' NL;
NL : '\r'? '\n';

If you test this you will see that the grammar successfully parse the file at the price of having also starting and ending string when accessing the AST, e.g. TITLE().getText() will contains also title:.

The island grammar

With our grammar we want basically to parse:

a tag, like title:, associated with one line.
a tag, like content:, associated with more lines.
a list of words, like tags: .

This is our meta-model expressed formally into the grammar we are going to write.

The Lexer respect rules precedence but here the problem is that the LINE rule has no start condition and once it starts will match for instance always more chars than WORDS. The solution are lexer modes but for this you should split your grammar in a lexer and parser grammars, see BlogLexer.g4 and BlogParser.g4 . You need a sequence that start a mode and a sequence that switch back to the default mode. Inside a mode you have different lexer rules, for instance after title: we match chars until a new line while after content: the new line char alone has nothing special and we match a longer sequence as you can see reading the grammar.
The only remark is how we match a long sequence of chars, the CH rule, into the lexer that the parser join together into a chars object.

I created an eclipse project for this blog, you can play with my grammar:

use compile-lexer.launch to compile the lexer, then
use compile-parser.launch to compile the parser, then
refresh the eclipse project
create the html from templates using update-web-gen.launch.

There is also a grun.launch to have from eclipse the same output of grun command, but while developing a new grammar, at least when it's a small grammar, it's easier from the command line.
The rest of the code at the moment are simple code that parses post files and output HTML files using StringTemplate.

Conclusion

When you develop a grammar usually you look at the final result that is produced by the parser, however you have to don't forget that it receives what the lexer prepares.