MAR9000 posts feed.Master feed of posts from mar9000.org .Efficient DTOs2020-10-07T22:00:00Z2020-10-07T22:00:00Z<p>When we need to return data to be displayed on the UI as <a href="https://martinfowler.com/eaaCatalog/dataTransferObject.html">DTO</a>s we are going to read from multiple repositories unless for trivial UI. For example forum posts can have author (a user) and comments. Comments can have author too.</p>
<pre><code>{
posts: [{
title: "Efficent DTO",
author: {
name: "Marco"
},
comments: [{
author: {
...
}
</code></pre><p>In this context, no matter the technology we are going to use, the N+1 problem will pop up. For instance with:</p>
<ul>
<li>ReST API: we execute a request for the list of posts then N requests for the User resources.</li>
<li>ReST API + hint, like <code>/posts?with=user</code>: now we have only one ReST request but probably on the server users are loaded one by one from their repository.</li>
<li>GraphQL: without data loaders you still have the same problem, the user of a post is resolved for each post.</li>
</ul><p>The last example is not accidentally: when I faced this problem I thought at first to resort to GraphQL and this is why my idea comes from it. However I did not have dynamic queries but static ones, read "several endpoints that return always the same JSON", hence I did not want to adopt en engine to parse, for a given endpoint, the same query over and over again.</p><p>I found that what I needed were batched loaders and <em>data loaders</em>, something that every GraphQL implementation have as associated library. In our example for every post DTO resolution the load of the user is delayed until we have the complete list of the users to load (same for comments and users of comments) so we can batch load users.<br/>Other sulutions, illustrated below, do not scale, at least in my use cases.<br/>The library that implements the above algorithm, without GraphQL, is <a href="https://github.com/mar9000/graph-loader">graph-loader</a>.</p><h2>Context</h2><p>So we have implemented boundary, repositories, application and domain services and we have to return data to display the UI. Unless for trivial UI the ReST paradigm won't work. We will have to request posts then users, them probably comments, etc:</p>
<pre><code>/posts
/posts/1/user
/posts/2/user
...
</code></pre><p>Let's try ReST plus some hints like <code>with=users,comments</code> to extend our response content. Now how are we going to implement this request on the backend? We load a list of posts then for each post we load its user?<br/>This will yield the N+1 problem right? No problem we can collect the list of the users for all posts and load them in a batch, same algoritm used by GraphLoder (GL) but implemented manually. So far so good.</p><h2>Problem</h2><p>What about loading also comment users or user's country to show a flag icon? Once you have loaded all users you have to collect, manually, all countries of all users to batch the load of countries. So far we have only named three repositories but for rich UI it's common the case to load from ten different repositories, an optimized loading logic become quickly cumbersome.</p><p>One of my assumptions is that my queries are static, one endpoint should resolve a small set of DTO shapes but a query language is not required. The solution, batch loads, is one of the possibile solutions and actually the one that we are going to explore. But doing it manually would be a nightmare.</p><p>At this point I evaluated GraphQL. I liked its typed API but AFAIK (I've deeply explored its source code) the entry point to execute the resolvers and data loaders one has configured, is only by a GraphQL query. Indeed one endpoint, binded to a given GraphQL query, would execute the same query over and over again to compose its response.</p><p>Trying to avoid this overhead, while at the same time construct a typed API, led me to GraphLoader.</p><h2>Solution</h2><p>For the solution space the scheme of our API is as follows:</p>
<ul>
<li>we have a type used as key, one of them indicated generically as <code>K</code>.<br/> Every lader has its own key so if needed we can have a set of types <code>K</code>.<br/> I have one and its <code>Integer</code>.</li>
<li>a certain <code>Repository<V></code> of a type <code>V</code> can batch loads a list a <code>V</code> given a list of keys of type <code>K</code>.<br/> For example a repository that load a set of <code>Integer</code> into a set of <code>Post</code>.</li>
<li>our response is composed out of a set of types, each one having one or more assembler, for example an <code>Assembler<V,D></code> to transform a value <code>V</code> into a DTO <code>D</code>.<br/> These assmble for example <code>Post</code> into <code>PostDTO</code> another into <code>DetailedPostDTO</code>.</li>
<li>assemblers can queue more loads once they receive their own value <code>V</code> using its properties as key values.</li>
</ul><p>For instance:</p>
<pre><code>loadPosts
- for each post
- queue the load of its author
- queue the load of its comments
while (there are pending loads)
- execute pending loads
(this pass each loaded V to the assembler
-> assembler eventually queues other loads)
</code></pre><p><img src="/images/resolution-flow.png" alt="resolution flow""/></p><p>A call to <code>GraphLoader.resolve()</code> looks like:</p>
<pre><code>GlResult<PostResource> result = graphLoader.resolve(1L, "postLoader", new PostResourceAssembler());
</code></pre><p>The first phase <em>load</em> is not mandatory, when we have a value with an associated assembler we can execute only the transformation of <code>V</code> into <code>D</code>. For this reason GL has <em>resolve</em> methods for keys and for values, take a looke at <code>resolve()</code> and <code>resolveValue()</code> as starting point. When we have already a Post for example:</p>
<pre><code>GlResult<PostResource> result = graphLoader.resolveValue(post, new PostResourceAssembler());
</code></pre><p>There are also 2 methods to work with lists of keys or lists of values, see <code>resolveMany(List<K> keys, ...)</code> and <code>resolveValues(List<V> values, ...)</code>.</p><h2>Implementation</h2><p>The key point is that loaders method <code>load()</code> does not return values <code>V</code> but instead accepts a consumer of <code>V</code> to handle the result once a given key gets loaded and the resulting <code>V</code> is ready.</p><p>Consider for instance the resolution of post's author presents into the <code>PostResourceAssembler</code>:</p>
<pre><code>PostResource resource = new PostResource();
authorLoader.load(post.authorId,
user -> resource.author =
authorAssembler.assemble(user, context));
</code></pre><h2>Performance</h2><p>The benchmark is very simple at the moment, but data are promising:</p>
<pre><code>Benchmark Mode Cnt Score Units
GLBenchmark.glAvgTime avgt 3 1.212 us/op
GLBenchmark.glAvgTime:·gc.alloc.rate.norm avgt 3 2728.000 B/op
GraphQLJava.graphqlAvgTime avgt 3 94.926 us/op
GraphQLJava.graphqlAvgTime:·gc.alloc.rate.norm avgt 3 162968.507 B/op
</code></pre><p>Said that, I'll do a much more complex graph resolution/query as soon as possible, I mean a query that returns 100 rows and uses 10 repositories.</p><h2>Additional considerations</h2><p>Only in case we have SQL repositories (as the ones I had) is tempting to try:</p><p><strong>More complex query</strong>: for instance specialized repository for Post that joins also the table for author. One-to-one associations are reletively simple to load this way, nevertheless how many tables are we going to join when the UI gets richer? Moreover load associations with a <em>many</em> end, for instance comments, quickly become complicated. Not to mention loading associations of associations.</p><p><strong>JPA entity graph</strong>: if it's always an option, read "they are reachable from the same aggregate", e.g. from <code>Post</code> you can reach author (<code>User</code>) and <code>Comments</code> and comment's author, it probably means you have defined too big aggregates.</p><p>The algoritm used by GL is in my opinion the best trade-off in case of static <em>DTO queries</em> that suggests to do not use GraphQL. Moreover if one decides to migrate to GraphQL later on, most of the classes defined to work with GL can be reused. Repositories and batch loaders can be used almost without modifications.</p><h2>Conclusion</h2><p>GraphQL and <code>java-dataloader</code> are great projects when you don't know the queries your clients are going to send. GitHub for instance move to GraphQL with version 4 of the their API. <code>graphql-java</code> supports CompletableFuture and is highly configurable. But if you have only one front-end speaking with your back-end I think that graph-loader, or at least the idea behind <code>graph-loader</code>, is simpler, smaller, faster and one can be up and running with a small effort.</p>2020-10-07T22:00:00ZA grammar for projectional editor2015-12-13T13:00:00Z2015-12-13T13:00:00Z<p>Describing a language editor can be repetitive, for instance when you have to define expressions. <code>expression '+' expression</code> and <code>expression '*' expression</code> is a typical example. From the grammar file that describes the language structure it's possible to recognize repetitive rule structures and build an editor in a consistent way.</p><p>The <a href="https://github.com/mar9000/pe" title="PE project">PE</a> project defines a grammar and generates AST in a way not related to any projectional editor. <a href="https://github.com/mar9000/pe4mps" title="PE4MPS project">PE4MPS</a> project imports the generated ASTs into <a href="http://www.jetbrains.com/mps" title="MPS project">MPS</a> generating for example from this rule:</p>
<pre><code>Graph:
strict=STRICT? type=GraphType name=string?
statementList<indentList('{', '}')>=Statement*
;
</code></pre><p>this MPS editor:</p><p><img src="/images/graph-editor-example.png" alt="Graph editor example""/></p><p>The defined list of indented statement can be utilized also in:</p>
<pre><code>Subgraph:
(SUBGRAPH label=Id?)?
statementList<indentList('{', '}')>=Statement*
;
</code></pre><p>giving:</p><p><img src="/images/subgraph-editor-example.png" alt="Subgraph editor example""/></p><p>The last example for this small introduction are optional group of elements like:</p>
<pre><code>NodeId:
id=Id (':' first=Id (':' second=Id)?)?
;
</code></pre><p>translated in MPS into two intentions:</p><p><img src="/images/nodeid-editor-example.png" alt="NodeId editor example""/></p><p>The two mentioned project are still at their very first version and many features are still missing, probably the most important one is scopes handling.</p>2015-12-13T13:00:00ZNew ECMAScript4MPS project2015-02-12T13:00:00Z2015-02-12T13:00:00Z<p>There are several strategies for code generation. The one used by <a href="https://www.jetbrains.com/mps/">MPS</a> could be called <em>no code generation</em> strategy. In fact the suggested implementation for code generation is first to transform models in your language to models in the target language, or if you prefer first to transform AST representation of your program into AST representation of target language. Then models from a real language, for example Java, are translated to text. This means that only construct, like <em>if-then-else</em>, of language that one can compile or execute, like Java, have a place where one say how to translate them to text, in MPS called <em>TextGen</em> aspect.</p><p>MPS gives you for free a language called <em>baseLanguge</em> that transform seamlessly to Java. Whereas one of my target languages is Javascript, so I created this new MPS language. The new project is hosted on <a href="https://github.com/mar9000/ecmascript4mps">github</a>.</p><p>I plan to write some more documentation in the future, but I want highlight few points here.</p><p>At first I started from the <em>grammar-like</em> specifications one can find on the <a href="http://www.ecma-international.org/ecma-262/5.1/">www.ecma-international.org</a> site. Probably it will be useful in the future to implement all lexer rules, still missing in this very first version. But with MPS you implement AST models while grammars are more focused on parsing strategies.</p><p>Then I came to the <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API">Mozilla Parser API</a> that in addition has real parser implementations like <a href="http://esprima.org/">Esprima</a>. This was quite useful, in fact ECMAScript4MPS fully respect the AST documented by the Mozilla documentation with few exceptions.</p><p>Many editing features are still missing and editing is at the moment more an <em>AST editing</em> than <em>text editing</em>. However my goal is javascript generation so ECMAScript4MPS will be used for template definition, here editing with <em>intentions</em> and less automatic <em>side transform</em> I think it's acceptable. Said than lot of intentions and side transforms are still missing, stay tuned for next release.</p><p>Writing a developer's guide that could help other developers understanding MPS is also one of my goals.</p>2015-02-12T13:00:00ZSunset in Antignano2014-10-24T12:00:00Z2014-10-24T12:00:00Z<p><a href="/photos/2014/antignano-sunset-2014-10-24-IMG00798.jpg"
data-lightbox="antignano-sunset-2014-10-24-IMG00798" data-title="Sunset in Antignano"> <img src="/photos-small/2014/antignano-sunset-2014-10-24-IMG00798.jpg" /> </a></p><p><img style="max-width: 100%; height: auto" src="/photos/2014/antignano-sunset-2014-10-24-IMG00798.jpg" /> <br/> On the horizon the Gorgona island.</p>2014-10-24T12:00:00ZSketching UI with text tools2014-09-29T12:00:00Z2014-09-29T12:00:00Z<p>The idea to sketching UI, or other types of drawing, using text file is not new. Text can be embedded into text files so a whole document can be defined using only text. This is the approach used by <a href="http://sphinx-doc.org/">Sphinx</a>, with it you can use <a href="http://docutils.sf.net/rst.html">reStructuredText</a> to define your HTML or PDF documentation.</p><p>Sphinx is integrated with <a href="http://plantuml.sourceforge.net/">PlantUML</a> so you are able also to define UI, UML diagrams, charts using text.</p><p>The subproject used by PlantUML to define UI sketch is called <em>Salt</em>. What I do not like about Salt it's that it uses lines, curves and text to draw a UI sketch. This way the result is not realistic.<br/>And ok... I would like to exercise with <a href="http://www.antlr.org">ANTLR</a>.</p><p>The task that let me include realistic screenshots into my documents was divided into two subprojects:</p>
<ol>
<li>I wrote an ANTLR parser for the part of Salt syntax I needed so far.<br/> This result in the <a href="https://github.com/mar9000/salt9000">Salt9000</a> project.</li>
<li>then I integrated Salt9000 into a customization version of PlantUML.<br/> This result in the <a href="https://github.com/mar9000/plantuml">Plant UML 9000</a> project.</li>
</ol><p>Don't be confused about my real target: I would prefer to generate my documentation directly from source code artifacts, as you can do with the <a href="http://mbeddr.com/">mbeddr</a> project. However even with such powerful tool, it seems to me that a DSL to define UI it's still useful while you are writing first requirements.</p><h2>An example</h2><p>Let's say you want to design a login form like this:</p><p><img src="/images/salt9000-example.png" alt="Salt9000 example""/></p><p>instead of open a graphic tool with line, curves, color... you can enter this text into a definition file and process it with Salt9000:</p>
<pre><code>{
{* Help}
{Username: | " " | Password: | " "}
{[Ok] | [Cancel]}
}
</code></pre><p>For reference the image generated by original Salt would be:</p><p><img src="/images/salt-example.png" alt="Salt example""/></p><p>With Sphinx and PlantUML you don't have to generate this image manually, just include something like this in your source document to have your image rendered into the PDF or HTML that Sphinx generates:</p>
<pre><code>.. uml::
@startsalt9000
{
{* Help}
{Username: | " " | Password: | " "}
{[Ok] | [Cancel]}
}
@endsalt9000
</code></pre>2014-09-29T12:00:00ZANTLR4 grammar for Markdown2014-08-31T22:00:00Z2014-08-31T22:00:00Z<p><a href="http://daringfireball.net/projects/markdown/">Markdown</a> is today used in several place including this blog.<br/><a href="http://www.antlr.org">ANTLR</a> is also used, almost, everywhere where you need a parser.<br/>So to learn ANTLR I have chosen to try teach ANTLR how to parse Markdown syntax. This task has been much more hard then I expected.</p><p>The result has been published as the <a href="https://github.com/mar9000/antmark/">ANTMark</a> project.</p><p>The main problem is that everything is context sensitive, including newlines.</p><p>The hardest thing to parse were ordered and unordered lists. Here there is another problem: there is not reference syntax definition for Markdown a different implementations parse nested ordered or unordered list differently.</p><p>To force ANTLR to parse Markdown the result as not as fast as I expected, I mean it's pretty slow and I had to break long test cases in smaller one otherwise the parsing never ends.</p><h2>State of the art</h2><p>Under <code>tests</code> you can find 143 tests that can be executed with the <code>MarkdownTest</code> class.<br/>There are some homemade tests, almost all the Markdown default tests (version 1.0.2), and all tests of the <a href="https://github.com/karlcow/markdown-testsuite/">markdown-testsuite</a>.<br/>Due to the problem highlighted below the project is at the moment only an exercise of style.</p><h2>Main problems</h2><p>I don't know with other parsing engines, but with ANTLR4 I spent lot of time parsing lists. The main problem with ANTLR4 is probably that one would have a stopping rule for rules such as <code>(...)*?</code> that forces the parser to stop to consume tokens.<br/>ANTLR4 stops to consume token depending on what follow the example rule above, but this in my grammar was not enough.</p><p>So I used semantic predicates, but the parser <strong>is very very slow</strong>, it's not able to parse a file that contains more than 2 lists, I also tried to inspect the generated DSA but it was for me out of reach.<br/>Adding more lists causes that the parsing never ends. I hope this will have some interest for the ANTLR gurus.</p><h2>Use cases</h2><p>After you have cloned the project, import it into eclipse, than you can:</p>
<ul>
<li>compile the <code>MarkdownLexer.g4</code> grammar with <em>compile-lexer</em> run/debug configuration.</li>
<li>compile the <code>MarkdownParser.g4</code> grammar with <em>compile-parser</em> run/debug configuration.</li>
<li>open the class <code>MarkdownTest</code> and <em>Run As -> JUnit Test</em> from the right click menu.</li>
</ul><h2>Future directions</h2><p><strong>Use a <em>scanner less</em> gramar</strong>: unfortunately almost at the end of my work I realized that the lexer was not doing that much so a <em>scanner less</em> grammar could be probably adopted easily. In addition the semantic predicates I wrote all of them act inspecting the token stream. In a <em>scanner less</em> scenario should be easier to inspect a <code>CharStream</code>.</p><p><strong>Parser modularization</strong>: in case no global solution exists to build a real parser for the whole language, one could try to first build a parser for the block structure of Markdown, for example identifying first lists, verbatim, heading..., than parse each block content to parser emph, strong, links and all the other span elements.<br/>I already used this approach for blockquotes: here on each line the starting tokens <code>></code> are removed and the result is passed to an new instance of the parser. Probably some fixes are required in case of presence of <em>reference links</em> into the blockquote.</p><h2>Support/discussions</h2><p>Because the project is just started I think a generic group for discussions and comments is enough: <a href="https://groups.google.com/forum/#!forum/antmark-discussion">https://groups.google.com/forum/#!forum/antmark-discussion</a>.</p>2014-08-31T22:00:00ZThe smallest static site generator2014-03-03T21:00:00Z2014-03-03T21:00:00Z<p>Now that I can write posts in HTML and in Markdown what is missing to create the smallest static web generator we can imagine?</p><p>In fact this is only the first list of <em>missing features</em>, for example <a href="http://octopress.org">Octopress</a> has a beautiful syntax highlight, anyway... I asked to myself the minimum to be able to publish and organize content and let people follows my blog, so:</p>
<ul>
<li>my index page is almost empty, I'll publish here last posts abstract to drive the reader.</li>
<li>already with the index page I could write posts and readers find posts on the index. However minimum search capabilities can be implemented with tags even if they will become useful only with several posts.</li>
<li>Atom feed is the minimum to <em>keep in touch</em> with readers.</li>
</ul><h3>Abstract on the index page</h3><p>This will require for sure to order posts with respect to their date to publish only some of last published ones. To do this I will collect posts metadata during posts processing. As you can imagine this will be used also when we will implement the tags pages. The bootstrap construct used is the <em>description lists</em>. I already had an index template I modified it and added a new composite attribute to the StringTemplate instance used.</p><h3>Tags pages</h3><p>While processing posts I collect data about posts into a <code>ArrayList</code> used by <code>Blog.createTagsPages()</code>. A main page with all tags together with a page for each tag get created by the above method.</p><h3>Atom feed</h3><p>Atom feed is created from the same data used for the main index page. When I searched for a library to generate an <code>atom</code> file from Java the surprise was that there are not so much alternatives. Finally I have chosen <a href="https://github.com/rometools/rome">ROME</a> to do not chose the Apache alternative Abdera that seemed to me to have more dependencies. The documentation is not that big nevertheless I found the example I was looking for.</p><p>At this point I also added <a href="http://fortawesome.github.io/Font-Awesome/">Font Awesome</a> to add the feed symbol on the navigation bar.</p><p>As a final note I have chosen to implement Atom instead of RSS because it's a newer format.</p>2014-03-03T21:00:00ZAdding a markup language2014-03-01T23:00:00Z2014-03-01T23:00:00Z<p>The previous post introduces a small language to define post files and in this project you can find some java classes to transform them into a small blog, based on bootstrap. The first improvement I see that was missing it's a markup language to speed up writing of posts. I decide to investigate a bit on <em>static site generators</em> like for example <a href="http://jekyllrb.com">Jekyll</a>, actually the one used by github.</p><p>I started searching for <em>static site generator</em> or <em>static web framework</em> and I found among others: <a href="http://jekyllrb.com">Jekyll</a>, <a href="http://blog.getpelican.com">Pelican</a>, <a href="http://octopress.org">Octopress</a>, <a href="http://assemble.io/">Assemble</a>, <a href="http://nanoc.ws">Nanos</a>, <a href="http://wintersmith.io">Wintersmith</a>, <a href="http://phrozn.info/en/">Phrozn</a>, <a href="https://github.com/koenbok/Cactus/">Cactus</a>, <a href="http://zespia.tw/hexo/">Hexo</a>, <a href="http://lkdjiin.github.io/genit/">Genit</a>.<br/>One based on java is <a href="http://jbake.org">JBake</a>.<br/>There is a comprehensive list <a href="http://staticgen.com/">here</a>.</p><p>At this point I asked to myself what I would like to do in the future: use a fully featured static site generator or continue to use my code. I decided for the ladder because the blog itself is not my first goal, instead I would also to experiment with grammars and code generation. So I looked at the above projects more as examples of files organization than projects to use in practice. I will add feature one by one as soon as I will need something not yet implemented.</p><p>Write directly in HTML is time consuming, so the first thing to implement is the possibility to use <a href="http://daringfireball.net/projects/markdown/">Markdown</a> instead of plain HTML, a list of implementations is maintained <a href="http://www.w3.org/community/markdown/wiki/MarkdownImplementations">here</a>.</p><p>Between others <a href="http://markdown.tautua.org/index.html">MarkdownPapers</a> use JavaCC, and <a href="https://github.com/sirthias/pegdown">PegDown</a> is derived from a PEG grammar. I chose the ladder because of better documentation.</p><p>After adding some jars I was able to write my posts using <em>markdown</em>. Now post files with extension <code>.md</code> are parsed as post but <em>abstract</em> and <em>content</em> are transformed with PegDown.<br/>This post is the first of this kind, check out <code>adding-markup-language.md</code> .<br/>This extension has also the side effect that posts written in markdown can be edited with syntax highlight if opened with a specialized editor.</p><p>The java part had very few modifications, for example to process the <em>content</em> I just needed, see the <a href="https://github.com/mar9000/mar9000.org/blob/master/src/org/mar9000/blog/Blog.java">Blog</a> class:</p>
<pre><code>if (post.getName().endsWith(MARKDOWN)) {
content = new PegDownProcessor().markdownToHtml(content);
}
</code></pre>2014-03-01T23:00:00ZPost #12014-02-14T23:00:00Z2014-02-14T23:00:00Z<p>If you are a Java developer and you are interested in Domain Specific Language (DSL)
and Code Generation, soon or late you are going to play a bit with <a href="http://www.antlr.org">ANTLR</a>.
In addition if you are such kind of person you will probably know the Martin Fowler
<a href="http://martinfowler.com/bliki/WhatIsaBliki.html">bliki</a>.
Now something personal: I in general dislike working with graphic tools when
I can do the same thing by coding and/or command line (who knows if in one of my next posts I will decide to
explain why). I also dislike to store into a database things that are much more comfortable into the
file system. All these reasons drive me to implement my own <i>bliki</i>. </p><p>I have given also an opportunity to WordPress, indeed a spanish blog I translate to italian is
maintained with WordPress, but let's speak about the static part of this site.</p>
<h2>The first (bad) grammar</h2>
<p>Because I aim to experiment with ANTLR I decided to wrote a small language to define a blog post. Once you have
such a language and you can parse your posts you can use this data to:
<ul>
<li>create a main page with only last posts.</li>
<li>create a page to show only posts tagged with a specific label.</li>
<li>create your RSS feed.</li>
</ul>
A post will be something like:
<pre>
title: <something>
url: <url to use for the post>
date:
tags: antlr,java, ..
content: the HTML part of the post
</pre>
</p>
<p> If you are new to ANTLR the first grammar will be:
<pre>
post: title url date tags content;
title: 'title:' LINE;
url: 'url:' LINE;
date: 'date:' DIGIT DIGIT DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT NL;
tags: 'tags:' WORDS? (',' WORDS)? '\n';
content: 'content:' .*;
DIGIT: [0-9];
WORDS: ([a-zA-Z0-9] | ' ')+;
LINE: ~[\r\n]* NL;
NL: '\r'? '\n';
</pre>
If you try this grammar with:
<pre>
$ antlr4 BadBlog.g4
$ javac *.java
$ grun BadBlog post -tokens test1.post
</pre>
You will receive some errors like this <i>line 1:0 missing 'title:' at 'title: something\n'</i>.
Why ANTLR says <i>title:</i> is missing if it's actually inside the file?
</p>
<h2>The first good grammar</h2>
<p>This fact is stated at page 15 of
<a href="http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference">The Definitive ANTLR 4 Reference</a>:
<blockquote>
<p>Note that lexers try to match the longest string possible</p>
</blockquote>
Out lexer consume <i>title:</i> while matching the <code>LINE</code> rule, and this is visible from the preceeding command:
<pre>
[@0,0:18='title: something\n',<11>,1:0]
</pre>
The token 11 is <code>LINE</code>.
</p>
<p>The solution is to implement everything at lexer level (I introduce "..." to end the content rule):
<pre>
post: TITLE URL DATE TAGS CONTENT;
TITLE: 'title:' .*? NL;
URL: 'url:' .*? NL;
DATE: 'date:' .*? NL;
TAGS: 'tags:' .*? NL;
CONTENT: 'content:' .*? NL '...' NL;
NL : '\r'? '\n';
</pre>
If you test this you will see that the grammar successfully parse the file at the
price of having also starting and ending string when accessing the AST, e.g. TITLE().getText()
will contains also <i>title:</i>.
</p>
<h2>The <i>island</i> grammar</h2>
<p>
With our grammar we want basically to parse:
<ul>
<li>a tag, like title:, associated with one line.</li>
<li>a tag, like content:, associated with more lines.</li>
<li>a list of words, like tags: .</li>
</ul>
This is our <i>meta-model</i> expressed formally into the grammar we are going to write.
</p>
<p>
The Lexer respect rules precedence but here the problem is that the <code>LINE</code> rule
has no start condition and once it starts will match for instance always more chars than <code>WORDS</code>.
The solution are <i>lexer modes</i> but for this you should split your grammar in a lexer and parser grammars,
see BlogLexer.g4 and BlogParser.g4 . You need a sequence that start a mode and a sequence that switch back to the
default mode. Inside a mode you have different lexer rules, for instance after <i>title:</i> we match chars until a
new line while after <i>content:</i> the new line char alone has nothing special and we match a longer sequence
as you can see reading the grammar. <br/>
The only remark is how we match a long sequence of chars, the <code>CH</code> rule, into the lexer that the parser
join together into a <code>chars</code> object.
</p>
<p>I created an eclipse project for this blog, you can play with my grammar:
<ul>
<li>use <code>compile-lexer.launch</code> to compile the lexer, then</li>
<li>use <code>compile-parser.launch</code> to compile the parser, then</li>
<li>refresh the eclipse project</li>
<li>create the html from templates using <code>update-web-gen.launch</code>.</li>
</ul>
There is also a <code>grun.launch</code> to have from eclipse the same output of <code>grun</code> command, but
while developing a new grammar, at least when it's a small grammar, it's easier from the command line. <br/>
The rest of the code at the moment are simple code that parses post files and output HTML files
using <a href="http://www.stringtemplate.org/">StringTemplate</a>.
</p>
<h2>Conclusion</h2>
When you develop a grammar usually you look at the final result that is produced by the parser,
however you have to don't forget that it receives what the lexer prepares.2014-02-14T23:00:00Z