Qafoo GmbH - passion for software quality ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Kore Nordmann :Date: Fri, 16 Sep 2016 11:44:49 +0200 :Revision: 5 :Copyright: All rights reserved ================================================ Grasp - Structural JavaScript Search and Replace ================================================ :Description: Grasp - A structural search and replace toolkit for your JavaScript development :Keywords: JavaScript, grasp, sed, grep, AST, parser, syntax, semantic, analysis :Abstract: About a week ago I came across a new JavaScript-related utility called grasp__. Grasp is a commandline utility to search and replace content in JavaScript files. It has a certain similarity to tools like grep__ or sed__. __ http://graspjs.com __ http://www.gnu.org/software/grep/ __ http://www.gnu.org/software/sed/ About a week ago I came across a new JavaScript-related utility called grasp__. Grasp is a commandline utility to search and replace content in JavaScript files. It has a certain similarity to tools like grep__ or sed__. __ http://graspjs.com __ http://www.gnu.org/software/grep/ __ http://www.gnu.org/software/sed/ Of course there are quite a lot tools like this out of there. Why is another one needed? What is the difference between this utility and its counterparts? The answer lies within its specialization on JavaScript. Even though the tools' author `George Zahariev`__ plans on supporting different languages in the future aside from JavaScript, there is a reason for this limitation: **grasp** does not utilize the source code as a plain text file, but understands about its structure, its building blocks. Therefore, it can be much more powerful than a plain text-based solution in a lot of situations. __ https://twitter.com/gkzahariev Installing grasp ================ .. note:: Qafoo experts can `support your team`__ to learn and use all the different varieties of available JavaScript tools and techniques to bring more efficiency to your everyday work. __ /services/consulting.html *grasp* is written in JavaScript using node as an execution environment. As every good nodejs citizen it is being distributed using the `node-package-manager (npm)`__. Therefore its installation is completely painless once you have a running `nodejs`__ on your system:: $ npm install -g grasp Once *npm* has finished its work, you should be able to simply execute the „grasp“-command and the tool will greet you with its usage information and help output. __ http://npmjs.org __ http://nodejs.org The power of grasp ================== Let's take a first look at what *grasp* can do for us. As it operates on files, specifically JavaScript files, an example is needed. Let's use this one for now:: var isBlogPost = true; if (author == "Jakob" && isBlogPost) { qafooBlog.publish(post); } if (author == "Toby" && isBlogPost) { qafooBlog.publish(post); } Before we take a look at a more real-world example, let's start with this quite easy one, as it is capable of demonstrating the basic features of *grasp* very nicely. Why structure matters --------------------- The given example shows quite clearly why structural analysis of source code may matter for such a tool. Even though the shown ``if``-statements are mostly the same, they are written in a different manner. One of the conditions is written on one line, while the other one is split into two. A JavaScript-Engine does not care about this newline, as it knows about the syntactical structure of a JavaScript file. However, a simple text-based search algorithm will be fooled by the newline and might miss what we originally searched for. Assuming our aim is to find all the occurrences of the ``isBlogPost`` identifier, our goal can be reached using ``grep`` or the search function of any IDE quite easily:: $ grep 'isBlogPost' blog_if_example.js 1:var isBlogPost = true; 3:if (author == "Jakob" && isBlogPost) { 8: && isBlogPost) { That worked like a charm. So let's get a little bit more restrictive. Say we want to find all ``statements`` which utilize the ``isBlogPost`` identifier inside their ``test`` condition. This can be accomplished using a regular expression that searches for a line, where an ``if`` is found followed by a pair of parenthesis which do contain ``isBlogPost`` somewhere between them:: $ grep 'if (.*isBlogPost.*)' blog_if_example.js 3:if (author == "Jakob" && isBlogPost) { Oh wait. We missed an occurrence! Unfortunately, we only checked the occurrence on one line. The second occurrence was split up to multiple lines. As I mentioned before, it is of course possible to modify the regular expression to accommodate this fact, but it is neither useful nor pragmatic as there are a lot of other cases which can't be covered using the regexp approach. grasp the code -------------- As *grasp* has a syntactical understanding of the source code, like a JavaScript Engine, it does not care about the multiline ``if`` statement. It just sees an ``if`` with a ``test`` condition, which consists of a ``LogicalExpression`` (the ``&&``), that does consist of a ``BinaryExpression`` (the ``auther == ...``) and an ``Identifier`` (the ``isBlogPost``). What ``grasp`` sees and therefore works on is the Abstract-Syntax-Tree (AST) of the given source code. It is a tree-based structure, which does contain all the needed information, but without all the different ways of writing it down:: if └── test └── LogicalExpression (&&) ├── BinaryExpression (==) │ ├── Identifier (author) │ └── Literal ("Jakob") └── Identifier (isBlogPost) Okay, we now have an idea about the view of *grasp* on our source code. Let's use this information to complete the before given task of selecting all ``if`` statements that contain the ``isBlogPost`` identifier in their ``test`` condition:: $ grasp 'if.test! #isBlogPost' blog_if_example.js 3:if (author == "Jakob" && isBlogPost) { 7-8:(multiline): if (author == "Toby" && isBlogPost) { Hey, ``grasp`` was able to correctly provide us with what we searched for. But what exactly did we give *grasp* as input to achieve our goal? CSS for ASTs ============ *grasp* does provide us with a CSS-like selector language to specify what we want to search for inside the AST of the given source file. Once you have developed a feeling for which nodes are placed where in the created AST of your input file, it is actually quite easy to write and grasp ;). A full list of all nodes, their child nodes and possible attributes is available on `grasp's JavaScript-Syntax page`__ __ http://graspjs.com/docs/syntax-js/ Let's take a closer look at the used selector to extract the usages of ``isBlogPost`` in ``if`` statements: ``if.test #isBlogPost``. Especially how I came up with this selector: - I wanted to select something from within an ``if``. Therefore, I looked up the ``IfStatement`` inside the linked JavaScript-Syntax-Page. I found out that its shortname is ``if``. So I used that. **Current selector:** ``if`` - Further reading of the ``IfStatement``-paragraph revealed that besides others it has an attribute called ``test``, which houses the condition of the statement. *grasp* allows the access of attributes using a simple dot notation. **Current selector:** ``if.test``. - Everything which classifies as an ``Identifier`` can be represented using a hash followed by the identifiers name, much like in CSS. As I am searching for the ``isBlogArticle`` identifier somewhere below the ``test`` attribute, I simply split the two statements using a whitespace. Exactly the same as with a normal CSS selector. **Current selector:** ``if.test #isBlogPost``. - Like in CSS, the element matched is the last one inside the given chain of selectors. In this case the ``isBlogPost`` -identifier. We however want to extract the whole condition which contains this identifier. Luckily for us, *grasp* allows us to mark the part of the given expression we really want to extract using an exclamation mark. As we want to extract the ``test`` condition of every match, we put the exclamation mark right behind it. **Current selector:** ``if.test! #isBlogPost``. As you can see, constructing *grasp*-selectors progressively is not that hard. Just think about what you want to select and build it up, one part at a time. The full spectrum of the CSS-like selector syntax is available from the `grasp squery documentation`__. __ http://graspjs.com/docs/squery/ Taking a look at the AST behind the curtain ------------------------------------------- Once you start constructing more complex expressions you might wonder, what the AST you just matched looks like. Having information about this might come in handy to decide on how to narrow down your selection to what you really searched for. *grasp* does provide a *json output mode*, which is triggered using the ``-j`` argument. Essentially, it outputs a JSON representation of the AST instead of the usual match. Using a JSON prettifier to look at the generated structures is highly recommended ;):: $ grasp -j 'if.test #isBlogPost' blog_if_example.js | prettyson [{ type: 'Identifier', start: 49, end: 59, loc: { start: { line: 3, column: 25 }, end: { line: 3, column: 35 } }, name: 'isBlogPost' }, { type: 'Identifier', start: 126, end: 136, loc: { start: { line: 8, column: 7 }, end: { line: 8, column: 17 } }, name: 'isBlogPost' }] A little bit more real world ============================ I promised before to show a little bit more of a real world example later on. Here we are now, knowing about the ideas of *grasp* as well as the basics of its selector syntax. It is time. Assume we have the following file as input:: require.config({ baseUrl: "/", paths: { // Some aliases "translation": "my/deeply/nested/translation/module", "window": "my/global/window/wrapper", // External libs jquery: "vendor/some/component/manager/jquery/jquery", ui: "vendor/some/component/manager/jquery-ui/jquery-ui", underscore: "vendor/some/component/manager/underscore/underscore", // AMD Loaders text: "base/requirejs/loaders/text" }, shim: { jquery: { exports: "jQuery" }, ui: { deps: ["jquery"] }, underscore: { exports: "_" } } }); A quite common `require.js`__ configuration, defining some ``paths``, some ``shims`` and a ``baseUrl``. While working with something similar to this I realized I had a problem. Even though during development everything worked like a charm, a certain version of ``r.js`` - the *require.js* optimizer - didn't want to read the configuration to create a build. I found out that it was quite particular about the way ``keys`` were defined in the configuration ``objects``. All of the keys needed to be strings, encapsulated in quotation marks. As the file was quite large, much larger than this example, I decided to use *grasp* to help me out:: $ grasp 'prop[key!=String] > @key' require.config.js --replace '""{{}}""' __ http://requirejs.org While this looks a little bit scary at first, it did exactly what I wanted: Replacing every non-string-key in the configuration object with its double quotes encapsulated counter part. Let's take apart the selector to see what is really going on here: - First, every object property (short: ``prop``) is selected. - A look into the `syntax documentation`__ or the ``grasp --help prop`` output tells us properties have a ``key``- and a ``value``-attribute. ``key`` seems to be exactly what we need. Using the square-bracket notation a condition is therefore defined, which instructs *grasp* to select all ``prop`` elements. Those elements need to have a ``key`` attribute that is **not** a ``String``. __ http://graspjs.com/docs/syntax-js/ - If we wanted to only extract all properties having a non-string-key, we would have already succeeded. We however intend to replace all of those keys. Therefore we need to really select the ``key``-attribute and not the whole ``prop`` fulfilling the given condition. As with CSS the greater-than ``>`` operator tells the engine to select a direct child, while ``@key`` tells *grasp* that the requested child is the key attribute of the selected element. - The selector is ready. Missing is the replacement, specified by the ``--replace``-option a replacement for the matched structures may be given. In this example it is a double pair of curly braces encapsulated in double quotes. The ``{{}}`` represents the contents of the structure to be replaced. In our case the ``key``. Due to an `open bug`__ of *grasp* we need to provide two double quotes instead of one around the ``key`` as one pair is eaten by *grasp*. __ https://github.com/gkz/grasp/issues/5 Wrapping it up ============== As this post demonstrates, there is quite some potential for a search-and-replace-system which does incorporate knowledge about the programming language it operates on. Even though being quite young, *grasp* has already good chances to become a new utility in my JavaScript tool belt. It is definitely worth looking into. I hope you have as much fun playing with it as I had. .. Local Variables: mode: rst fill-column: 79 End: vim: et syn=rst tw=79 Trackbacks ========== Comments ========