Here are the latest developments on Abstract Syntax Trees (ASTs) based on recent public discussions and research.
Overview
- Abstract syntax trees continue to gain attention as code understanding and transformation tools expand, especially in the context of large language models (LLMs) and program analysis. They are increasingly used not just by compilers, but by linters, formatters, code analyzers, and AI-assisted coding systems.[2][7]
Recent research highlights
- Studies compare AST representations across parsing frameworks (JDT, Tree-sitter, ANTLR, srcML). Findings indicate JDT tends to produce smaller, shallower trees with higher abstraction, while other parsers yield richer but sometimes more verbose ASTs. The practical takeaway is that the choice of parser can affect downstream tasks such as code summarization or code search, balancing richness with learnability.[4][2]
- A notable arXiv paper surveys AST representations for programming language understanding, emphasizing that AST size and abstraction level vary by parser and that these differences influence performance on code-related tasks. This work suggests selecting an AST representation tailored to the task (e.g., learning-friendly abstractions vs. detailed granularity).[4]
- Related findings indicate ASTs can improve expressiveness in code representation for tasks like code search and summarization, but overly rich ASTs may introduce redundancy and higher learning complexity for models. Practical guidance points toward moderate abstractions that capture essential structure without overwhelming the learner.[4]
Practical implications for tools and AI
- JavaScript/TypeScript tooling continues to leverage ASTs for code transformations, linters, and formatters, with ongoing exploration of how ASTs interact with AI/LLM-based code generation and patching workflows. This aligns with industry interest in using ASTs to steer automated code edits and to integrate static analysis into AI pipelines.[1][7]
- A growing ecosystem of libraries and tools focusing on AST manipulation (e.g., patches, transforms) reflects the broader adoption of ASTs beyond traditional compiler pipelines, including for education, tooling, and research.[10]
Examples and resources
- Introductory explanations and practical overviews remain common in video tutorials and blog posts, illustrating how ASTs underlie parsing, code analysis, and code generation workflows in languages like JavaScript and TypeScript. These resources often connect AST structure to real-world tooling like ESLint, Prettier, and Babel.[7]
- For academic readers, the arXiv piece comparing AST parsers provides concrete metrics on tree size, depth, and abstraction, relevant when selecting a parser for a given task.[2][4]
Would you like:
- A quick side-by-side comparison table of AST parsers (JDT, Tree-sitter, ANTLR, srcML) with their typical characteristics (tree size, depth, abstraction)?
- A brief guide on choosing an AST representation for a specific task (e.g., code summarization vs. static analysis)?
- A short list of current libraries and tools in Python/JavaScript for AST manipulation and example usage?
Sources
ievans on June 7, 2021 It supports many more languages (~17 at various stages of development) and being able to do AST patching as in the original is one of the capabilities we're experimenting with: https://semgrep.dev/docs/experiments/overview/#autofix Would love your feedback!
news.ycombinator.com• The ASTs generated by different AST parsing methods differ in size and abstraction level. The size (in terms of tree size and tree depth) and abstraction level (in terms of unique types and unique tokens) of the ASTs generated by JDT are the smallest and highest, respectively. On the contrary, ASTs generated by ANTLR exhibit the largest size and the lowest abstraction level. Tree-sitter and srcML are both intermediate in structure size and abstraction level between JDT and ANTLR. … • Among...
arxiv.orginterpreter, pyre-ast will be able to parse/reject it as well. Furthermore, abstract syntax trees obtained from pyre-ast is guaranteed to 100% match the results obtained by Python's own ast.parse API, down to every AST node and every line/column number.
alan.petitepomme.netWe apply the approach to gradually migrate the schemas of the AUTOBAYES program synthesis system to concrete syntax. Fit experiences show that this can result in a considerable reduction of the code size and an improved readability of the code. In particular, abstracting out fresh-variable generation and second-order term construction allows the formulation of larger continuous fragments and improves the locality in the schemas. … We used the recent grammar of the Arden Syntax v.2.10, and both...
www.science.govBased on the extensive experimental results, we conclude the following findings: • The ASTs generated by different AST parsing methods differ in size and abstraction level. The size (in terms of tree size and tree depth) and abstraction level (in terms of unique types and unique tokens) of the ASTs generated by JDT are the smallest and highest, respectively. On … pets require more high-level abstract summaries in code summarization, and code snippets semantically match but contain fewer query...
arxiv.org