1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
|
===============
pycparser v2.22
===============
.. image:: https://github.com/eliben/pycparser/workflows/pycparser-tests/badge.svg
:align: center
:target: https://github.com/eliben/pycparser/actions
----
.. contents::
:backlinks: none
.. sectnum::
Introduction
============
What is pycparser?
------------------
**pycparser** is a parser for the C language, written in pure Python. It is a
module designed to be easily integrated into applications that need to parse
C source code.
What is it good for?
--------------------
Anything that needs C code to be parsed. The following are some uses for
**pycparser**, taken from real user reports:
* C code obfuscator
* Front-end for various specialized C compilers
* Static code checker
* Automatic unit-test discovery
* Adding specialized extensions to the C language
One of the most popular uses of **pycparser** is in the `cffi
<https://cffi.readthedocs.io/en/latest/>`_ library, which uses it to parse the
declarations of C functions and types in order to auto-generate FFIs.
**pycparser** is unique in the sense that it's written in pure Python - a very
high level language that's easy to experiment with and tweak. To people familiar
with Lex and Yacc, **pycparser**'s code will be simple to understand. It also
has no external dependencies (except for a Python interpreter), making it very
simple to install and deploy.
Which version of C does pycparser support?
------------------------------------------
**pycparser** aims to support the full C99 language (according to the standard
ISO/IEC 9899). Some features from C11 are also supported, and patches to support
more are welcome.
**pycparser** supports very few GCC extensions, but it's fairly easy to set
things up so that it parses code with a lot of GCC-isms successfully. See the
`FAQ <https://github.com/eliben/pycparser/wiki/FAQ>`_ for more details.
What grammar does pycparser follow?
-----------------------------------
**pycparser** very closely follows the C grammar provided in Annex A of the C99
standard (ISO/IEC 9899).
How is pycparser licensed?
--------------------------
`BSD license <https://github.com/eliben/pycparser/blob/master/LICENSE>`_.
Contact details
---------------
For reporting problems with **pycparser** or submitting feature requests, please
open an `issue <https://github.com/eliben/pycparser/issues>`_, or submit a
pull request.
Installing
==========
Prerequisites
-------------
* **pycparser** was tested with Python 3.8+ on Linux, macOS and Windows.
* **pycparser** has no external dependencies. The only non-stdlib library it
uses is PLY, which is bundled in ``pycparser/ply``. The current PLY version is
3.10, retrieved from `<http://www.dabeaz.com/ply/>`_
Note that **pycparser** (and PLY) uses docstrings for grammar specifications.
Python installations that strip docstrings (such as when using the Python
``-OO`` option) will fail to instantiate and use **pycparser**. You can try to
work around this problem by making sure the PLY parsing tables are pre-generated
in normal mode; this isn't an officially supported/tested mode of operation,
though.
Installation process
--------------------
The recommended way to install **pycparser** is with ``pip``::
> pip install pycparser
Using
=====
Interaction with the C preprocessor
-----------------------------------
In order to be compilable, C code must be preprocessed by the C preprocessor -
``cpp``. ``cpp`` handles preprocessing directives like ``#include`` and
``#define``, removes comments, and performs other minor tasks that prepare the C
code for compilation.
For all but the most trivial snippets of C code **pycparser**, like a C
compiler, must receive preprocessed C code in order to function correctly. If
you import the top-level ``parse_file`` function from the **pycparser** package,
it will interact with ``cpp`` for you, as long as it's in your PATH, or you
provide a path to it.
Note also that you can use ``gcc -E`` or ``clang -E`` instead of ``cpp``. See
the ``using_gcc_E_libc.py`` example for more details. Windows users can download
and install a binary build of Clang for Windows `from this website
<http://llvm.org/releases/download.html>`_.
What about the standard C library headers?
------------------------------------------
C code almost always ``#include``\s various header files from the standard C
library, like ``stdio.h``. While (with some effort) **pycparser** can be made to
parse the standard headers from any C compiler, it's much simpler to use the
provided "fake" standard includes for C11 in ``utils/fake_libc_include``. These
are standard C header files that contain only the bare necessities to allow
valid parsing of the files that use them. As a bonus, since they're minimal, it
can significantly improve the performance of parsing large C files.
The key point to understand here is that **pycparser** doesn't really care about
the semantics of types. It only needs to know whether some token encountered in
the source is a previously defined type. This is essential in order to be able
to parse C correctly.
See `this blog post
<https://eli.thegreenplace.net/2015/on-parsing-c-type-declarations-and-fake-headers>`_
for more details.
Note that the fake headers are not included in the ``pip`` package nor installed
via ``setup.py`` (`#224 <https://github.com/eliben/pycparser/issues/224>`_).
Basic usage
-----------
Take a look at the |examples|_ directory of the distribution for a few examples
of using **pycparser**. These should be enough to get you started. Please note
that most realistic C code samples would require running the C preprocessor
before passing the code to **pycparser**; see the previous sections for more
details.
.. |examples| replace:: ``examples``
.. _examples: examples
Advanced usage
--------------
The public interface of **pycparser** is well documented with comments in
``pycparser/c_parser.py``. For a detailed overview of the various AST nodes
created by the parser, see ``pycparser/_c_ast.cfg``.
There's also a `FAQ available here <https://github.com/eliben/pycparser/wiki/FAQ>`_.
In any case, you can always drop me an `email <eliben@gmail.com>`_ for help.
Modifying
=========
There are a few points to keep in mind when modifying **pycparser**:
* The code for **pycparser**'s AST nodes is automatically generated from a
configuration file - ``_c_ast.cfg``, by ``_ast_gen.py``. If you modify the AST
configuration, make sure to re-generate the code. This can be done by running
the ``_build_tables.py`` script from the ``pycparser`` directory.
* Make sure you understand the optimized mode of **pycparser** - for that you
must read the docstring in the constructor of the ``CParser`` class. For
development you should create the parser without optimizations, so that it
will regenerate the Yacc and Lex tables when you change the grammar.
Package contents
================
Once you unzip the ``pycparser`` package, you'll see the following files and
directories:
README.rst:
This README file.
LICENSE:
The pycparser license
setup.py:
Installation script
examples/:
A directory with some examples of using **pycparser**
pycparser/:
The **pycparser** module source code.
tests/:
Unit tests.
utils/fake_libc_include:
Minimal standard C library include files that should allow to parse any C code.
Note that these headers now include C11 code, so they may not work when the
preprocessor is configured to an earlier C standard (like ``-std=c99``).
utils/internal/:
Internal utilities for my own use. You probably don't need them.
Contributors
============
Some people have contributed to **pycparser** by opening issues on bugs they've
found and/or submitting patches. The list of contributors is in the CONTRIBUTORS
file in the source distribution. After **pycparser** moved to Github I stopped
updating this list because Github does a much better job at tracking
contributions.
|