aboutsummaryrefslogtreecommitdiffstats
path: root/contrib/libs/grpc/third_party/upb/DESIGN.md
blob: 7ae10f449b0af234057a0369c70fd42ef2536e6a (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
 
μpb Design 
---------- 
 
μpb has the following design goals: 
 
- C89 compatible. 
- small code size (both for the core library and generated messages). 
- fast performance (hundreds of MB/s). 
- idiomatic for C programs. 
- easy to wrap in high-level languages (Python, Ruby, Lua, etc) with 
  good performance and all standard protobuf features. 
- hands-off about memory management, allowing for easy integration 
  with existing VMs and/or garbage collectors. 
- offers binary ABI compatibility between apps, generated messages, and 
  the core library (doesn't require re-generating messages or recompiling 
  your application when the core library changes). 
- provides all features that users expect from a protobuf library 
  (generated messages in C, reflection, text format, etc.). 
- layered, so the core is small and doesn't require descriptors. 
- tidy about symbol references, so that any messages or features that 
  aren't used by a C program can have their code GC'd by the linker. 
- possible to use protobuf binary format without leaking message/field 
  names into the binary. 
 
μpb accomplishes these goals by keeping a very small core that does not contain 
descriptors.  We need some way of knowing what fields are in each message and 
where they live, but instead of descriptors, we keep a small/lightweight summary 
of the .proto file.  We call this a `upb_msglayout`.  It contains the bare 
minimum of what we need to know to parse and serialize protobuf binary format 
into our internal representation for messages, `upb_msg`. 
 
The core then contains functions to parse/serialize a message, given a `upb_msg*` 
and a `const upb_msglayout*`. 
 
This approach is similar to [nanopb](https://github.com/nanopb/nanopb) which 
also compiles message definitions to a compact, internal representation without 
names.  However nanopb does not aim to be a fully-featured library, and has no 
support for text format, JSON, or descriptors.  μpb is unique in that it has a 
small core similar to nanopb (though not quite as small), but also offers a 
full-featured protobuf library for applications that want reflection, text 
format, JSON format, etc. 
 
Without descriptors, the core doesn't have access to field names, so it cannot 
parse/serialize to protobuf text format or JSON.  Instead this functionality 
lives in separate modules that depend on the module implementing descriptors. 
With the descriptor module we can parse/serialize binary descriptors and 
validate that they follow all the rules of protobuf schemas. 
 
To provide binary compatibility, we version the structs that generated messages 
use to create a `upb_msglayout*`.  The current initializers are 
`upb_msglayout_msginit_v1`, `upb_msglayout_fieldinit_v1`, etc.  Then 
`upb_msglayout*` uses these as its internal representation.  If upb changes its 
internal representation for a `upb_msglayout*`, it will also include code to 
convert the old representation to the new representation.  This will use some 
more memory/CPU at runtime to convert between the two, but apps that statically 
link μpb will never need to worry about this. 
 
TODO 
---- 
 
1. revise our generated code until it is in a state where we feel comfortable 
   committing to API/ABI stability for it.  In particular there is an open 
   question of whether non-ABI-compatible field accesses should have a 
   fastpath different from the ABI-compatible field access. 
1. Add missing features (maps, extensions, unknown fields). 
1. Flesh out C++ wrappers. 
1. *(lower-priority)*: revise all of the existing encoders/decoders and 
   handlers.  We probably will want to keep handlers, since they let us decouple 
   encoders/decoders from `upb_msg`, but we need to simplify all of that a LOT. 
   Likely we will want to make handlers only per-message instead of per-field, 
   except for variable-length fields.