Rewrite the core of the binding generator.

TL;DR: The binding generator is a mess as of right now. At first it was funny (in a "this is challenging" sense) to improve on it, but this is not sustainable. The truth is that the current architecture of the binding generator is a huge pile of hacks, so these few days I've been working on rewriting it with a few goals. 1) Have the hacks as contained and identified as possible. They're sometimes needed because how clang exposes the AST, but ideally those hacks are well identified and don't interact randomly with each others. As an example, in the current bindgen when scanning the parameters of a function that references a struct clones all the struct information, then if the struct name changes (because we mangle it), everything breaks. 2) Support extending the bindgen output without having to deal with clang. The way I'm aiming to do this is separating completely the parsing stage from the code generation one, and providing a single id for each item the binding generator provides. 3) No more random mutation of the internal representation from anywhere. That means no more Rc<RefCell<T>>, no more random circular references, no more borrow_state... nothing. 4) No more deduplication of declarations before code generation. Current bindgen has a stage, called `tag_dup_decl`[1], that takes care of deduplicating declarations. That's completely buggy, and for C++ it's a complete mess, since we YOLO modify the world. I've managed to take rid of this using the clang canonical declaration, and the definition, to avoid scanning any type/item twice. 5) Code generation should not modify any internal data structure. It can lookup things, traverse whatever it needs, but not modifying randomly. 6) Each item should have a canonical name, and a single source of mangling logic, and that should be computed from the inmutable state, at code generation. I've put a few canonical_name stuff in the code generation phase, but it's still not complete, and should change if I implement namespaces. Improvements pending until this can land: 1) Add support for missing core stuff, mainly generating functions (note that we parse the signatures for types correctly though), bitfields, generating C++ methods. 2) Add support for the necessary features that were added to work around some C++ pitfalls, like opaque types, etc... 3) Add support for the sugar that Manish added recently. 4) Optionally (and I guess this can land without it, because basically nobody uses it since it's so buggy), bring back namespace support. These are not completely trivial, but I think I can do them quite easily with the current architecture. I'm putting the current state of affairs here as a request for comments... Any thoughts? Note that there are still a few smells I want to eventually re-redesign, like the ParseError::Recurse thing, but until that happens I'm way happier with this kind of architecture. I'm keeping the old `parser.rs` and `gen.rs` in tree just for reference while I code, but they will go away. [1]: https://github.com/Yamakaky/rust-bindgen/blob/master/src/gen.rs#L448
author: Emilio Cobos Álvarez <ecoal95@gmail.com> 2016-08-20 22:32:16 -0700
committer: Emilio Cobos Álvarez <ecoal95@gmail.com> 2016-09-16 11:34:07 -0700
commit: cfdf15f5d04d4fbca3e7fcb46a1dd658ade973cd (patch)
tree: f7d2087332f4506bb836dce901bc181e5ffc7fba /src/ir/function.rs
parent: bbd6b2c9919e02642a8874e5ceb2ba3b5c76adec (diff)
1 files changed, 220 insertions, 0 deletions
diff --git a/src/ir/function.rs b/src/ir/function.rs
new file mode 100644
index 00000000..b95ac57b
--- /dev/null
+++ b/src/ir/function.rs
@@ -0,0 +1,220 @@
+use super::item::{Item, ItemId};
+use super::ty::TypeKind;
+use super::context::BindgenContext;
+use syntax::abi;
+use clang;
+use clangll::Enum_CXCallingConv;
+use parse::{ClangItemParser, ClangSubItemParser, ParseError, ParseResult};
+
+/// A function declaration , with a signature, arguments, and argument names.
+///
+/// The argument names vector must be the same length as the ones in the
+/// signature.
+#[derive(Debug)]
+pub struct Function {
+    name: String,
+    /// The mangled name, that is, the symbol.
+    mangled_name: Option<String>,
+    /// The id pointing to the current function signature.
+    signature: ItemId,
+    /// The doc comment on the function, if any.
+    comment: Option<String>,
+}
+
+impl Function {
+    pub fn new(name: String,
+               mangled_name: Option<String>,
+               sig: ItemId,
+               comment: Option<String>) -> Self {
+        Function {
+            name: name,
+            mangled_name: mangled_name,
+            signature: sig,
+            comment: comment,
+        }
+    }
+
+    pub fn name(&self) -> &str {
+        &self.name
+    }
+
+    pub fn mangled_name(&self) -> Option<&str> {
+        self.mangled_name.as_ref().map(|n| &**n)
+    }
+
+    pub fn signature(&self) -> ItemId {
+        self.signature
+    }
+}
+
+/// A function signature.
+#[derive(Debug)]
+pub struct FunctionSig {
+    /// The return type of the function.
+    return_type: ItemId,
+    /// The type of the arguments, optionally with the name of the argument when
+    /// declared.
+    argument_types: Vec<(Option<String>, ItemId)>,
+    /// Whether this function is variadic.
+    is_variadic: bool,
+    /// The abi of this function.
+    abi: abi::Abi,
+}
+
+fn get_abi(cc: Enum_CXCallingConv) -> abi::Abi {
+    use clangll::*;
+    match cc {
+        CXCallingConv_Default => abi::Abi::C,
+        CXCallingConv_C => abi::Abi::C,
+        CXCallingConv_X86StdCall => abi::Abi::Stdcall,
+        CXCallingConv_X86FastCall => abi::Abi::Fastcall,
+        CXCallingConv_AAPCS => abi::Abi::Aapcs,
+        CXCallingConv_X86_64Win64 => abi::Abi::Win64,
+        other => panic!("unsupported calling convention: {}", other),
+    }
+}
+
+pub fn cursor_mangling(cursor: &clang::Cursor) -> Option<String> {
+    let mut mangling = cursor.mangling();
+
+    // Try to undo backend linkage munging (prepended _, generally)
+    if cfg!(target_os = "macos") {
+        mangling.remove(0);
+    }
+
+    if mangling.is_empty() { None } else { Some(mangling) }
+}
+
+impl FunctionSig {
+    pub fn new(return_type: ItemId,
+               arguments: Vec<(Option<String>, ItemId)>,
+               is_variadic: bool,
+               abi: abi::Abi) -> Self {
+        FunctionSig {
+            return_type: return_type,
+            argument_types: arguments,
+            is_variadic: is_variadic,
+            abi: abi,
+        }
+    }
+
+    pub fn from_ty(ty: &clang::Type,
+                   cursor: &clang::Cursor,
+                   ctx: &mut BindgenContext) -> Result<Self, ParseError> {
+        use clangll::*;
+        debug!("FunctionSig::from_ty {:?} {:?}", ty, cursor);
+
+        // Don't parse operatorxx functions in C++
+        let spelling = cursor.spelling();
+        if spelling.starts_with("operator") {
+            return Err(ParseError::Continue);
+        }
+
+        let cursor = if cursor.is_valid() {
+            *cursor
+        } else {
+            ty.declaration()
+        };
+        let mut args: Vec<_> = match cursor.kind() {
+            CXCursor_FunctionDecl |
+            CXCursor_CXXMethod => {
+                // For CXCursor_FunctionDecl, cursor.args() is the reliable way
+                // to get parameter names and types.
+                cursor.args().iter().map(|arg| {
+                    let arg_ty = arg.cur_type();
+                    let name = arg.spelling();
+                    let name = if name.is_empty() { None } else { Some(name) };
+                    let ty = Item::from_ty(&arg_ty, Some(*arg), None, ctx)
+                                    .expect("Argument?");
+                    (name, ty)
+                }).collect()
+            }
+            _ => {
+                // For non-CXCursor_FunctionDecl, visiting the cursor's children
+                // is the only reliable way to get parameter names.
+                let mut args = vec![];
+                cursor.visit(|c, _| {
+                    if c.kind() == CXCursor_ParmDecl {
+                        let ty = Item::from_ty(&c.cur_type(), Some(*c), None, ctx)
+                                    .expect("ParmDecl?");
+                        let name = c.spelling();
+                        let name = if name.is_empty() { None } else { Some(name) };
+                        args.push((name, ty));
+                    }
+                    CXChildVisit_Continue
+                });
+                args
+            }
+        };
+
+        if cursor.kind() == CXCursor_CXXMethod {
+            let is_const = cursor.method_is_const();
+            let is_virtual = cursor.method_is_virtual();
+            let is_static = cursor.method_is_static();
+            if !is_static && !is_virtual {
+                let class = Item::parse(cursor.semantic_parent(), None, ctx)
+                                .expect("Expected to parse the class");
+                let ptr = Item::builtin_type(TypeKind::Pointer(class), is_const, ctx);
+                args.insert(0, (Some("this".into()), ptr));
+            } else if is_virtual {
+                let void = Item::builtin_type(TypeKind::Void, false, ctx);
+                let ptr = Item::builtin_type(TypeKind::Pointer(void), false, ctx);
+                args.insert(0, (Some("this".into()), ptr));
+            }
+        }
+
+        let ret = try!(Item::from_ty(&ty.ret_type(), None, None, ctx));
+        let abi = get_abi(ty.call_conv());
+
+        Ok(Self::new(ret, args, ty.is_variadic(), abi))
+    }
+
+    pub fn return_type(&self) -> ItemId {
+        self.return_type
+    }
+
+    pub fn argument_types(&self) -> &[(Option<String>, ItemId)] {
+        &self.argument_types
+    }
+
+    pub fn abi(&self) -> abi::Abi {
+        self.abi
+    }
+
+    pub fn is_variadic(&self) -> bool {
+        // Clang reports some functions as variadic when they *might* be
+        // variadic. We do the argument check because rust doesn't codegen well
+        // variadic functions without an initial argument.
+        self.is_variadic && !self.argument_types.is_empty()
+    }
+}
+
+impl ClangSubItemParser for Function {
+    fn parse(cursor: clang::Cursor,
+             context: &mut BindgenContext) -> Result<ParseResult<Self>, ParseError> {
+        use clangll::*;
+        match cursor.kind() {
+            CXCursor_FunctionDecl |
+            CXCursor_CXXMethod => {},
+            _ => return Err(ParseError::Continue),
+        };
+
+        debug!("Function::parse({:?}, {:?})", cursor, cursor.cur_type());
+
+        // Grab the signature using Item::from_ty.
+        let sig = try!(Item::from_ty(&cursor.cur_type(), Some(cursor), None, context));
+
+        let name = cursor.spelling();
+        assert!(!name.is_empty(), "Empty function name?");
+
+        let mut mangled_name = cursor_mangling(&cursor);
+        if mangled_name.as_ref() == Some(&name) {
+            mangled_name = None;
+        }
+
+        let comment = cursor.raw_comment();
+
+        let function = Self::new(name, mangled_name, sig, comment);
+        Ok(ParseResult::New(function, Some(cursor)))
+    }
+}
author	Emilio Cobos Álvarez <ecoal95@gmail.com>	2016-08-20 22:32:16 -0700
committer	Emilio Cobos Álvarez <ecoal95@gmail.com>	2016-09-16 11:34:07 -0700
commit	cfdf15f5d04d4fbca3e7fcb46a1dd658ade973cd (patch)
tree	f7d2087332f4506bb836dce901bc181e5ffc7fba /src/ir/function.rs
parent	bbd6b2c9919e02642a8874e5ceb2ba3b5c76adec (diff)