Does making a derived C++ class “final” change the ABI?












13















I'm curious if marking an existing derived C++ class as final to allow for de-virtualisation optimisations will change ABI when using C++11. My expectation is that it should have no effect as I see this as primarily a hint to the compiler about how it can optimise virtual functions and as such I can't see any way it would change the size of the struct or the vtable, but perhaps I'm missing something?



I'm aware this changes API here so that code that further derives from this derived class will no longer work, but I'm only concerned about ABI in this particular case.










share|improve this question




















  • 2





    Hardly normative, but GCC seems to make full use of that hint. I don't think it affects the ABI. Calls via base class pointers or references must still work.

    – StoryTeller
    Nov 20 '18 at 7:21








  • 4





    The ABI is not part of the C++ standard, so this will be implementation-defined (or implementation un-defined). Still a valid question to ask what compilers do in practice, and I would imagine with so many other things being able to affect it (e.g. changing something from public to private), this would likely affect it too.

    – HostileFork
    Nov 20 '18 at 7:23











  • Provided you have struct A { virtual void f(); }; struct B : A { void f() final; }; struct C : B {}; and a B& b i suppose the compiler could devirtualize b.f().

    – Asu
    Nov 20 '18 at 7:57











  • For the itanium C++ abi, there is a set of virtual table for each possible most derived object: vtable for A, vtable for A in B, etc... So the compiler does always know what is the final overrider for all vtable it generates. So the final keyword does not change this.

    – Oliv
    Nov 20 '18 at 10:38






  • 1





    @ShafikYaghmour In my mind it is closer to a speculation. I do not have a good grasp of the subject.

    – Oliv
    Nov 23 '18 at 9:48
















13















I'm curious if marking an existing derived C++ class as final to allow for de-virtualisation optimisations will change ABI when using C++11. My expectation is that it should have no effect as I see this as primarily a hint to the compiler about how it can optimise virtual functions and as such I can't see any way it would change the size of the struct or the vtable, but perhaps I'm missing something?



I'm aware this changes API here so that code that further derives from this derived class will no longer work, but I'm only concerned about ABI in this particular case.










share|improve this question




















  • 2





    Hardly normative, but GCC seems to make full use of that hint. I don't think it affects the ABI. Calls via base class pointers or references must still work.

    – StoryTeller
    Nov 20 '18 at 7:21








  • 4





    The ABI is not part of the C++ standard, so this will be implementation-defined (or implementation un-defined). Still a valid question to ask what compilers do in practice, and I would imagine with so many other things being able to affect it (e.g. changing something from public to private), this would likely affect it too.

    – HostileFork
    Nov 20 '18 at 7:23











  • Provided you have struct A { virtual void f(); }; struct B : A { void f() final; }; struct C : B {}; and a B& b i suppose the compiler could devirtualize b.f().

    – Asu
    Nov 20 '18 at 7:57











  • For the itanium C++ abi, there is a set of virtual table for each possible most derived object: vtable for A, vtable for A in B, etc... So the compiler does always know what is the final overrider for all vtable it generates. So the final keyword does not change this.

    – Oliv
    Nov 20 '18 at 10:38






  • 1





    @ShafikYaghmour In my mind it is closer to a speculation. I do not have a good grasp of the subject.

    – Oliv
    Nov 23 '18 at 9:48














13












13








13


2






I'm curious if marking an existing derived C++ class as final to allow for de-virtualisation optimisations will change ABI when using C++11. My expectation is that it should have no effect as I see this as primarily a hint to the compiler about how it can optimise virtual functions and as such I can't see any way it would change the size of the struct or the vtable, but perhaps I'm missing something?



I'm aware this changes API here so that code that further derives from this derived class will no longer work, but I'm only concerned about ABI in this particular case.










share|improve this question
















I'm curious if marking an existing derived C++ class as final to allow for de-virtualisation optimisations will change ABI when using C++11. My expectation is that it should have no effect as I see this as primarily a hint to the compiler about how it can optimise virtual functions and as such I can't see any way it would change the size of the struct or the vtable, but perhaps I'm missing something?



I'm aware this changes API here so that code that further derives from this derived class will no longer work, but I'm only concerned about ABI in this particular case.







c++ c++11 virtual-functions abi vtable






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 7 at 15:16









curiousguy

4,51522943




4,51522943










asked Nov 20 '18 at 7:10









DanDan

10.8k185078




10.8k185078








  • 2





    Hardly normative, but GCC seems to make full use of that hint. I don't think it affects the ABI. Calls via base class pointers or references must still work.

    – StoryTeller
    Nov 20 '18 at 7:21








  • 4





    The ABI is not part of the C++ standard, so this will be implementation-defined (or implementation un-defined). Still a valid question to ask what compilers do in practice, and I would imagine with so many other things being able to affect it (e.g. changing something from public to private), this would likely affect it too.

    – HostileFork
    Nov 20 '18 at 7:23











  • Provided you have struct A { virtual void f(); }; struct B : A { void f() final; }; struct C : B {}; and a B& b i suppose the compiler could devirtualize b.f().

    – Asu
    Nov 20 '18 at 7:57











  • For the itanium C++ abi, there is a set of virtual table for each possible most derived object: vtable for A, vtable for A in B, etc... So the compiler does always know what is the final overrider for all vtable it generates. So the final keyword does not change this.

    – Oliv
    Nov 20 '18 at 10:38






  • 1





    @ShafikYaghmour In my mind it is closer to a speculation. I do not have a good grasp of the subject.

    – Oliv
    Nov 23 '18 at 9:48














  • 2





    Hardly normative, but GCC seems to make full use of that hint. I don't think it affects the ABI. Calls via base class pointers or references must still work.

    – StoryTeller
    Nov 20 '18 at 7:21








  • 4





    The ABI is not part of the C++ standard, so this will be implementation-defined (or implementation un-defined). Still a valid question to ask what compilers do in practice, and I would imagine with so many other things being able to affect it (e.g. changing something from public to private), this would likely affect it too.

    – HostileFork
    Nov 20 '18 at 7:23











  • Provided you have struct A { virtual void f(); }; struct B : A { void f() final; }; struct C : B {}; and a B& b i suppose the compiler could devirtualize b.f().

    – Asu
    Nov 20 '18 at 7:57











  • For the itanium C++ abi, there is a set of virtual table for each possible most derived object: vtable for A, vtable for A in B, etc... So the compiler does always know what is the final overrider for all vtable it generates. So the final keyword does not change this.

    – Oliv
    Nov 20 '18 at 10:38






  • 1





    @ShafikYaghmour In my mind it is closer to a speculation. I do not have a good grasp of the subject.

    – Oliv
    Nov 23 '18 at 9:48








2




2





Hardly normative, but GCC seems to make full use of that hint. I don't think it affects the ABI. Calls via base class pointers or references must still work.

– StoryTeller
Nov 20 '18 at 7:21







Hardly normative, but GCC seems to make full use of that hint. I don't think it affects the ABI. Calls via base class pointers or references must still work.

– StoryTeller
Nov 20 '18 at 7:21






4




4





The ABI is not part of the C++ standard, so this will be implementation-defined (or implementation un-defined). Still a valid question to ask what compilers do in practice, and I would imagine with so many other things being able to affect it (e.g. changing something from public to private), this would likely affect it too.

– HostileFork
Nov 20 '18 at 7:23





The ABI is not part of the C++ standard, so this will be implementation-defined (or implementation un-defined). Still a valid question to ask what compilers do in practice, and I would imagine with so many other things being able to affect it (e.g. changing something from public to private), this would likely affect it too.

– HostileFork
Nov 20 '18 at 7:23













Provided you have struct A { virtual void f(); }; struct B : A { void f() final; }; struct C : B {}; and a B& b i suppose the compiler could devirtualize b.f().

– Asu
Nov 20 '18 at 7:57





Provided you have struct A { virtual void f(); }; struct B : A { void f() final; }; struct C : B {}; and a B& b i suppose the compiler could devirtualize b.f().

– Asu
Nov 20 '18 at 7:57













For the itanium C++ abi, there is a set of virtual table for each possible most derived object: vtable for A, vtable for A in B, etc... So the compiler does always know what is the final overrider for all vtable it generates. So the final keyword does not change this.

– Oliv
Nov 20 '18 at 10:38





For the itanium C++ abi, there is a set of virtual table for each possible most derived object: vtable for A, vtable for A in B, etc... So the compiler does always know what is the final overrider for all vtable it generates. So the final keyword does not change this.

– Oliv
Nov 20 '18 at 10:38




1




1





@ShafikYaghmour In my mind it is closer to a speculation. I do not have a good grasp of the subject.

– Oliv
Nov 23 '18 at 9:48





@ShafikYaghmour In my mind it is closer to a speculation. I do not have a good grasp of the subject.

– Oliv
Nov 23 '18 at 9:48












3 Answers
3






active

oldest

votes


















2














Final on a function declaration X::f() implies that the declaration cannot be overridden, so all calls that name that declaration can be bound early (not those calls that name a declaration in a base class): if a virtual function is final in the ABI, the produced vtables can be incompatible with the one produced almost same class without final: calls to virtual functions that name declarations marked final can be assumed to be direct: trying to use a vtable entry (that should exist in the final-less ABI) is illegal.



The compiler could use the final guarantee to cut on the size of vtables (that can sometime grow a lot) by not adding a new entry that would be usually be added and that must be according to the ABI for non final declaration.



Entries are added for a declaration overriding a function not a (inherently, always) primary base or for a non trivially covariant return type (a return type covariant on a non primary base).



Inherently primary base class: the simplest case of polymorphic inheritance



The simple case of polymorphic inheritance, a derived class inheriting non virtually from a single polymorphic base class, is the typical case of an always primary base: the polymorphic base subobject is at the beginning, the address of derived object is the same as the address of the base subobject, virtual calls can be made directly with a pointer to either, everything is simple.



These properties are true whether the derived class is a complete object (one that isn't a subobject), a most derived object, or a base class. (They are class invariants guaranteed at the ABI level for pointers of unknown origin.)



Considering the case where the return type isn't covariant; or:



Trivial covariance



An example: the case where it's covariant with the same type as *this; as in:



struct B { virtual B *f(); };
struct D : B { virtual D *f(); }; // trivial covariance


Here B is inherently, invariably the primary in D: in all D (sub)objects ever created, a B resides at the same address: the D* to B* conversion is trivial so the covariance is also trivial: it's a static typing issue.



Whenever this is the case (trivial up-cast), covariance disappears at the code generation level.



Conclusion



In these cases the type of the declaration of the overriding function is trivially different from the type of the base:




  • all parameters are almost the same (with only a trivial difference on the type of this)

  • the return type is almost the same (with only a possible difference on the type of a returned pointer(*) type)


(*) since returning a reference is exactly the same as returning a pointer at the ABI level, references aren't discussed specifically



So no vtable entry is added for the derived declaration.



(So making the class final wouldn't be vtable simplification.)



Never primary base



Obviously a class can only have one subobject, containing a specific scalar data member (like the vptr (*)), at offset 0. Other base classes with scalar data members will be at a non trivial offset, requiring non trivial derived to base conversions of pointers. So multiple interesting(**) inheritance will create non primary bases.



(*) The vptr isn't a normal data member at the user level; but in the generated code, it's pretty much a normal scalar data member known to the compiler.
(**) The layout of non polymorphic bases isn't interesting here: for the purpose of vtable ABI, a non polymorphic base is treated like a member subobject, as it doesn't affect the vtables in any way.



The conceptually simplest interesting example of a non primary, and non trivial pointer conversion is:



struct B1 { virtual void f(); };
struct B2 { virtual void f(); };
struct D : B1, B2 { };


Each base has its own vptr scalar member, and these vptr have different purposes:





  • B1::vptr points to a B1_vtable structure


  • B2::vptr points to a B2_vtable structure


and these have identical layout (because the class definitions are superposable, the ABI must generate superposable layouts); and they are strictly incompatible because





  1. The vtables have distinct entries:





    • B1_vtable.f_ptr points to the final overrider for B1::f()


    • B2_vtable.f_ptr points to the final overrider for B2::f()



  2. B1_vtable.f_ptr must be at the same offset as B2_vtable.f_ptr (from their respective vptr data members in B1 and B2)


  3. The final overriders of B1::f() and B2::f() aren't inherently (always, invariably) equivalent(*): they can have distinct final overriders that do different things.(***)


(*) Two callable runtime functions(**) are equivalent if they have same observable behavior at the ABI level. (Equivalent callable functions may not have the same declaration or C++ types.)



(**) A callable runtime function is any entry point: any address that can be called/jumped at; it can be a normal function code, a thunk/trampoline, a particular entry in a multiple entry function. Callable runtime functions often have no possible C++ declarations, like "final overrider called with a base class pointer".



(***) That they sometimes have the same final overrider in a further derived class:



struct DD : D { void f(); }


isn't useful for the purpose of defining the ABI of D.



So we see that D provably needs a non primary polymorphic base; by convention it will be D2; the first nominated polymorphic base (B1) gets to be primary.



So B2 must be at non trivial offset, and D to B2 conversion is non trivial: it requires generated code.



So the parameters of a member function of D cannot be equivalent with the parameters of a member function of B2, as the implicit this isn't trivially convertible; so:





  • D must have two different vtables: a vtable corresponding with B1_vtable and one with B2_vtable (they are in practice put together in one big vtable for D but conceptually they are two distinct structures).

  • the vtable entry of a virtual member of B2::g that is overridden in D needs two entries, one in the D_B2_vtable (which is just a B2_vtable layout with different values) and one in the D_B1_vtable which is an enhanced B1_vtable: a B1_vtable plus entries for new runtime features of D.


Because the D_B1_vtable is built from a B1_vtable, a pointer to D_B1_vtable is trivially a pointer to a B1_vtable, and the vptr value is the same.



Note that in theory is would be possible to omit the entry for D::g() in D_B1_vtable if the burden of making all virtual calls of D::g() via the B2 base, which as far as no non trivial covariance is used(#), is also a possibility.



(#) or if non trivial covariance occurs, "virtual covariance" (covariance in a derived to base relation involving virtual inheritance) isn't used



Not inherently primary base



Regular (non virtual) inheritance is simple like membership:




  • a non virtual base subobject is a direct base of exactly one object (which implies that there always exactly one final overrider of any virtual function when virtual inheritance isn't used);

  • the placement of a non virtual base is fixed;

  • base subobject that don't have virtual base subobjects, just like data member, are constructed exactly like complete objects (they have exactly one runtime constructor function code for every defined C++ constructor).


A more subtle case of inheritance is virtual inheritance: a virtual base subobject can be the direct base of many base class subobjects. That implies that the layout of virtual bases is only determined at the most derived class level: the offset of a virtual base in a most derived object is well known and a compile time constant; in a arbitrary derived class object (that may or may not be a most derived object) it is a value computed at runtime.



That offset can never be known because C++ supports both unifying and duplicating inheritance:




  • virtual inheritance is unifying: all virtual bases of a given type in a most derived object are one and the same subobject;


  • non virtual inheritance is duplicating: all indirect non virtual bases are semantically distinct, as their virtual members don't need to have common final overriders (contrast with Java where this is impossible (AFAIK)):



    struct B { virtual void f(); };
    struct D1 : B { virtual void f(); }; // final overrider
    struct D2 : B { virtual void f(); }; // final overrider
    struct DD : D1, D2 { };




Here DD has two distinct final overriders of B::f():





  • DD::D1::f() is final overrider for DD::D1::B::f()


  • DD::D2::f() is final overrider for DD::D2::B::f()


in two distinct vtable entries.



Duplicating inheritance, where you indirectly derive multiple times from a given class, implies multiple vptrs, vtables and possibly distinct vtable ultimate code (the ultimate aim of using a vtable entry: the high level semantic of calling a virtual function - not the entry point).



Not only C++ supports both, but the fact combinations are allowed: duplicating inheritance of a class that uses unifying inheritance:



struct VB { virtual void f(); };
struct D : virtual VB { virtual void g(); int dummy; };
struct DD1 : D { void g(); };
struct DD2 : D { void g(); };
struct DDD : DD1, DD2 { };


There is only one DDD::VB but there are two observably distinct D subobjects in DDD with different final overriders for D::g(). Whether or not a C++-like language (that supports virtual and non virtual inheritance semantic) guarantees that distinct subobjects have different addresses, the address of DDD::DD1::D cannot be at the same as the address of DDD::DD2::D.



So the offset of a VB in a D cannot be fixed (in any language that supports unification and duplication of bases).



In that particular example a real VB object (the object at runtime) has no concrete data member except the vptr, and the vptr is a special scalar member as it is a type "invariant" (not const) shared member: it is fixed on the constructor (invariant after complete construction) and its semantic is shared between bases and derived classes. Because VB has no scalar member that isn't type invariant, that in a DDD the VB subobject can be an overlay over DDD::DD1::D, as long as the vtable of D is a match for the vtable of VB.



This however cannot be the case for virtual bases that have non invariant scalar members, that is regular data members with an identity, that is members occupying a distinct range of bytes: these "real" data members cannot be overlayed on anything else. So a virtual base subobject with data members (members with with an address guaranteed to be distinct by C++ or any other the distinct C++-like language you are implementing) must be put at a distinct location: virtual bases with data members normally(##) have inherently non trivial offsets.



(##) with potentially a very narrow special case with a derived class with no data member with a virtual base with some data members



So we see that "almost empty" classes (classes with no data member but with a vptr) are special cases when used as virtual base classes: these virtual base are candidate for overlaying on derived classes, they are potential primaries but not inherent primaries:




  • the offset at which they reside will only be determined in the most derived class;

  • the offset might or might not be zero;

  • a nul offset implies overlaying of the base, so the vtable of each directly derived class must be a match for the vtable of the base;

  • a non nul offset implies non trivial conversions, so the entries in the vtables must treat conversion of the pointers to the virtual base as needing a runtime conversion (except when overlaid obviously as it wouldn't be necessary not possible).


This means that when overriding a virtual function in a virtual base, an adjustment is always assumed to be potentially needed, but in some cases no adjustment will be needed.



A morally virtual base is a base class relationship that involves a virtual inheritance (possibly plus non virtual inheritance). Performing a derived to base conversion, specifically converting a pointer d to derived D, to base B, a conversion to...





  • ...a non-morally virtual base is inherently reversible in every case:




    • there is a one to one relation between the identity of a subobject B of a D and a D (which might be a subobject itself);

    • the reverse operation can be performed with a static_cast<D*>: static_cast<D*>((B*)d) is d;




  • (in any C++ like language with complete support for unifying and duplicating inheritance) ...a morally virtual base is inherently non reversible in the general case (although it's reversible in common case with simple hierarchies). Note that:





    • static_cast<D*>((B*)d) is ill formed;


    • dynamic_cast<D*>((B*)d) will work for the simple cases.




So let's called virtual covariance the case where the covariance of the return type is based on morally virtual base. When overriding with virtual covariance, the calling convention cannot assume the base will be at a known offset. So a new vtable entry is inherently needed for virtual covariance, whether or not the overridden declaration is in an inherent primary:



struct VB { virtual void f(); }; // almost empty
struct D : virtual VB { }; // VB is potential primary

struct Ba { virtual VB * g(); };
struct Da : Ba { // non virtual base, so Ba is inherent primary
D * g(); // virtually covariant: D->VB is morally virtual
};


Here VB may be at offset zero in D and no adjustment may be needed (for example for a complete object of type D), but it isn't always the case in a D subobject: when dealing with pointers to D, one cannot know whether that is the case.



When Da::g() overrides Ba::g() with virtual covariance, the general case must be assumed so a new vtable entry is strictly needed for Da::g() as there is no possible down pointer conversion from VB to D that reverses the D to VB pointer conversion in the general case.



Ba is an inherent primary in Da so the semantics of Ba::vptr are shared/enhanced:




  • there are additional guarantees/invariants on that scalar member, and the vtable is extended;

  • no new vptr is needed for Da.


So the Da_vtable (inherently compatible with Ba_vtable) needs two distinct entries for virtual calls to g():




  • in the Ba_vtable part of the vtable: Ba::g() vtable entry: calls final overrider of Ba::g() with an implicit this parameter of Ba* and returns a VB* value.

  • in the new members part of the vtable: Da::g() vtable entry: calls final overrider of Da::g() (which by is inherently the same as final overrider of Ba::g() in C++) with an implicit this parameter of Da* and returns a D* value.


Note that there is not really any ABI freedom here: the fundamentals of vptr/vtable design and their intrinsic properties imply the presence of these multiple entries for what is a unique virtual function at the high language level.



Note that making the virtual function body inline and a visible by the ABI (so that the ABI by classes with different inline function definitions could be made incompatible, allowing more information to inform memory layout) wouldn't possibly help, as inline code would only define what a call to a non overridden virtual function does: one cannot based the ABI decisions on choices that can be overridden in derived classes.



[Example of a virtual covariance that ends up being only trivially covariant as in a complete D the offset for VB is trivial and no adjustment code would have been necessary in that case:



struct Da : Ba { // non virtual base, so inherent primary
D * g() { return new D; } // VB really is primary in complete D
// so conversion to VB* is trivial here
};


Note that in that code an incorrect code generation for a virtual call by a buggy compiler that would use the Ba_vtable entry to call g() would actually work because covariance ends up being trivial, as VB is primary in complete D.



The calling convention is for the general case and such code generation would fail with code that returns an object of a different class.



--end example]



But if Da::g() is final in the ABI, only virtual calls can be made via the VB * g(); declaration: covariance is made purely static, the derived to base conversion is be done at compile time as the last step of the virtual thunk, as if virtual covariance was never used.



Possible extension of final



There are two types of virtual-ness in C++: member functions (matched by function signature) and inheritance (match by class name). If final stops overriding a virtual function, could it be applied to base classes in a C++-like language?



First we need to define what is overriding a virtual base inheritance:



An "almost direct" subobject relation means that a indirect subobject is controlled almost as a direct subobject:




  • an almost direct subobject can be initialized like a direct subobject;

  • access control is never a really obstacle to access (inaccessible private almost direct subobjects can be made accessible at discretion).


Virtual inheritance provides almost direct access:




  • constructor for each virtual bases must be called by ctor-init-list of the constructor of the most derived class;

  • when a virtual base class is inaccessible because declared private in a base class, or publicly inherited in a private base class of a base class, the derived class has the discretion to declare the virtual base as a virtual base again, making it accessible.


A way to formalize virtual base overriding is to make an imaginary inheritance declaration in each derived class that overrides base class virtual inheritance declarations:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D
// , virtual VB // imaginary overrider of D inheritance of VB
{
// DD () : VB() { } // implicit definition
};


Now C++ variants that support both forms of inheritance don't have to have C++ semantic of almost direct access in all derived classes:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D, virtual final VB {
// DD () : VB() { } // implicit definition
};


Here the virtual-ness of the VB base is frozen and cannot be used in further derived classes; the virtual-ness is made invisible and inaccessible to derived classes and the location of VB is fixed.



struct DDD : DD {
DD () :
VB() // error: not an almost direct subobject
{ }
};
struct DD2 : D, virtual final VB {
// DD2 () : VB() { } // implicit definition
};
struct Diamond : DD, DD2 // error: no unique final overrider
{ // for ": virtual VB"
};


The virtual-ness freeze makes it illegal to unify Diamond::DD::VB and Diamond::DD2::VB but virtual-ness of VB requires unification which makes Diamond a contradictory, illegal class definition: no class can ever derive from both DD and DD2 [analog/example: just like no useful class can directly derive from A1 and A2:



struct A1 {
virtual int f() = 0;
};
struct A2 {
virtual unsigned f() = 0;
};
struct UselessAbstract : A1, A2 {
// no possible declaration of f() here
// none of the inherited virtual functions can be overridden
// in UselessAbstract or any derived class
};


Here UselessAbstract is abstract and no derived class are too, making that ABC (abstract base class) extremely silly, as any pointer to UselessAbstract is provably a null pointer.



-- end analog/example]



That would provide a way to freeze virtual inheritance, to provide meaningful private inheritance of classes with virtual base (without it derived classes can usurp the relationship between a class and its private base class).



Such use of final would of course freeze the location of a virtual base in a derived class and its further derived classes, avoiding additional vtable entries that are only needed because the location of virtual base isn't fixed.






share|improve this answer



















  • 1





    Could you make a tl;dr for this?

    – Mark Ransom
    Nov 27 '18 at 22:49











  • @MarkRansom In theory final allows vtable to be shorter in a few special cases, so ABI incompatibility is conceivable.

    – curiousguy
    Nov 27 '18 at 22:51











  • I didn't even consider the cases of obviously useless virtualness, like declaring a new virtual final function or new virtual final inheritance.

    – curiousguy
    Nov 28 '18 at 2:26



















0














I believe that adding the final keyword should not be ABI breaking, however removing it from an existing class might render some optimizations invalid. For example, consider this:



// in car.h
struct Vehicle { virtual void honk() { } };
struct Car final : Vehicle { void honk() override { } };

// in car.cpp

// Here, the compiler can assume that no derived class of Car can be passed,
// and so `honk()` can be devirtualized. However, if Car is not final
// anymore, this optimization is invalid.
void foo(Car* car) { car->honk(); }


If foo is compiled separately and e.g. shipped in a shared library, removing final (and hence making it possible for users to derive from Car) could render the optimization invalid.



I'm not 100% sure about this though, some of it is speculation.






share|improve this answer



















  • 1





    Wouldn't removing final violate the ODR? At that point you'd get undefined behavior.

    – Mark Ransom
    Nov 27 '18 at 22:47













  • Why would it be a violation of the ODR?

    – Louis Dionne
    Nov 28 '18 at 19:39






  • 1





    You can't have two conflicting definitions of the same object. Your example is exactly why it isn't allowed. You're talking about compiling twice, once with an old definition and once with a new definition.

    – Mark Ransom
    Nov 28 '18 at 19:54











  • Well.. that's almost always what we do, right? We compile a .so against one set of headers and ship it to users. They build applications using those headers and they link to the .so. Then we change the headers (in ways we believe not to be ABI-breaking), we recompile the .so, and we ship it again, and we expect their application to still work. With your interpretation, any change to the definition of the class is an ODR-violation. FWIW, the standard library adds things like private member functions to classes all the time. With your interpretation, the answer to this question is "yes".

    – Louis Dionne
    Dec 3 '18 at 14:40













  • The standard library expects you to recompile everything when you get a new set of headers. Especially if new private members were added you'd get a different object size, and it's critical for the compiler to know that.

    – Mark Ransom
    Dec 3 '18 at 14:47



















0














If you do not introduce new virtual methods in your final class (only override methods of parent class) you should be ok (the virtual table is going to be the same as the parent object, because it must be able to be called with a pointer to parent), if you introduce virtual methods the compiler can indeed ignore the virtual specifier and only generate standard methods, e.g:



class A {
virtual void f();
};

class B final : public A {
virtual void f(); // <- should be ok
virtual void g(); // <- not ok
};


The idea is that every time in C++ that you can invoke the method g() you have a pointer/reference whose static and dynamic type is B: static because the method does not exist except for B and his children, dynamic because final ensures that B has no children. For this reason you never need to do virtual dispatch to call the right g() implementation (because there can be only one), and the compiler might (and should) not add it to the virtual table for B - while it is forced to do so if the method could be overridden. This is basically the whole point for which the final keyword exist as far as I understand






share|improve this answer


























  • please note that even the case where the function exists already does not give you any guarantee that the ABI won't change, although it is reasonable it won't, because how each compiler implements virtual is not specified.

    – pqnet
    Nov 28 '18 at 0:46











  • @curiousguy yeh I do like the father/children methaphor, and wasn't going for a formal and complete answer because there is already one here and it's pretty good

    – pqnet
    Nov 28 '18 at 1:04











  • @curiousguy yes it is. Basically I expect the virtual table for B that express the A interface to be fine, not so much for the B interface if it is different from A

    – pqnet
    Nov 28 '18 at 1:22






  • 1





    OK then I'm deleting my previous comments to reduce the clutter.

    – curiousguy
    Nov 28 '18 at 1:26






  • 1





    @curiousguy That is the point. He may rely on them being present in the virtual method table to invoke them in some other method (e.g., COM) because he wrote virtual but the compiler might outsmart him and optimize that away (because he knows it's final)

    – pqnet
    Jan 7 at 14:47











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387917%2fdoes-making-a-derived-c-class-final-change-the-abi%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Final on a function declaration X::f() implies that the declaration cannot be overridden, so all calls that name that declaration can be bound early (not those calls that name a declaration in a base class): if a virtual function is final in the ABI, the produced vtables can be incompatible with the one produced almost same class without final: calls to virtual functions that name declarations marked final can be assumed to be direct: trying to use a vtable entry (that should exist in the final-less ABI) is illegal.



The compiler could use the final guarantee to cut on the size of vtables (that can sometime grow a lot) by not adding a new entry that would be usually be added and that must be according to the ABI for non final declaration.



Entries are added for a declaration overriding a function not a (inherently, always) primary base or for a non trivially covariant return type (a return type covariant on a non primary base).



Inherently primary base class: the simplest case of polymorphic inheritance



The simple case of polymorphic inheritance, a derived class inheriting non virtually from a single polymorphic base class, is the typical case of an always primary base: the polymorphic base subobject is at the beginning, the address of derived object is the same as the address of the base subobject, virtual calls can be made directly with a pointer to either, everything is simple.



These properties are true whether the derived class is a complete object (one that isn't a subobject), a most derived object, or a base class. (They are class invariants guaranteed at the ABI level for pointers of unknown origin.)



Considering the case where the return type isn't covariant; or:



Trivial covariance



An example: the case where it's covariant with the same type as *this; as in:



struct B { virtual B *f(); };
struct D : B { virtual D *f(); }; // trivial covariance


Here B is inherently, invariably the primary in D: in all D (sub)objects ever created, a B resides at the same address: the D* to B* conversion is trivial so the covariance is also trivial: it's a static typing issue.



Whenever this is the case (trivial up-cast), covariance disappears at the code generation level.



Conclusion



In these cases the type of the declaration of the overriding function is trivially different from the type of the base:




  • all parameters are almost the same (with only a trivial difference on the type of this)

  • the return type is almost the same (with only a possible difference on the type of a returned pointer(*) type)


(*) since returning a reference is exactly the same as returning a pointer at the ABI level, references aren't discussed specifically



So no vtable entry is added for the derived declaration.



(So making the class final wouldn't be vtable simplification.)



Never primary base



Obviously a class can only have one subobject, containing a specific scalar data member (like the vptr (*)), at offset 0. Other base classes with scalar data members will be at a non trivial offset, requiring non trivial derived to base conversions of pointers. So multiple interesting(**) inheritance will create non primary bases.



(*) The vptr isn't a normal data member at the user level; but in the generated code, it's pretty much a normal scalar data member known to the compiler.
(**) The layout of non polymorphic bases isn't interesting here: for the purpose of vtable ABI, a non polymorphic base is treated like a member subobject, as it doesn't affect the vtables in any way.



The conceptually simplest interesting example of a non primary, and non trivial pointer conversion is:



struct B1 { virtual void f(); };
struct B2 { virtual void f(); };
struct D : B1, B2 { };


Each base has its own vptr scalar member, and these vptr have different purposes:





  • B1::vptr points to a B1_vtable structure


  • B2::vptr points to a B2_vtable structure


and these have identical layout (because the class definitions are superposable, the ABI must generate superposable layouts); and they are strictly incompatible because





  1. The vtables have distinct entries:





    • B1_vtable.f_ptr points to the final overrider for B1::f()


    • B2_vtable.f_ptr points to the final overrider for B2::f()



  2. B1_vtable.f_ptr must be at the same offset as B2_vtable.f_ptr (from their respective vptr data members in B1 and B2)


  3. The final overriders of B1::f() and B2::f() aren't inherently (always, invariably) equivalent(*): they can have distinct final overriders that do different things.(***)


(*) Two callable runtime functions(**) are equivalent if they have same observable behavior at the ABI level. (Equivalent callable functions may not have the same declaration or C++ types.)



(**) A callable runtime function is any entry point: any address that can be called/jumped at; it can be a normal function code, a thunk/trampoline, a particular entry in a multiple entry function. Callable runtime functions often have no possible C++ declarations, like "final overrider called with a base class pointer".



(***) That they sometimes have the same final overrider in a further derived class:



struct DD : D { void f(); }


isn't useful for the purpose of defining the ABI of D.



So we see that D provably needs a non primary polymorphic base; by convention it will be D2; the first nominated polymorphic base (B1) gets to be primary.



So B2 must be at non trivial offset, and D to B2 conversion is non trivial: it requires generated code.



So the parameters of a member function of D cannot be equivalent with the parameters of a member function of B2, as the implicit this isn't trivially convertible; so:





  • D must have two different vtables: a vtable corresponding with B1_vtable and one with B2_vtable (they are in practice put together in one big vtable for D but conceptually they are two distinct structures).

  • the vtable entry of a virtual member of B2::g that is overridden in D needs two entries, one in the D_B2_vtable (which is just a B2_vtable layout with different values) and one in the D_B1_vtable which is an enhanced B1_vtable: a B1_vtable plus entries for new runtime features of D.


Because the D_B1_vtable is built from a B1_vtable, a pointer to D_B1_vtable is trivially a pointer to a B1_vtable, and the vptr value is the same.



Note that in theory is would be possible to omit the entry for D::g() in D_B1_vtable if the burden of making all virtual calls of D::g() via the B2 base, which as far as no non trivial covariance is used(#), is also a possibility.



(#) or if non trivial covariance occurs, "virtual covariance" (covariance in a derived to base relation involving virtual inheritance) isn't used



Not inherently primary base



Regular (non virtual) inheritance is simple like membership:




  • a non virtual base subobject is a direct base of exactly one object (which implies that there always exactly one final overrider of any virtual function when virtual inheritance isn't used);

  • the placement of a non virtual base is fixed;

  • base subobject that don't have virtual base subobjects, just like data member, are constructed exactly like complete objects (they have exactly one runtime constructor function code for every defined C++ constructor).


A more subtle case of inheritance is virtual inheritance: a virtual base subobject can be the direct base of many base class subobjects. That implies that the layout of virtual bases is only determined at the most derived class level: the offset of a virtual base in a most derived object is well known and a compile time constant; in a arbitrary derived class object (that may or may not be a most derived object) it is a value computed at runtime.



That offset can never be known because C++ supports both unifying and duplicating inheritance:




  • virtual inheritance is unifying: all virtual bases of a given type in a most derived object are one and the same subobject;


  • non virtual inheritance is duplicating: all indirect non virtual bases are semantically distinct, as their virtual members don't need to have common final overriders (contrast with Java where this is impossible (AFAIK)):



    struct B { virtual void f(); };
    struct D1 : B { virtual void f(); }; // final overrider
    struct D2 : B { virtual void f(); }; // final overrider
    struct DD : D1, D2 { };




Here DD has two distinct final overriders of B::f():





  • DD::D1::f() is final overrider for DD::D1::B::f()


  • DD::D2::f() is final overrider for DD::D2::B::f()


in two distinct vtable entries.



Duplicating inheritance, where you indirectly derive multiple times from a given class, implies multiple vptrs, vtables and possibly distinct vtable ultimate code (the ultimate aim of using a vtable entry: the high level semantic of calling a virtual function - not the entry point).



Not only C++ supports both, but the fact combinations are allowed: duplicating inheritance of a class that uses unifying inheritance:



struct VB { virtual void f(); };
struct D : virtual VB { virtual void g(); int dummy; };
struct DD1 : D { void g(); };
struct DD2 : D { void g(); };
struct DDD : DD1, DD2 { };


There is only one DDD::VB but there are two observably distinct D subobjects in DDD with different final overriders for D::g(). Whether or not a C++-like language (that supports virtual and non virtual inheritance semantic) guarantees that distinct subobjects have different addresses, the address of DDD::DD1::D cannot be at the same as the address of DDD::DD2::D.



So the offset of a VB in a D cannot be fixed (in any language that supports unification and duplication of bases).



In that particular example a real VB object (the object at runtime) has no concrete data member except the vptr, and the vptr is a special scalar member as it is a type "invariant" (not const) shared member: it is fixed on the constructor (invariant after complete construction) and its semantic is shared between bases and derived classes. Because VB has no scalar member that isn't type invariant, that in a DDD the VB subobject can be an overlay over DDD::DD1::D, as long as the vtable of D is a match for the vtable of VB.



This however cannot be the case for virtual bases that have non invariant scalar members, that is regular data members with an identity, that is members occupying a distinct range of bytes: these "real" data members cannot be overlayed on anything else. So a virtual base subobject with data members (members with with an address guaranteed to be distinct by C++ or any other the distinct C++-like language you are implementing) must be put at a distinct location: virtual bases with data members normally(##) have inherently non trivial offsets.



(##) with potentially a very narrow special case with a derived class with no data member with a virtual base with some data members



So we see that "almost empty" classes (classes with no data member but with a vptr) are special cases when used as virtual base classes: these virtual base are candidate for overlaying on derived classes, they are potential primaries but not inherent primaries:




  • the offset at which they reside will only be determined in the most derived class;

  • the offset might or might not be zero;

  • a nul offset implies overlaying of the base, so the vtable of each directly derived class must be a match for the vtable of the base;

  • a non nul offset implies non trivial conversions, so the entries in the vtables must treat conversion of the pointers to the virtual base as needing a runtime conversion (except when overlaid obviously as it wouldn't be necessary not possible).


This means that when overriding a virtual function in a virtual base, an adjustment is always assumed to be potentially needed, but in some cases no adjustment will be needed.



A morally virtual base is a base class relationship that involves a virtual inheritance (possibly plus non virtual inheritance). Performing a derived to base conversion, specifically converting a pointer d to derived D, to base B, a conversion to...





  • ...a non-morally virtual base is inherently reversible in every case:




    • there is a one to one relation between the identity of a subobject B of a D and a D (which might be a subobject itself);

    • the reverse operation can be performed with a static_cast<D*>: static_cast<D*>((B*)d) is d;




  • (in any C++ like language with complete support for unifying and duplicating inheritance) ...a morally virtual base is inherently non reversible in the general case (although it's reversible in common case with simple hierarchies). Note that:





    • static_cast<D*>((B*)d) is ill formed;


    • dynamic_cast<D*>((B*)d) will work for the simple cases.




So let's called virtual covariance the case where the covariance of the return type is based on morally virtual base. When overriding with virtual covariance, the calling convention cannot assume the base will be at a known offset. So a new vtable entry is inherently needed for virtual covariance, whether or not the overridden declaration is in an inherent primary:



struct VB { virtual void f(); }; // almost empty
struct D : virtual VB { }; // VB is potential primary

struct Ba { virtual VB * g(); };
struct Da : Ba { // non virtual base, so Ba is inherent primary
D * g(); // virtually covariant: D->VB is morally virtual
};


Here VB may be at offset zero in D and no adjustment may be needed (for example for a complete object of type D), but it isn't always the case in a D subobject: when dealing with pointers to D, one cannot know whether that is the case.



When Da::g() overrides Ba::g() with virtual covariance, the general case must be assumed so a new vtable entry is strictly needed for Da::g() as there is no possible down pointer conversion from VB to D that reverses the D to VB pointer conversion in the general case.



Ba is an inherent primary in Da so the semantics of Ba::vptr are shared/enhanced:




  • there are additional guarantees/invariants on that scalar member, and the vtable is extended;

  • no new vptr is needed for Da.


So the Da_vtable (inherently compatible with Ba_vtable) needs two distinct entries for virtual calls to g():




  • in the Ba_vtable part of the vtable: Ba::g() vtable entry: calls final overrider of Ba::g() with an implicit this parameter of Ba* and returns a VB* value.

  • in the new members part of the vtable: Da::g() vtable entry: calls final overrider of Da::g() (which by is inherently the same as final overrider of Ba::g() in C++) with an implicit this parameter of Da* and returns a D* value.


Note that there is not really any ABI freedom here: the fundamentals of vptr/vtable design and their intrinsic properties imply the presence of these multiple entries for what is a unique virtual function at the high language level.



Note that making the virtual function body inline and a visible by the ABI (so that the ABI by classes with different inline function definitions could be made incompatible, allowing more information to inform memory layout) wouldn't possibly help, as inline code would only define what a call to a non overridden virtual function does: one cannot based the ABI decisions on choices that can be overridden in derived classes.



[Example of a virtual covariance that ends up being only trivially covariant as in a complete D the offset for VB is trivial and no adjustment code would have been necessary in that case:



struct Da : Ba { // non virtual base, so inherent primary
D * g() { return new D; } // VB really is primary in complete D
// so conversion to VB* is trivial here
};


Note that in that code an incorrect code generation for a virtual call by a buggy compiler that would use the Ba_vtable entry to call g() would actually work because covariance ends up being trivial, as VB is primary in complete D.



The calling convention is for the general case and such code generation would fail with code that returns an object of a different class.



--end example]



But if Da::g() is final in the ABI, only virtual calls can be made via the VB * g(); declaration: covariance is made purely static, the derived to base conversion is be done at compile time as the last step of the virtual thunk, as if virtual covariance was never used.



Possible extension of final



There are two types of virtual-ness in C++: member functions (matched by function signature) and inheritance (match by class name). If final stops overriding a virtual function, could it be applied to base classes in a C++-like language?



First we need to define what is overriding a virtual base inheritance:



An "almost direct" subobject relation means that a indirect subobject is controlled almost as a direct subobject:




  • an almost direct subobject can be initialized like a direct subobject;

  • access control is never a really obstacle to access (inaccessible private almost direct subobjects can be made accessible at discretion).


Virtual inheritance provides almost direct access:




  • constructor for each virtual bases must be called by ctor-init-list of the constructor of the most derived class;

  • when a virtual base class is inaccessible because declared private in a base class, or publicly inherited in a private base class of a base class, the derived class has the discretion to declare the virtual base as a virtual base again, making it accessible.


A way to formalize virtual base overriding is to make an imaginary inheritance declaration in each derived class that overrides base class virtual inheritance declarations:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D
// , virtual VB // imaginary overrider of D inheritance of VB
{
// DD () : VB() { } // implicit definition
};


Now C++ variants that support both forms of inheritance don't have to have C++ semantic of almost direct access in all derived classes:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D, virtual final VB {
// DD () : VB() { } // implicit definition
};


Here the virtual-ness of the VB base is frozen and cannot be used in further derived classes; the virtual-ness is made invisible and inaccessible to derived classes and the location of VB is fixed.



struct DDD : DD {
DD () :
VB() // error: not an almost direct subobject
{ }
};
struct DD2 : D, virtual final VB {
// DD2 () : VB() { } // implicit definition
};
struct Diamond : DD, DD2 // error: no unique final overrider
{ // for ": virtual VB"
};


The virtual-ness freeze makes it illegal to unify Diamond::DD::VB and Diamond::DD2::VB but virtual-ness of VB requires unification which makes Diamond a contradictory, illegal class definition: no class can ever derive from both DD and DD2 [analog/example: just like no useful class can directly derive from A1 and A2:



struct A1 {
virtual int f() = 0;
};
struct A2 {
virtual unsigned f() = 0;
};
struct UselessAbstract : A1, A2 {
// no possible declaration of f() here
// none of the inherited virtual functions can be overridden
// in UselessAbstract or any derived class
};


Here UselessAbstract is abstract and no derived class are too, making that ABC (abstract base class) extremely silly, as any pointer to UselessAbstract is provably a null pointer.



-- end analog/example]



That would provide a way to freeze virtual inheritance, to provide meaningful private inheritance of classes with virtual base (without it derived classes can usurp the relationship between a class and its private base class).



Such use of final would of course freeze the location of a virtual base in a derived class and its further derived classes, avoiding additional vtable entries that are only needed because the location of virtual base isn't fixed.






share|improve this answer



















  • 1





    Could you make a tl;dr for this?

    – Mark Ransom
    Nov 27 '18 at 22:49











  • @MarkRansom In theory final allows vtable to be shorter in a few special cases, so ABI incompatibility is conceivable.

    – curiousguy
    Nov 27 '18 at 22:51











  • I didn't even consider the cases of obviously useless virtualness, like declaring a new virtual final function or new virtual final inheritance.

    – curiousguy
    Nov 28 '18 at 2:26
















2














Final on a function declaration X::f() implies that the declaration cannot be overridden, so all calls that name that declaration can be bound early (not those calls that name a declaration in a base class): if a virtual function is final in the ABI, the produced vtables can be incompatible with the one produced almost same class without final: calls to virtual functions that name declarations marked final can be assumed to be direct: trying to use a vtable entry (that should exist in the final-less ABI) is illegal.



The compiler could use the final guarantee to cut on the size of vtables (that can sometime grow a lot) by not adding a new entry that would be usually be added and that must be according to the ABI for non final declaration.



Entries are added for a declaration overriding a function not a (inherently, always) primary base or for a non trivially covariant return type (a return type covariant on a non primary base).



Inherently primary base class: the simplest case of polymorphic inheritance



The simple case of polymorphic inheritance, a derived class inheriting non virtually from a single polymorphic base class, is the typical case of an always primary base: the polymorphic base subobject is at the beginning, the address of derived object is the same as the address of the base subobject, virtual calls can be made directly with a pointer to either, everything is simple.



These properties are true whether the derived class is a complete object (one that isn't a subobject), a most derived object, or a base class. (They are class invariants guaranteed at the ABI level for pointers of unknown origin.)



Considering the case where the return type isn't covariant; or:



Trivial covariance



An example: the case where it's covariant with the same type as *this; as in:



struct B { virtual B *f(); };
struct D : B { virtual D *f(); }; // trivial covariance


Here B is inherently, invariably the primary in D: in all D (sub)objects ever created, a B resides at the same address: the D* to B* conversion is trivial so the covariance is also trivial: it's a static typing issue.



Whenever this is the case (trivial up-cast), covariance disappears at the code generation level.



Conclusion



In these cases the type of the declaration of the overriding function is trivially different from the type of the base:




  • all parameters are almost the same (with only a trivial difference on the type of this)

  • the return type is almost the same (with only a possible difference on the type of a returned pointer(*) type)


(*) since returning a reference is exactly the same as returning a pointer at the ABI level, references aren't discussed specifically



So no vtable entry is added for the derived declaration.



(So making the class final wouldn't be vtable simplification.)



Never primary base



Obviously a class can only have one subobject, containing a specific scalar data member (like the vptr (*)), at offset 0. Other base classes with scalar data members will be at a non trivial offset, requiring non trivial derived to base conversions of pointers. So multiple interesting(**) inheritance will create non primary bases.



(*) The vptr isn't a normal data member at the user level; but in the generated code, it's pretty much a normal scalar data member known to the compiler.
(**) The layout of non polymorphic bases isn't interesting here: for the purpose of vtable ABI, a non polymorphic base is treated like a member subobject, as it doesn't affect the vtables in any way.



The conceptually simplest interesting example of a non primary, and non trivial pointer conversion is:



struct B1 { virtual void f(); };
struct B2 { virtual void f(); };
struct D : B1, B2 { };


Each base has its own vptr scalar member, and these vptr have different purposes:





  • B1::vptr points to a B1_vtable structure


  • B2::vptr points to a B2_vtable structure


and these have identical layout (because the class definitions are superposable, the ABI must generate superposable layouts); and they are strictly incompatible because





  1. The vtables have distinct entries:





    • B1_vtable.f_ptr points to the final overrider for B1::f()


    • B2_vtable.f_ptr points to the final overrider for B2::f()



  2. B1_vtable.f_ptr must be at the same offset as B2_vtable.f_ptr (from their respective vptr data members in B1 and B2)


  3. The final overriders of B1::f() and B2::f() aren't inherently (always, invariably) equivalent(*): they can have distinct final overriders that do different things.(***)


(*) Two callable runtime functions(**) are equivalent if they have same observable behavior at the ABI level. (Equivalent callable functions may not have the same declaration or C++ types.)



(**) A callable runtime function is any entry point: any address that can be called/jumped at; it can be a normal function code, a thunk/trampoline, a particular entry in a multiple entry function. Callable runtime functions often have no possible C++ declarations, like "final overrider called with a base class pointer".



(***) That they sometimes have the same final overrider in a further derived class:



struct DD : D { void f(); }


isn't useful for the purpose of defining the ABI of D.



So we see that D provably needs a non primary polymorphic base; by convention it will be D2; the first nominated polymorphic base (B1) gets to be primary.



So B2 must be at non trivial offset, and D to B2 conversion is non trivial: it requires generated code.



So the parameters of a member function of D cannot be equivalent with the parameters of a member function of B2, as the implicit this isn't trivially convertible; so:





  • D must have two different vtables: a vtable corresponding with B1_vtable and one with B2_vtable (they are in practice put together in one big vtable for D but conceptually they are two distinct structures).

  • the vtable entry of a virtual member of B2::g that is overridden in D needs two entries, one in the D_B2_vtable (which is just a B2_vtable layout with different values) and one in the D_B1_vtable which is an enhanced B1_vtable: a B1_vtable plus entries for new runtime features of D.


Because the D_B1_vtable is built from a B1_vtable, a pointer to D_B1_vtable is trivially a pointer to a B1_vtable, and the vptr value is the same.



Note that in theory is would be possible to omit the entry for D::g() in D_B1_vtable if the burden of making all virtual calls of D::g() via the B2 base, which as far as no non trivial covariance is used(#), is also a possibility.



(#) or if non trivial covariance occurs, "virtual covariance" (covariance in a derived to base relation involving virtual inheritance) isn't used



Not inherently primary base



Regular (non virtual) inheritance is simple like membership:




  • a non virtual base subobject is a direct base of exactly one object (which implies that there always exactly one final overrider of any virtual function when virtual inheritance isn't used);

  • the placement of a non virtual base is fixed;

  • base subobject that don't have virtual base subobjects, just like data member, are constructed exactly like complete objects (they have exactly one runtime constructor function code for every defined C++ constructor).


A more subtle case of inheritance is virtual inheritance: a virtual base subobject can be the direct base of many base class subobjects. That implies that the layout of virtual bases is only determined at the most derived class level: the offset of a virtual base in a most derived object is well known and a compile time constant; in a arbitrary derived class object (that may or may not be a most derived object) it is a value computed at runtime.



That offset can never be known because C++ supports both unifying and duplicating inheritance:




  • virtual inheritance is unifying: all virtual bases of a given type in a most derived object are one and the same subobject;


  • non virtual inheritance is duplicating: all indirect non virtual bases are semantically distinct, as their virtual members don't need to have common final overriders (contrast with Java where this is impossible (AFAIK)):



    struct B { virtual void f(); };
    struct D1 : B { virtual void f(); }; // final overrider
    struct D2 : B { virtual void f(); }; // final overrider
    struct DD : D1, D2 { };




Here DD has two distinct final overriders of B::f():





  • DD::D1::f() is final overrider for DD::D1::B::f()


  • DD::D2::f() is final overrider for DD::D2::B::f()


in two distinct vtable entries.



Duplicating inheritance, where you indirectly derive multiple times from a given class, implies multiple vptrs, vtables and possibly distinct vtable ultimate code (the ultimate aim of using a vtable entry: the high level semantic of calling a virtual function - not the entry point).



Not only C++ supports both, but the fact combinations are allowed: duplicating inheritance of a class that uses unifying inheritance:



struct VB { virtual void f(); };
struct D : virtual VB { virtual void g(); int dummy; };
struct DD1 : D { void g(); };
struct DD2 : D { void g(); };
struct DDD : DD1, DD2 { };


There is only one DDD::VB but there are two observably distinct D subobjects in DDD with different final overriders for D::g(). Whether or not a C++-like language (that supports virtual and non virtual inheritance semantic) guarantees that distinct subobjects have different addresses, the address of DDD::DD1::D cannot be at the same as the address of DDD::DD2::D.



So the offset of a VB in a D cannot be fixed (in any language that supports unification and duplication of bases).



In that particular example a real VB object (the object at runtime) has no concrete data member except the vptr, and the vptr is a special scalar member as it is a type "invariant" (not const) shared member: it is fixed on the constructor (invariant after complete construction) and its semantic is shared between bases and derived classes. Because VB has no scalar member that isn't type invariant, that in a DDD the VB subobject can be an overlay over DDD::DD1::D, as long as the vtable of D is a match for the vtable of VB.



This however cannot be the case for virtual bases that have non invariant scalar members, that is regular data members with an identity, that is members occupying a distinct range of bytes: these "real" data members cannot be overlayed on anything else. So a virtual base subobject with data members (members with with an address guaranteed to be distinct by C++ or any other the distinct C++-like language you are implementing) must be put at a distinct location: virtual bases with data members normally(##) have inherently non trivial offsets.



(##) with potentially a very narrow special case with a derived class with no data member with a virtual base with some data members



So we see that "almost empty" classes (classes with no data member but with a vptr) are special cases when used as virtual base classes: these virtual base are candidate for overlaying on derived classes, they are potential primaries but not inherent primaries:




  • the offset at which they reside will only be determined in the most derived class;

  • the offset might or might not be zero;

  • a nul offset implies overlaying of the base, so the vtable of each directly derived class must be a match for the vtable of the base;

  • a non nul offset implies non trivial conversions, so the entries in the vtables must treat conversion of the pointers to the virtual base as needing a runtime conversion (except when overlaid obviously as it wouldn't be necessary not possible).


This means that when overriding a virtual function in a virtual base, an adjustment is always assumed to be potentially needed, but in some cases no adjustment will be needed.



A morally virtual base is a base class relationship that involves a virtual inheritance (possibly plus non virtual inheritance). Performing a derived to base conversion, specifically converting a pointer d to derived D, to base B, a conversion to...





  • ...a non-morally virtual base is inherently reversible in every case:




    • there is a one to one relation between the identity of a subobject B of a D and a D (which might be a subobject itself);

    • the reverse operation can be performed with a static_cast<D*>: static_cast<D*>((B*)d) is d;




  • (in any C++ like language with complete support for unifying and duplicating inheritance) ...a morally virtual base is inherently non reversible in the general case (although it's reversible in common case with simple hierarchies). Note that:





    • static_cast<D*>((B*)d) is ill formed;


    • dynamic_cast<D*>((B*)d) will work for the simple cases.




So let's called virtual covariance the case where the covariance of the return type is based on morally virtual base. When overriding with virtual covariance, the calling convention cannot assume the base will be at a known offset. So a new vtable entry is inherently needed for virtual covariance, whether or not the overridden declaration is in an inherent primary:



struct VB { virtual void f(); }; // almost empty
struct D : virtual VB { }; // VB is potential primary

struct Ba { virtual VB * g(); };
struct Da : Ba { // non virtual base, so Ba is inherent primary
D * g(); // virtually covariant: D->VB is morally virtual
};


Here VB may be at offset zero in D and no adjustment may be needed (for example for a complete object of type D), but it isn't always the case in a D subobject: when dealing with pointers to D, one cannot know whether that is the case.



When Da::g() overrides Ba::g() with virtual covariance, the general case must be assumed so a new vtable entry is strictly needed for Da::g() as there is no possible down pointer conversion from VB to D that reverses the D to VB pointer conversion in the general case.



Ba is an inherent primary in Da so the semantics of Ba::vptr are shared/enhanced:




  • there are additional guarantees/invariants on that scalar member, and the vtable is extended;

  • no new vptr is needed for Da.


So the Da_vtable (inherently compatible with Ba_vtable) needs two distinct entries for virtual calls to g():




  • in the Ba_vtable part of the vtable: Ba::g() vtable entry: calls final overrider of Ba::g() with an implicit this parameter of Ba* and returns a VB* value.

  • in the new members part of the vtable: Da::g() vtable entry: calls final overrider of Da::g() (which by is inherently the same as final overrider of Ba::g() in C++) with an implicit this parameter of Da* and returns a D* value.


Note that there is not really any ABI freedom here: the fundamentals of vptr/vtable design and their intrinsic properties imply the presence of these multiple entries for what is a unique virtual function at the high language level.



Note that making the virtual function body inline and a visible by the ABI (so that the ABI by classes with different inline function definitions could be made incompatible, allowing more information to inform memory layout) wouldn't possibly help, as inline code would only define what a call to a non overridden virtual function does: one cannot based the ABI decisions on choices that can be overridden in derived classes.



[Example of a virtual covariance that ends up being only trivially covariant as in a complete D the offset for VB is trivial and no adjustment code would have been necessary in that case:



struct Da : Ba { // non virtual base, so inherent primary
D * g() { return new D; } // VB really is primary in complete D
// so conversion to VB* is trivial here
};


Note that in that code an incorrect code generation for a virtual call by a buggy compiler that would use the Ba_vtable entry to call g() would actually work because covariance ends up being trivial, as VB is primary in complete D.



The calling convention is for the general case and such code generation would fail with code that returns an object of a different class.



--end example]



But if Da::g() is final in the ABI, only virtual calls can be made via the VB * g(); declaration: covariance is made purely static, the derived to base conversion is be done at compile time as the last step of the virtual thunk, as if virtual covariance was never used.



Possible extension of final



There are two types of virtual-ness in C++: member functions (matched by function signature) and inheritance (match by class name). If final stops overriding a virtual function, could it be applied to base classes in a C++-like language?



First we need to define what is overriding a virtual base inheritance:



An "almost direct" subobject relation means that a indirect subobject is controlled almost as a direct subobject:




  • an almost direct subobject can be initialized like a direct subobject;

  • access control is never a really obstacle to access (inaccessible private almost direct subobjects can be made accessible at discretion).


Virtual inheritance provides almost direct access:




  • constructor for each virtual bases must be called by ctor-init-list of the constructor of the most derived class;

  • when a virtual base class is inaccessible because declared private in a base class, or publicly inherited in a private base class of a base class, the derived class has the discretion to declare the virtual base as a virtual base again, making it accessible.


A way to formalize virtual base overriding is to make an imaginary inheritance declaration in each derived class that overrides base class virtual inheritance declarations:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D
// , virtual VB // imaginary overrider of D inheritance of VB
{
// DD () : VB() { } // implicit definition
};


Now C++ variants that support both forms of inheritance don't have to have C++ semantic of almost direct access in all derived classes:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D, virtual final VB {
// DD () : VB() { } // implicit definition
};


Here the virtual-ness of the VB base is frozen and cannot be used in further derived classes; the virtual-ness is made invisible and inaccessible to derived classes and the location of VB is fixed.



struct DDD : DD {
DD () :
VB() // error: not an almost direct subobject
{ }
};
struct DD2 : D, virtual final VB {
// DD2 () : VB() { } // implicit definition
};
struct Diamond : DD, DD2 // error: no unique final overrider
{ // for ": virtual VB"
};


The virtual-ness freeze makes it illegal to unify Diamond::DD::VB and Diamond::DD2::VB but virtual-ness of VB requires unification which makes Diamond a contradictory, illegal class definition: no class can ever derive from both DD and DD2 [analog/example: just like no useful class can directly derive from A1 and A2:



struct A1 {
virtual int f() = 0;
};
struct A2 {
virtual unsigned f() = 0;
};
struct UselessAbstract : A1, A2 {
// no possible declaration of f() here
// none of the inherited virtual functions can be overridden
// in UselessAbstract or any derived class
};


Here UselessAbstract is abstract and no derived class are too, making that ABC (abstract base class) extremely silly, as any pointer to UselessAbstract is provably a null pointer.



-- end analog/example]



That would provide a way to freeze virtual inheritance, to provide meaningful private inheritance of classes with virtual base (without it derived classes can usurp the relationship between a class and its private base class).



Such use of final would of course freeze the location of a virtual base in a derived class and its further derived classes, avoiding additional vtable entries that are only needed because the location of virtual base isn't fixed.






share|improve this answer



















  • 1





    Could you make a tl;dr for this?

    – Mark Ransom
    Nov 27 '18 at 22:49











  • @MarkRansom In theory final allows vtable to be shorter in a few special cases, so ABI incompatibility is conceivable.

    – curiousguy
    Nov 27 '18 at 22:51











  • I didn't even consider the cases of obviously useless virtualness, like declaring a new virtual final function or new virtual final inheritance.

    – curiousguy
    Nov 28 '18 at 2:26














2












2








2







Final on a function declaration X::f() implies that the declaration cannot be overridden, so all calls that name that declaration can be bound early (not those calls that name a declaration in a base class): if a virtual function is final in the ABI, the produced vtables can be incompatible with the one produced almost same class without final: calls to virtual functions that name declarations marked final can be assumed to be direct: trying to use a vtable entry (that should exist in the final-less ABI) is illegal.



The compiler could use the final guarantee to cut on the size of vtables (that can sometime grow a lot) by not adding a new entry that would be usually be added and that must be according to the ABI for non final declaration.



Entries are added for a declaration overriding a function not a (inherently, always) primary base or for a non trivially covariant return type (a return type covariant on a non primary base).



Inherently primary base class: the simplest case of polymorphic inheritance



The simple case of polymorphic inheritance, a derived class inheriting non virtually from a single polymorphic base class, is the typical case of an always primary base: the polymorphic base subobject is at the beginning, the address of derived object is the same as the address of the base subobject, virtual calls can be made directly with a pointer to either, everything is simple.



These properties are true whether the derived class is a complete object (one that isn't a subobject), a most derived object, or a base class. (They are class invariants guaranteed at the ABI level for pointers of unknown origin.)



Considering the case where the return type isn't covariant; or:



Trivial covariance



An example: the case where it's covariant with the same type as *this; as in:



struct B { virtual B *f(); };
struct D : B { virtual D *f(); }; // trivial covariance


Here B is inherently, invariably the primary in D: in all D (sub)objects ever created, a B resides at the same address: the D* to B* conversion is trivial so the covariance is also trivial: it's a static typing issue.



Whenever this is the case (trivial up-cast), covariance disappears at the code generation level.



Conclusion



In these cases the type of the declaration of the overriding function is trivially different from the type of the base:




  • all parameters are almost the same (with only a trivial difference on the type of this)

  • the return type is almost the same (with only a possible difference on the type of a returned pointer(*) type)


(*) since returning a reference is exactly the same as returning a pointer at the ABI level, references aren't discussed specifically



So no vtable entry is added for the derived declaration.



(So making the class final wouldn't be vtable simplification.)



Never primary base



Obviously a class can only have one subobject, containing a specific scalar data member (like the vptr (*)), at offset 0. Other base classes with scalar data members will be at a non trivial offset, requiring non trivial derived to base conversions of pointers. So multiple interesting(**) inheritance will create non primary bases.



(*) The vptr isn't a normal data member at the user level; but in the generated code, it's pretty much a normal scalar data member known to the compiler.
(**) The layout of non polymorphic bases isn't interesting here: for the purpose of vtable ABI, a non polymorphic base is treated like a member subobject, as it doesn't affect the vtables in any way.



The conceptually simplest interesting example of a non primary, and non trivial pointer conversion is:



struct B1 { virtual void f(); };
struct B2 { virtual void f(); };
struct D : B1, B2 { };


Each base has its own vptr scalar member, and these vptr have different purposes:





  • B1::vptr points to a B1_vtable structure


  • B2::vptr points to a B2_vtable structure


and these have identical layout (because the class definitions are superposable, the ABI must generate superposable layouts); and they are strictly incompatible because





  1. The vtables have distinct entries:





    • B1_vtable.f_ptr points to the final overrider for B1::f()


    • B2_vtable.f_ptr points to the final overrider for B2::f()



  2. B1_vtable.f_ptr must be at the same offset as B2_vtable.f_ptr (from their respective vptr data members in B1 and B2)


  3. The final overriders of B1::f() and B2::f() aren't inherently (always, invariably) equivalent(*): they can have distinct final overriders that do different things.(***)


(*) Two callable runtime functions(**) are equivalent if they have same observable behavior at the ABI level. (Equivalent callable functions may not have the same declaration or C++ types.)



(**) A callable runtime function is any entry point: any address that can be called/jumped at; it can be a normal function code, a thunk/trampoline, a particular entry in a multiple entry function. Callable runtime functions often have no possible C++ declarations, like "final overrider called with a base class pointer".



(***) That they sometimes have the same final overrider in a further derived class:



struct DD : D { void f(); }


isn't useful for the purpose of defining the ABI of D.



So we see that D provably needs a non primary polymorphic base; by convention it will be D2; the first nominated polymorphic base (B1) gets to be primary.



So B2 must be at non trivial offset, and D to B2 conversion is non trivial: it requires generated code.



So the parameters of a member function of D cannot be equivalent with the parameters of a member function of B2, as the implicit this isn't trivially convertible; so:





  • D must have two different vtables: a vtable corresponding with B1_vtable and one with B2_vtable (they are in practice put together in one big vtable for D but conceptually they are two distinct structures).

  • the vtable entry of a virtual member of B2::g that is overridden in D needs two entries, one in the D_B2_vtable (which is just a B2_vtable layout with different values) and one in the D_B1_vtable which is an enhanced B1_vtable: a B1_vtable plus entries for new runtime features of D.


Because the D_B1_vtable is built from a B1_vtable, a pointer to D_B1_vtable is trivially a pointer to a B1_vtable, and the vptr value is the same.



Note that in theory is would be possible to omit the entry for D::g() in D_B1_vtable if the burden of making all virtual calls of D::g() via the B2 base, which as far as no non trivial covariance is used(#), is also a possibility.



(#) or if non trivial covariance occurs, "virtual covariance" (covariance in a derived to base relation involving virtual inheritance) isn't used



Not inherently primary base



Regular (non virtual) inheritance is simple like membership:




  • a non virtual base subobject is a direct base of exactly one object (which implies that there always exactly one final overrider of any virtual function when virtual inheritance isn't used);

  • the placement of a non virtual base is fixed;

  • base subobject that don't have virtual base subobjects, just like data member, are constructed exactly like complete objects (they have exactly one runtime constructor function code for every defined C++ constructor).


A more subtle case of inheritance is virtual inheritance: a virtual base subobject can be the direct base of many base class subobjects. That implies that the layout of virtual bases is only determined at the most derived class level: the offset of a virtual base in a most derived object is well known and a compile time constant; in a arbitrary derived class object (that may or may not be a most derived object) it is a value computed at runtime.



That offset can never be known because C++ supports both unifying and duplicating inheritance:




  • virtual inheritance is unifying: all virtual bases of a given type in a most derived object are one and the same subobject;


  • non virtual inheritance is duplicating: all indirect non virtual bases are semantically distinct, as their virtual members don't need to have common final overriders (contrast with Java where this is impossible (AFAIK)):



    struct B { virtual void f(); };
    struct D1 : B { virtual void f(); }; // final overrider
    struct D2 : B { virtual void f(); }; // final overrider
    struct DD : D1, D2 { };




Here DD has two distinct final overriders of B::f():





  • DD::D1::f() is final overrider for DD::D1::B::f()


  • DD::D2::f() is final overrider for DD::D2::B::f()


in two distinct vtable entries.



Duplicating inheritance, where you indirectly derive multiple times from a given class, implies multiple vptrs, vtables and possibly distinct vtable ultimate code (the ultimate aim of using a vtable entry: the high level semantic of calling a virtual function - not the entry point).



Not only C++ supports both, but the fact combinations are allowed: duplicating inheritance of a class that uses unifying inheritance:



struct VB { virtual void f(); };
struct D : virtual VB { virtual void g(); int dummy; };
struct DD1 : D { void g(); };
struct DD2 : D { void g(); };
struct DDD : DD1, DD2 { };


There is only one DDD::VB but there are two observably distinct D subobjects in DDD with different final overriders for D::g(). Whether or not a C++-like language (that supports virtual and non virtual inheritance semantic) guarantees that distinct subobjects have different addresses, the address of DDD::DD1::D cannot be at the same as the address of DDD::DD2::D.



So the offset of a VB in a D cannot be fixed (in any language that supports unification and duplication of bases).



In that particular example a real VB object (the object at runtime) has no concrete data member except the vptr, and the vptr is a special scalar member as it is a type "invariant" (not const) shared member: it is fixed on the constructor (invariant after complete construction) and its semantic is shared between bases and derived classes. Because VB has no scalar member that isn't type invariant, that in a DDD the VB subobject can be an overlay over DDD::DD1::D, as long as the vtable of D is a match for the vtable of VB.



This however cannot be the case for virtual bases that have non invariant scalar members, that is regular data members with an identity, that is members occupying a distinct range of bytes: these "real" data members cannot be overlayed on anything else. So a virtual base subobject with data members (members with with an address guaranteed to be distinct by C++ or any other the distinct C++-like language you are implementing) must be put at a distinct location: virtual bases with data members normally(##) have inherently non trivial offsets.



(##) with potentially a very narrow special case with a derived class with no data member with a virtual base with some data members



So we see that "almost empty" classes (classes with no data member but with a vptr) are special cases when used as virtual base classes: these virtual base are candidate for overlaying on derived classes, they are potential primaries but not inherent primaries:




  • the offset at which they reside will only be determined in the most derived class;

  • the offset might or might not be zero;

  • a nul offset implies overlaying of the base, so the vtable of each directly derived class must be a match for the vtable of the base;

  • a non nul offset implies non trivial conversions, so the entries in the vtables must treat conversion of the pointers to the virtual base as needing a runtime conversion (except when overlaid obviously as it wouldn't be necessary not possible).


This means that when overriding a virtual function in a virtual base, an adjustment is always assumed to be potentially needed, but in some cases no adjustment will be needed.



A morally virtual base is a base class relationship that involves a virtual inheritance (possibly plus non virtual inheritance). Performing a derived to base conversion, specifically converting a pointer d to derived D, to base B, a conversion to...





  • ...a non-morally virtual base is inherently reversible in every case:




    • there is a one to one relation between the identity of a subobject B of a D and a D (which might be a subobject itself);

    • the reverse operation can be performed with a static_cast<D*>: static_cast<D*>((B*)d) is d;




  • (in any C++ like language with complete support for unifying and duplicating inheritance) ...a morally virtual base is inherently non reversible in the general case (although it's reversible in common case with simple hierarchies). Note that:





    • static_cast<D*>((B*)d) is ill formed;


    • dynamic_cast<D*>((B*)d) will work for the simple cases.




So let's called virtual covariance the case where the covariance of the return type is based on morally virtual base. When overriding with virtual covariance, the calling convention cannot assume the base will be at a known offset. So a new vtable entry is inherently needed for virtual covariance, whether or not the overridden declaration is in an inherent primary:



struct VB { virtual void f(); }; // almost empty
struct D : virtual VB { }; // VB is potential primary

struct Ba { virtual VB * g(); };
struct Da : Ba { // non virtual base, so Ba is inherent primary
D * g(); // virtually covariant: D->VB is morally virtual
};


Here VB may be at offset zero in D and no adjustment may be needed (for example for a complete object of type D), but it isn't always the case in a D subobject: when dealing with pointers to D, one cannot know whether that is the case.



When Da::g() overrides Ba::g() with virtual covariance, the general case must be assumed so a new vtable entry is strictly needed for Da::g() as there is no possible down pointer conversion from VB to D that reverses the D to VB pointer conversion in the general case.



Ba is an inherent primary in Da so the semantics of Ba::vptr are shared/enhanced:




  • there are additional guarantees/invariants on that scalar member, and the vtable is extended;

  • no new vptr is needed for Da.


So the Da_vtable (inherently compatible with Ba_vtable) needs two distinct entries for virtual calls to g():




  • in the Ba_vtable part of the vtable: Ba::g() vtable entry: calls final overrider of Ba::g() with an implicit this parameter of Ba* and returns a VB* value.

  • in the new members part of the vtable: Da::g() vtable entry: calls final overrider of Da::g() (which by is inherently the same as final overrider of Ba::g() in C++) with an implicit this parameter of Da* and returns a D* value.


Note that there is not really any ABI freedom here: the fundamentals of vptr/vtable design and their intrinsic properties imply the presence of these multiple entries for what is a unique virtual function at the high language level.



Note that making the virtual function body inline and a visible by the ABI (so that the ABI by classes with different inline function definitions could be made incompatible, allowing more information to inform memory layout) wouldn't possibly help, as inline code would only define what a call to a non overridden virtual function does: one cannot based the ABI decisions on choices that can be overridden in derived classes.



[Example of a virtual covariance that ends up being only trivially covariant as in a complete D the offset for VB is trivial and no adjustment code would have been necessary in that case:



struct Da : Ba { // non virtual base, so inherent primary
D * g() { return new D; } // VB really is primary in complete D
// so conversion to VB* is trivial here
};


Note that in that code an incorrect code generation for a virtual call by a buggy compiler that would use the Ba_vtable entry to call g() would actually work because covariance ends up being trivial, as VB is primary in complete D.



The calling convention is for the general case and such code generation would fail with code that returns an object of a different class.



--end example]



But if Da::g() is final in the ABI, only virtual calls can be made via the VB * g(); declaration: covariance is made purely static, the derived to base conversion is be done at compile time as the last step of the virtual thunk, as if virtual covariance was never used.



Possible extension of final



There are two types of virtual-ness in C++: member functions (matched by function signature) and inheritance (match by class name). If final stops overriding a virtual function, could it be applied to base classes in a C++-like language?



First we need to define what is overriding a virtual base inheritance:



An "almost direct" subobject relation means that a indirect subobject is controlled almost as a direct subobject:




  • an almost direct subobject can be initialized like a direct subobject;

  • access control is never a really obstacle to access (inaccessible private almost direct subobjects can be made accessible at discretion).


Virtual inheritance provides almost direct access:




  • constructor for each virtual bases must be called by ctor-init-list of the constructor of the most derived class;

  • when a virtual base class is inaccessible because declared private in a base class, or publicly inherited in a private base class of a base class, the derived class has the discretion to declare the virtual base as a virtual base again, making it accessible.


A way to formalize virtual base overriding is to make an imaginary inheritance declaration in each derived class that overrides base class virtual inheritance declarations:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D
// , virtual VB // imaginary overrider of D inheritance of VB
{
// DD () : VB() { } // implicit definition
};


Now C++ variants that support both forms of inheritance don't have to have C++ semantic of almost direct access in all derived classes:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D, virtual final VB {
// DD () : VB() { } // implicit definition
};


Here the virtual-ness of the VB base is frozen and cannot be used in further derived classes; the virtual-ness is made invisible and inaccessible to derived classes and the location of VB is fixed.



struct DDD : DD {
DD () :
VB() // error: not an almost direct subobject
{ }
};
struct DD2 : D, virtual final VB {
// DD2 () : VB() { } // implicit definition
};
struct Diamond : DD, DD2 // error: no unique final overrider
{ // for ": virtual VB"
};


The virtual-ness freeze makes it illegal to unify Diamond::DD::VB and Diamond::DD2::VB but virtual-ness of VB requires unification which makes Diamond a contradictory, illegal class definition: no class can ever derive from both DD and DD2 [analog/example: just like no useful class can directly derive from A1 and A2:



struct A1 {
virtual int f() = 0;
};
struct A2 {
virtual unsigned f() = 0;
};
struct UselessAbstract : A1, A2 {
// no possible declaration of f() here
// none of the inherited virtual functions can be overridden
// in UselessAbstract or any derived class
};


Here UselessAbstract is abstract and no derived class are too, making that ABC (abstract base class) extremely silly, as any pointer to UselessAbstract is provably a null pointer.



-- end analog/example]



That would provide a way to freeze virtual inheritance, to provide meaningful private inheritance of classes with virtual base (without it derived classes can usurp the relationship between a class and its private base class).



Such use of final would of course freeze the location of a virtual base in a derived class and its further derived classes, avoiding additional vtable entries that are only needed because the location of virtual base isn't fixed.






share|improve this answer













Final on a function declaration X::f() implies that the declaration cannot be overridden, so all calls that name that declaration can be bound early (not those calls that name a declaration in a base class): if a virtual function is final in the ABI, the produced vtables can be incompatible with the one produced almost same class without final: calls to virtual functions that name declarations marked final can be assumed to be direct: trying to use a vtable entry (that should exist in the final-less ABI) is illegal.



The compiler could use the final guarantee to cut on the size of vtables (that can sometime grow a lot) by not adding a new entry that would be usually be added and that must be according to the ABI for non final declaration.



Entries are added for a declaration overriding a function not a (inherently, always) primary base or for a non trivially covariant return type (a return type covariant on a non primary base).



Inherently primary base class: the simplest case of polymorphic inheritance



The simple case of polymorphic inheritance, a derived class inheriting non virtually from a single polymorphic base class, is the typical case of an always primary base: the polymorphic base subobject is at the beginning, the address of derived object is the same as the address of the base subobject, virtual calls can be made directly with a pointer to either, everything is simple.



These properties are true whether the derived class is a complete object (one that isn't a subobject), a most derived object, or a base class. (They are class invariants guaranteed at the ABI level for pointers of unknown origin.)



Considering the case where the return type isn't covariant; or:



Trivial covariance



An example: the case where it's covariant with the same type as *this; as in:



struct B { virtual B *f(); };
struct D : B { virtual D *f(); }; // trivial covariance


Here B is inherently, invariably the primary in D: in all D (sub)objects ever created, a B resides at the same address: the D* to B* conversion is trivial so the covariance is also trivial: it's a static typing issue.



Whenever this is the case (trivial up-cast), covariance disappears at the code generation level.



Conclusion



In these cases the type of the declaration of the overriding function is trivially different from the type of the base:




  • all parameters are almost the same (with only a trivial difference on the type of this)

  • the return type is almost the same (with only a possible difference on the type of a returned pointer(*) type)


(*) since returning a reference is exactly the same as returning a pointer at the ABI level, references aren't discussed specifically



So no vtable entry is added for the derived declaration.



(So making the class final wouldn't be vtable simplification.)



Never primary base



Obviously a class can only have one subobject, containing a specific scalar data member (like the vptr (*)), at offset 0. Other base classes with scalar data members will be at a non trivial offset, requiring non trivial derived to base conversions of pointers. So multiple interesting(**) inheritance will create non primary bases.



(*) The vptr isn't a normal data member at the user level; but in the generated code, it's pretty much a normal scalar data member known to the compiler.
(**) The layout of non polymorphic bases isn't interesting here: for the purpose of vtable ABI, a non polymorphic base is treated like a member subobject, as it doesn't affect the vtables in any way.



The conceptually simplest interesting example of a non primary, and non trivial pointer conversion is:



struct B1 { virtual void f(); };
struct B2 { virtual void f(); };
struct D : B1, B2 { };


Each base has its own vptr scalar member, and these vptr have different purposes:





  • B1::vptr points to a B1_vtable structure


  • B2::vptr points to a B2_vtable structure


and these have identical layout (because the class definitions are superposable, the ABI must generate superposable layouts); and they are strictly incompatible because





  1. The vtables have distinct entries:





    • B1_vtable.f_ptr points to the final overrider for B1::f()


    • B2_vtable.f_ptr points to the final overrider for B2::f()



  2. B1_vtable.f_ptr must be at the same offset as B2_vtable.f_ptr (from their respective vptr data members in B1 and B2)


  3. The final overriders of B1::f() and B2::f() aren't inherently (always, invariably) equivalent(*): they can have distinct final overriders that do different things.(***)


(*) Two callable runtime functions(**) are equivalent if they have same observable behavior at the ABI level. (Equivalent callable functions may not have the same declaration or C++ types.)



(**) A callable runtime function is any entry point: any address that can be called/jumped at; it can be a normal function code, a thunk/trampoline, a particular entry in a multiple entry function. Callable runtime functions often have no possible C++ declarations, like "final overrider called with a base class pointer".



(***) That they sometimes have the same final overrider in a further derived class:



struct DD : D { void f(); }


isn't useful for the purpose of defining the ABI of D.



So we see that D provably needs a non primary polymorphic base; by convention it will be D2; the first nominated polymorphic base (B1) gets to be primary.



So B2 must be at non trivial offset, and D to B2 conversion is non trivial: it requires generated code.



So the parameters of a member function of D cannot be equivalent with the parameters of a member function of B2, as the implicit this isn't trivially convertible; so:





  • D must have two different vtables: a vtable corresponding with B1_vtable and one with B2_vtable (they are in practice put together in one big vtable for D but conceptually they are two distinct structures).

  • the vtable entry of a virtual member of B2::g that is overridden in D needs two entries, one in the D_B2_vtable (which is just a B2_vtable layout with different values) and one in the D_B1_vtable which is an enhanced B1_vtable: a B1_vtable plus entries for new runtime features of D.


Because the D_B1_vtable is built from a B1_vtable, a pointer to D_B1_vtable is trivially a pointer to a B1_vtable, and the vptr value is the same.



Note that in theory is would be possible to omit the entry for D::g() in D_B1_vtable if the burden of making all virtual calls of D::g() via the B2 base, which as far as no non trivial covariance is used(#), is also a possibility.



(#) or if non trivial covariance occurs, "virtual covariance" (covariance in a derived to base relation involving virtual inheritance) isn't used



Not inherently primary base



Regular (non virtual) inheritance is simple like membership:




  • a non virtual base subobject is a direct base of exactly one object (which implies that there always exactly one final overrider of any virtual function when virtual inheritance isn't used);

  • the placement of a non virtual base is fixed;

  • base subobject that don't have virtual base subobjects, just like data member, are constructed exactly like complete objects (they have exactly one runtime constructor function code for every defined C++ constructor).


A more subtle case of inheritance is virtual inheritance: a virtual base subobject can be the direct base of many base class subobjects. That implies that the layout of virtual bases is only determined at the most derived class level: the offset of a virtual base in a most derived object is well known and a compile time constant; in a arbitrary derived class object (that may or may not be a most derived object) it is a value computed at runtime.



That offset can never be known because C++ supports both unifying and duplicating inheritance:




  • virtual inheritance is unifying: all virtual bases of a given type in a most derived object are one and the same subobject;


  • non virtual inheritance is duplicating: all indirect non virtual bases are semantically distinct, as their virtual members don't need to have common final overriders (contrast with Java where this is impossible (AFAIK)):



    struct B { virtual void f(); };
    struct D1 : B { virtual void f(); }; // final overrider
    struct D2 : B { virtual void f(); }; // final overrider
    struct DD : D1, D2 { };




Here DD has two distinct final overriders of B::f():





  • DD::D1::f() is final overrider for DD::D1::B::f()


  • DD::D2::f() is final overrider for DD::D2::B::f()


in two distinct vtable entries.



Duplicating inheritance, where you indirectly derive multiple times from a given class, implies multiple vptrs, vtables and possibly distinct vtable ultimate code (the ultimate aim of using a vtable entry: the high level semantic of calling a virtual function - not the entry point).



Not only C++ supports both, but the fact combinations are allowed: duplicating inheritance of a class that uses unifying inheritance:



struct VB { virtual void f(); };
struct D : virtual VB { virtual void g(); int dummy; };
struct DD1 : D { void g(); };
struct DD2 : D { void g(); };
struct DDD : DD1, DD2 { };


There is only one DDD::VB but there are two observably distinct D subobjects in DDD with different final overriders for D::g(). Whether or not a C++-like language (that supports virtual and non virtual inheritance semantic) guarantees that distinct subobjects have different addresses, the address of DDD::DD1::D cannot be at the same as the address of DDD::DD2::D.



So the offset of a VB in a D cannot be fixed (in any language that supports unification and duplication of bases).



In that particular example a real VB object (the object at runtime) has no concrete data member except the vptr, and the vptr is a special scalar member as it is a type "invariant" (not const) shared member: it is fixed on the constructor (invariant after complete construction) and its semantic is shared between bases and derived classes. Because VB has no scalar member that isn't type invariant, that in a DDD the VB subobject can be an overlay over DDD::DD1::D, as long as the vtable of D is a match for the vtable of VB.



This however cannot be the case for virtual bases that have non invariant scalar members, that is regular data members with an identity, that is members occupying a distinct range of bytes: these "real" data members cannot be overlayed on anything else. So a virtual base subobject with data members (members with with an address guaranteed to be distinct by C++ or any other the distinct C++-like language you are implementing) must be put at a distinct location: virtual bases with data members normally(##) have inherently non trivial offsets.



(##) with potentially a very narrow special case with a derived class with no data member with a virtual base with some data members



So we see that "almost empty" classes (classes with no data member but with a vptr) are special cases when used as virtual base classes: these virtual base are candidate for overlaying on derived classes, they are potential primaries but not inherent primaries:




  • the offset at which they reside will only be determined in the most derived class;

  • the offset might or might not be zero;

  • a nul offset implies overlaying of the base, so the vtable of each directly derived class must be a match for the vtable of the base;

  • a non nul offset implies non trivial conversions, so the entries in the vtables must treat conversion of the pointers to the virtual base as needing a runtime conversion (except when overlaid obviously as it wouldn't be necessary not possible).


This means that when overriding a virtual function in a virtual base, an adjustment is always assumed to be potentially needed, but in some cases no adjustment will be needed.



A morally virtual base is a base class relationship that involves a virtual inheritance (possibly plus non virtual inheritance). Performing a derived to base conversion, specifically converting a pointer d to derived D, to base B, a conversion to...





  • ...a non-morally virtual base is inherently reversible in every case:




    • there is a one to one relation between the identity of a subobject B of a D and a D (which might be a subobject itself);

    • the reverse operation can be performed with a static_cast<D*>: static_cast<D*>((B*)d) is d;




  • (in any C++ like language with complete support for unifying and duplicating inheritance) ...a morally virtual base is inherently non reversible in the general case (although it's reversible in common case with simple hierarchies). Note that:





    • static_cast<D*>((B*)d) is ill formed;


    • dynamic_cast<D*>((B*)d) will work for the simple cases.




So let's called virtual covariance the case where the covariance of the return type is based on morally virtual base. When overriding with virtual covariance, the calling convention cannot assume the base will be at a known offset. So a new vtable entry is inherently needed for virtual covariance, whether or not the overridden declaration is in an inherent primary:



struct VB { virtual void f(); }; // almost empty
struct D : virtual VB { }; // VB is potential primary

struct Ba { virtual VB * g(); };
struct Da : Ba { // non virtual base, so Ba is inherent primary
D * g(); // virtually covariant: D->VB is morally virtual
};


Here VB may be at offset zero in D and no adjustment may be needed (for example for a complete object of type D), but it isn't always the case in a D subobject: when dealing with pointers to D, one cannot know whether that is the case.



When Da::g() overrides Ba::g() with virtual covariance, the general case must be assumed so a new vtable entry is strictly needed for Da::g() as there is no possible down pointer conversion from VB to D that reverses the D to VB pointer conversion in the general case.



Ba is an inherent primary in Da so the semantics of Ba::vptr are shared/enhanced:




  • there are additional guarantees/invariants on that scalar member, and the vtable is extended;

  • no new vptr is needed for Da.


So the Da_vtable (inherently compatible with Ba_vtable) needs two distinct entries for virtual calls to g():




  • in the Ba_vtable part of the vtable: Ba::g() vtable entry: calls final overrider of Ba::g() with an implicit this parameter of Ba* and returns a VB* value.

  • in the new members part of the vtable: Da::g() vtable entry: calls final overrider of Da::g() (which by is inherently the same as final overrider of Ba::g() in C++) with an implicit this parameter of Da* and returns a D* value.


Note that there is not really any ABI freedom here: the fundamentals of vptr/vtable design and their intrinsic properties imply the presence of these multiple entries for what is a unique virtual function at the high language level.



Note that making the virtual function body inline and a visible by the ABI (so that the ABI by classes with different inline function definitions could be made incompatible, allowing more information to inform memory layout) wouldn't possibly help, as inline code would only define what a call to a non overridden virtual function does: one cannot based the ABI decisions on choices that can be overridden in derived classes.



[Example of a virtual covariance that ends up being only trivially covariant as in a complete D the offset for VB is trivial and no adjustment code would have been necessary in that case:



struct Da : Ba { // non virtual base, so inherent primary
D * g() { return new D; } // VB really is primary in complete D
// so conversion to VB* is trivial here
};


Note that in that code an incorrect code generation for a virtual call by a buggy compiler that would use the Ba_vtable entry to call g() would actually work because covariance ends up being trivial, as VB is primary in complete D.



The calling convention is for the general case and such code generation would fail with code that returns an object of a different class.



--end example]



But if Da::g() is final in the ABI, only virtual calls can be made via the VB * g(); declaration: covariance is made purely static, the derived to base conversion is be done at compile time as the last step of the virtual thunk, as if virtual covariance was never used.



Possible extension of final



There are two types of virtual-ness in C++: member functions (matched by function signature) and inheritance (match by class name). If final stops overriding a virtual function, could it be applied to base classes in a C++-like language?



First we need to define what is overriding a virtual base inheritance:



An "almost direct" subobject relation means that a indirect subobject is controlled almost as a direct subobject:




  • an almost direct subobject can be initialized like a direct subobject;

  • access control is never a really obstacle to access (inaccessible private almost direct subobjects can be made accessible at discretion).


Virtual inheritance provides almost direct access:




  • constructor for each virtual bases must be called by ctor-init-list of the constructor of the most derived class;

  • when a virtual base class is inaccessible because declared private in a base class, or publicly inherited in a private base class of a base class, the derived class has the discretion to declare the virtual base as a virtual base again, making it accessible.


A way to formalize virtual base overriding is to make an imaginary inheritance declaration in each derived class that overrides base class virtual inheritance declarations:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D
// , virtual VB // imaginary overrider of D inheritance of VB
{
// DD () : VB() { } // implicit definition
};


Now C++ variants that support both forms of inheritance don't have to have C++ semantic of almost direct access in all derived classes:



struct VB { virtual void f(); };
struct D : virtual VB { };
struct DD : D, virtual final VB {
// DD () : VB() { } // implicit definition
};


Here the virtual-ness of the VB base is frozen and cannot be used in further derived classes; the virtual-ness is made invisible and inaccessible to derived classes and the location of VB is fixed.



struct DDD : DD {
DD () :
VB() // error: not an almost direct subobject
{ }
};
struct DD2 : D, virtual final VB {
// DD2 () : VB() { } // implicit definition
};
struct Diamond : DD, DD2 // error: no unique final overrider
{ // for ": virtual VB"
};


The virtual-ness freeze makes it illegal to unify Diamond::DD::VB and Diamond::DD2::VB but virtual-ness of VB requires unification which makes Diamond a contradictory, illegal class definition: no class can ever derive from both DD and DD2 [analog/example: just like no useful class can directly derive from A1 and A2:



struct A1 {
virtual int f() = 0;
};
struct A2 {
virtual unsigned f() = 0;
};
struct UselessAbstract : A1, A2 {
// no possible declaration of f() here
// none of the inherited virtual functions can be overridden
// in UselessAbstract or any derived class
};


Here UselessAbstract is abstract and no derived class are too, making that ABC (abstract base class) extremely silly, as any pointer to UselessAbstract is provably a null pointer.



-- end analog/example]



That would provide a way to freeze virtual inheritance, to provide meaningful private inheritance of classes with virtual base (without it derived classes can usurp the relationship between a class and its private base class).



Such use of final would of course freeze the location of a virtual base in a derived class and its further derived classes, avoiding additional vtable entries that are only needed because the location of virtual base isn't fixed.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 27 '18 at 22:01









curiousguycuriousguy

4,51522943




4,51522943








  • 1





    Could you make a tl;dr for this?

    – Mark Ransom
    Nov 27 '18 at 22:49











  • @MarkRansom In theory final allows vtable to be shorter in a few special cases, so ABI incompatibility is conceivable.

    – curiousguy
    Nov 27 '18 at 22:51











  • I didn't even consider the cases of obviously useless virtualness, like declaring a new virtual final function or new virtual final inheritance.

    – curiousguy
    Nov 28 '18 at 2:26














  • 1





    Could you make a tl;dr for this?

    – Mark Ransom
    Nov 27 '18 at 22:49











  • @MarkRansom In theory final allows vtable to be shorter in a few special cases, so ABI incompatibility is conceivable.

    – curiousguy
    Nov 27 '18 at 22:51











  • I didn't even consider the cases of obviously useless virtualness, like declaring a new virtual final function or new virtual final inheritance.

    – curiousguy
    Nov 28 '18 at 2:26








1




1





Could you make a tl;dr for this?

– Mark Ransom
Nov 27 '18 at 22:49





Could you make a tl;dr for this?

– Mark Ransom
Nov 27 '18 at 22:49













@MarkRansom In theory final allows vtable to be shorter in a few special cases, so ABI incompatibility is conceivable.

– curiousguy
Nov 27 '18 at 22:51





@MarkRansom In theory final allows vtable to be shorter in a few special cases, so ABI incompatibility is conceivable.

– curiousguy
Nov 27 '18 at 22:51













I didn't even consider the cases of obviously useless virtualness, like declaring a new virtual final function or new virtual final inheritance.

– curiousguy
Nov 28 '18 at 2:26





I didn't even consider the cases of obviously useless virtualness, like declaring a new virtual final function or new virtual final inheritance.

– curiousguy
Nov 28 '18 at 2:26













0














I believe that adding the final keyword should not be ABI breaking, however removing it from an existing class might render some optimizations invalid. For example, consider this:



// in car.h
struct Vehicle { virtual void honk() { } };
struct Car final : Vehicle { void honk() override { } };

// in car.cpp

// Here, the compiler can assume that no derived class of Car can be passed,
// and so `honk()` can be devirtualized. However, if Car is not final
// anymore, this optimization is invalid.
void foo(Car* car) { car->honk(); }


If foo is compiled separately and e.g. shipped in a shared library, removing final (and hence making it possible for users to derive from Car) could render the optimization invalid.



I'm not 100% sure about this though, some of it is speculation.






share|improve this answer



















  • 1





    Wouldn't removing final violate the ODR? At that point you'd get undefined behavior.

    – Mark Ransom
    Nov 27 '18 at 22:47













  • Why would it be a violation of the ODR?

    – Louis Dionne
    Nov 28 '18 at 19:39






  • 1





    You can't have two conflicting definitions of the same object. Your example is exactly why it isn't allowed. You're talking about compiling twice, once with an old definition and once with a new definition.

    – Mark Ransom
    Nov 28 '18 at 19:54











  • Well.. that's almost always what we do, right? We compile a .so against one set of headers and ship it to users. They build applications using those headers and they link to the .so. Then we change the headers (in ways we believe not to be ABI-breaking), we recompile the .so, and we ship it again, and we expect their application to still work. With your interpretation, any change to the definition of the class is an ODR-violation. FWIW, the standard library adds things like private member functions to classes all the time. With your interpretation, the answer to this question is "yes".

    – Louis Dionne
    Dec 3 '18 at 14:40













  • The standard library expects you to recompile everything when you get a new set of headers. Especially if new private members were added you'd get a different object size, and it's critical for the compiler to know that.

    – Mark Ransom
    Dec 3 '18 at 14:47
















0














I believe that adding the final keyword should not be ABI breaking, however removing it from an existing class might render some optimizations invalid. For example, consider this:



// in car.h
struct Vehicle { virtual void honk() { } };
struct Car final : Vehicle { void honk() override { } };

// in car.cpp

// Here, the compiler can assume that no derived class of Car can be passed,
// and so `honk()` can be devirtualized. However, if Car is not final
// anymore, this optimization is invalid.
void foo(Car* car) { car->honk(); }


If foo is compiled separately and e.g. shipped in a shared library, removing final (and hence making it possible for users to derive from Car) could render the optimization invalid.



I'm not 100% sure about this though, some of it is speculation.






share|improve this answer



















  • 1





    Wouldn't removing final violate the ODR? At that point you'd get undefined behavior.

    – Mark Ransom
    Nov 27 '18 at 22:47













  • Why would it be a violation of the ODR?

    – Louis Dionne
    Nov 28 '18 at 19:39






  • 1





    You can't have two conflicting definitions of the same object. Your example is exactly why it isn't allowed. You're talking about compiling twice, once with an old definition and once with a new definition.

    – Mark Ransom
    Nov 28 '18 at 19:54











  • Well.. that's almost always what we do, right? We compile a .so against one set of headers and ship it to users. They build applications using those headers and they link to the .so. Then we change the headers (in ways we believe not to be ABI-breaking), we recompile the .so, and we ship it again, and we expect their application to still work. With your interpretation, any change to the definition of the class is an ODR-violation. FWIW, the standard library adds things like private member functions to classes all the time. With your interpretation, the answer to this question is "yes".

    – Louis Dionne
    Dec 3 '18 at 14:40













  • The standard library expects you to recompile everything when you get a new set of headers. Especially if new private members were added you'd get a different object size, and it's critical for the compiler to know that.

    – Mark Ransom
    Dec 3 '18 at 14:47














0












0








0







I believe that adding the final keyword should not be ABI breaking, however removing it from an existing class might render some optimizations invalid. For example, consider this:



// in car.h
struct Vehicle { virtual void honk() { } };
struct Car final : Vehicle { void honk() override { } };

// in car.cpp

// Here, the compiler can assume that no derived class of Car can be passed,
// and so `honk()` can be devirtualized. However, if Car is not final
// anymore, this optimization is invalid.
void foo(Car* car) { car->honk(); }


If foo is compiled separately and e.g. shipped in a shared library, removing final (and hence making it possible for users to derive from Car) could render the optimization invalid.



I'm not 100% sure about this though, some of it is speculation.






share|improve this answer













I believe that adding the final keyword should not be ABI breaking, however removing it from an existing class might render some optimizations invalid. For example, consider this:



// in car.h
struct Vehicle { virtual void honk() { } };
struct Car final : Vehicle { void honk() override { } };

// in car.cpp

// Here, the compiler can assume that no derived class of Car can be passed,
// and so `honk()` can be devirtualized. However, if Car is not final
// anymore, this optimization is invalid.
void foo(Car* car) { car->honk(); }


If foo is compiled separately and e.g. shipped in a shared library, removing final (and hence making it possible for users to derive from Car) could render the optimization invalid.



I'm not 100% sure about this though, some of it is speculation.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 25 '18 at 4:25









Louis DionneLouis Dionne

2,367928




2,367928








  • 1





    Wouldn't removing final violate the ODR? At that point you'd get undefined behavior.

    – Mark Ransom
    Nov 27 '18 at 22:47













  • Why would it be a violation of the ODR?

    – Louis Dionne
    Nov 28 '18 at 19:39






  • 1





    You can't have two conflicting definitions of the same object. Your example is exactly why it isn't allowed. You're talking about compiling twice, once with an old definition and once with a new definition.

    – Mark Ransom
    Nov 28 '18 at 19:54











  • Well.. that's almost always what we do, right? We compile a .so against one set of headers and ship it to users. They build applications using those headers and they link to the .so. Then we change the headers (in ways we believe not to be ABI-breaking), we recompile the .so, and we ship it again, and we expect their application to still work. With your interpretation, any change to the definition of the class is an ODR-violation. FWIW, the standard library adds things like private member functions to classes all the time. With your interpretation, the answer to this question is "yes".

    – Louis Dionne
    Dec 3 '18 at 14:40













  • The standard library expects you to recompile everything when you get a new set of headers. Especially if new private members were added you'd get a different object size, and it's critical for the compiler to know that.

    – Mark Ransom
    Dec 3 '18 at 14:47














  • 1





    Wouldn't removing final violate the ODR? At that point you'd get undefined behavior.

    – Mark Ransom
    Nov 27 '18 at 22:47













  • Why would it be a violation of the ODR?

    – Louis Dionne
    Nov 28 '18 at 19:39






  • 1





    You can't have two conflicting definitions of the same object. Your example is exactly why it isn't allowed. You're talking about compiling twice, once with an old definition and once with a new definition.

    – Mark Ransom
    Nov 28 '18 at 19:54











  • Well.. that's almost always what we do, right? We compile a .so against one set of headers and ship it to users. They build applications using those headers and they link to the .so. Then we change the headers (in ways we believe not to be ABI-breaking), we recompile the .so, and we ship it again, and we expect their application to still work. With your interpretation, any change to the definition of the class is an ODR-violation. FWIW, the standard library adds things like private member functions to classes all the time. With your interpretation, the answer to this question is "yes".

    – Louis Dionne
    Dec 3 '18 at 14:40













  • The standard library expects you to recompile everything when you get a new set of headers. Especially if new private members were added you'd get a different object size, and it's critical for the compiler to know that.

    – Mark Ransom
    Dec 3 '18 at 14:47








1




1





Wouldn't removing final violate the ODR? At that point you'd get undefined behavior.

– Mark Ransom
Nov 27 '18 at 22:47







Wouldn't removing final violate the ODR? At that point you'd get undefined behavior.

– Mark Ransom
Nov 27 '18 at 22:47















Why would it be a violation of the ODR?

– Louis Dionne
Nov 28 '18 at 19:39





Why would it be a violation of the ODR?

– Louis Dionne
Nov 28 '18 at 19:39




1




1





You can't have two conflicting definitions of the same object. Your example is exactly why it isn't allowed. You're talking about compiling twice, once with an old definition and once with a new definition.

– Mark Ransom
Nov 28 '18 at 19:54





You can't have two conflicting definitions of the same object. Your example is exactly why it isn't allowed. You're talking about compiling twice, once with an old definition and once with a new definition.

– Mark Ransom
Nov 28 '18 at 19:54













Well.. that's almost always what we do, right? We compile a .so against one set of headers and ship it to users. They build applications using those headers and they link to the .so. Then we change the headers (in ways we believe not to be ABI-breaking), we recompile the .so, and we ship it again, and we expect their application to still work. With your interpretation, any change to the definition of the class is an ODR-violation. FWIW, the standard library adds things like private member functions to classes all the time. With your interpretation, the answer to this question is "yes".

– Louis Dionne
Dec 3 '18 at 14:40







Well.. that's almost always what we do, right? We compile a .so against one set of headers and ship it to users. They build applications using those headers and they link to the .so. Then we change the headers (in ways we believe not to be ABI-breaking), we recompile the .so, and we ship it again, and we expect their application to still work. With your interpretation, any change to the definition of the class is an ODR-violation. FWIW, the standard library adds things like private member functions to classes all the time. With your interpretation, the answer to this question is "yes".

– Louis Dionne
Dec 3 '18 at 14:40















The standard library expects you to recompile everything when you get a new set of headers. Especially if new private members were added you'd get a different object size, and it's critical for the compiler to know that.

– Mark Ransom
Dec 3 '18 at 14:47





The standard library expects you to recompile everything when you get a new set of headers. Especially if new private members were added you'd get a different object size, and it's critical for the compiler to know that.

– Mark Ransom
Dec 3 '18 at 14:47











0














If you do not introduce new virtual methods in your final class (only override methods of parent class) you should be ok (the virtual table is going to be the same as the parent object, because it must be able to be called with a pointer to parent), if you introduce virtual methods the compiler can indeed ignore the virtual specifier and only generate standard methods, e.g:



class A {
virtual void f();
};

class B final : public A {
virtual void f(); // <- should be ok
virtual void g(); // <- not ok
};


The idea is that every time in C++ that you can invoke the method g() you have a pointer/reference whose static and dynamic type is B: static because the method does not exist except for B and his children, dynamic because final ensures that B has no children. For this reason you never need to do virtual dispatch to call the right g() implementation (because there can be only one), and the compiler might (and should) not add it to the virtual table for B - while it is forced to do so if the method could be overridden. This is basically the whole point for which the final keyword exist as far as I understand






share|improve this answer


























  • please note that even the case where the function exists already does not give you any guarantee that the ABI won't change, although it is reasonable it won't, because how each compiler implements virtual is not specified.

    – pqnet
    Nov 28 '18 at 0:46











  • @curiousguy yeh I do like the father/children methaphor, and wasn't going for a formal and complete answer because there is already one here and it's pretty good

    – pqnet
    Nov 28 '18 at 1:04











  • @curiousguy yes it is. Basically I expect the virtual table for B that express the A interface to be fine, not so much for the B interface if it is different from A

    – pqnet
    Nov 28 '18 at 1:22






  • 1





    OK then I'm deleting my previous comments to reduce the clutter.

    – curiousguy
    Nov 28 '18 at 1:26






  • 1





    @curiousguy That is the point. He may rely on them being present in the virtual method table to invoke them in some other method (e.g., COM) because he wrote virtual but the compiler might outsmart him and optimize that away (because he knows it's final)

    – pqnet
    Jan 7 at 14:47
















0














If you do not introduce new virtual methods in your final class (only override methods of parent class) you should be ok (the virtual table is going to be the same as the parent object, because it must be able to be called with a pointer to parent), if you introduce virtual methods the compiler can indeed ignore the virtual specifier and only generate standard methods, e.g:



class A {
virtual void f();
};

class B final : public A {
virtual void f(); // <- should be ok
virtual void g(); // <- not ok
};


The idea is that every time in C++ that you can invoke the method g() you have a pointer/reference whose static and dynamic type is B: static because the method does not exist except for B and his children, dynamic because final ensures that B has no children. For this reason you never need to do virtual dispatch to call the right g() implementation (because there can be only one), and the compiler might (and should) not add it to the virtual table for B - while it is forced to do so if the method could be overridden. This is basically the whole point for which the final keyword exist as far as I understand






share|improve this answer


























  • please note that even the case where the function exists already does not give you any guarantee that the ABI won't change, although it is reasonable it won't, because how each compiler implements virtual is not specified.

    – pqnet
    Nov 28 '18 at 0:46











  • @curiousguy yeh I do like the father/children methaphor, and wasn't going for a formal and complete answer because there is already one here and it's pretty good

    – pqnet
    Nov 28 '18 at 1:04











  • @curiousguy yes it is. Basically I expect the virtual table for B that express the A interface to be fine, not so much for the B interface if it is different from A

    – pqnet
    Nov 28 '18 at 1:22






  • 1





    OK then I'm deleting my previous comments to reduce the clutter.

    – curiousguy
    Nov 28 '18 at 1:26






  • 1





    @curiousguy That is the point. He may rely on them being present in the virtual method table to invoke them in some other method (e.g., COM) because he wrote virtual but the compiler might outsmart him and optimize that away (because he knows it's final)

    – pqnet
    Jan 7 at 14:47














0












0








0







If you do not introduce new virtual methods in your final class (only override methods of parent class) you should be ok (the virtual table is going to be the same as the parent object, because it must be able to be called with a pointer to parent), if you introduce virtual methods the compiler can indeed ignore the virtual specifier and only generate standard methods, e.g:



class A {
virtual void f();
};

class B final : public A {
virtual void f(); // <- should be ok
virtual void g(); // <- not ok
};


The idea is that every time in C++ that you can invoke the method g() you have a pointer/reference whose static and dynamic type is B: static because the method does not exist except for B and his children, dynamic because final ensures that B has no children. For this reason you never need to do virtual dispatch to call the right g() implementation (because there can be only one), and the compiler might (and should) not add it to the virtual table for B - while it is forced to do so if the method could be overridden. This is basically the whole point for which the final keyword exist as far as I understand






share|improve this answer















If you do not introduce new virtual methods in your final class (only override methods of parent class) you should be ok (the virtual table is going to be the same as the parent object, because it must be able to be called with a pointer to parent), if you introduce virtual methods the compiler can indeed ignore the virtual specifier and only generate standard methods, e.g:



class A {
virtual void f();
};

class B final : public A {
virtual void f(); // <- should be ok
virtual void g(); // <- not ok
};


The idea is that every time in C++ that you can invoke the method g() you have a pointer/reference whose static and dynamic type is B: static because the method does not exist except for B and his children, dynamic because final ensures that B has no children. For this reason you never need to do virtual dispatch to call the right g() implementation (because there can be only one), and the compiler might (and should) not add it to the virtual table for B - while it is forced to do so if the method could be overridden. This is basically the whole point for which the final keyword exist as far as I understand







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 28 '18 at 1:24

























answered Nov 28 '18 at 0:40









pqnetpqnet

3,31111742




3,31111742













  • please note that even the case where the function exists already does not give you any guarantee that the ABI won't change, although it is reasonable it won't, because how each compiler implements virtual is not specified.

    – pqnet
    Nov 28 '18 at 0:46











  • @curiousguy yeh I do like the father/children methaphor, and wasn't going for a formal and complete answer because there is already one here and it's pretty good

    – pqnet
    Nov 28 '18 at 1:04











  • @curiousguy yes it is. Basically I expect the virtual table for B that express the A interface to be fine, not so much for the B interface if it is different from A

    – pqnet
    Nov 28 '18 at 1:22






  • 1





    OK then I'm deleting my previous comments to reduce the clutter.

    – curiousguy
    Nov 28 '18 at 1:26






  • 1





    @curiousguy That is the point. He may rely on them being present in the virtual method table to invoke them in some other method (e.g., COM) because he wrote virtual but the compiler might outsmart him and optimize that away (because he knows it's final)

    – pqnet
    Jan 7 at 14:47



















  • please note that even the case where the function exists already does not give you any guarantee that the ABI won't change, although it is reasonable it won't, because how each compiler implements virtual is not specified.

    – pqnet
    Nov 28 '18 at 0:46











  • @curiousguy yeh I do like the father/children methaphor, and wasn't going for a formal and complete answer because there is already one here and it's pretty good

    – pqnet
    Nov 28 '18 at 1:04











  • @curiousguy yes it is. Basically I expect the virtual table for B that express the A interface to be fine, not so much for the B interface if it is different from A

    – pqnet
    Nov 28 '18 at 1:22






  • 1





    OK then I'm deleting my previous comments to reduce the clutter.

    – curiousguy
    Nov 28 '18 at 1:26






  • 1





    @curiousguy That is the point. He may rely on them being present in the virtual method table to invoke them in some other method (e.g., COM) because he wrote virtual but the compiler might outsmart him and optimize that away (because he knows it's final)

    – pqnet
    Jan 7 at 14:47

















please note that even the case where the function exists already does not give you any guarantee that the ABI won't change, although it is reasonable it won't, because how each compiler implements virtual is not specified.

– pqnet
Nov 28 '18 at 0:46





please note that even the case where the function exists already does not give you any guarantee that the ABI won't change, although it is reasonable it won't, because how each compiler implements virtual is not specified.

– pqnet
Nov 28 '18 at 0:46













@curiousguy yeh I do like the father/children methaphor, and wasn't going for a formal and complete answer because there is already one here and it's pretty good

– pqnet
Nov 28 '18 at 1:04





@curiousguy yeh I do like the father/children methaphor, and wasn't going for a formal and complete answer because there is already one here and it's pretty good

– pqnet
Nov 28 '18 at 1:04













@curiousguy yes it is. Basically I expect the virtual table for B that express the A interface to be fine, not so much for the B interface if it is different from A

– pqnet
Nov 28 '18 at 1:22





@curiousguy yes it is. Basically I expect the virtual table for B that express the A interface to be fine, not so much for the B interface if it is different from A

– pqnet
Nov 28 '18 at 1:22




1




1





OK then I'm deleting my previous comments to reduce the clutter.

– curiousguy
Nov 28 '18 at 1:26





OK then I'm deleting my previous comments to reduce the clutter.

– curiousguy
Nov 28 '18 at 1:26




1




1





@curiousguy That is the point. He may rely on them being present in the virtual method table to invoke them in some other method (e.g., COM) because he wrote virtual but the compiler might outsmart him and optimize that away (because he knows it's final)

– pqnet
Jan 7 at 14:47





@curiousguy That is the point. He may rely on them being present in the virtual method table to invoke them in some other method (e.g., COM) because he wrote virtual but the compiler might outsmart him and optimize that away (because he knows it's final)

– pqnet
Jan 7 at 14:47


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53387917%2fdoes-making-a-derived-c-class-final-change-the-abi%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

Calculate evaluation metrics using cross_val_predict sklearn

Insert data from modal to MySQL (multiple modal on website)