Tuesday, September 27, 2016

Set in my ways

Google's Chrome browser, being built from the open source Chromium project, is written almost entirely in C++. I, being a long time Object Pascal guy, have had to adjust to using C++ exclusively. This isn't too much of an issue since I've also had to use C/C++ while I worked on the RAD Studio development team. Most notably, the Object Pascal/Delphi compiler is written in mostly C with a smattering of C++, the editor kernel (sans display rendering) and debugger engine (process control/symbol table management) were written in C++. All of which I've worked on throughout my 24+ years on that team.


Given all that history of working with a huge Object Pascal codebase, I've begun to miss some language features. Until you have to live without them for a while, it's the small things that you take for granted.

Recently, I've been working on some ideas for the Chromium "views" framework. In the course of this work, I found that a set type would be the best way to represent this bit (heh, get it :)?) of data. C++ doesn't have the notion of a set type at all. While a set can be declared using a sub-range of simple ordinal data types (Byte, Integer, Word, etc..), it's most useful when declared using an enumeration type. By using an enumeration as the underlying type, you can give each element in the set a nice descriptive name. Also, since enumerations are not type-compatible with normal ordinal data-types or even with other enumerations, a decent level of type-safety is ensured.

But wait, you say! Sets are just a collection of bits that you can easily manipulate with bit-wise operators, right? Yes and no. Sets are a little more abstract than being mere bits. Yes, they are implemented using a packed collections of bits but that is just a detail of the implementation. They could just as easily been implemented using an array of Booleans (bool for you C++ folks). Yes it's inefficient and memory intensive, but can be done.

Hold on silly! There's an STL class, std::bitset that implements a nice, efficient variable size bit set. Yes, yes it does. It was, in fact, considered. However, all it does is wrap up a few bit-wise operations and make them a little easier than direct manipulation, especially for bit-sets > sizeof(int). You cannot easily do union or intersection operations or even check if groups of elements are "in" the set. So scratch the std::bitset type.

C++Builder, being Delphi's C++ counterpart, must be able to consume, at the binary level all the libraries which are implemented in Delphi's Object Pascal language. However, since there is no C++ analog for Object Pascal set types, a special template type was introduced that "wrapped" set typed data. This allowed C++ code to easily manipulate the data using C++ without having to directly manipulate the actual underlying data structure.

Taking a cue from C++Builder, I implemented similar such template that may (or may not) show up in Chromium. Unlike the C++Builder version, I only intend to allow enumerations to be the underlying type. There should be no need to support a set with more than 32 elements, so larger sets need not be supported. I also wanted to "modernize" the template by using more recent C++ language/library features. C++11 and C++14 (C++17 isn't out yet and it's features aren't yet supported in Chromium) introduced a lot of nice features that make implementing templates more type-safe and constrained. For instance, std::is_enum<T>, can be used in a static_assert() statement which can trigger a compile-time error if the template isn't instantiated with an enumeration type.

Another really nice feature of C++11 is the std::initializer_list<T>. By using this template, the C++ compiler will, in many instances, convert literal arrays into an appropriately typed instantiation of that template. The template also implements several operators which correspond to Object Pascal's use of operators on sets. The '*' operator does an intersection, '+' is a union, and '-' is difference.

Object Pascal has one operator that is unique to sets, the 'in' operator. The 'in' operator tests whether a given set (right-hand side) "contains" the given element (left-hand side). The array, '[]', operator seemed like a viable analog. It treats the set as an array of 'bool' indexed by the enumeration. To allow direct assignment to the set element, I'll need to implement a proxy.  The '[]' operator is implemented as calling the the Contains() method. Leveraging the std::initializer_list<T> type, I also addded ContainsAll() and ContainsAny(). While Contains() checks that a single element is within the set, the others check for he existence of multiple elements in one operation.

There are also multiple overloads of the "Equals" method that go beyond simply overloading the "==" operator (which certainly is provided). A very common statement when using sets in Object Pascal is:

  ...
  if ASet * [Element1, Element2] = [Element2] then
  ...

This creates a temporary set that is the intersection of ASet and the literal [Element1, Element2] set. It then compares that temporary set to the literal [Element2] set. The expression is ensuring that Element2 and not Element1 is in ASet.

There are a couple of "Equals" overloads that implement that expression using std::initializer_list<T> as the parameters. It allows for the use of literals for the elements.

  ...
  if (ASet.Equals({Element1, Element2}, {Element2}) {
  }
  ...

In Delphi's implementation of Object Pascal, there are also a couple of standard procedures, Include & Exclude that allow for easily adding/removing elements from the set. While the '+' & '-' operators can add any number of elements to the set, Include and Exclude are more descriptive syntactically.

Because C++ allows function results to be a reference, that makes it possible to chain function calls such as:

  ...
  ASet.Include(Element1).Include(Element2).Exclude(Element4);
  ...

This would ensure Element1 & Element2 are in, and Element4 is not in the ASet variable.

I did include the '<<' and '>>' operators as on the C++Builder set template. However, when I consulted with one of my former colleagues on the C++Builder team, it was mentioned that many C++Builder customers found those operators to be a little obscure and non-intuitive. The above statement could also have been written like this:

  ...
  ASet << Element1 << Element2 >> Element4;
  ...

Personally, that looks OK to me. C'mon, the '<<' and '>>' operators are overloaded for I/O streams.. in either case, no actual "bit shifting" is happening. I suppose one could view them as "shifting to or from the stream." The same could read as "shifting 'element' into the set or shifting 'element' from the set." Meh. They're there. (Yes, yes, I know. There are many who find the C++ I/O stream operators '<<' and '>>' obtuse even for that purpose).

So, to use this template, it's as simple as declaring an enumeration (contiguous, unassigned elements, please).

  ...
  enum class Anchor {
    Left,
    Top,
    Right,
    Bottom,
  }
  using Anchors = SetOf<Anchor, Anchor::Bottom>;
  ...

I used a scoped enum, but a normal enum could also have been used. Scoped enums are generally better to use since the elements don't pollute the containing namespace with extra identifiers.

Object Pascal allows a set to be created from a sub-range of an enumeration, so the C++Builder template has a 'minEl' parameter to the template in addition to the 'maxEl' parameter. Didn't need that here since sub-ranges are another language feature for which there is no analog in C++.

Later, I'll write up some more information about what all this means for Chromium and the work on the internal views framework. Also, I'm no real C++ guru here, so I'm sure there may be more elegant techniques. For instance, I may want to implement a move constructor if that allows the compiler to generate some more efficient code in certain situations. Once I understand them more and the semantics, I'll look into it.

UPDATE: Fixed issue pointed out by Christoph Hillefeld in the comments.

// Copyright (c) 2016 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

#ifndef BASE_SET_TEMPLATE_H_
#define BASE_SET_TEMPLATE_H_

#include <initializer_list>
#include <type_traits>

namespace base {

template <class T, unsigned char maxEl>
class SetCore {
 public:
  typedef
      typename std::conditional<((maxEl / 8 + 1) == 1), unsigned char,
      typename std::conditional<((maxEl / 8 + 1) == 2), unsigned short,
      unsigned int>::type>::type _Ty;

 private:
  static_assert(std::is_enum<T>::value, "T must be an enumeration");
  static_assert(maxEl < 32, "Sets can only contain up to 32 elements");

 protected:
  SetCore() : data(0) {}

 public:
  _Ty data;
};

template <class T, T maxEl>
class SetOf : public SetCore<T, static_cast<unsigned char>(maxEl)> {
  static constexpr unsigned char NumElms =
      static_cast<unsigned char>(maxEl) + 1;
  static constexpr unsigned char MaxEl = NumElms - 1;

 public:
  class ElemRef {
    friend class SetOf<T, maxEl>;
   public:
    ~ElemRef() {}

    ElemRef& operator=(bool rhs) {
      if (rhs)
        set_->Include(el_);
      else
        set_->Exclude(el_);
      return *this;
    }

    ElemRef& operator=(const ElemRef& elemref) {
      set_->operator=(bool(elemref));
      return *this;
    }

    operator bool() const {
      return set_->Contains(el_);
    }

   private:
    ElemRef() : set_(nullptr), el_(static_cast<T>(0)) {}
    ElemRef(SetOf<T, maxEl>& set, T el) : set_(&set), el_(el)  {}

    SetOf<T, maxEl>* set_;
    T el_;
  };

  SetOf() : SetCore<T, MaxEl>() {}

  SetOf(const SetOf& src) : SetCore<T, MaxEl>() {
    this->data = src.data;
  }

  explicit SetOf(const int& src) : SetCore<T, MaxEl>() {
    union {
      typename SetCore<T, MaxEl>::_Ty data;
      unsigned int IntData;
    } overlay;
    overlay.IntData = src;
    this->data = overlay.data;
  }

  SetOf(const std::initializer_list<T>& src) : SetCore<T, MaxEl>() {
    Include(src);
  }

  SetOf& operator=(const SetOf& rhs) {
    if (this != &rhs)
      this->data = rhs.data;
    return *this;
  }

  SetOf& operator+=(const SetOf& rhs) {
    this->data |= rhs.data;
    return *this;
  }

  SetOf& operator-=(const SetOf& rhs) {
    this->data ^= rhs.data;
    return *this;
  }

  SetOf& operator*=(const SetOf& rhs) {
    this->data &= rhs.data;
    return *this;
  }

  SetOf operator+(const SetOf& rhs) {
    SetOf<T, maxEl> s;
    s.data = this->data | rhs.data;
    return s;
  }

  SetOf operator-(const SetOf& rhs) {
    SetOf<T, maxEl> s;
    s.data = this->data ^ rhs.data;
    return s;
  }

  SetOf operator*(const SetOf& rhs) {
    SetOf<T, maxEl> s;
    s.data = this->data & rhs.data;
    return s;
  }

  SetOf& operator<<(const T el) {
    return Include(el);
  }

  SetOf& operator>>(const T el) {
    return Exclude(el);
  }

  bool operator==(const SetOf& rhs) const {
    static const typename SetCore<T, MaxEl>::_Ty Mask =
        static_cast<typename SetCore<T, MaxEl>::_Ty>(-1) >>
          (sizeof(this->data) * 8 - NumElms);
    return !(this->data != rhs.data &&
        (Mask & this->data) != (Mask & rhs.data));
  }

  bool operator!=(const SetOf& rhs) const {
    return !operator==(rhs);
  }

  constexpr bool operator[](const T el) const {
    return Contains(el);
  }

  ElemRef operator[](T el) {
    return ElemRef(*this, el);
  }

  SetOf& Clear() {
    this->data = 0;
    return *this;
  }

  bool Contains(const T el) const {
    return static_cast<int>(el) >= 0 && static_cast<int>(el) < NumElms &&
           (this->data & static_cast<int>(1) << static_cast<int>(el)) != 0;
  }

  bool ContainsAll(const std::initializer_list<T>& els) const {
    for (const T& el : els)
      if (!Contains(el))
        return false;
    return true;
  }

  bool ContainsSome(const std::initializer_list<T>& els) const {
    for (const T& el : els)
      if (Contains(el))
        return true;
    return false;
  }

  bool Equals(const std::initializer_list<T>& rhs) const {
    SetOf<T, maxEl> s(rhs);
    return operator==(s);
  }

  bool Equals(const std::initializer_list<T>& mask,
              const std::initializer_list<T>& rhs) const {
    SetOf<T, maxEl> s_mask(mask);
    SetOf<T, maxEl> s_rhs(rhs);
    return (s_mask * *this) == s_rhs;
  }

  bool Equals(const SetOf& mask,
              const std::initializer_list<T>& rhs) const {
    SetOf<T, maxEl> s_rhs(rhs);
    return (mask * *this) == s_rhs;
  }

  bool Equals(const SetOf& rhs) const {
    return *this == rhs;
  }

  SetOf& Exclude(const T el) {
    if (static_cast<int>(el) >= 0 && static_cast<int>(el) < NumElms)
      this->data &= ~(static_cast<int>(1) << static_cast<int>(el));
    return *this;
  }

  SetOf& Exclude(const std::initializer_list<T>& els) {
    for (const T& el : els)
      Exclude(el);
    return *this;
  }

  SetOf& Include(const T el) {
    if (static_cast<int>(el) >= 0 && static_cast<int>(el) < NumElms)
      this->data |= (static_cast<int>(1) << static_cast<int>(el));
    return *this;
  }

  SetOf& Include(const std::initializer_list<T>& els) {
    for (const T& el : els)
      Include(el);
    return *this;
  }

  bool IsEmpty() const { return this->data == 0; }
};

} // namespace base

#endif  // BASE_SET_TEMPLATE_H_

No comments:

Post a Comment

Please keep your comments related to the post on which you are commenting. No spam, personal attacks, or general nastiness. I will be watching and will delete comments I find irrelevant, offensive and unnecessary.