about summary refs log tree commit diff stats
diff options
context:
space:
mode:
-rw-r--r--README.md90
1 files changed, 88 insertions, 2 deletions
diff --git a/README.md b/README.md
index f77852b..1ae9c51 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,88 @@
 # Simple Allocators
 
-If you're curious about simple memory allocation schemes in C, maybe you'll be
-able to learn a thing or two here.
+If you're curious about memory allocation schemes in C, maybe you'll be able to
+learn a thing or two here.
+
+## The Allocators
+
+The terminology around different memory allocation strategies is not exactly
+"settled" so different people will call similar things by different names and
+vice versa. In an effort to clear things up at least for the code here, let me
+try to explain what "my" terms mean.
+
+### Arena
+
+An "arena" allocator starts with a big chunk of memory and carves it up into
+smaller slices for allocation; these allocations can be of any size that "still
+fits" into the arena. Here's a picture:
+
+```
+  ________________ capacity ________________
+ /                                          \
++----+--------+----+------+------------------+
+| a1 |   a2   | a3 |  a4  |       free       |
++----+--------+----+------+------------------+
+                          ^
+                          | used
+```
+
+Allocating a slice of `size` bytes is *very* cheap:
+check `used + size <= capacity` and if it fits
+`used += size` finishes the allocation.
+There is no way to free an individual object, only the arena as a whole can be
+freed.
+An arena allocator is best used when a data structure is built up, processed,
+and torn down *completely* again.
+
+### Cached
+
+The "cached" allocator tries to keep a certain number of objects around for
+reuse; this works best if all objects are the same size. Here's a picture:
+
+```
+  0   1   2   3   4   5   6   7
++---+---+---+---+---+---+---+---+
+| * | * | - | - | - | - | - | - |
++-|-+-|-+---+---+---+---+---+---+
+  |   v
+  | +---+
+  | |   |
+  | +---+
+  v
++---+
+|   |
++---+
+```
+
+If we need to allocate another object, we check the cache, return the object
+from slot 1 and clear that slot; no actual allocation was done.
+If we later need to free an object, we check the cache and put the object into
+slot 1; no actual freeing was done.
+Of course if there are no more objects in the cache, we have to allocate, and
+if there's no more room in the cache, we have to free; but if we have a good
+"guess" for the working set (how many objects are needed "on average") we can
+keep the number of actual allocation (or free) operations low.
+
+### Pool
+
+The "pool" allocator starts with a big chunk of memory and carves it up into
+smaller slices for allocation; these allocations are all the same size. Here's
+a picture:
+
+```
+  ________________ capacity ________________
+ /                                          \
++----+----+----+----+----+----+----+----+----+
+| a1 |    | a2 |    |    |    | a3 |    |    |
++----+----+----+----+----+----+----+----+----+
+```
+
+Allocating an object requires finding an unused slot; we use a bitmap for that,
+but other options, such as a free list, would also be possible.
+Individual objects can be freed, unlike in the arena allocator.
+A pool allocator works well if there's a useful upper limit on the number of
+fixed-size objects and if there's considerable dynamism in terms of object
+life-times.
 
 ## Limitations
 
@@ -16,6 +97,11 @@ fixed maximum capacity that may not be appropriate for all programs; code to
 "grow" the arena when necessary might be a good addition.
 Nothing here is thread-safe either, so you may need to throw in a lock or three.
 
+## Benchmark
+
+The simplistic benchmark puts the arena allocator first, followed by the pool
+and cached allocators.
+
 ## License
 
 I very much doubt that you can use the code "as is" for a real system, but feel
239 240 241 242 243
// Read a single-file C++ program having a very specific structure, and split
// it up into multiple separate compilation units to reduce the work needed to
// rebuild after a small change. Write each compilation unit only if it has
// changed from what was on disk before.
//
// This tool is tightly coupled with the build system for this project. The
// makefile already auto-generates various things; we only do here what
// standard unix tools can't easily do.
//
// Usage:
//  cleave [input C++ file] [existing output directory]
//
// The input C++ file has the following structure:
//   [#includes]
//   [type definitions]
//   // Globals
//   [global variable definitions]
//   // End Globals
//   [function definitions]
//
// Afterwards, the output directory contains:
//   header -- everything before the '// Globals' delimiter
//   global_definitions_list -- everything between '// Globals' and '// End Globals' delimiters
//   [.cc files partitioning function definitions]
//
// Each output function definition file contains:
//   #include "header"
//   #include "global_declarations_list"
//   [function definitions]
//
// To preserve the original layer-based line numbers in error messages and the
// debugger, we'll chunk the files only at boundaries where we encounter a
// '#line ' directive (generated by the previous tangle/ stage) between
// functions.
//
// One exception: the first file emitted #includes "global_definitions_list" instead
// of "global_declarations_list"

// Tune this parameter to balance time for initial vs incremental build.
//
//   decrease value -> faster initial build
//   increase value -> faster incremental build
int Num_compilation_units = 3;

#include<assert.h>
#include<cstdlib>
#include<cstring>

#include<vector>
using std::vector;
#include<list>
using std::list;
#include<utility>
using std::pair;

#include<string>
using std::string;

#include<iostream>
using std::istream;
using std::ostream;
using std::cin;
using std::cout;
using std::cerr;

#include<sstream>
using std::istringstream;
using std::ostringstream;

#include<fstream>
using std::ifstream;
using std::ofstream;

#include <locale>
using std::isspace;  // unicode-aware

string trim(const string& s) {
  string::const_iterator first = s.begin();
  while (first != s.end() && isspace(*first))
    ++first;
  if (first == s.end()) return "";

  string::const_iterator last = --s.end();
  while (last != s.begin() && isspace(*last))
    --last;
  ++last;
  return string(first, last);
}

bool starts_with(const string& s, const string& pat) {
  string::const_iterator a=s.begin(), b=pat.begin();
  for (/*nada*/; a!=s.end() && b!=pat.end(); ++a, ++b)
    if (*a != *b) return false;
  return b == pat.end();
}

bool has_data(istream& in) {
  return in && !in.eof();
}

void slurp(const string& filename, vector<string>& lines) {
  lines.clear();
  ifstream in(filename.c_str());
  while (has_data(in)) {
    string curr_line;
    getline(in, curr_line);
    lines.push_back(curr_line);
  }
}

size_t slurp_some_functions(const vector<string>& in, size_t start, vector<string>& out, bool first) {
  out.clear();
  if (start >= in.size()) return start;
  out.push_back("#include \"header\"");
  if (first)
    out.push_back("#include \"global_definitions_list\"");
  else
    out.push_back("#include \"global_declarations_list\"");
  out.push_back("");
  size_t curr = start;
  while (true) {
    if (curr >= in.size()) break;
    if (out.size() >= in.size()/Num_compilation_units) break;
    while (curr < in.size()) {
      // read functions -- lines until unindented '}'
      while (curr < in.size()) {
        const string& line = in.at(curr);
//?         cerr << curr << ": adding to function: " << line << '\n';
        out.push_back(line);  ++curr;
        if (!line.empty() && line.at(0) == '}') break;
      }
      // now look for a '#line' directive before the next non-comment non-empty
      // line
      while (curr < in.size()) {
        const string& line = in.at(curr);
        if (starts_with(line, "#line ")) goto try_return;
        out.push_back(line);   ++curr;
        if (trim(line).empty()) continue;
        if (starts_with(trim(line), "//")) continue;
        break;
      }
    }
    try_return:;
  }
  return curr;
}

// compare contents of a file with a list of lines, ignoring #line directives
// on both sides
bool no_change(const vector<string>& lines, const string& output_filename) {
  vector<string> old_lines;
  slurp(output_filename, old_lines);
  size_t l=0, o=0;
  while (true) {
    while (l < lines.size() &&
            (lines.at(l).empty() || starts_with(lines.at(l), "#line "))) {
      ++l;
    }
    while (o < old_lines.size() &&
            (old_lines.at(o).empty() || starts_with(old_lines.at(o), "#line "))) {
      ++o;
    }
    if (l >= lines.size() && o >= old_lines.size()) return true;  // no change
    if (l >= lines.size() || o >= old_lines.size()) return false;  // contents changed
//?     cerr << "comparing\n";
//?     cerr << o << ": " << old_lines.at(o) << '\n';
//?     cerr << l << ": " << lines.at(l) << '\n';
    if (lines.at(l) != old_lines.at(o)) return false;  // contents changed
    ++l;  ++o;
  }
  assert(false);
}

string next_output_filename(const string& output_directory) {
  static int file_count = 0;
  ostringstream out;
  out << output_directory << "/mu_" << file_count << ".cc";
  file_count++;
  return out.str();
}

void emit_file(const vector<string>& lines, const string& output_filename) {
  if (no_change(lines, output_filename)) return;
  cerr << "  updating " << output_filename << '\n';
  ofstream out(output_filename.c_str());
  for (size_t i = 0; i < lines.size(); ++i)
    out << lines.at(i) << '\n';
}

void emit_compilation_unit(const vector<string>& lines, const string& output_directory) {
  string output_filename = next_output_filename(output_directory);
  emit_file(lines, output_filename);
}

int main(int argc, const char* argv[]) {
  if (argc != 3) {
    cerr << "usage: cleave [input .cc file] [output directory]\n";
    exit(0);
  }

  // read input
  vector<string> lines;
  slurp(argv[1], lines);

  // write header until but excluding '// Global' delimiter
  string output_directory = argv[2];
  size_t line_num = 0;
  {
    vector<string> out;
    while (line_num < lines.size()) {
      const string& line = lines.at(line_num);
      if (trim(line) == "// Globals") break;  // todo: #line directive for delimiters
      out.push_back(line);
      ++line_num;
    }
    emit_file(out, output_directory+"/header");
  }

  // write global_definitions_list (including delimiters)
  {
    vector<string> out;
    while (line_num < lines.size()) {
      const string& line = lines.at(line_num);
      out.push_back(line);
      ++line_num;
      if (trim(line) == "// End Globals") break;
    }
    emit_file(out, output_directory+"/global_definitions_list");
  }

  // segment functions
  // first one is special
  if (line_num < lines.size()) {
    vector<string> function;
    line_num = slurp_some_functions(lines, line_num, function, true);
    emit_compilation_unit(function, output_directory);
  }
  while (line_num < lines.size()) {
    vector<string> function;
    line_num = slurp_some_functions(lines, line_num, function, false);
    emit_compilation_unit(function, output_directory);
  }
}