ProblemsModern C++Tokenize without copying
IntermediateModern C++

Tokenize without copying

Context

The log parser you inherited along with the on-call rotation allocates a std::string per token. Logs run a terabyte a day; the profiler says half the CPU goes to malloc. The CFO is asking why the server bill grows faster than revenue. The answer is string_view: slice the lines without copying a single byte.

Task

Split a string into tokens using std::string_view — each token must be a view into the original buffer, not a freshly allocated std::string. Return the non-empty substrings between delimiters.

Constraints

  • Return std::vector<std::string_view> whose elements point into the input — copy no character data
  • Use std::string_view (no per-token std::string)
  • Return only non-empty tokens; consecutive delimiters and leading/trailing delimiters produce no empty entries
  • An empty input returns an empty vector

Before you code

  • What exactly does a std::string_view own? (Nothing — what does that imply for lifetimes?)
  • Why is returning a string_view into a temporary a dangling-view bug?
  • How is splitting with views cheaper than splitting into std::strings?

Tests

  • #1Splits into views
  • #2Tokens are views into the original buffer
  • #3Empty tokens are skipped
  • #4Empty input yields no tokens

Hints

Hint 1

Walk the text tracking the start of the current run; when you hit a delimiter (or the end) and the run is non-empty, emit text.substr(start, i - start).

Hint 2

std::string_view::substr returns another view into the same buffer — it allocates nothing.

Editorstring-view-tokens.cpp
Results

Hit Submit (or ⌘/Ctrl + ↵) — test results will show up here.