MimeKit: Newline missing between headers

Describe the bug HeaderList.WriteTo misses writing a \r\n between two headers.

Platform

  • OS: Windows 10 latest
  • .NET Runtime: CoreCLR
  • .NET Framework: .Net Core 3.1
  • MimeKit Version: 2.14.0

To Reproduce Code to reproduce the behavior:

ContentType contentType = ContentType.Parse("text/plain");
Header.TryParse(contentType.ToString(), out Header contentTypeHeader);
HeaderList headers = new HeaderList();
headers.Add(contentTypeHeader);
headers["Content-Encoding"] = "binary";
using (MemoryStream ms = new MemoryStream())
{
	headers.WriteTo(ms);
	Console.Write(Encoding.ASCII.GetString(ms.GetBuffer()));
}

Output:

Content-Type: text/plainContent-Encoding: binary

Expected behavior There should be a \r\n before the Content-Encoding header starts, but it’s written in one line.

When I set both headers without the HeaderList.Add method, the line break will be written as expected.

Am I missing something?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 37 (29 by maintainers)

Commits related to this issue

Most upvoted comments

Rewrote the header parser tonight to allow OnHeaderReadAsync()/OnHeadersEnd/Async() and it seems a lot cleaner now, but may be slower 😥

Working on some benchmarks so I can compare. I also want to build a new MimeParser API on top of the MimeReader and compare performance there as well. Would be great to only have 1 actual parser rather than 2 😂

Current benchmarks show MimeParser parsing the JWZ mbox in less than 260ms?? And MimeReader in 178ms? Seems crazy fast!

Oops, I did it again!

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update)
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=5.0.400
  [Host]     : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT  [AttachedDebugger]
  DefaultJob : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT

|                                             Method |      Mean |    Error |   StdDev |
|--------------------------------------------------- |----------:|---------:|---------:|
|                         MimeParser_StarTrekMessage | 232.25 ms | 1.043 ms | 0.925 ms |
|             ExperimentalMimeParser_StarTrekMessage | 223.45 ms | 1.079 ms | 1.009 ms |
|                         MimeReader_StarTrekMessage | 176.73 ms | 0.498 ms | 0.442 ms |

|                                             Method |      Mean |    Error |   StdDev |
|--------------------------------------------------- |----------:|---------:|---------:|
|               MimeParser_StarTrekMessagePersistent | 201.63 ms | 1.930 ms | 1.711 ms |
|   ExperimentalMimeParser_StarTrekMessagePersistent | 194.39 ms | 0.847 ms | 0.792 ms |

|                                             Method |      Mean |    Error |   StdDev |
|--------------------------------------------------- |----------:|---------:|---------:|
|                       MimeParser_ContentLengthMbox |  18.85 ms | 0.138 ms | 0.123 ms |
|           ExperimentalMimeParser_ContentLengthMbox |  18.16 ms | 0.128 ms | 0.120 ms |
|                       MimeReader_ContentLengthMbox |  12.42 ms | 0.041 ms | 0.034 ms |

|                                             Method |      Mean |    Error |   StdDev |
|--------------------------------------------------- |----------:|---------:|---------:|
|             MimeParser_ContentLengthMboxPersistent |  17.10 ms | 0.123 ms | 0.109 ms |
| ExperimentalMimeParser_ContentLengthMboxPersistent |  16.31 ms | 0.115 ms | 0.102 ms |

|                                             Method |      Mean |    Error |   StdDev |
|--------------------------------------------------- |----------:|---------:|---------:|
|                                 MimeParser_JwzMbox | 180.51 ms | 2.242 ms | 2.097 ms |
|                     ExperimentalMimeParser_JwzMbox | 171.97 ms | 0.570 ms | 0.533 ms |
|                                 MimeReader_JwzMbox | 129.92 ms | 0.637 ms | 0.564 ms |

|                                             Method |      Mean |    Error |   StdDev |
|--------------------------------------------------- |----------:|---------:|---------:|
|                       MimeParser_JwzMboxPersistent | 157.71 ms | 1.073 ms | 0.951 ms |
|           ExperimentalMimeParser_JwzMboxPersistent | 150.19 ms | 0.424 ms | 0.376 ms |

|                                             Method |      Mean |    Error |   StdDev |
|--------------------------------------------------- |----------:|---------:|---------:|
|                        MimeParser_HeaderStressTest |  28.27 ms | 0.102 ms | 0.091 ms |
|            ExperimentalMimeParser_HeaderStressTest |  20.35 ms | 0.243 ms | 0.227 ms |
|                        MimeReader_HeaderStressTest |  12.40 ms | 0.127 ms | 0.119 ms |

I remembered some optimizations that I had made to MimeParser and applied those same things to MimeReader and now ExperimentalMimeParser is always faster than MimeParser.

Oh, and I should mention that MimeParser in the mimereader branch is also faster than MimeKit 2.15.0’s MImeParser 😃

@nitinag I just finished implementing a new MimeParser on top of MimeReader…

Are you ready for the results? 😂

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update)
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=5.0.400
  [Host]     : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT  [AttachedDebugger]
  DefaultJob : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT


|                                             Method |       Mean |     Error |    StdDev |
|--------------------------------------------------- |-----------:|----------:|----------:|
|                         MimeParser_StarTrekMessage | 228.581 ms | 3.0492 ms | 2.8522 ms |
|               MimeParser_StarTrekMessagePersistent | 195.614 ms | 1.1403 ms | 0.9522 ms |
|                       MimeParser_ContentLengthMbox |  18.101 ms | 0.1275 ms | 0.1065 ms |
|             MimeParser_ContentLengthMboxPersistent |  16.392 ms | 0.0849 ms | 0.0753 ms |
|                                 MimeParser_JwzMbox | 172.222 ms | 1.1952 ms | 1.1180 ms |
|                       MimeParser_JwzMboxPersistent | 151.150 ms | 0.5734 ms | 0.5083 ms |
|                        MimeParser_HeaderStressTest |  26.184 ms | 0.0770 ms | 0.0683 ms |
|             ExperimentalMimeParser_StarTrekMessage | 212.104 ms | 0.7758 ms | 0.6878 ms |
|   ExperimentalMimeParser_StarTrekMessagePersistent | 183.906 ms | 0.8915 ms | 0.7903 ms |
|           ExperimentalMimeParser_ContentLengthMbox |  22.169 ms | 0.4429 ms | 1.0177 ms |
| ExperimentalMimeParser_ContentLengthMboxPersistent |  21.940 ms | 0.4388 ms | 1.1863 ms |
|                     ExperimentalMimeParser_JwzMbox | 244.491 ms | 3.1079 ms | 2.9071 ms |
|           ExperimentalMimeParser_JwzMboxPersistent | 244.751 ms | 2.3894 ms | 2.2350 ms |
|            ExperimentalMimeParser_HeaderStressTest |   1.694 ms | 0.0034 ms | 0.0028 ms |
|                         MimeReader_StarTrekMessage | 177.045 ms | 1.9746 ms | 1.8470 ms |
|                       MimeReader_ContentLengthMbox |  12.628 ms | 0.0347 ms | 0.0290 ms |
|                                 MimeReader_JwzMbox | 128.712 ms | 0.5835 ms | 0.5172 ms |
|                        MimeReader_HeaderStressTest |  11.689 ms | 0.0396 ms | 0.0351 ms |

@nitinag FWIW, I’ve created a MimeKit 3.0 board to track progress of what I want to accomplish for v3.0.

Okay, here’s our comparison data:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update) Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores .NET SDK=5.0.400 [Host] : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT [AttachedDebugger] DefaultJob : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT

Method Mean Error StdDev
MimeParser_HeaderStressTest 35.35 ms 0.131 ms 0.123 ms
MimeReader_HeaderStressTest1 18.25 ms 0.185 ms 0.173 ms
MimeReader_HeaderStressTest2 16.13 ms 0.085 ms 0.075 ms
MimeReader_HeaderStressTest3 12.29 ms 0.109 ms 0.102 ms
MimeReader_HeaderStressTest4 11.59 ms 0.095 ms 0.089 ms

Notes:

  • MimeReader_HeaderStressTest1 refers to the original MimeReader implementation that copied the MimeParser StepHeaders() implementation with minumal changes (if any).
  • MimeReader_HeaderStressTest2 refers to the first rewrite of StepHeaders() to allow OnHeaderRead() and OnHeaderReadAsync(). This version splots the logic into StepHeaderField() and STepHeaderValue() where each contains a loop (or loops) that iterate over the data looking for a : or \n while simultaneously copying the data into a byte[] (and having to check every loop to validate that the buffer is large enough).
  • MimeReader_HeaderStressTest3 refers to the second rewrite that uses scanning loops and then Buffer.BlockCopy() to minimize array bounds checking.
  • MimeReader_HeaderStressTest4 refers to a modification of previous iterations scanning loops to unroll them.

Looks like I misremembered exactly what test I was remembering rough stats for. My excuse is I was commenting on my phone while watching a movie with my g/f 😂

In any event, here are the raw benchmarks as run this morning.

The following 2 benchmarks were run using the old header parser in the MimeReader tests:

Method Mean Error StdDev
MimeParser_StarTrekMessage 248.000 ms 4.8560 ms 5.3974 ms
MimeParser_StarTrekMessagePersistent 215.199 ms 2.8532 ms 2.6689 ms
MimeParser_ContentLengthMbox 2.613 ms 0.0354 ms 0.0331 ms
MimeParser_ContentLengthMboxPersistent 2.437 ms 0.0440 ms 0.0411 ms
MimeParser_JwzMbox 18.187 ms 0.0874 ms 0.0817 ms
MimeParser_JwzMboxPersistent 16.429 ms 0.0807 ms 0.0674 ms
MimeReader_StarTrekMessage 178.348 ms 0.9655 ms 0.9031 ms
MimeReader_ContentLengthMbox 1.676 ms 0.0112 ms 0.0093 ms
MimeReader_JwzMbox 12.873 ms 0.1357 ms 0.1203 ms
Method Mean Error StdDev
MimeParser_StarTrekMessage 242.943 ms 2.5493 ms 2.1288 ms
MimeParser_StarTrekMessagePersistent 213.239 ms 0.5939 ms 0.5265 ms
MimeParser_ContentLengthMbox 2.523 ms 0.0134 ms 0.0119 ms
MimeParser_ContentLengthMboxPersistent 2.367 ms 0.0088 ms 0.0082 ms
MimeParser_JwzMbox 18.071 ms 0.0856 ms 0.0759 ms
MimeParser_JwzMboxPersistent 16.318 ms 0.0698 ms 0.0583 ms
MimeReader_StarTrekMessage 179.497 ms 1.2597 ms 1.0519 ms
MimeReader_ContentLengthMbox 1.675 ms 0.0123 ms 0.0115 ms
MimeReader_JwzMbox 12.559 ms 0.0670 ms 0.0626 ms

These next 2 benchmark runs were done after applying my patch that rewrote the header parser for MimeReader:

Method Mean Error StdDev
MimeParser_StarTrekMessage 246.242 ms 4.9158 ms 5.4639 ms
MimeParser_StarTrekMessagePersistent 216.666 ms 2.3338 ms 2.1830 ms
MimeParser_ContentLengthMbox 2.528 ms 0.0098 ms 0.0076 ms
MimeParser_ContentLengthMboxPersistent 2.364 ms 0.0135 ms 0.0120 ms
MimeParser_JwzMbox 17.901 ms 0.0623 ms 0.0552 ms
MimeParser_JwzMboxPersistent 16.210 ms 0.0565 ms 0.0472 ms
MimeReader_StarTrekMessage 181.140 ms 2.0074 ms 1.8778 ms
MimeReader_ContentLengthMbox 1.679 ms 0.0098 ms 0.0091 ms
MimeReader_JwzMbox 12.568 ms 0.0544 ms 0.0454 ms
Method Mean Error StdDev
MimeParser_StarTrekMessage 242.428 ms 2.3040 ms 2.1552 ms
MimeParser_StarTrekMessagePersistent 212.466 ms 1.6403 ms 1.5343 ms
MimeParser_ContentLengthMbox 2.520 ms 0.0175 ms 0.0146 ms
MimeParser_ContentLengthMboxPersistent 2.371 ms 0.0096 ms 0.0090 ms
MimeParser_JwzMbox 18.075 ms 0.0987 ms 0.0875 ms
MimeParser_JwzMboxPersistent 16.182 ms 0.0785 ms 0.0734 ms
MimeReader_StarTrekMessage 179.623 ms 0.5818 ms 0.4858 ms
MimeReader_ContentLengthMbox 1.664 ms 0.0086 ms 0.0080 ms
MimeReader_JwzMbox 12.634 ms 0.0415 ms 0.0368 ms

As we can see, the new MimeReader header parser is actually a smidgen slower but I wonder if I can speed things up a bit.

The old header parser uses this type of approach:

byte[] headerBuffer = new byte[512];

fixed (byte* inbuf = input) {
    byte* inptr = inbuf + startIndex;
    byte* inend = inbuf + inputLength;
    byte* start = inptr;

    *inend = '\n';
    while (*inptr != '\n')
        inptr++;

    if (inptr < inend) {
        // we know we have the complete line
        int length = (int) (inptr - start);

        if (length >= headerBuffer.Length)
            Array.Resize (ref headerBuffer, NextAllocSize (length));

        Buffer.BlockCopy (input, startIndex, headerBuffer, 0, length);
    } else {
        // ...
    }
}

whereas the new parser uses this approach:

byte[] headerBuffer = new byte[512];
int bufferIndex = 0;

fixed (byte* inbuf = input) {
    byte* inptr = inbuf + startIndex;
    byte* inend = inbuf + inputLength;
    byte* start = inptr;

    *inend = '\n';
    while (*inptr != '\n') {
        if (bufferIndex >= headerBuffer.Length)
            Array.Resize (headerBuffer, headerBuffer.Length + 64);

        headerBuffer[bufferIndex++] = *inptr++;
    }
}

The downside to the old approach is that if inptr == inend at the end of that loop, we would break out, fill our input buffer, and then restart the loop from the beginning of the line. This is not the case with the new approach (I know the code snippets above don’t really illustrate this, but take my word for it). The new approach consumes everything in the input.

The new approach, even though slower per loop iteration, may actually compensate for that by not having to scan over the same input data more than once if the line of data crosses buffer boundaries.

In fact, the old code needs to loop over the same input data at a minimum of 2x. The first time finds the end of the line, and the second is the Buffer.BlockCopy() call. In the hopefully worst-case scenario, it would have to iterate over the input ~3x (maybe the first pass is only missing the ‘\n’ because it hasn’t been buffered yet).

One thing I want to try is a hybrid approach to see if the overhead of the Buffer.BlockCopy() is actually less than the length comparison in every loop iteration. I’m pretty sure I’ve seen the implementation of BlockCopy() and it blits 8/4/2/1 bytes at a time, making it very fast, so it’s possible that it is faster.

I could also make the input buffer grow by 80 instead of 64 as that may be a more efficient grow size (MIME recommends wrapping lines to fit within 80 columns).

Anyway… I’ve got a few tricks I can try to see if I can eke out a bit more performance.

Happy that the new approach isn’t drastically slower.

I’m hoping that if we don’t use the *Read methods, no memory usage occurs in allocating the byte[] content, etc. for those unused events?

My plan was to pass along the MimeReader’s internal input buffer (see MimeParser.cs for what I mean). The startIndex and count parameters would indicate where in the buffer the content is located. This would avoid allocating a new buffer.

If we were to ever use the *Read methods, one may also write the content out somewhere so aysnc may be useful to allow that use case. Async would also be useful on all the events so one could run database or network calls to check any conditions based on the entity read.

Yea, this was why I was considering having Async variants, but you raise a good point about how maybe they should all have Async variants? And have CancellationToken parameters as well.

[written prior to your most recent post - looks like you may have addressed some of it]

We would still need the HeadersEndOffset since we need to store the start/end of the headers

Yep. Gotcha covered. I’ll be keeping those in the OnXyzEnd() hooks, same as they are currently in the MimeParser event API.

We’d also still need the raw body line count since imap requires us to store that as well at least for messages and text parts.

Yep, I added that to my plan as well - after remembering I had that there in the current MimeParser event API.

Entity type: [Message, Multipart, TextPart, BasicPart] We would need the part type on OnMimeEntityBegin so that we can store it and make choices on what to do based on that. TextPart has slightly different handling thus it is separately identified from BasicPart (MimePart).

I was thinking that in this new “MimeReader”(?) (I think I like MimeReader better than StreamingMimeParser) API, I was not planning to instantiate MimeEntity objects at all, but it should be possible to use the headers to figure this out. That said… I think I’ll need to track the COntentType anyway as I parse the headers so that I know which entity type I’m parsing, so maybe I can pass that into the OnMimePartBegin/MultipartBegin/MessagePartBegin APIs and maybe to the equivalent End APIs as well?

Technically, I think this is only really needed for OnMimePartBegin/End to cover the Basic vs Text differentiation. Maybe I can even add OnTextPartBegin/End instead? Not sure… will have to think on this one.

It’s probably possible to just figure it out based on previously-parsed headers (or lack-thereof, implying text/plain).

The fully parsed headers for messages and parts. For messages we store the full envelope information and for parts the full content headers. On your suggested API, we would build our own header list from OnHeaderParsed events and then take what we need. Though for the envelope information we need, we currently use the MimeMessage object (based on parsing just the headers).

The MimeMessage convenience APIs like To/From/Cc/Subject/Date/etc are all doable w/o needing the overhead of MimeMessage… or, if you really want, you can just instantiate a new MimeMessage (or MimePart or whatever) and add the headers you collect from OnHeaderParsed().

None of that is currently handled by the MimeParser anyway, that’s all handled by MimeMessage and MimeEntity.

Anyway, I appreciate your feedback! I’ll definitely want more feedback if/when I decide to go ahead with this.

Now I just gotta convince the girlfriend to let me spend the necessary time on this 😉

I’ve already been spending a lot of late nights hacking on an NTLM rewrite for MailKit and trying to figure out how to add ChannelBinding support as well (starting with the SCRAM-SHA-X-PLUS mechanisms). Then I need to figure out ChannelBinding for NTLM.

Lots of big new features in the works 😉

Okay, so based on what I think you’re saying, what you really want is a “streaming” MIME parser (or at least a “streaming” multipart parser).

It seems that there is a special need for this kind of parser in the HTTP world where you can apparently POST more-or-less endless multiparts and/or at least multiparts containing very large file streams. This apparently goes for both client-side where an application might use an HTTP GET request which results in a large multipart as well as server-side implementations of a POST request that contains large multiparts.

MimeKit’s MimeParser API is suboptimal for this kind of thing even if I added a way to override which type of Stream the MimeParser creates for storing the content of each MimePart.

That said, even though MimeKit is really targeted for email applications, perhaps I could implement such a streaming parser API for this kind of thing…

In the meantime, MimeKit does provide some pretty handy tooling for what you want to do. You should definitely take a look at the MimeKit.IO.FilteredStream class as well as the MimeKit.IO.Filters.DecoderFilter class (and/or the MimeKit.Encoders namespace for direct usage of the decoders).

Next up, there are some useful techniques to be found in my MailKit project. In particular, the streaming technique I use in both the POP3 and IMAP implementations that wrap a NetworkStream (or SslStream depending on the connection type) that know when to stop reading data based on the current protocol state.

For example, in the Pop3Stream.ReadAsync method, it scans for a line containing “.\r\n” and once it reaches that, declares EOF by returning 0 and will prevent you from reading beyond that point. The ReadAsync method also unmunges (or un-byte-stuffs) the stream, meaning that it takes lines that start with “…” and removes the first dot (this is a POP3ism that is meant to safeguard the termination sequence).

The ImapStream.ReadAsync method is actually a lot simpler since it doesn’t have to unmunge any data, it just knows that if it is in the “IMAP Literal String” state, that it should read until the literal is consumed and then return 0 (signifying EOF). In IMAP, there is a concept called a “literal string” which is how IMAP sends large (or non-ASCII) content. The way it does it is like this: ... TOKEN TOKEN TOKEN {12345}\r\n<12345 bytes of literal content go here> TOKEN TOKEN TOKEN (where a token that consists of { + a numeric value + } + <CR><LF> tells the parser that it should treat the next X number of bytes as a single string token).

What I would probably do is implement a “MultipartContentStream” class that did something similar and knew only to read until the next MIME boundary marker and then pretend that it had reached EOF.

// Get the ContentType of the POST request or GET request, I guess?
var contentType = postRequest.ContentType;

// Construct a new StreamingMultipartParser that uses the ContentType to know about the boundary marker.
var parser = new StreamingMultipartParser (contentType, stream);

// consume the multipart prologue (the garbage that leads up to the very first boundary marker)
parser.ConsumeMultipartPrologue ();

// the ReadNext method would construct a MimePart with the parsed headers, same as
// MimeParser.ParseHeaders() does or MimeParser.ParseMessage/Entity() does now
// except that it wouldn't load the content into a MemoryBlockStream. Instead, it would
// set the MimePart.Content.Stream to a MultipartContentStream that knows to stop reading
// once it reaches the multipart boundary marker.
while (!parser.IsEndOfStream) {
    var part = parser.ReadNext ();
    var fileName = GetFileName (part);

    // Note: up until this point, the the content of the MimePart has never been read from the NetworkStream...
    // so part.Content.Stream has vey little overhead. It doesn't even need an input buffer because it should
    // just make use of the parser's internal input buffer.
    using (var fileStream = File.Create (fileName)) {
        // The DecodeTo method will actually be reading the raw MIME content from the NetworkStream and
        // decoding it right into the file stream, in blocks of 4K at a time - so memory usage stays really low.
        part.Content.DecodeTo (fileStream);
    }
}

This could be so blazingly fast… I love it.

Question: Do these types of multiparts support nested multiparts? (because that would make this more complicated).

I would ask if message/rfc822 is supported, but I think that could easily just be mapped (in this parser) to a MimePart as well instead of a MessagePart like MimeParser uses. The problem with trying to handle a message/rfc822 part by returning a MessagePart is that the MessagePart recursively parses the embedded message instead of returning just uninterpreted content.

What you want to use is:

contentType.ToString (Encoding.UTF8, true);

That will encode the new-line characters.

The normal ToString() method is meant only for display purposes.