MimeKit: Newline missing between headers
Describe the bug
HeaderList.WriteTo
misses writing a \r\n
between two headers.
Platform
- OS: Windows 10 latest
- .NET Runtime: CoreCLR
- .NET Framework: .Net Core 3.1
- MimeKit Version: 2.14.0
To Reproduce Code to reproduce the behavior:
ContentType contentType = ContentType.Parse("text/plain");
Header.TryParse(contentType.ToString(), out Header contentTypeHeader);
HeaderList headers = new HeaderList();
headers.Add(contentTypeHeader);
headers["Content-Encoding"] = "binary";
using (MemoryStream ms = new MemoryStream())
{
headers.WriteTo(ms);
Console.Write(Encoding.ASCII.GetString(ms.GetBuffer()));
}
Output:
Content-Type: text/plainContent-Encoding: binary
Expected behavior
There should be a \r\n
before the Content-Encoding
header starts, but it’s written in one line.
When I set both headers without the HeaderList.Add
method, the line break will be written as expected.
Am I missing something?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 37 (29 by maintainers)
Commits related to this issue
- Changed public Header.Parse/TryParse APIs to canonicalize values to end with a newline Fixes issue #695 — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
- Added a new MimeReader The MimeReader is an alternative to the MimeParser. Unlike the MimeParser, the MimeReader does not construct a MimeMessage or MimeEntity "DOM". Instead, it works more like a S... — committed to jstedfast/MimeKit by jstedfast 3 years ago
Rewrote the header parser tonight to allow OnHeaderReadAsync()/OnHeadersEnd/Async() and it seems a lot cleaner now, but may be slower 😥
Working on some benchmarks so I can compare. I also want to build a new MimeParser API on top of the MimeReader and compare performance there as well. Would be great to only have 1 actual parser rather than 2 😂
Current benchmarks show MimeParser parsing the JWZ mbox in less than 260ms?? And MimeReader in 178ms? Seems crazy fast!
Oops, I did it again!
I remembered some optimizations that I had made to MimeParser and applied those same things to MimeReader and now ExperimentalMimeParser is always faster than MimeParser.
Oh, and I should mention that MimeParser in the
mimereader
branch is also faster than MimeKit 2.15.0’s MImeParser 😃@nitinag I just finished implementing a new MimeParser on top of MimeReader…
Are you ready for the results? 😂
@nitinag FWIW, I’ve created a MimeKit 3.0 board to track progress of what I want to accomplish for v3.0.
Okay, here’s our comparison data:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update) Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores .NET SDK=5.0.400 [Host] : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT [AttachedDebugger] DefaultJob : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT
Notes:
:
or\n
while simultaneously copying the data into abyte[]
(and having to check every loop to validate that the buffer is large enough).Looks like I misremembered exactly what test I was remembering rough stats for. My excuse is I was commenting on my phone while watching a movie with my g/f 😂
In any event, here are the raw benchmarks as run this morning.
The following 2 benchmarks were run using the old header parser in the MimeReader tests:
These next 2 benchmark runs were done after applying my patch that rewrote the header parser for MimeReader:
As we can see, the new MimeReader header parser is actually a smidgen slower but I wonder if I can speed things up a bit.
The old header parser uses this type of approach:
whereas the new parser uses this approach:
The downside to the old approach is that if
inptr == inend
at the end of that loop, we would break out, fill our input buffer, and then restart the loop from the beginning of the line. This is not the case with the new approach (I know the code snippets above don’t really illustrate this, but take my word for it). The new approach consumes everything in the input.The new approach, even though slower per loop iteration, may actually compensate for that by not having to scan over the same input data more than once if the line of data crosses buffer boundaries.
In fact, the old code needs to loop over the same input data at a minimum of 2x. The first time finds the end of the line, and the second is the Buffer.BlockCopy() call. In the hopefully worst-case scenario, it would have to iterate over the input ~3x (maybe the first pass is only missing the ‘\n’ because it hasn’t been buffered yet).
One thing I want to try is a hybrid approach to see if the overhead of the Buffer.BlockCopy() is actually less than the length comparison in every loop iteration. I’m pretty sure I’ve seen the implementation of BlockCopy() and it blits 8/4/2/1 bytes at a time, making it very fast, so it’s possible that it is faster.
I could also make the input buffer grow by 80 instead of 64 as that may be a more efficient grow size (MIME recommends wrapping lines to fit within 80 columns).
Anyway… I’ve got a few tricks I can try to see if I can eke out a bit more performance.
Happy that the new approach isn’t drastically slower.
My plan was to pass along the MimeReader’s internal
input
buffer (see MimeParser.cs for what I mean). ThestartIndex
andcount
parameters would indicate where in the buffer the content is located. This would avoid allocating a new buffer.Yea, this was why I was considering having Async variants, but you raise a good point about how maybe they should all have Async variants? And have CancellationToken parameters as well.
Yep. Gotcha covered. I’ll be keeping those in the OnXyzEnd() hooks, same as they are currently in the MimeParser event API.
Yep, I added that to my plan as well - after remembering I had that there in the current MimeParser event API.
I was thinking that in this new “MimeReader”(?) (I think I like MimeReader better than StreamingMimeParser) API, I was not planning to instantiate MimeEntity objects at all, but it should be possible to use the headers to figure this out. That said… I think I’ll need to track the COntentType anyway as I parse the headers so that I know which entity type I’m parsing, so maybe I can pass that into the OnMimePartBegin/MultipartBegin/MessagePartBegin APIs and maybe to the equivalent End APIs as well?
Technically, I think this is only really needed for OnMimePartBegin/End to cover the Basic vs Text differentiation. Maybe I can even add OnTextPartBegin/End instead? Not sure… will have to think on this one.
It’s probably possible to just figure it out based on previously-parsed headers (or lack-thereof, implying text/plain).
The MimeMessage convenience APIs like To/From/Cc/Subject/Date/etc are all doable w/o needing the overhead of MimeMessage… or, if you really want, you can just instantiate a new MimeMessage (or MimePart or whatever) and add the headers you collect from OnHeaderParsed().
None of that is currently handled by the MimeParser anyway, that’s all handled by MimeMessage and MimeEntity.
Anyway, I appreciate your feedback! I’ll definitely want more feedback if/when I decide to go ahead with this.
Now I just gotta convince the girlfriend to let me spend the necessary time on this 😉
I’ve already been spending a lot of late nights hacking on an NTLM rewrite for MailKit and trying to figure out how to add ChannelBinding support as well (starting with the SCRAM-SHA-X-PLUS mechanisms). Then I need to figure out ChannelBinding for NTLM.
Lots of big new features in the works 😉
Okay, so based on what I think you’re saying, what you really want is a “streaming” MIME parser (or at least a “streaming” multipart parser).
It seems that there is a special need for this kind of parser in the HTTP world where you can apparently POST more-or-less endless multiparts and/or at least multiparts containing very large file streams. This apparently goes for both client-side where an application might use an HTTP GET request which results in a large multipart as well as server-side implementations of a POST request that contains large multiparts.
MimeKit’s MimeParser API is suboptimal for this kind of thing even if I added a way to override which type of Stream the MimeParser creates for storing the content of each MimePart.
That said, even though MimeKit is really targeted for email applications, perhaps I could implement such a streaming parser API for this kind of thing…
In the meantime, MimeKit does provide some pretty handy tooling for what you want to do. You should definitely take a look at the
MimeKit.IO.FilteredStream
class as well as the MimeKit.IO.Filters.DecoderFilter class (and/or theMimeKit.Encoders
namespace for direct usage of the decoders).Next up, there are some useful techniques to be found in my MailKit project. In particular, the streaming technique I use in both the POP3 and IMAP implementations that wrap a NetworkStream (or SslStream depending on the connection type) that know when to stop reading data based on the current protocol state.
For example, in the Pop3Stream.ReadAsync method, it scans for a line containing “.\r\n” and once it reaches that, declares EOF by returning 0 and will prevent you from reading beyond that point. The ReadAsync method also unmunges (or un-byte-stuffs) the stream, meaning that it takes lines that start with “…” and removes the first dot (this is a POP3ism that is meant to safeguard the termination sequence).
The ImapStream.ReadAsync method is actually a lot simpler since it doesn’t have to unmunge any data, it just knows that if it is in the “IMAP Literal String” state, that it should read until the literal is consumed and then return 0 (signifying EOF). In IMAP, there is a concept called a “literal string” which is how IMAP sends large (or non-ASCII) content. The way it does it is like this:
... TOKEN TOKEN TOKEN {12345}\r\n<12345 bytes of literal content go here> TOKEN TOKEN TOKEN
(where a token that consists of{
+ a numeric value +}
+ <CR><LF> tells the parser that it should treat the next X number of bytes as a single string token).What I would probably do is implement a “MultipartContentStream” class that did something similar and knew only to read until the next MIME boundary marker and then pretend that it had reached EOF.
This could be so blazingly fast… I love it.
Question: Do these types of multiparts support nested multiparts? (because that would make this more complicated).
I would ask if message/rfc822 is supported, but I think that could easily just be mapped (in this parser) to a MimePart as well instead of a MessagePart like MimeParser uses. The problem with trying to handle a message/rfc822 part by returning a MessagePart is that the MessagePart recursively parses the embedded message instead of returning just uninterpreted content.
What you want to use is:
That will encode the new-line characters.
The normal ToString() method is meant only for display purposes.