Recently I came across a rather annoying issue with the XslCompiledTransform class. Namely it really doesn’t like having a BOM (byte order mark) shoved down it’s load method.
I got a the follow error message and stack trace
System.Xml.XmlException: Data at the root level is invalid. Line 1, position 1. at System.Xml.XmlTextReaderImpl.Throw(Exception e) at System.Xml.XmlTextReaderImpl.Throw(String res, String arg) at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace() at System.Xml.XmlTextReaderImpl.ParseDocumentContent() at System.Xml.XmlTextReaderImpl.Read() at System.Xml.Xsl.Xslt.XsltInput.ReadNextSiblingHelper() at System.Xml.Xsl.Xslt.XsltInput.ReadNextSibling() at System.Xml.Xsl.Xslt.XsltInput.MoveToNextSibling() at System.Xml.Xsl.Xslt.XsltInput.Start() at System.Xml.Xsl.Xslt.XsltLoader.LoadDocument() at System.Xml.Xsl.Xslt.XsltLoader.LoadStylesheet(XmlReader reader, Boolean include) System.Xml.Xsl.XslLoadException: XSLT compile error. at System.Xml.Xsl.Xslt.XsltLoader.LoadStylesheet(XmlReader reader, Boolean include) at System.Xml.Xsl.Xslt.XsltLoader.Load(Compiler compiler, Object stylesheet, XmlResolver xmlResolver) at System.Xml.Xsl.Xslt.Compiler.Compile(Object stylesheet, XmlResolver xmlResolver, ref QilExpression qil) at System.Xml.Xsl.XslCompiledTransform.CompileXsltToQil(Object stylesheet, XsltSettings settings, XmlResolver stylesheetResolver) at System.Xml.Xsl.XslCompiledTransform.LoadInternal(Object stylesheet, XsltSettings settings, XmlResolver stylesheetResolver) at System.Xml.Xsl.XslCompiledTransform.Load(XmlReader stylesheet)
Viewing the input data to the XslCompiledTransform in debug mode showed a perfectly valid and lovely XSLT file. Visual studio does not display the BOM in the visual representation of the string. Saving the string to disk and examining it in a hex editor revealed that the first 3 bytes were indeed the BOM for UTF-8: EF BB BF
I wrote a little extension method for the bytes class to get rid of the BOM if it’s present
/// <summary>
/// Removes the byte order mark.
/// </summary>
///
<param name="bytes">The bytes.</param>
/// <returns>byte array without the BOM.</returns>
public static byte[] RemoveByteOrderMark(this byte[] bytes)
{
if (!bytes.StartsWithByteOrderMark())
{
return bytes;
}
byte[] results = new byte[bytes.Length - 3];
Array.Copy(bytes, 3, results, 0, bytes.Length - 3);
return results;
}
/// <summary>
/// Determines if the byte array starts with a byte order mark.
/// </summary>
///
<param name="bytes">The bytes.</param>
/// <returns><c>true</c> if the byte array starts with a byte order mark; otherwise false.</returns>
public static bool StartsWithByteOrderMark(this byte[] bytes)
{
if (bytes == null)
{
return false;
}
if (bytes.Length < 3)
{
return false;
}
return
bytes[0] == 0xEF &&
bytes[1] == 0xBB &&
bytes[2] == 0xBF;
}