Posts

Showing posts from September, 2013

Flatten HTML Document to List of Tags, Attributes, and Values

I had a need to flatten a set of HTML documents to a list of the HTML tags in their head sections.  I thought this bit of code might be useful for someone in the future. This uses the CsQuery library which is a port of jQuery in C#:  https://github.com/jamietre/CsQuery CsQuery also has a NuGet Package:  https://www.nuget.org/packages/CsQuery //Note: Get HTML from somewhere... var html = ""; var cq = CsQuery.CQ.Create(html); var head = cq["head"]; var nonScriptHeadTagsQuery = from t in head.Children() where t.NodeName != "SCRIPT" && t.NodeName != "LINK" select new { Tag = t, TagId = Guid.NewGuid() }; var nonScriptHeadTags = nonScriptHeadTagsQuery.ToList(); var htmlTags = nonScriptHeadTags .SelectMany(tagInfo => tagInfo.Tag.Attributes, (tagInfo, attribute) => new { TagInfo = tagInfo, Attribute = attribute }) .Select(x => new { TagId = x.TagInfo.TagId, TagType = x.TagInfo.Tag.NodeName, A