HTML Content Guide

This guide provides specifications for sending HTML content through Arkipel API messages. Use this reference when implementing message types that include HTML fields (e.g., content, html_content, body).

Overview

Arkipel accepts HTML content as JSON string values within message payloads. The HTML is stored as-is and sanitized when rendered in the web interface.

JSON Format Requirements

HTML must be sent as a JSON string value:

{
  "payload": {
    "type": "publications:upsert",
    "name": "Article Title",
    "content": "<p>Your HTML content here</p>"
  }
}

Escaping Double Quotes

Critical: All double quotes (") inside HTML strings must be escaped as \":

"content": "<div class=\"content\"><p>She said, \"Hello!\"</p></div>"

Let your programming language’s JSON library handle escaping automatically:

Python:

import json

payload = {
    "content": '<div class="content"><p>Hello "World"</p></div>'
}
json_string = json.dumps(payload)  # Escapes quotes automatically

Ruby:

require 'json'

payload = {
  content: '<div class="content"><p>Hello "World"</p></div>'
}
json_string = payload.to_json  # Escapes quotes automatically

JavaScript/Node.js:

const payload = {
  content: '<div class="content"><p>Hello "World"</p></div>',
};
const jsonString = JSON.stringify(payload); // Escapes quotes automatically

Go:

import "encoding/json"

payload := map[string]string{
    "content": `<div class="content"><p>Hello "World"</p></div>`,
}
jsonBytes, _ := json.Marshal(payload)  // Escapes quotes automatically

If you must manually construct JSON:

  1. Replace all " with \"
  2. Replace all \ with \\
  3. Replace newlines with \n

Size Limits

  • Maximum payload size: 1MB (including all fields)
  • Recommended HTML content size: Under 50KB per message
  • For large content, consider splitting into multiple articles/messages

HTML Structure Guidelines

Send semantic HTML fragments, not full documents:

<!-- Good -->
<article>
  <h1>Article Title</h1>
  <p>Paragraph with <a href="/link">a link</a>...</p>
  <img src="image.jpg" alt="Description" />
</article>

<!-- Also acceptable -->
<section class="content">
  <h2>Section Title</h2>
  <p>Content here...</p>
</section>
<!-- Bad - Don't send complete HTML pages -->
<!DOCTYPE html>
<html>
  <head>
    <title>...</title>
  </head>
  <body>
    ...
  </body>
</html>

Allowed HTML Elements

The following HTML elements are fully supported and will be preserved:

Text Formatting

  • <h1> through <h6> - Headings
  • <p> - Paragraphs
  • <br> - Line breaks
  • <strong>, <b> - Bold text
  • <em>, <i> - Italic text
  • <u> - Underlined text
  • <s>, <del> - Strikethrough text
  • <mark> - Highlighted text
  • <small> - Small text
  • <sub>, <sup> - Subscript/superscript

Lists

  • <ul> - Unordered lists
  • <ol> - Ordered lists
  • <li> - List items
  • <dl>, <dt>, <dd> - Definition lists
  • <a> - Links (all standard attributes supported)
  • <img> - Images (with src, alt, width, height)
  • <iframe> - Embeds (YouTube, etc.)
  • <video> - Video embeds
  • <audio> - Audio embeds

Structure

  • <div> - Generic containers
  • <span> - Inline containers
  • <article> - Article content
  • <section> - Section containers
  • <header> - Header sections
  • <footer> - Footer sections
  • <aside> - Side content
  • <figure>, <figcaption> - Figures with captions
  • <blockquote> - Block quotations
  • <hr> - Horizontal rules

Tables

  • <table>, <thead>, <tbody>, <tfoot>
  • <tr>, <th>, <td>
  • <caption>, <colgroup>, <col>

Attributes Allowed

  • class - CSS classes
  • id - Element IDs
  • style - Inline styles
  • data-* - Custom data attributes
  • title - Tooltips
  • Standard attributes for each element (e.g., href, src, alt, target for links)

Sanitized Content

The following will be removed or stripped during rendering:

JavaScript (Always Removed)

  • <script> tags
  • Event handlers: onclick, onload, onerror, etc.
  • javascript: URLs

Styling (Partially Restricted)

  • <style> blocks (removed)
  • <link> tags for CSS (removed)
  • Inline style attributes (allowed but sanitized)

Dangerous Protocols

  • javascript: URLs in links
  • data: URLs (except for images in specific contexts)
  • vbscript: URLs

Forms and Interactive Elements

  • <form> tags
  • <input>, <textarea>, <select>, <button>
  • <iframe> with suspicious sources

Character Encoding

Required: UTF-8

Always encode your JSON payload as UTF-8:

{
  "content": "<p>International characters: café, 日本語, русский, العربية, עברית</p>"
}

HTML Entities

You can use HTML entities for special characters:

Character Entity Description
& &amp; Ampersand
< &lt; Less than
> &gt; Greater than
" &quot; Double quote
' &#39; Single quote
© &copy; Copyright
® &reg; Registered trademark

Note: In text content (not attributes), you can usually include raw UTF-8 characters instead of entities.

Error Handling

Common Errors

JSON Parse Error (Unescaped Quotes):

// Bad - quotes not escaped
"content": "<p>She said, "Hello!"</p>"

// Good - quotes escaped
"content": "<p>She said, \"Hello!\"</p>"

Binary Content:

// Bad - null bytes or binary data
"content": "<p>\x00</p>"

// Good - only valid UTF-8 text
"content": "<p>Valid text content</p>"

Message types that accept HTML content:


Last Updated: 2025-02-18