Crossword XML Schema
contents
introduction
While working on the CrosswordPlayer SVG application, I was unable to find an openly specified file format for crossword puzzles. Thus began development of a crossword XML Schema. It is still rough and comments are welcome.
overview
A schema is a set of rules describing how information is packaged together.
The W3C, who publishes the openly specified XML 1.0 standard, says: "XML (Extensible Markup Language) is a simple, very flexible text format..." and "XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents."
This document discusses the information associated with a crossword puzzle, shows examples of puzzle formats, and explains the crossword XML schema.
information structure
puzzle types and styles
A puzzle is of some type and style. Within the context of the type and style of the puzzle the 'data' content and structure can be defined. There is also associated 'metadata' such as the title of the puzzle, and the name of the creator. The CrosswordPlayer currently handles puzzles of type Crossword, style American, and language English.
This information has been gathered from various sources and is in no way definitive or complete. Helpful web sources are the basic rules and specification links at CRUCIVERB.COM and the article Crossword Puzzles from around the World at dummies.com.
Puzzles can be of type crossword, acrostic, diagramless, etc. A type of puzzle may have different styles, the crossword puzzle has style American, French, Spanish, UK, etc. A puzzle is also in a certain language, you can have a Dutch language American-style crossword puzzle.
Puzzle style | Grid | Clues | Numbered | Other |
---|---|---|---|---|
Crossword crossword American-style |
columns x rows | Across & Down | corner of cell 1,2,3,.. clue is Across 5, Down 5 |
has complete letter interlock usually square 15x15 |
Mot croises crossword French-style |
columns x rows | Horizontalement & Verticalement | top of columns 1,2,.. left of rows I, II,.. (Roman numberals) clue is Verticalement 5, Horizontalement V more than one clue per line for more than one word for line |
has unchecked letters usually asymmetrical, horizontal 9x11 |
Crucigramas crossword Spanish-style |
columns x rows | Horizontales & Verticales | top of columns 1,2,.. left of rows I, II,.. (Roman numberals) clue is IV-3 |
The first puzzle put into XML is an American-style Crossword Puzle in English. While the following focuses on this implementation, the same process can be applied to other type, style, language puzzles.
data
An American-style crossword has a grid and some clues. The clues are grouped by 'Across' and 'Down'. The cells of the grid are numbered in the upper left corner of the cell. The 'data' is size of the grid, often a 15x15 square, which cells have letters and which are not used, the answers to each cell, and the clues that correspond to each entry for each direction. There are also 'rules' that a puzzle and type will follow, such as each cell having both an Across and Down clue. And even more simply the rule that there are no blanks within a 'word'. With the data and the rules of the type and style of puzzle, a puzzle player should be able to recreate the puzzle.
A basic structure of the data for a crossword puzzle american style with grid 15x15 would be:
- 1 grid (columns=15, rows=15)
- 15x15 = 225 cells (a cell is letter or blank)
- some number of clues
- a clue is across or down
metadata
The format of the 'metadata' of the puzzle attempts to follow Dublin Core standards. The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models.
- metadata
- title
- date
- creator
- rights
- publisher
- identifier
- description
XML is a good exchange format for the information (data and metadata) in the crossword itself. For exchange of information about the puzzles between crossword players (both software and human) use of Resource Description Framework (RDF) and RDF Schema (RDFS) following Dublin Core standards seems like a good fit. And that is another cool project, out of the scope of the current document.
vocabulary
- grid
- 2 dimensional 'grid' structure of puzzle
- cell
- one box or square in the grid
- letter cell
- cell containing a letter
- blank cell
- cell that is not active in the puzzle
- letter
- one letter to one cell
- word
- word or words that are the answer to the clue (sometimes referred to as 'entry')
- unchecked letter or unkeyed letter
- letters that appear in only one word across or down
- letter interlocks
- letters that appear in both an across or down word
file formats
Crossword players expect to see the crossword data to be formatted in such a way that they can understand. Some formats found on the web are .txt (plain text file), .puz (Across Lite format), .cmo (Crossword Maestro format), and .xwd (Crossdown format). An open format for crossword puzzles allows seperation of the puzzle information from the puzzle player. A puzzle could be played on multiple players, without conversion difficulties caused by closed proprietary formats.
The November 23, 2003 - "Sunday Challenge" is used as an example to show the .puz binary format (shown in a hexdump) and the .xml format currently used in CrosswordPlayer.
example puz (hexdump)
Example of a crossword in puz format. A hexdump of the puz format is relatively easier to understand. The hex values correspond to the characters on the right.
0000: 6A 7A 41 43 52 4F 53 53 26 44 4F 57 4E 00 02 AA jzACROSS&DOWN..© 0010: 4B E9 72 04 EB EF 7E C6 31 2E 32 00 00 00 00 00 K©r.©©~©1.2..... 0020: 00 00 00 00 00 00 00 00 00 00 00 00 0F 0F 46 00 ..............F. 0030: 01 00 00 00 4A 41 4D 45 53 4D 41 53 4F 4E 2E 54 ....JAMESMASON.T 0040: 45 53 53 49 54 41 4C 49 41 4E 49 43 45 2E 49 4C ESSITALIANICE.IL 0050: 57 55 47 4F 4C 44 44 49 47 47 45 52 2E 50 49 45 WUGOLDDIGGER.PIE 0060: 52 53 50 4C 45 45 4E 2E 4E 41 56 59 2E 57 45 45 RSPLEEN.NAVY.WEE 0070: 2E 2E 2E 53 41 52 47 2E 4E 45 57 4D 41 54 48 43 ...SARG.NEWMATHC 0080: 4F 41 54 52 4F 4F 4D 2E 53 43 41 4C 49 41 4F 44 OATROOM.SCALIAOD 0090: 44 2E 4D 41 4C 49 43 2E 41 52 4C 45 4E 4D 45 41 D.MALIC.ARLENMEA 00A0: 54 2E 44 45 4D 4F 4E 2E 4C 41 50 44 50 52 4D 45 T.DEMON.LAPDPRME 00B0: 4E 2E 4D 45 44 4F 43 2E 43 49 45 55 4E 53 45 41 N.MEDOC.CIEUNSEA 00C0: 54 2E 44 45 54 41 43 48 45 44 4C 45 41 4E 54 4F T.DETACHEDLEANTO 00D0: 53 2E 44 41 54 41 2E 2E 2E 53 49 50 2E 4F 4D 4E S.DATA...SIP.OMN 00E0: 49 2E 42 41 52 54 41 42 49 53 50 53 2E 43 41 4E I.BARTABISPS.CAN 00F0: 4E 45 4C 4C 4F 4E 49 4F 53 4C 4F 2E 41 54 47 55 NELLONIOSLO.ATGU 0100: 4E 50 4F 49 4E 54 4E 45 45 44 2E 54 48 45 42 45 NPOINTNEED.THEBE 0110: 41 54 4C 45 53 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E ATLES----------. 0120: 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D --------------.- 0130: 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D -------------.-- 0140: 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2E 2D 2D --------.----.-- 0150: 2D 2E 2E 2E 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D -...----.------- 0160: 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D --------.------- 0170: 2D 2D 2E 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D --.-----.------- 0180: 2D 2D 2E 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D --.-----.------- 0190: 2D 2D 2E 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D --.-----.------- 01A0: 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D --.------------- 01B0: 2D 2D 2E 2D 2D 2D 2D 2E 2E 2E 2D 2D 2D 2E 2D 2D --.----...---.-- 01C0: 2D 2D 2E 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D --.----------.-- 01D0: 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 2D ------------.--- 01E0: 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2E 2D 2D 2D 2D -----------.---- 01F0: 2D 2D 2D 2D 2D 2D 4E 6F 76 65 6D 62 65 72 20 32 ------November 2 0200: 33 2C 20 32 30 30 33 20 2D 20 22 53 75 6E 64 61 3, 2003 - "Sunda 0210: 79 20 43 68 61 6C 6C 65 6E 67 65 22 00 20 20 42 y Challenge". B 0220: 79 20 42 6F 62 20 4B 6C 61 68 6E 20 20 00 A9 20 y Bob Klahn .© 0230: 32 30 30 33 20 42 6F 62 20 4B 6C 61 68 6E 2E 20 2003 Bob Klahn. 0240: 44 69 73 74 72 69 62 75 74 65 64 20 62 79 20 43 Distributed by C 0250: 72 6F 73 53 79 6E 65 72 67 79 28 54 4D 29 20 53 rosSynergy(TM) S 0260: 79 6E 64 69 63 61 74 65 00 22 54 68 65 20 56 65 yndicate."The Ve 0270: 72 64 69 63 74 22 20 61 63 74 6F 72 00 46 65 61 rdict" actor.Fea 0280: 74 68 65 72 65 64 20 66 69 73 68 69 6E 67 20 68 thered fishing h 0290: 6F 6F 6B 73 00 53 74 72 61 64 64 6C 69 6E 67 00 ooks.Straddling. 02A0: 4F 6E 69 6F 6D 61 6E 69 61 63 27 73 20 6D 65 63 Oniomaniac's mec 02B0: 63 61 00 46 69 72 73 74 20 74 6F 20 61 72 72 69 ca.First to arri 02C0: 76 65 3F 00 50 69 73 74 6F 6C 20 6F 72 20 73 61 ve?.Pistol or sa 02D0: 62 65 72 00 41 72 74 65 72 79 00 45 6D 6D 61 27 ber.Artery.Emma' 02E0: 73 20 22 53 65 6E 73 65 20 61 6E 64 20 53 65 6E s "Sense and Sen 02F0: 73 69 62 69 6C 69 74 79 22 20 64 69 72 65 63 74 sibility" direct 0300: 6F 72 00 52 61 74 69 66 79 00 4B 65 79 20 6C 6F or.Ratify.Key lo 0310: 63 61 74 69 6F 6E 3F 00 42 75 74 74 65 72 66 6C cation?.Butterfl 0320: 69 65 73 00 31 39 37 39 20 4E 61 73 74 61 73 73 ies.1979 Nastass 0330: 6A 61 20 4B 69 6E 73 6B 69 20 72 6F 6C 65 00 42 ja Kinski role.B 0340: 65 67 69 6E 20 74 6F 20 75 70 73 65 74 00 22 42 egin to upset."B 0350: 61 62 79 20 44 6F 6C 6C 22 20 77 61 73 20 68 69 aby Doll" was hi 0360: 73 20 66 69 6C 6D 20 64 65 62 75 74 00 53 75 67 s film debut.Sug 0370: 61 72 00 44 69 73 70 6C 61 79 69 6E 67 20 74 68 ar.Displaying th 0380: 65 20 73 6B 69 6C 6C 20 61 6E 64 20 65 78 70 65 e skill and expe 0390: 72 69 65 6E 63 65 20 6F 66 20 61 6E 20 65 78 70 rience of an exp 03A0: 65 72 74 00 42 6F 61 72 64 77 61 6C 6B 20 62 75 ert.Boardwalk bu 03B0: 79 00 4C 61 62 6F 72 20 6F 72 67 2E 20 62 6F 72 y.Labor org. bor 03C0: 6E 20 6F 6E 20 74 68 65 20 50 61 63 69 66 69 63 n on the Pacific 03D0: 20 63 6F 61 73 74 20 69 6E 20 74 68 65 20 6C 61 coast in the la 03E0: 74 65 20 27 33 30 73 00 43 61 74 68 65 72 69 6E te '30s.Catherin 03F0: 65 20 5A 65 74 61 2D 4A 6F 6E 65 73 27 73 20 63 e Zeta-Jones's c 0400: 68 61 72 61 63 74 65 72 20 69 6E 20 22 49 6E 74 haracter in "Int 0410: 6F 6C 65 72 61 62 6C 65 20 43 72 75 65 6C 74 79 olerable Cruelty 0420: 2C 22 20 65 2E 67 2E 00 57 61 6C 6B 20 6F 6E 20 ," e.g..Walk on 0430: 77 61 74 65 72 3F 00 4C 79 6D 70 68 6F 63 79 74 water?.Lymphocyt 0440: 65 20 70 72 6F 64 75 63 65 72 00 42 6C 75 65 20 e producer.Blue 0450: 68 75 65 00 42 6C 75 65 20 54 72 69 61 6E 67 6C hue.Blue Triangl 0460: 65 20 67 70 2E 00 42 61 72 65 6C 79 20 70 65 72 e gp..Barely per 0470: 63 65 70 74 69 62 6C 65 00 4D 61 72 69 6F 6E 65 ceptible.Marione 0480: 74 74 65 20 6D 61 6B 65 72 20 54 6F 6E 79 00 41 tte maker Tony.A 0490: 75 74 6F 6D 61 74 6F 6E 00 49 74 20 77 61 73 20 utomaton.It was 04A0: 62 61 73 65 64 20 6F 6E 20 73 65 74 20 74 68 65 based on set the 04B0: 6F 72 79 00 45 61 72 74 68 79 20 64 65 70 6F 73 ory.Earthy depos 04C0: 69 74 20 6F 66 20 63 6C 61 79 20 61 6E 64 20 63 it of clay and c 04D0: 61 6C 63 69 75 6D 20 63 61 72 62 6F 6E 61 74 65 alcium carbonate 04E0: 00 59 6F 75 20 63 61 6E 20 63 68 65 63 6B 20 79 .You can check y 04F0: 6F 75 72 20 68 61 6E 67 2D 75 70 73 20 68 65 72 our hang-ups her 0500: 65 00 49 74 27 73 20 75 73 75 61 6C 6C 79 20 69 e.It's usually i 0510: 72 72 65 73 69 73 74 69 62 6C 65 00 5F 5F 5F 20 rresistible.___ 0520: 4C 69 6E 65 20 28 70 6F 73 74 2D 57 57 49 49 20 Line (post-WWII 0530: 47 65 72 6D 61 6E 2D 50 6F 6C 69 73 68 20 62 6F German-Polish bo 0540: 72 64 65 72 29 00 49 74 27 73 20 6D 6F 72 65 20 rder).It's more 0550: 70 72 6F 6D 69 6E 65 6E 74 20 69 6E 20 6D 65 6E prominent in men 0560: 20 74 68 61 6E 20 69 6E 20 77 6F 6D 65 6E 00 57 than in women.W 0570: 65 6E 74 20 77 69 74 68 6F 75 74 20 73 61 79 69 ent without sayi 0580: 6E 67 3F 00 52 65 61 67 61 6E 20 43 6F 75 72 74 ng?.Reagan Court 0590: 20 61 70 70 6F 69 6E 74 65 65 00 4D 61 74 63 68 appointee.Match 05A0: 6C 65 73 73 00 41 67 69 6E 67 20 61 63 69 64 20 less.Aging acid 05B0: 66 6F 75 6E 64 20 69 6E 20 66 72 75 69 74 00 49 found in fruit.I 05C0: 6E 20 43 2C 20 70 65 72 68 61 70 73 00 43 6F 6D n C, perhaps.Com 05D0: 70 6F 73 65 72 20 6F 66 20 22 54 68 65 20 57 69 poser of "The Wi 05E0: 7A 61 72 64 20 6F 66 20 4F 7A 22 00 43 68 61 72 zard of Oz".Char 05F0: 63 75 74 65 72 69 65 20 6F 66 66 65 72 69 6E 67 cuterie offering 0600: 00 4E 61 6E 63 79 20 44 72 65 77 20 6F 72 20 4B .Nancy Drew or K 0610: 69 6E 67 20 54 75 74 00 53 70 65 65 64 79 20 6F ing Tut.Speedy o 0620: 6E 65 3F 00 22 52 65 61 64 20 74 68 69 73 21 22 ne?."Read this!" 0630: 00 22 52 75 73 68 20 48 6F 75 72 22 20 6F 72 67 ."Rush Hour" org 0640: 2E 00 46 6C 61 63 6B 73 00 50 61 72 74 6E 65 72 ..Flacks.Partner 0650: 73 68 69 70 20 66 6F 72 20 50 65 61 63 65 20 67 ship for Peace g 0660: 70 2E 00 52 65 64 20 42 6F 72 64 65 61 75 78 00 p..Red Bordeaux. 0670: 49 6E 64 69 61 6E 20 62 65 61 6E 00 43 61 6E 6E Indian bean.Cann 0680: 65 73 20 63 6F 6E 63 65 72 6E 20 28 61 62 62 72 es concern (abbr 0690: 2E 29 00 54 6F 70 70 6C 65 00 46 2D 31 34 20 66 .).Topple.F-14 f 06A0: 69 67 68 74 65 72 00 49 63 79 00 57 68 65 72 65 ighter.Icy.Where 06B0: 20 74 6F 20 66 69 6E 64 20 62 6F 74 68 20 63 72 to find both cr 06C0: 65 61 6D 20 70 75 66 66 73 20 61 6E 64 20 6C 65 eam puffs and le 06D0: 6D 6F 6E 73 00 53 68 65 64 73 00 53 63 79 74 68 mons.Sheds.Scyth 06E0: 65 20 68 61 6E 64 6C 65 00 41 70 70 6C 65 20 63 e handle.Apple c 06F0: 6F 6F 6B 69 65 2C 20 65 2E 67 2E 00 53 6D 61 6C ookie, e.g..Smal 0700: 6C 20 73 77 61 6C 6C 6F 77 00 41 6C 6C 20 61 74 l swallow.All at 0710: 20 66 69 72 73 74 3F 00 22 57 68 65 72 65 27 73 first?."Where's 0720: 20 44 61 64 64 79 3F 22 20 64 72 61 6D 61 74 69 Daddy?" dramati 0730: 73 74 00 52 6F 75 6E 64 73 20 6B 65 65 70 65 72 st.Rounds keeper 0740: 3F 00 44 72 75 64 67 65 20 6F 72 20 74 72 75 64 ?.Drudge or trud 0750: 67 65 00 41 72 63 68 65 72 20 77 69 74 68 6F 75 ge.Archer withou 0760: 74 20 61 20 71 75 69 76 65 72 3F 00 54 68 65 79 t a quiver?.They 0770: 27 72 65 20 73 74 72 61 69 67 68 74 20 66 72 6F 're straight fro 0780: 6D 20 74 68 65 20 68 6F 72 73 65 27 73 20 6D 6F m the horse's mo 0790: 75 74 68 00 47 61 74 65 6B 65 65 70 65 72 73 20 uth.Gatekeepers 07A0: 77 69 74 68 20 63 6F 6E 6E 65 63 74 69 6F 6E 73 with connections 07B0: 20 28 61 62 62 72 2E 29 00 49 6E 74 65 6E 74 69 (abbr.).Intenti 07C0: 6F 6E 61 6C 20 67 72 6F 75 6E 64 69 6E 67 3F 00 onal grounding?. 07D0: 54 75 62 75 6C 61 72 20 69 6E 76 65 6E 74 69 6F Tubular inventio 07E0: 6E 20 6F 66 20 74 68 65 20 6C 61 74 65 20 74 65 n of the late te 07F0: 65 6E 73 20 6F 72 20 65 61 72 6C 79 20 74 77 65 ens or early twe 0800: 6E 74 69 65 73 00 48 65 61 72 74 20 6F 66 20 74 nties.Heart of t 0810: 68 65 20 6D 61 74 74 65 72 00 43 61 70 69 74 61 he matter.Capita 0820: 6C 20 61 74 20 74 68 65 20 63 65 6E 74 65 72 20 l at the center 0830: 6F 66 20 43 7A 65 63 68 6F 73 6C 6F 76 61 6B 69 of Czechoslovaki 0840: 61 3F 00 42 61 64 20 77 61 79 20 74 6F 20 62 65 a?.Bad way to be 0850: 20 6D 61 72 72 69 65 64 00 43 61 6C 6C 20 66 6F married.Call fo 0860: 72 00 27 36 30 73 20 69 6E 76 61 64 65 72 73 00 r.'60s invaders. 0870: 00 .
example xml
The same example crossword in XML.
<?xml version="1.0" encoding="utf-8"?> <puzzle> <crossword language="en"> <metadata> <title>"Sunday Challenge"</title> <date>November 23, 2003</date> <creator> By Bob Klahn </creator> <rights>©2003 Bob Klahn. Distributed by CrosSynergy(TM) Syndicate</rights> <publisher>Houston Chronical</publisher> <identifier>http://www.example.com/puzzles/crossword.puz</identifier> <description>Puzzle of type Crossword and style American in language English translated from format .puz into format .xml following schema http://www.koonts.com/some/dir/crossword.</description> </metadata> <american> <grid rows="15" columns="15"> <letter id="1,1">J</letter> <letter id="1,2">A</letter> <letter id="1,3">M</letter> <letter id="1,4">E</letter> <letter id="1,5">S</letter> <letter id="1,6">M</letter> <letter id="1,7">A</letter> <letter id="1,8">S</letter> <letter id="1,9">O</letter> <letter id="1,10">N</letter> <blank></blank> <letter id="1,12">T</letter> <letter id="1,13">E</letter> <letter id="1,14">S</letter> <letter id="1,15">S</letter> <letter id="2,1">I</letter> <letter id="2,2">T</letter> <letter id="2,3">A</letter> <letter id="2,4">L</letter> <letter id="2,5">I</letter> <letter id="2,6">A</letter> <letter id="2,7">N</letter> <letter id="2,8">I</letter> <letter id="2,9">C</letter> <letter id="2,10">E</letter> <blank></blank> <letter id="2,12">I</letter> <letter id="2,13">L</letter> <letter id="2,14">W</letter> <letter id="2,15">U</letter> <letter id="3,1">G</letter> <letter id="3,2">O</letter> <letter id="3,3">L</letter> <letter id="3,4">D</letter> <letter id="3,5">D</letter> <letter id="3,6">I</letter> <letter id="3,7">G</letter> <letter id="3,8">G</letter> <letter id="3,9">E</letter> <letter id="3,10">R</letter> <blank></blank> <letter id="3,12">P</letter> <letter id="3,13">I</letter> <letter id="3,14">E</letter> <letter id="3,15">R</letter> <letter id="4,1">S</letter> <letter id="4,2">P</letter> <letter id="4,3">L</letter> <letter id="4,4">E</letter> <letter id="4,5">E</letter> <letter id="4,6">N</letter> <blank></blank> <letter id="4,8">N</letter> <letter id="4,9">A</letter> <letter id="4,10">V</letter> <letter id="4,11">Y</letter> <blank></blank> <letter id="4,13">W</letter> <letter id="4,14">E</letter> <letter id="4,15">E</letter> <blank></blank> <blank></blank> <blank></blank> <letter id="5,4">S</letter> <letter id="5,5">A</letter> <letter id="5,6">R</letter> <letter id="5,7">G</letter> <blank></blank> <letter id="5,9">N</letter> <letter id="5,10">E</letter> <letter id="5,11">W</letter> <letter id="5,12">M</letter> <letter id="5,13">A</letter> <letter id="5,14">T</letter> <letter id="5,15">H</letter> <letter id="6,1">C</letter> <letter id="6,2">O</letter> <letter id="6,3">A</letter> <letter id="6,4">T</letter> <letter id="6,5">R</letter> <letter id="6,6">O</letter> <letter id="6,7">O</letter> <letter id="6,8">M</letter> <blank></blank> <letter id="6,10">S</letter> <letter id="6,11">C</letter> <letter id="6,12">A</letter> <letter id="6,13">L</letter> <letter id="6,14">I</letter> <letter id="6,15">A</letter> <letter id="7,1">O</letter> <letter id="7,2">D</letter> <letter id="7,3">D</letter> <blank></blank> <letter id="7,5">M</letter> <letter id="7,6">A</letter> <letter id="7,7">L</letter> <letter id="7,8">I</letter> <letter id="7,9">C</letter> <blank></blank> <letter id="7,11">A</letter> <letter id="7,12">R</letter> <letter id="7,13">L</letter> <letter id="7,14">E</letter> <letter id="7,15">N</letter> <letter id="8,1">M</letter> <letter id="8,2">E</letter> <letter id="8,3">A</letter> <letter id="8,4">T</letter> <blank></blank> <letter id="8,6">D</letter> <letter id="8,7">E</letter> <letter id="8,8">M</letter> <letter id="8,9">O</letter> <letter id="8,10">N</letter> <blank></blank> <letter id="8,12">L</letter> <letter id="8,13">A</letter> <letter id="8,14">P</letter> <letter id="8,15">D</letter> <letter id="9,1">P</letter> <letter id="9,2">R</letter> <letter id="9,3">M</letter> <letter id="9,4">E</letter> <letter id="9,5">N</letter> <blank></blank> <letter id="9,7">M</letter> <letter id="9,8">E</letter> <letter id="9,9">D</letter> <letter id="9,10">O</letter> <letter id="9,11">C</letter> <blank></blank> <letter id="9,13">C</letter> <letter id="9,14">I</letter> <letter id="9,15">E</letter> <letter id="10,1">U</letter> <letter id="10,2">N</letter> <letter id="10,3">S</letter> <letter id="10,4">E</letter> <letter id="10,5">A</letter> <letter id="10,6">T</letter> <blank></blank> <letter id="10,8">D</letter> <letter id="10,9">E</letter> <letter id="10,10">T</letter> <letter id="10,11">A</letter> <letter id="10,12">C</letter> <letter id="10,13">H</letter> <letter id="10,14">E</letter> <letter id="10,15">D</letter> <letter id="11,1">L</letter> <letter id="11,2">E</letter> <letter id="11,3">A</letter> <letter id="11,4">N</letter> <letter id="11,5">T</letter> <letter id="11,6">O</letter> <letter id="11,7">S</letter> <blank></blank> <letter id="11,9">D</letter> <letter id="11,10">A</letter> <letter id="11,11">T</letter> <letter id="11,12">A</letter> <blank></blank> <blank></blank> <blank></blank> <letter id="12,1">S</letter> <letter id="12,2">I</letter> <letter id="12,3">P</letter> <blank></blank> <letter id="12,5">O</letter> <letter id="12,6">M</letter> <letter id="12,7">N</letter> <letter id="12,8">I</letter> <blank></blank> <letter id="12,10">B</letter> <letter id="12,11">A</letter> <letter id="12,12">R</letter> <letter id="12,13">T</letter> <letter id="12,14">A</letter> <letter id="12,15">B</letter> <letter id="13,1">I</letter> <letter id="13,2">S</letter> <letter id="13,3">P</letter> <letter id="13,4">S</letter> <blank></blank> <letter id="13,6">C</letter> <letter id="13,7">A</letter> <letter id="13,8">N</letter> <letter id="13,9">N</letter> <letter id="13,10">E</letter> <letter id="13,11">L</letter> <letter id="13,12">L</letter> <letter id="13,13">O</letter> <letter id="13,14">N</letter> <letter id="13,15">I</letter> <letter id="14,1">O</letter> <letter id="14,2">S</letter> <letter id="14,3">L</letter> <letter id="14,4">O</letter> <blank></blank> <letter id="14,6">A</letter> <letter id="14,7">T</letter> <letter id="14,8">G</letter> <letter id="14,9">U</letter> <letter id="14,10">N</letter> <letter id="14,11">P</letter> <letter id="14,12">O</letter> <letter id="14,13">I</letter> <letter id="14,14">N</letter> <letter id="14,15">T</letter> <letter id="15,1">N</letter> <letter id="15,2">E</letter> <letter id="15,3">E</letter> <letter id="15,4">D</letter> <blank></blank> <letter id="15,6">T</letter> <letter id="15,7">H</letter> <letter id="15,8">E</letter> <letter id="15,9">B</letter> <letter id="15,10">E</letter> <letter id="15,11">A</letter> <letter id="15,12">T</letter> <letter id="15,13">L</letter> <letter id="15,14">E</letter> <letter id="15,15">S</letter> </grid> <clues> <across cellid="1,1">"The Verdict" actor</across> <across cellid="1,12">1979 Nastassja Kinski role</across> <across cellid="2,1">Boardwalk buy</across> <across cellid="2,12">Labor org. born on the Pacific coast in the late '30s</across> <across cellid="3,1">Catherine Zeta-Jones's character in "Intolerable Cruelty," e.g.</across> <across cellid="3,12">Walk on water?</across> <across cellid="4,1">Lymphocyte producer</across> <across cellid="4,8">Blue hue</across> <across cellid="4,13">Barely perceptible</across> <across cellid="5,4">Marionette maker Tony</across> <across cellid="5,9">It was based on set theory</across> <across cellid="6,1">You can check your hang-ups here</across> <across cellid="6,10">Reagan Court appointee</across> <across cellid="7,1">Matchless</across> <across cellid="7,5">Aging acid found in fruit</across> <across cellid="7,11">Composer of "The Wizard of Oz"</across> <across cellid="8,1">Charcuterie offering</across> <across cellid="8,6">Speedy one?</across> <across cellid="8,12">"Rush Hour" org.</across> <across cellid="9,1">Flacks</across> <across cellid="9,7">Red Bordeaux</across> <across cellid="9,13">Cannes concern (abbr.)</across> <across cellid="10,1">Topple</across> <across cellid="10,8">Icy</across> <across cellid="11,1">Sheds</across> <across cellid="11,9">Apple cookie, e.g.</across> <across cellid="12,1">Small swallow</across> <across cellid="12,5">All at first?</across> <across cellid="12,10">Rounds keeper?</across> <across cellid="13,1">Gatekeepers with connections (abbr.)</across> <across cellid="13,6">Tubular invention of the late teens or early twenties</across> <across cellid="14,1">Capital at the center of Czechoslovakia?</across> <across cellid="14,6">Bad way to be married</across> <across cellid="15,1">Call for</across> <across cellid="15,6">'60s invaders</across> <down cellid="1,1">Feathered fishing hooks</down> <down cellid="1,2">Straddling</down> <down cellid="1,3">Oniomaniac's mecca</down> <down cellid="1,4">First to arrive?</down> <down cellid="1,5">Pistol or saber</down> <down cellid="1,6">Artery</down> <down cellid="1,7">Emma's "Sense and Sensibility" director</down> <down cellid="1,8">Ratify</down> <down cellid="1,9">Key location?</down> <down cellid="1,10">Butterflies</down> <down cellid="1,12">Begin to upset</down> <down cellid="1,13">"Baby Doll" was his film debut</down> <down cellid="1,14">Sugar</down> <down cellid="1,15">Displaying the skill and experience of an expert</down> <down cellid="4,11">Blue Triangle gp.</down> <down cellid="5,7">Automaton</down> <down cellid="5,12">Earthy deposit of clay and calcium carbonate</down> <down cellid="6,1">It's usually irresistible</down> <down cellid="6,2">___ Line (post-WWII German-Polish border)</down> <down cellid="6,3">It's more prominent in men than in women</down> <down cellid="6,8">Went without saying?</down> <down cellid="7,9">In C, perhaps</down> <down cellid="8,4">Nancy Drew or King Tut</down> <down cellid="8,10">"Read this!"</down> <down cellid="9,5">Partnership for Peace gp.</down> <down cellid="9,11">Indian bean</down> <down cellid="10,6">F-14 fighter</down> <down cellid="10,12">Where to find both cream puffs and lemons</down> <down cellid="11,7">Scythe handle</down> <down cellid="12,8">"Where's Daddy?" dramatist</down> <down cellid="12,13">Drudge or trudge</down> <down cellid="12,14">Archer without a quiver?</down> <down cellid="12,15">They're straight from the horse's mouth</down> <down cellid="13,4">Intentional grounding?</down> <down cellid="13,9">Heart of the matter</down> </clues> </american> </crossword> </puzzle>
schemas
outline in xml
Outline of the XML structure in XML. The example crossword XML above follows this form.
<?xml version="1.0" encoding="utf-8"?> <puzzle> <crossword language="en"> <metadata> <title>"Title of Crossword"</title> <date>month DD, YYYY</date> <creator>By John Smith</creator> <rights>Copyright, ...</rights> <publisher>Houston Chronical</publisher> <identifier>http://www.example.com/puzzles/crossword.puz</identifier> <description>Puzzle of type Crossword and style American in language English. Format .xml following schema http://www.kooonts.com/some/dir/crossword.</description> </metadata> <american> <grid rows="15" columns="15"> <letter id="1,1">W</letter> <letter id="1,2">O</letter> <letter id="1,3">R</letter> <letter id="1,4">D</letter> <letter id="1,5">A</letter> <letter id="1,6">C</letter> <letter id="1,7">R</letter> <letter id="1,8">O</letter> <letter id="1,9">S</letter> <letter id="1,10">S</letter> <blank></blank> <letter id="1,12">W</letter> ... </grid> <clues> <across cellid="1,1">Clue text</across> <across cellid="1,12">Clue text</across> <across cellid="2,1">Clue text</across> ... <down cellid="1,1">Clue text</down> <down cellid="1,2">Clue text</down> <down cellid="1,3">Clue text</down> ... </clues> </american> </crossword> </puzzle>
A puzzle can be of type 'crossword', the language is specified (according to XML Language Identification xml:lang tag). A crossword has 'metadata' and 'style'. The metadata can contain various fields and must have a title, and creator. the 'style' contains the grid and clue informtion. a style can be 'American', 'French', ..
The order of the, letter or blank, cells is the order of the cells in the puzzle, starting at the top and going left to right. The only remaining information is which clue in what direction goes with which cell. Each letter that is the first letter of an word is assigned an id, each clue references that id. So the letter at position column=1, row=1 could have id="foo", then the corresponding accross and down clues would have a cellid="foo". While in practice for human clarity in the XML, the characters representing the column, row position are used, they have no meaning other than matching the letter cell with it's across and down clues. letter id="1,1" matches clue cellid="1,1".
When data is entered by humans for humans, the same data may be in multiple places, to help the human keep track of what goes where. This can lead to problems if there is a conflict in the data. A XML represenatation of a crossword, while still human readable, can avoid these errors by having the necessary information in one place. Knowing that some software will probably generate a playable puzzle from the xml file. This is why, for example, the clue numbers are not in the XML, they are generated while building the puzzle.
While the XML outline shows the general structure, it does not describe all of the information necessary for a schema. For this a schema language for XML is helpful. RELAX NG is a schema language for XML with a XML syntax (.rng) and a compact nonXML syntax (.rnc) that can be used to represent schemas. The W3C schema language is XML Schema Document (.xsd).
.rnc - RELAX NG Compact Syntax
default namespace = "" start = element puzzle { element crossword { attribute language { xsd:language }, element metadata { element title { text }, element date { xsd:date }, element creator { text }, element rights { text }? element publisher { text }? element identifier { xsd:anyURI }? element description { text }? }+, element american { element grid { attribute columns { xsd:positiveInteger }, attribute rows { xsd:positiveInteger }, (element blank { empty } | element letter { attribute id { text }, xsd:string { minLength = "1" maxLength = "1" } })+ }, element clues { element across { attribute cellid { text }, text }+, element down { attribute cellid { text }, text }+ } } } }
.xsd - XML Schema Document
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="puzzle"> <xs:complexType> <xs:sequence> <xs:element ref="crossword"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="crossword"> <xs:complexType> <xs:sequence> <xs:element ref="metadata"/> <xs:element ref="american"/> </xs:sequence> <xs:attribute name="language" use="required" type="xs:NCName"/> </xs:complexType> </xs:element> <xs:element name="metadata"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="date"/> <xs:element ref="creator"/> <xs:element ref="rights"/> <xs:element ref="publisher"/> <xs:element ref="identifier"/> <xs:element ref="description"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="title" type="xs:string"/> <xs:element name="date" type="xs:string"/> <xs:element name="creator" type="xs:string"/> <xs:element name="rights" type="xs:string"/> <xs:element name="publisher" type="xs:string"/> <xs:element name="identifier" type="xs:anyURI"/> <xs:element name="description" type="xs:string"/> <xs:element name="american"> <xs:complexType> <xs:sequence> <xs:element ref="grid"/> <xs:element ref="clues"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="grid"> <xs:complexType> <xs:choice maxOccurs="unbounded"> <xs:element ref="blank"/> <xs:element ref="letter"/> </xs:choice> <xs:attribute name="columns" use="required" type="xs:integer"/> <xs:attribute name="rows" use="required" type="xs:integer"/> </xs:complexType> </xs:element> <xs:element name="blank"> <xs:complexType/> </xs:element> <xs:element name="letter"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:NCName"> <xs:attribute name="id" use="required"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="clues"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" ref="across"/> <xs:element maxOccurs="unbounded" ref="down"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="across"> <xs:complexType mixed="true"> <xs:attribute name="cellid" use="required"/> </xs:complexType> </xs:element> <xs:element name="down"> <xs:complexType mixed="true"> <xs:attribute name="cellid" use="required"/> </xs:complexType> </xs:element> </xs:schema>
.dtd
<?xml encoding="UTF-8"?> <!ELEMENT puzzle (crossword)> <!ATTLIST puzzle xmlns CDATA #FIXED ''> <!ELEMENT crossword (metadata,american)> <!ATTLIST crossword xmlns CDATA #FIXED '' language NMTOKEN #REQUIRED> <!ELEMENT metadata (title,date,creator,rights,publisher,identifier, description)> <!ATTLIST metadata xmlns CDATA #FIXED ''> <!ELEMENT american (grid,clues)> <!ATTLIST american xmlns CDATA #FIXED ''> <!ELEMENT title (#PCDATA)> <!ATTLIST title xmlns CDATA #FIXED ''> <!ELEMENT date (#PCDATA)> <!ATTLIST date xmlns CDATA #FIXED ''> <!ELEMENT creator (#PCDATA)> <!ATTLIST creator xmlns CDATA #FIXED ''> <!ELEMENT rights (#PCDATA)> <!ATTLIST rights xmlns CDATA #FIXED ''> <!ELEMENT publisher (#PCDATA)> <!ATTLIST publisher xmlns CDATA #FIXED ''> <!ELEMENT identifier (#PCDATA)> <!ATTLIST identifier xmlns CDATA #FIXED ''> <!ELEMENT description (#PCDATA)> <!ATTLIST description xmlns CDATA #FIXED ''> <!ELEMENT grid (blank|letter)+> <!ATTLIST grid xmlns CDATA #FIXED '' columns CDATA #REQUIRED rows CDATA #REQUIRED> <!ELEMENT clues (across+,down+)> <!ATTLIST clues xmlns CDATA #FIXED ''> <!ELEMENT blank EMPTY> <!ATTLIST blank xmlns CDATA #FIXED ''> <!ELEMENT letter (#PCDATA)> <!ATTLIST letter xmlns CDATA #FIXED '' id CDATA #REQUIRED> <!ELEMENT across (#PCDATA)> <!ATTLIST across xmlns CDATA #FIXED '' cellid CDATA #REQUIRED> <!ELEMENT down (#PCDATA)> <!ATTLIST down xmlns CDATA #FIXED '' cellid CDATA #REQUIRED>
references
normative
- [RDF]
- Resource Description Framework (RDF)
- [RDFS]
- RDF Schema (RDFS)
- [RELAX NG Compact Syntax Tutorial]
- RELAX NG Compact Syntax Tutorial, OASIS Working Draft, 26 March 2003
- [RNG]
- RELAX NG Specification, OASIS Committee Specification, 3 December 2001. Definitive specification for RELAX NG using the XML syntax.
- [RNC]
- RELAX NG Compact Syntax, OASIS Committee Specification, 21 November 2002. Definitive specification for the compact syntax in terms of the XML syntax.
- [Unicode]
- The Unicode Consortium. The Unicode Standard, Version 3.2 or later
- [XML 1.0]
- Extensible Markup Language (XML) 1.0
- [XML:LANG]
- XML Language Identification, xml:lang tag.
- [XSD]
- XML Schema Document (.xsd)
informative
- [Crossword Puzzles from around the World]
- Crossword Puzzles from around the World. Adapted From: 101 Crossword Puzzles For Dummies, Volume 1 by Dummies.com.
- [CMO]
- Crossword Maestro software has .cmo file format.
- [CWML]
- CRUCIVERB.COM Crossword Constructors Community Center. Mailing list for a "CrossWord Markup Language"
- [Dublin Core]
- Dublin Core Metadata Initiative
- [PUZ]
- Across Lite Literate Software Systems has .puz format.
- [SPECIFICATION]
- List of links to CROSSWORD SPECIFICATION SHEETS for various publications. CRUCIVERB.COM Crossword Constructors Community Center
- [XML]
- W3C Extensible Markup Language (XML)
- [XML Schema]
- W3C XML Schema
- [XWD]
- Crossdown has .xwd format.