XML Tutorial/XML Schema/pattern

Материал из Web эксперт
Перейти к: навигация, поиск

A character class expression is simply a character group, enclosed in square brackets

   <source lang="xml">

For example, if we wanted to allow any single upper case ASCII letter: <xs:pattern value="[A-Z]" />

uses the "s-e range" form, a contiguous range of character values beginning with the min value and up to and including the max value. <xs:pattern value="[min-max]" /></source>


Any ASCII letter: adding a second character range to the character group expression

   <source lang="xml">

<xs:pattern value="[A-Za-z]" /> invert our positive character group to be a negative character group, simply by preceding the above character ranges with a caret (^) character

<xs:pattern value="[^A-Za-z]" /></source>


Any single normal character will match only that character

   <source lang="xml">

For example, only a single "A" character can match the following regular expression: <xs:pattern value="A" /></source>


A phone number

   <source lang="xml">

<xsd:simpleType name="phoneType">

 <xsd:restriction base="xsd:string">
  <xsd:pattern value="[0-9]{3}-[0-9]{7}"/>
 </xsd:restriction>

</xsd:simpleType></source>


Character classes

   <source lang="xml">

\s Spaces. \S Characters that are not spaces. \d Digits \D Characters that are not digits. \w Extended "word" characters \W Nonword characters. \i XML 1.0 initial name characters \I Characters that may not be used as a XML initial name character. \c XML 1.0 name characters \C Characters that may not be used in a XML 1.0 name.</source>


Define a pattern that can be used for zip codes

   <source lang="xml">

<xsd:simpleType name="zipType">

 <xsd:restriction base="xsd:string">
  <xsd:pattern value="[0-9]{5}"/>
 </xsd:restriction>

</xsd:simpleType></source>


Getting rid of leading zeros

   <source lang="xml">

<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

 targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
 elementFormDefault="qualified">
 <xs:simpleType name="myType">
   <xs:restriction base="xs:integer">
     <xs:pattern value="[+-]?([1-9][0-9]*|0)" />
   </xs:restriction>
 </xs:simpleType>

</xs:schema></source>


list of atoms that match a single character

   <source lang="xml">

\n New line (can also be written as " - since we are in a XML document). \r Carriage return (can also be written as "&#x0D; -- ). \t Tabulation (can also be written as " -- ) \\ Character "\" \| Character "|" \. Character "." \- Character "-" \^ Character "^" \? Character "?" \* Character "*" \+ Character "+" \{ Character "{" \} Character "}" \( Character "(" \) Character ")" \[ Character "[" \] Character "]"</source>


Merge our three patterns into one

   <source lang="xml">

<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:simpleType name="myByte"> 
     <xs:restriction base="xs:byte"> 
       <xs:pattern value="1?5?"/> 
     </xs:restriction> 
   </xs:simpleType> 

</xs:schema></source>


Meta Characters

   <source lang="xml">

Meta character Use/Meaning . Match any character, except end-of-line (#x0D and/or #x0A) - same as [^\n\r] \ Begin escape sequence ? Zero or one occurrences

  • Zero or more occurrences

+ One or more occurrences { } Enclose a numeric quantifier or character group ( ) Enclose a regular expression (may be the atom of another regex) [ ] Enclose a character class expression</source>


pattern Constrains the lexical space to literals that must match a defined pattern

   <source lang="xml">

<xsd:simpleType name="isbnType">

 <xsd:restriction base="xsd:string">
  <xsd:pattern value="[0-9]{10}"/>
 </xsd:restriction>

</xsd:simpleType></source>


Pattern for time

   <source lang="xml">

<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

 targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
 elementFormDefault="qualified">
 <xsd:element name="gestation">
   <xsd:simpleType>
     <xsd:restriction base="xsd:time">
       <xsd:pattern value="P\d+D" />
     </xsd:restriction>
   </xsd:simpleType>
 </xsd:element>

</xsd:schema></source>


Pattern syntax

   <source lang="xml">

. for any character at all \d for any digit; \D for any non-digit \s for any white space (including space, tab, newline, and return); \S for any character that is not white space x* to have zero or more x"s; (xy)* to have zero or more xy"s x? to have one or zero x"s; (xy)? to have one or no xy"s x+ to have one or more x"s; (xy)+ to have one or more xy"s [abc] to include one of a group of values (a, b, or c) [0? to include the range of values from 0 to 9 A|B to have A or B in the content. x{5} to have exactly 5 x"s (in a row) x{5,} to have at least 5 x"s (in a row) x{5,8} to have at least 5 and at most 8 x"s (in a row) (xyz){2} to have exactly two xyz"s (in a row).</source>


pattern: USA_SSN datatype

   <source lang="xml">

File: Schema.xsd <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

 targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
 elementFormDefault="qualified">
 <xs:simpleType name="USA_SSN">
   <xs:restriction base="xs:string">
     <xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}" />
   </xs:restriction>
 </xs:simpleType>

</xs:schema></source>


Special regex characters (-[]) cannot be used for the single normal character form of the character range.

   <source lang="xml">

For example, we can match either the opening or closing square bracket with the following: <xs:pattern value="[\[\]]" /></source>


Specifying a Pattern for a Simple Type

   <source lang="xml">

<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

 targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
 elementFormDefault="qualified">
 <xsd:element name="invoice_number">
   <xsd:simpleType>
     <xsd:restriction base="xsd:string">
       <xsd:pattern value="INV #99\d{3}" />
     </xsd:restriction>
   </xsd:simpleType>
 </xsd:element>

</xsd:schema> File: Data.xml <?xml version="1.0"?> <invoice_number xmlns="http://www.wbex.ru">INV #99426</invoice_number></source>


These three characters should be used with caution:

   <source lang="xml">

Hybrid Use/Meaning ^ Begin a negative character group - 1) Begin a character class subtraction

       2) Separate the minimum/maximum values that define a range of character values

, Separate the minimum/maximum values for number of occurrences of an atom</source>


To match a string of any length (including the empty string) that is comprised exclusively of lower-case ASCII letters

   <source lang="xml">

<xs:pattern value="[a-z]*" />

Below is an example of element content that matches the above pattern: <example>qwertyuiop</example>

Describing structured numeric strings like US Social Security Numbers (SSNs): <xs:pattern value="\d{3}-\d{2}-\d{4}" />

Allow only the ten ASCII digits, using this character class expression

<xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}" /></source>


Unicode character classes

   <source lang="xml">

Unicode Character Class Includes C Other characters (non-letters, non symbols, non-numbers, non-separators) Cc Control characters Cf Format characters Cn Unassigned code points Co Private use characters L Letters Ll Lowercase letters Lm Modifier letters Lo Other letters Lt Titlecase letters Lu Uppercase letters M All Marks Mc Spacing combining marks Me Enclosing marks Mn Non-spacing marks N Numbers Nd Decimal digits Nl Number letters No Other numbers P Punctuation Pc Connector punctuation Pd Dashes Pe Closing punctuation Pf Final quotes (may behave like Ps or Pe) Pi Initial quotes (may behave like Ps or Pe) Po Other forms of punctuation Ps Opening punctuation S Symbols Sc Currency symbols Sk Modifier symbols Sm Mathematical symbols So Other symbols Z Separators Zl Line breaks Zp Paragraph breaks Zs Spaces</source>


Use quantifiers to limit the number of leading zeros-for instance

   <source lang="xml">

The following pattern limits the number of leading zeros to up to 2: <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:simpleType name="myByte"> 
     <xs:restriction base="xs:byte"> 
       <xs:pattern value="0{0,2}1?5?"/> 
     </xs:restriction> 
   </xs:simpleType> 

</xs:schema></source>


User-defined character classes

   <source lang="xml">

[azertyuiop] define the list of letters on the first row of a French keyboard, [a-z] specify all the characters between "a" and "z", [^a-z] for all the characters that are not between "a" and "z," but also [-^\\] define the characters "-," "^," and "\," or [-+] specify a decimal sign.</source>


You can use patterns to offer choices for an element"s content.

   <source lang="xml">

File: Schema.xsd <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

 targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
 elementFormDefault="qualified">
 <xsd:element name="language">
   <xsd:simpleType>
     <xsd:restriction base="xsd:string">
       <xsd:pattern value="English|Latin" />
     </xsd:restriction>
   </xsd:simpleType>
 </xsd:element>

</xsd:schema>

File: Data.xml <?xml version="1.0"?> <language xmlns="http://www.wbex.ru">English</language></source>