XML Tutorial/XML Schema/pattern

Материал из Web эксперт
Версия от 18:22, 25 мая 2010; (обсуждение)
(разн.) ← Предыдущая | Текущая версия (разн.) | Следующая → (разн.)
Перейти к: навигация, поиск

A character class expression is simply a character group, enclosed in square brackets

For example, if we wanted to allow any single upper case ASCII letter:
<xs:pattern value="[A-Z]" />

uses the "s-e range" form, a contiguous range of character values beginning with the min value and up to and including the max value. 
<xs:pattern value="[min-max]" />


Any ASCII letter: adding a second character range to the character group expression

<xs:pattern value="[A-Za-z]" />
invert our positive character group to be a negative character group, simply by preceding the above character ranges with a caret (^) character
 
<xs:pattern value="[^A-Za-z]" />


Any single normal character will match only that character

For example, only a single "A" character can match the following regular expression:
<xs:pattern value="A" />


A phone number

<xsd:simpleType name="phoneType">
  <xsd:restriction base="xsd:string">
   <xsd:pattern value="[0-9]{3}-[0-9]{7}"/>
  </xsd:restriction>
</xsd:simpleType>


Character classes

\s  Spaces. 
\S  Characters that are not spaces.  
\d  Digits
\D  Characters that are not digits.  
\w  Extended "word" characters 
\W  Nonword characters.  
\i  XML 1.0 initial name characters  
\I  Characters that may not be used as a XML initial name character.  
\c  XML 1.0 name characters 
\C  Characters that may not be used in a XML 1.0 name.


Define a pattern that can be used for zip codes

<xsd:simpleType name="zipType">
  <xsd:restriction base="xsd:string">
   <xsd:pattern value="[0-9]{5}"/>
  </xsd:restriction>
</xsd:simpleType>


Getting rid of leading zeros

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
  elementFormDefault="qualified">
  <xs:simpleType name="myType">
    <xs:restriction base="xs:integer">
      <xs:pattern value="[+-]?([1-9][0-9]*|0)" />
    </xs:restriction>
  </xs:simpleType>
</xs:schema>


list of atoms that match a single character

\n   New line (can also be written as "&#x0A;- since we are in a XML document).  
\r   Carriage return (can also be written as "&#x0D; -- ).  
\t   Tabulation (can also be written as "&#x09; -- )  
\\   Character "\"  
\|   Character "|"  
\.   Character "."  
\-   Character "-"  
\^   Character "^"  
\?   Character "?"  
\*   Character "*"  
\+   Character "+"  
\{   Character "{"  
\}   Character "}"  
\(   Character "("  
\)   Character ")"  
\[   Character "["  
\]   Character "]"


Merge our three patterns into one

<?xml version="1.0"?>  
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
    <xs:simpleType name="myByte"> 
      <xs:restriction base="xs:byte"> 
        <xs:pattern value="1?5?"/> 
      </xs:restriction> 
    </xs:simpleType> 
</xs:schema>


Meta Characters

Meta character  Use/Meaning
.               Match any character, except end-of-line (#x0D and/or #x0A) - same as [^\n\r] 
\               Begin escape sequence
?               Zero or one occurrences
*               Zero or more occurrences
+               One or more occurrences
{ }           Enclose a numeric quantifier or character group
( )           Enclose a regular expression (may be the atom of another regex)
[ ]           Enclose a character class expression


pattern Constrains the lexical space to literals that must match a defined pattern

<xsd:simpleType name="isbnType">
  <xsd:restriction base="xsd:string">
   <xsd:pattern value="[0-9]{10}"/>
  </xsd:restriction>
</xsd:simpleType>


Pattern for time

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
  elementFormDefault="qualified">
  <xsd:element name="gestation">
    <xsd:simpleType>
      <xsd:restriction base="xsd:time">
        <xsd:pattern value="P\d+D" />
      </xsd:restriction>
    </xsd:simpleType>
  </xsd:element>
</xsd:schema>


Pattern syntax

.          for any character at all
\d         for any digit; 
\D         for any non-digit
\s         for any white space (including space, tab, newline, and return); 
\S         for any character that is not white space
x*         to have zero or more x"s; 
(xy)*      to have zero or more xy"s
x?         to have one or zero x"s; 
(xy)?      to have one or no xy"s
x+         to have one or more x"s; (xy)+ to have one or more xy"s
[abc]      to include one of a group of values (a, b, or c)
[0?      to include the range of values from 0 to 9
A|B        to have A or B in the content. 
x{5}       to have exactly 5 x"s (in a row)
x{5,}      to have at least 5 x"s (in a row)
x{5,8}     to have at least 5 and at most 8 x"s (in a row)
(xyz){2}   to have exactly two xyz"s (in a row).


pattern: USA_SSN datatype

File: Schema.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
  elementFormDefault="qualified">
  <xs:simpleType name="USA_SSN">
    <xs:restriction base="xs:string">
      <xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}" />
    </xs:restriction>
  </xs:simpleType>
</xs:schema>


Special regex characters (-[]) cannot be used for the single normal character form of the character range.

For example, we can match either the opening or closing square bracket with the following: 
<xs:pattern value="[\[\]]" />


Specifying a Pattern for a Simple Type

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
  elementFormDefault="qualified">
  <xsd:element name="invoice_number">
    <xsd:simpleType>
      <xsd:restriction base="xsd:string">
        <xsd:pattern value="INV #99\d{3}" />
      </xsd:restriction>
    </xsd:simpleType>
  </xsd:element>

</xsd:schema>
File: Data.xml 
<?xml version="1.0"?>
<invoice_number xmlns="http://www.wbex.ru">INV #99426</invoice_number>


These three characters should be used with caution:

Hybrid  Use/Meaning
^       Begin a negative character group
-       1) Begin a character class subtraction
        2) Separate the minimum/maximum values that define a range of character values
,       Separate the minimum/maximum values for number of occurrences of an atom


To match a string of any length (including the empty string) that is comprised exclusively of lower-case ASCII letters

<xs:pattern value="[a-z]*" />

Below is an example of element content that matches the above pattern:
<example>qwertyuiop</example>

Describing structured numeric strings like US Social Security Numbers (SSNs): 
<xs:pattern value="\d{3}-\d{2}-\d{4}" />

Allow only the ten ASCII digits, using this character class expression

<xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}" />


Unicode character classes

Unicode Character Class   Includes 
C                           Other characters (non-letters, non symbols, non-numbers, non-separators)  
Cc                           Control characters 
Cf                           Format characters 
Cn                           Unassigned code points 
Co                           Private use characters 
L                           Letters 
Ll                           Lowercase letters 
Lm                           Modifier letters 
Lo                           Other letters 
Lt                           Titlecase letters 
Lu                           Uppercase letters 
M                           All Marks 
Mc                           Spacing combining marks 
Me                           Enclosing marks 
Mn                           Non-spacing marks 
N                           Numbers 
Nd                           Decimal digits 
Nl                           Number letters 
No                           Other numbers 
P                           Punctuation 
Pc                           Connector punctuation 
Pd                           Dashes 
Pe                           Closing punctuation 
Pf                           Final quotes (may behave like Ps or Pe) 
Pi                           Initial quotes (may behave like Ps or Pe) 
Po                           Other forms of punctuation 
Ps                           Opening punctuation 
S                           Symbols 
Sc                           Currency symbols 
Sk                           Modifier symbols 
Sm                           Mathematical symbols 
So                           Other symbols 
Z                           Separators 
Zl                           Line breaks 
Zp                           Paragraph breaks 
Zs                           Spaces


Use quantifiers to limit the number of leading zeros-for instance

The following pattern limits the number of leading zeros to up to 2:
<?xml version="1.0"?>  
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
    <xs:simpleType name="myByte"> 
      <xs:restriction base="xs:byte"> 
        <xs:pattern value="0{0,2}1?5?"/> 
      </xs:restriction> 
    </xs:simpleType> 
</xs:schema>


User-defined character classes

[azertyuiop]   define the list of letters on the first row of a French keyboard, 
[a-z]          specify all the characters between "a" and "z",  
[^a-z]         for all the characters that are not between "a" and "z," but also  
[-^\\]         define the characters "-," "^," and "\," or  
[-+]           specify a decimal sign.


You can use patterns to offer choices for an element"s content.

File: Schema.xsd
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
  elementFormDefault="qualified">
  <xsd:element name="language">
    <xsd:simpleType>
      <xsd:restriction base="xsd:string">
        <xsd:pattern value="English|Latin" />
      </xsd:restriction>
    </xsd:simpleType>
  </xsd:element>
</xsd:schema>
 
File: Data.xml
<?xml version="1.0"?>
<language xmlns="http://www.wbex.ru">English</language>