XML Tutorial/XML Schema/pattern — различия между версиями
Admin (обсуждение | вклад) м (1 версия) |
(нет различий)
Текущая версия на 08:26, 26 мая 2010
- 1 A character class expression is simply a character group, enclosed in square brackets
- 2 Any ASCII letter: adding a second character range to the character group expression
- 3 Any single normal character will match only that character
- 4 A phone number
- 5 Character classes
- 6 Define a pattern that can be used for zip codes
- 7 Getting rid of leading zeros
- 8 list of atoms that match a single character
- 9 Merge our three patterns into one
- 10 Meta Characters
- 11 pattern Constrains the lexical space to literals that must match a defined pattern
- 12 Pattern for time
- 13 Pattern syntax
- 14 pattern: USA_SSN datatype
- 15 Special regex characters (-[]) cannot be used for the single normal character form of the character range.
- 16 Specifying a Pattern for a Simple Type
- 17 These three characters should be used with caution:
- 18 To match a string of any length (including the empty string) that is comprised exclusively of lower-case ASCII letters
- 19 Unicode character classes
- 20 Use quantifiers to limit the number of leading zeros-for instance
- 21 User-defined character classes
- 22 You can use patterns to offer choices for an element"s content.
A character class expression is simply a character group, enclosed in square brackets
For example, if we wanted to allow any single upper case ASCII letter:
<xs:pattern value="[A-Z]" />
uses the "s-e range" form, a contiguous range of character values beginning with the min value and up to and including the max value.
<xs:pattern value="[min-max]" />
Any ASCII letter: adding a second character range to the character group expression
<xs:pattern value="[A-Za-z]" />
invert our positive character group to be a negative character group, simply by preceding the above character ranges with a caret (^) character
<xs:pattern value="[^A-Za-z]" />
Any single normal character will match only that character
For example, only a single "A" character can match the following regular expression:
<xs:pattern value="A" />
A phone number
<xsd:simpleType name="phoneType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{3}-[0-9]{7}"/>
Character classes
\s Spaces.
\S Characters that are not spaces.
\d Digits
\D Characters that are not digits.
\w Extended "word" characters
\W Nonword characters.
\i XML 1.0 initial name characters
\I Characters that may not be used as a XML initial name character.
\c XML 1.0 name characters
\C Characters that may not be used in a XML 1.0 name.
Define a pattern that can be used for zip codes
<xsd:simpleType name="zipType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{5}"/>
Getting rid of leading zeros
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
<xs:simpleType name="myType">
<xs:restriction base="xs:integer">
<xs:pattern value="[+-]?([1-9][0-9]*|0)" />
list of atoms that match a single character
\n New line (can also be written as "
- since we are in a XML document).
\r Carriage return (can also be written as "
 -- ).
\t Tabulation (can also be written as "	 -- )
\\ Character "\"
\| Character "|"
\. Character "."
\- Character "-"
\^ Character "^"
\? Character "?"
\* Character "*"
\+ Character "+"
\{ Character "{"
\} Character "}"
\( Character "("
\) Character ")"
\[ Character "["
\] Character "]"
Merge our three patterns into one
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="myByte">
<xs:restriction base="xs:byte">
<xs:pattern value="1?5?"/>
Meta Characters
Meta character Use/Meaning
. Match any character, except end-of-line (#x0D and/or #x0A) - same as [^\n\r]
\ Begin escape sequence
? Zero or one occurrences
* Zero or more occurrences
+ One or more occurrences
{ } Enclose a numeric quantifier or character group
( ) Enclose a regular expression (may be the atom of another regex)
[ ] Enclose a character class expression
pattern Constrains the lexical space to literals that must match a defined pattern
<xsd:simpleType name="isbnType">
<xsd:restriction base="xsd:string">
<xsd:pattern value="[0-9]{10}"/>
Pattern for time
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
<xsd:element name="gestation">
<xsd:restriction base="xsd:time">
<xsd:pattern value="P\d+D" />
Pattern syntax
. for any character at all
\d for any digit;
\D for any non-digit
\s for any white space (including space, tab, newline, and return);
\S for any character that is not white space
x* to have zero or more x"s;
(xy)* to have zero or more xy"s
x? to have one or zero x"s;
(xy)? to have one or no xy"s
x+ to have one or more x"s; (xy)+ to have one or more xy"s
[abc] to include one of a group of values (a, b, or c)
[0? to include the range of values from 0 to 9
A|B to have A or B in the content.
x{5} to have exactly 5 x"s (in a row)
x{5,} to have at least 5 x"s (in a row)
x{5,8} to have at least 5 and at most 8 x"s (in a row)
(xyz){2} to have exactly two xyz"s (in a row).
pattern: USA_SSN datatype
File: Schema.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
<xs:simpleType name="USA_SSN">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}" />
Special regex characters (-[]) cannot be used for the single normal character form of the character range.
For example, we can match either the opening or closing square bracket with the following:
<xs:pattern value="[\[\]]" />
Specifying a Pattern for a Simple Type
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
<xsd:element name="invoice_number">
<xsd:restriction base="xsd:string">
<xsd:pattern value="INV #99\d{3}" />
File: Data.xml
<?xml version="1.0"?>
<invoice_number xmlns="http://www.wbex.ru">INV #99426</invoice_number>
These three characters should be used with caution:
Hybrid Use/Meaning
^ Begin a negative character group
- 1) Begin a character class subtraction
2) Separate the minimum/maximum values that define a range of character values
, Separate the minimum/maximum values for number of occurrences of an atom
To match a string of any length (including the empty string) that is comprised exclusively of lower-case ASCII letters
<xs:pattern value="[a-z]*" />
Below is an example of element content that matches the above pattern:
Describing structured numeric strings like US Social Security Numbers (SSNs):
<xs:pattern value="\d{3}-\d{2}-\d{4}" />
Allow only the ten ASCII digits, using this character class expression
<xs:pattern value="[0-9]{3}-[0-9]{2}-[0-9]{4}" />
Unicode character classes
Unicode Character Class Includes
C Other characters (non-letters, non symbols, non-numbers, non-separators)
Cc Control characters
Cf Format characters
Cn Unassigned code points
Co Private use characters
L Letters
Ll Lowercase letters
Lm Modifier letters
Lo Other letters
Lt Titlecase letters
Lu Uppercase letters
M All Marks
Mc Spacing combining marks
Me Enclosing marks
Mn Non-spacing marks
N Numbers
Nd Decimal digits
Nl Number letters
No Other numbers
P Punctuation
Pc Connector punctuation
Pd Dashes
Pe Closing punctuation
Pf Final quotes (may behave like Ps or Pe)
Pi Initial quotes (may behave like Ps or Pe)
Po Other forms of punctuation
Ps Opening punctuation
S Symbols
Sc Currency symbols
Sk Modifier symbols
Sm Mathematical symbols
So Other symbols
Z Separators
Zl Line breaks
Zp Paragraph breaks
Zs Spaces
Use quantifiers to limit the number of leading zeros-for instance
The following pattern limits the number of leading zeros to up to 2:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="myByte">
<xs:restriction base="xs:byte">
<xs:pattern value="0{0,2}1?5?"/>
User-defined character classes
[azertyuiop] define the list of letters on the first row of a French keyboard,
[a-z] specify all the characters between "a" and "z",
[^a-z] for all the characters that are not between "a" and "z," but also
[-^\\] define the characters "-," "^," and "\," or
[-+] specify a decimal sign.
You can use patterns to offer choices for an element"s content.
File: Schema.xsd
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.wbex.ru" xmlns="http://www.wbex.ru"
<xsd:element name="language">
<xsd:restriction base="xsd:string">
<xsd:pattern value="English|Latin" />
File: Data.xml
<?xml version="1.0"?>
<language xmlns="http://www.wbex.ru">English</language>