Updated on June 7, 2022
Table of Contents
Problem
As part of your job, you’ll be developing an online order form for a European Union corporation.
Your customer (a VAT-registered firm based in one EU country) shall not be charged VAT when purchasing from a vendor (your company) located in another EU country (Value-Added Tax). VAT must be charged and paid to the local tax office if the buyer is not VAT-registered. Sellers are required to provide the tax office with the buyer’s VAT registration information to prove that no VAT is due. When selling tax-exempt goods, the vendor must ensure that the customer has a valid VAT number in order to proceed.
Typographical errors by the consumer are the most frequent source of invalid VAT numbers. You should use a regular expression to validate the VAT number as soon as the consumer enters it into your online order form in order to speed up the process. Your web server’s CGI script or some JavaScript can be used to implement this functionality on the customer’s end. A typographical error can be immediately corrected by the customer if the number entered does not fit the regular expression.
Solution
This solution is divided into two sections in order to make it easier to apply. To begin, all white space and punctuation have been removed. Validation is the next step.
Strip whitespace and punctuation
The customer’s VAT number should be stored in a variable. Replace all matches of this regular expression with a blank replacement text before checking for a valid number:
[-.●]
There are no possibilities for regex in this case.
Regex flavors:.NET, Java, JavaScript, PCRE, Perl, Python, Ruby Recipe 3.14 explains you how to do this initial replacement. To avoid confusion, we’ve assumed that customers would only use hyphens, dots, and spaces when entering punctuation. The upcoming check will catch any more characters.
Validate the number
This regular expression, which removes all whitespace and punctuation, verifies that the VAT number is valid in all 27 EU countries:
^( (AT)?U[0-9]{8} | # Austria (BE)?0[0-9]{9} | # Belgium (BG)?[0-9]{9,10} | # Bulgaria (CY)?[0-9]{8}L | # Cyprus (CZ)?[0-9]{8,10} | # Czech Republic (DE)?[0-9]{9} | # Germany (DK)?[0-9]{8} | # Denmark (EE)?[0-9]{9} | # Estonia (EL|GR)?[0-9]{9} | # Greece (ES)?[0-9A-Z][0-9]{7}[0-9A-Z] | # Spain (FI)?[0-9]{8} | # Finland (FR)?[0-9A-Z]{2}[0-9]{9} | # France (GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3}) | # United Kingdom (HU)?[0-9]{8} | # Hungary (IE)?[0-9]S[0-9]{5}L | # Ireland (IT)?[0-9]{11} | # Italy (LT)?([0-9]{9}|[0-9]{12}) | # Lithuania (LU)?[0-9]{8} | # Luxembourg (LV)?[0-9]{11} | # Latvia (MT)?[0-9]{8} | # Malta (NL)?[0-9]{9}B[0-9]{2} | # Netherlands (PL)?[0-9]{10} | # Poland (PT)?[0-9]{9} | # Portugal (RO)?[0-9]{2,10} | # Romania (SE)?[0-9]{12} | # Sweden (SI)?[0-9]{8} | # Slovenia (SK)?[0-9]{10} # Slovakia )$
Regex options: Free-spacing, case insensitive |
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby |
The above regular expression uses free-spacing mode to make it easy to edit later. Every now and then, new countries join the European Union, and member countries change their rules for VAT numbers. Unfortunately, JavaScript does not support free-spacing. In this case, you’re stuck putting everything on one line:
^((AT)?U[0-9]{8}|(BE)?0[0-9]{9}|(BG)?[0-9]{9,10}|(CY)?[0-9]{8}L|↵ (CZ)?[0-9]{8,10}|(DE)?[0-9]{9}|(DK)?[0-9]{8}|(EE)?[0-9]{9}|↵ (EL|GR)?[0-9]{9}|(ES)?[0-9A-Z][0-9]{7}[0-9A-Z]|(FI)?[0-9]{8}|↵ (FR)?[0-9A-Z]{2}[0-9]{9}|(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3})|↵ (HU)?[0-9]{8}|(IE)?[0-9]S[0-9]{5}L|(IT)?[0-9]{11}|↵ (LT)?([0-9]{9}|[0-9]{12})|(LU)?[0-9]{8}|(LV)?[0-9]{11}|(MT)?[0-9]{8}|↵ (NL)?[0-9]{9}B[0-9]{2}|(PL)?[0-9]{10}|(PT)?[0-9]{9}|(RO)?[0-9]{2,10}|↵ (SE)?[0-9]{12}|(SI)?[0-9]{8}|(SK)?[0-9]{10})$
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Follow Recipe 3.6 to add this regular expression to your order form.
Discussion
Strip whitespace and punctuation
When typing in VAT numbers, users typically use extra punctuation to separate the digits into groups so that they can be read by humans. When a German customer enters his VAT number DE123456789 as DE 123.456.789, he is using the correct format.
It’s impossible to create a single regular expression that can match VAT numbers from all 27 countries, regardless of the notation used. For simplicity’s sake, the punctuation should be removed first, then the plain VAT number should be validated.
You can use the regular expressions [-.], [-.], and [space] to match any character. The punctuation characters typically used in VAT numbers can be removed by replacing all matches of this regular expression with nothing.
TIP
VAT numbers consist only of letters and digits. Instead of using ‹[-.●]
› to remove only common punctuation, you could use ‹[^A-Z0-9]
› to strip out all invalid characters.
Validate the number
Regular expressions are used to validate numbers. Aside from using free-spacing syntax to help with readability, there is no other difference between the two. It is not possible to use free-spacing in JavaScript without the XRegExp package. You have a selection with the additional flavors.
The regex uses alternation to accommodate the VAT numbers of all 27 EU countries. The essential formats are shown in Table 4-3.
Table 4-3. EU VAT number formats
Country | VAT number format |
---|---|
Austria | U99999999 |
Belgium | 0999999999 |
Bulgaria | 999999999 or 9999999999 |
Cyprus | 99999999L |
Czech Republic | 99999999, 999999999, or 9999999999 |
Germany | 999999999 |
Denmark | 99999999 |
Estonia | 999999999 |
Greece | 999999999 |
Spain | X9999999X |
Finland | 99999999 |
France | XX999999999 |
United Kingdom | 999999999, 999999999999, or XX999 |
Hungary | 99999999 |
Ireland | 9S99999L |
Italy | 99999999999 |
Lithuania | 999999999 or 99999999999 |
Luxembourg | 99999999 |
Latvia | 99999999999 |
Malta | 99999999 |
Netherlands | 999999999B99 |
Poland | 999999999 |
Portugal | 999999999 |
Romania | 99, 999, 9999, 99999, 999999, 9999999, 99999999, 999999999, or 9999999999 |
Sweden | 99999999999 |
Slovenia | 99999999 |
Slovakia | 999999999 |
The VAT number includes the two-letter country code. However, since the billing address already reveals the country, it is frequently left out. The country code can be included or omitted from the VAT number when using the regular expression. Remove all the question marks from the regular expression if you want the country code to be required. The error message that alerts the user that their VAT number is invalid should clarify that you require the country code.
Customers from countries that aren’t listed in your order form’s country selection can skip the checkout process. Remove the | operator that separates an alternative from the next or previous one before deleting it. Your regular expression will have || instead if you don’t. As long as you include a VAT number in your order form, it will be accepted as valid as long as you do not include any other information in the field.
The 27 options are arranged in a row. An anchor for the regular expression to be applied to your string is inserted between a caret and a dollar sign. It is necessary to verify that the entire input is a VAT number.
Replace the anchors with b word boundaries if you’re looking for VAT numbers in a huge body of text.
Variations
In order to check for all 27 countries, you simply need to put one regular expression validation on your order form. There are 27 regular expressions you can use to improve your order form. Before anything else, make sure that the customer’s billing address is correct. Table 4-4 contains the proper regular expressions for each country.
Table 4-4. EU VAT number regular expressions
Country | VAT number regular expression |
---|---|
Austria | ‹^(AT)?U[0-9]{8}$ › |
Belgium | ‹^(BE)?0[0-9]{9}$ › |
Bulgaria | ‹^(BG)?[0-9]{9,10}$ › |
Cyprus | ‹^(CY)?[0-9]{8}L$ › |
Czech Republic | ‹^(CZ)?[0-9]{8,10}$ › |
Germany | ‹^(DE)?[0-9]{9}$ › |
Denmark | ‹^(DK)?[0-9]{8}$ › |
Estonia | ‹^(EE)?[0-9]{9}$ › |
Greece | ‹^(EL|GR)?[0-9]{9}$ › |
Spain | ‹^(ES)?[0-9A-Z][0-9]{7}[0-9A-Z]$ › |
Finland | ‹^(FI)?[0-9]{8}$ › |
France | ‹^(FR)?[0-9A-Z]{2}[0-9]{9}$ › |
United Kingdom | ‹^(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3})$ › |
Hungary | ‹^(HU)?[0-9]{8}$ › |
Ireland | ‹^(IE)?[0-9]S[0-9]{5}L$ › |
Italy | ‹^(IT)?[0-9]{11}$ › |
Lithuania | ‹^(LT)?([0-9]{9}|[0-9]{12})$ › |
Luxembourg | ‹^(LU)?[0-9]{8}$ › |
Latvia | ‹^(LV)?[0-9]{11}$ › |
Malta | ‹^(MT)?[0-9]{8}$ › |
Netherlands | ‹^(NL)?[0-9]{9}B[0-9]{2}$ › |
Poland | ‹^(PL)?[0-9]{10}$ › |
Portugal | ‹^(PT)?[0-9]{9}$ › |
Romania | ‹^(RO)?[0-9]{2,10}$ › |
Sweden | ‹^(SE)?[0-9]{12}$ › |
Slovenia | ‹^(SI)?[0-9]{8}$ › |
Slovakia | ‹^(SK)?[0-9]{10}$ › |
Make use of Recipe 3.6 to check that the VAT number is valid against the regular phrase. This will let you know if the number is legitimate in the country the consumer claims to reside in…
You may compel the VAT number to start with the correct country code without asking the customer by using separate regular expressions. You should examine the first capturing group’s contents if the regular expression meets the given number. This is explained in Recipe 3.9. This indicates that the buyer did not begin their VAT number with a country code. Before saving the number in your order database, you can add the country code.
Two country codes are permitted for Greek VAT numbers. GR is the ISO country code for Greece, but EL has long been the standard for Greek VAT numbers.
See Also
Only a valid VAT number may be determined by using a regular expression. In order to pick out honest mistakes, this is sufficient. Using a regular expression to check if a company has a VAT number is obviously ineffective. To find out if a given VAT number belongs to a specific business, the European Union provides a website at http://ec.europaeu/taxation customs/vies/vieshome.do.
Chapter 2 explains the regular expressions utilized in this recipe. Classes for characters are described in detail in Recipe 2.3. Anchors are defined in Recipe 2.5. The concept of alternation is explained in detail in Recipe 2.8. Grouping is explained in detail in Recipe 2.9. Repeated actions are described in detail in Recipe 2.12.