Great Trick: Extracting Domains from Email Addresses in Excel and SQL
By using functions in Excel and SQL that return the text from specific locations within a string combined with ones that can isolate the location of the “@” character in every email address, you can easily extract domain names from lists of email addresses.
The base function for this is RIGHT. RIGHT gets passed two arguments, text, which is the text being parsed, and num_chars, which is the number of characters returned by the function. RIGHT takes the form in Excel of RIGHT(text,[num_chars]).
The text argument is obvious; it is the text of the email address to be parsed.
The num_chars argument is determined using a combination of two other functions, LEN and FIND. We use LEN to determine the length of the overall email address and subtract the position of the @ operator, determined using FIND. The resulting differencewill return the length of the domain portion of the email address.
LEN(text) returns the number of characters in the string.
FIND(find_text,within_text,start_num) returns the postion of the find_text within the within_text. start_num, which we won’t use here, is a way to start selecting text after a certain number of characters.
To put this all together, let’s put my email address in cell A1, place our derived function into the B1, and derive the result.
=RIGHT(A1,((LEN(A1)-FIND(“@”,A1)))) calculates to:
SQL is a little bit trickier, as there isn’t a right-to-left text selection function, instead we are going to use a left-to-right function, SUBSTRING.
SUBSTRING returns a string of text based on definition passed to the function. SUBSTRING takes three arguments, value_expression, which is the text being parsed, start_expression, which is the starting character of the returned string, and length_expression, which is the number of characters returned, starting with the start_expression. This function in SQL looks like this: SUBSTRING(value_expression,start_expression,length_expression).
In this case our email address is the value_expression. The character following the “@” symbol is the start_expression, with the length of the remaining string being the length_expression.
To determine start_expression, we deploy another SQL function, CHARINDEX, which works exactly like FIND in Excel. Using SQL’s version of LEN and the same math, we can determine length_expression.
Putting it all together, let’s assume a table named email_addresses with a column named email:
SELECT SUBSTRING(email, CHARINDEX(‘@’, email) + 1, LEN(email) – CHARINDEX(‘@’, email) + 1) AS domain_name
WHERE email like ‘%@%’
The WHERE clause is in there to prevent malformed email addresses from crashing the CHARINDEX function.
The math works the same way as the Excel, except that you have to remember that we are working from the left, so the need arises to add one to the character counter in the CHARINDEX function.
Simple and powerful. I hope this is useful to someone – the inspiration for this post came from this post at Chandoo.org. Pointy Haird Dilbert is easily my favorite as well as one of the most useful and entertaining Excel blogs out there.