Regexp | Half’s SEO Notebook

Archive for the 'Regexp' Category

Scraping 101: Extracting Anchor Text with Regexp • Friday, February 8th, 2008

There are many ways to skin a cat, but when it comes to scraping websites, I like parsing content with regexp. One of the biggest problems I bumped into when parsing HTML is matching opening and closing tags.
For example:
(<a [^>]+>)(.*)</a>
Ok let’s try that in English:

(<a [^>]+>) matches <a href=”….”.>.
(.*) *should* match anchor text (I’ll […]

Posted in Coding, Regexp, Scraping | 6 Comments »

ICQ: Halfdeck
Skype: Halfdeck
Phone: 203-691-0920

Blogroll
- Mob Logic
SEO Tools
- gfe-eh.google.com
- Google Related Keywords Tool
- Google Translator
- HTML Validator
- HTTP Header Checker
- IWeb Backlink Checker
- More Backlinks
- Multi Domain PR Checker
- PageRankBot
- Text Analyzer
- URLTrends
- Wayback Machine
- Whois
- Whois.sc
- Word Count Tool
- Yahoo Site Explorer

About This Page

You are currently browsing the archives for the Regexp category.

Half’s SEO Notebook

Archive for the 'Regexp' Category

Blogroll

SEO Tools

About This Page

Pages

Categories

Chicklets

Admin Access