Revisión de 20:51 5 nov 2011

Contenido

Luis Miguel Morillas <lmorillas at xml3k.org>

identi.ca: lmorillas

import urllib2
 
URL = 'http://www.libresoftwareworldconference.com/'
source = urllib2.urlopen(URL).read()

...

Scraping the web with amara

@@ Línea 24: / Línea 24: @@
 <div class="slide">
-===  ===
+=== Búsqueda "bruta"  ===
+<source lang="python">
+import urllib2
+URL = 'http://www.libresoftwareworldconference.com/'
+source = urllib2.urlopen(URL).read()
+</source>
+* Proceso del texto
+* Expresiones regulares
+</div>
+<div class="slide">
+=== Librerías en Python ===
+* Beautiful Soup
+* mechanize
+* lxml
+* html5lib
+* scrapemark
+* pyquery
+* scrapy
+...
 </div>
 <div class="slide">
-===  ===
+=== amara ===
 </div>