Getting the length of a webpage with haskell

Posted on December 16, 2013

I was wanting to see how many words were on a webpage today, so I opened up a python interpreter and typed in the following code:

import urllib
webpage_content = urllib.urlopen("http://www.reddit.com").read()
words = webpage_content.split()
print len(words)

Very succinct code that does what I need it to. I’ve been using haskell lately though and wondered what it’s solution would look like. First I wrote a very imperative version:

import Network.HTTP.Conduit
import qualified Data.ByteString.Lazy.Char8 as BCL

main = do
  putStrLn "Enter a url: "
  url <- getLine
  pageText <- simpleHttp url
  let pageTextCount = length . words . BCL.unpack $ pageText
  print $ "There are " ++ show pageTextCount ++ " words!"

Play with the code in an online IDE (free signup required)

This isn’t nearly as short as the python version, but its still pretty understandable to most. Having to unpack the lazy bytestring might throw a few off, but everything else is straightforward. After writing this, I realized I could use bind and functional composition to shorten things up even more than python!

import Network.HTTP.Conduit
import qualified Data.ByteString.Lazy.Char8 as BCL

main = do
    putStrLn "Enter a url: "
    getLine >>= simpleHttp >>= print . length . BCL.words

Play with the code in an online IDE (free signup required)

After that I wondered if I could use the (official?) Network.HTTP package easier, and made a very similar version with it.

import Network.HTTP

main = do
    putStrLn "Enter a url: "
    getLine >>= simpleHTTP . getRequest >>= getResponseBody >>= print . length . words

Play with the code in an online IDE (free signup required)

Hope this helps :)