In part 6 we improved the useability of our Scraper. In this post we’ll continue in this vein by adding some basic error handling and wrapping up a few loose ends.
I’d encourage you to follow along, but if you want to skip directly to the code, it’s available on GitHub at: https://github.com/riebeekn/elixir-twitter-scraper.
If you followed along with part 6 just continue on with the code you created in part 6. If not and you’d rather jump right into part 7, you can use a clone of part 6 as a starting point.
Clone the Repo
If grabbing the code from GitHub instead of continuing along from part 6, the first step is to clone the repo.
OK, you’ve either gotten the code from GitHub or are using the existing code you created in Part 6, let’s get to it!
Guarding against bad values in the start_after_tweet parameter
Our public API expects two parameters; a
handle along with an optional
start_after_tweet parameter, i.e.
We already manage an invalid handle by returning a
404 message from within
What about invalid
start_after_tweet values however? Let’s try a few and see what happens.
Here we’re passing in a tweet
id that does not exist, and the result is:
So this looks OK, we simply end up with an empty Feed structure, which is likely a resonable result.
How about a non-numeric value:
Again we end up with an empty Feed structure, seems resonable.
Finally, how about a negative value?
Very nasty, we end up with a big dump of error text… I think we can do better than that, let’s add a quick test and improve on what we’re seeing.
The first step is to add a new
From our test we can see that we’re expecting an
:error atom and an appropriate message when a negative value is passed in.
The test will of course currently fail.
Let’s update our implementation to get the test to pass.
We’ve added a third
scrape definition to specifically handle the case where
start_after_tweet has a value of less than 0. Pretty simple, and our test is now passing!
As expected, we now have a friendlier response in
iex as well.
An issue with 302 redirects
A strange issue I ran into about half-way through this series of posts was that calls to Twitter from
HTTPoison started to result in
302 redirects. I think something temporarily changed with Twitter as requests from the Scraper were being redirected to the mobile version of Twitter.
As can be seen above, we’re getting a 302
status_code value (i.e. a redirect) which we’re not currently matching on in our
Scraper case statement. The
location part of the response indicates the redirect is to the mobile version of Twitter.
It took me awhile to figure out what was going on… and the issue cleared itself up on it’s own after a few days. However, we can guard against this being a problem in the future. We can specify some headers in
http_client.ex to indicate we are not on a mobile device, this should prevent any future problems. In case it doesn’t we should also add a new status code clause in
scraper.ex to check for a
Preventing 302 redirects to the mobile version of Twitter
Let’s start out by updating
We’ve added a new private function
get_headers which we pass into
HTTPoison as a second parameter. The second parameter to
HTTPoison.get is an optional header value which we were not previously populating. With the
user-agent string we’re now passing in, this should indicate to Twitter that we are not on a mobile device and thus we should not be redirected.
If you want to see the
302 redirect for yourself, you can replace the user-agent string with
Mozilla/5.0 (Linux; <Android Version>; <Build Tag etc.>) AppleWebKit/<WebKit Rev> (KHTML, like Gecko) Chrome/<Chrome Rev> Mobile Safari/<WebKit Rev>. If you recompile and run the Scraper with this string set as the header you’ll see the 302 redirect response being returned.
Handling 302 redirects
user-agent string set we don’t expect to run into any future
302 redirects but let’s update
scraper.ex anyway just in case.
We’ve added a new private function to format a
302 response (
defp return_302) along with case statements which will handle a status code of
302 from calls to
With this in place if we hit a
302 we’ll see something like:
So this pretty much wraps up our Twitter scraper! Just a few final touches…
Let’s have a quick look at our code coverage.
Not too bad, although with code coverage there always is room for improvement I suppose!
We should also update our documentation, speaking of which we’ve forgotten to add documentation to our public API! So let’s do that.
We’ve kept the code as is, but added some fairly detailed comments. This seems appropriate considering this is our public facing API that we expect people to interact with.
Let’s generate the docs!
Opening the docs we see we’ve got some pretty decent information available for people who may want to use our Scraper.
And with that we are done with our Scraper!
Thanks for reading and I hope you enjoyed this series of posts. In the future I may throw up a few quick posts demonstrating how the
Scraper application could be used by other Elixir applications, but for now we’re done and dusted!