A Curious Animal

Born to be curious, born to be animal!

Resilient PHP applications with Phystrix

06 July, 2016
- 6 min read

TL;DR: This post talks about how to make our PHP applications more resilient, that is, they must be able to adapt to third party failures and recover quickly from them. You will see why configuring timeouts is not enough to protect your system and what better tools exists. We will explain briefly Hystrix concepts and introduce an alternative for PHP called Phystrix. No matter how well designed be your system, no matter the number, kind or quality of your tests. There is only once thing for sure: your system will fail. Although we can well design, implement and test our systems we have dependencies with third party services: databases, queues, call REST API on another systems, ... In the decade of microservices where dependency/communication among systems becomes a main pilar we need mechanisms to make our systems resistant among third party failures. We need to resilience, that is, recover quickly from disasters. The problem with dependencies In the best case you can be sure your system will work fine but you can make the same assumption for the third party systems you communicate with (another microservices, databases, queues, ...). So the problems arise when some third party services starts failing. This can be because of: The system is near collapse, your requests arrives but they take too much time to be replied. This means each request that arrives to our system is hold until the third party service responds. Because the third party system works slowly we start holding too much connections which could collapse our system too. In addition, although the third party system is near collapse we continue sending requests, so we are not helping to mitigate the problem, we are increasing it. Collapse of a third party service can collapse our system. The system is down. Many drivers tries to establish a connection during a given period of time before raising an exception. Why to wait for a connection if we know the system is down? The classic (and not the right) way to solve The classic way and, a good practice, to mitigate this problem is to configure the timeouts for each library and driver you use to communicate to every third party service. As example, Guzzle allows to set a when instatiating a : Or the SncRedisBundle for Symfony allows to specify the timeout both for connection and read/write actions: As I said, setting timeouts is a good practice, but it not solves the problem. If the third party is down and we have a timeout of 2 seconds we guaranty our systems will finish the query in two seconds but, why send request and spend 2 seconds while we know the system is down? A robust solution As we talk at the beginning, the goal is to make our system resilient, that is, it must be able to adapt to third party failures and recover quickly from them. This means: Do not call a third party if we know it is down or takes too much time to respond. Fail quickly. If our system receives a request and we can't complete due lack of support from a third party service, we can fail quickly. The client, although it will receive a bad response, will receive a response quickly and our system should be able to attend a new request. You can see how this will avoid a cascade failure of our system. Hopefully, some years ago the Netflix engineers worked in a solution called Hystrix which is so good that has become one of the most famous tools when working with microservices in Java. Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable. Hystrix implements the circuit breaker pattern. Following the electrical circuits analogy, each dependency of our system to a third party service can be seen as a cable with an interruptor. While the communications with the third party works fine, the interruptor is closed (allowing flow the electricity), but when a problem is detected in the third party system (i.e due a timeout) the interruptor is opened making our system impossible to communicate with the third party, which makes our system fail quickly when a dependency fails. A robust solution for PHP After looking some libraries I finally decide to use Phystrix, a Hystrix port to PHP made by oDesk (now Upwork). Following Hystrix design, Phyxtrix requires all the actions we want to make against a third party service (querying a database, inserting into a queue, etc) must be encapsulated as a command (see the command pattern). The library offers and abstract command class from we can inherit when creating our commands. It is up to us to implement the method responsible to make the real action (the call to a service, the insert into database, etc) and optionally the method that returns a result if the third party is not available. The good part is each time a command is executed, the base command class collects metrics (like the number of failed call to a third party) and is responsible to manage the circuit state (open or close) depending on the current metric values. Note Phystrix stores all the metric in APC, so you should have this extension enabled. Additionally, the people from oDesk implemented two more tools: phystrix-bundle: A bundle for Symfony2 that can help us integrate Phystrix into our Symfony projects. phystrix-dashboard: Some classes that helps creating an endpoint in our application to serve the metrics stores by Phystrix. This endpoint can be added to the hystrix-dashboard tool to monitor our metrics in almost real time. hystrix-dashboard Conclusions Our systems has dependencies to other systems we can't control. Setting timeouts in our connections is a good practice but it is not enough. Hystrix is a tool that implements the circuit breaker pattern, which can help our systems to be more resilient. Phystrix is a Hystrix port for PHP. It is easy to integrate in our PHP applications and protect against latency, fault tolerance and cascade failures when working with third party systems.

Symfony, images and S3

25 March, 2016
- 6 min read

Paraphrasing the movie title Sex, lies and videotape, this post is related on how I configured my symfony project to work with images (including thumbnail generation) and store all them on Amazon S3 service. There are are libraries and bundles to work with images and also others to work with S3, but the combination of all them can be a tricky experience with not much documentation out there. In addition, there is also one more challenge to achieve and, that is, while developing I want to store images locally (in a directory) but when deployed I want to use S3 storage. Currently I'm working on a project where users can upload photos. The application allows users to create collections, drag and drop images, order them and finally upload to the server. In addition, the user can explore his/her collections, browse images and download collections in a single ZIP file. All this having in mind: While developing locally we want to store images in the local server folder. In staging or production environment we want to use S3. Original images must remain private, while the thumbnails must be public to be included in the application web pages. We need to generate thumbnails with different sizes. When a user access to a collection a list of small thumbnails are shown. When user clicks an image a medium thumbnail is presented. When user downloads a collection the ZIP file must include the original images. So, in summary, we need to deal with image upload, thumbnail generation and S3 service. Uploading images For image uploading we have used the VichUploaderBundle. Uploading images isn't a secret but can involve some extra tasks. The VichUploaderBundle helps working with file uploads that are attached to ORM entities, that is, it is responsible to move the images to some configured place and attach it to your entity. In addition, we want to store images in folders classified by user and collection, something like , where and are identifiers. A nice feature VichUploaderBundle offers is the possibility to attach the so called directory namer or file namer that determines the final name for the upload file. This way when a file is uploaded, given the current user and the selected collection, we determine dynamically the target folder where the bundle must store the image. Note the ORM entity only has the image file name. The path to the file is computed through the specified directory and/or file namers. For this reason, the bundle also includes the methods require to get the absolute path to a file given the file name stored within the entity. Generating thumbnails To generate thumbnails we have used the LiipImagineBundle bundle. With it, when you reference an image within your templates you don't get the original image but a new one obtained applying some filters. Next line shows how to include an image in your twig template obtained after applying a configuration: The good thing is LiipImagineBundle generates the thumbnails when images are first time accessed and stores them in a cache folder for next calls. Abstracting the file system The issue is we want to upload images and generate thumbnails into a local folder at development time and to S3 when running in staging or production. Hopefully for us there is the Gaufrette bundle, which is an abstract filesystem. It offers a common interface to read/write to different filesystem and a bunch of implementations to work against the local filesystem, an FTP server, Amazon S3 service, ... and many more. Putting it all together Arrived here, the main question is how to configure the three bundles to work together, in summary: We want to abstract the filesystem to work locally while developing and with S3 in production. We need to upload images. We need to generate thumbnails for uploaded images and store them in a cache folder to be later server. We have divided the configuration between the file and the . The first contains the configuration for the previous three bundles ready to work locally. The second overrides some propertires to work in production, using S3. The first point is to configure the Gaufrette bundle to abstract our filesystem. Next is the configuration in the : Compare with parameters we override in the . Note for production you need to define an AWS-S3 service which I do not include here. We define a filesystem which by default uses a adapter and in production uses an one. Next step is to configure the VichUploaderBundle bundle. Hopefully it is designed to integrate with Gaufrette so it is easy to specify how to upload files through gaufrette. Next is the configuration: As you can see we are specifying we want to use gaufrette with and the upload destination is the previous defined gaufrette filesystem . This means all images will be uploaded through the Gaufrette filesystem to that destination. Note, within the target filesystem, the final folder and file name are determined by a custom directory namer we have implemented ( which adds the user ID to the path) and the file namer offered by Gaufrette, which assigns a unique name to each file. Finally, we need to configure the LiipImagineBundle bundle. Next is the configuration used for local development. We need to specify the cache folder where to generate the thumbnails in adition to our filter, that will generate with size and half quality: Main properties to configure are the and the . The first one uses the stream that uses gaufrette filesystem. The second uses the resolver that we have configured to use the local folder . For production, configuration changes slightly. Here we override the resolver to generate cache files through the resolver which points to S3 bucket: Conclusions VichUploaderBundle, LiipImagineBundle and Gaufrette are three great Symfony2 bundles. The configuration to make work all them can by tricky so hope this post can help others. While VichUploaderBundle is designed to work with Gaufrette, and its configuration is almost trivial, LiipImagineBundle is not and requires some extra tasks. For LiipImagineBundle we need to configure its main components, which are the cache and the .

Closing 2015

31 December, 2015
- 2 min read

2015 is finishing. Its 18:15h while writing these lines. Looking back to this year lot of feelings mixes in my mind, some of them great and some sad. But live continues and we can only go forward. At professional level the most important fact was leaving Servei Meteorològic de Catalunya (meteo.cat) after nine years. I leave great workmates. We made some incredible things and, in the past year we work in some nice projects (a new website, a mobile ready web and an API to access weather data). Although probably the thing what I'm most proud with, as a team leader, was the fact to help creating a positive and constructive environment, always looking to learn and improve. Anyway, decisions are not easy to take and never likes to everybody. Well, I'm human like anyone, I can be wrong, but do not regret from anything I was done. You have enemies? Good. That means you’ve stood up for something, sometime in your life. - Winston Churchill A man with no enemies is a man with no character. - Paul Newman This year also start working at CartoDB, a young and really motivated company. I have been the last couple of months diving into code, learning and helping. I can't say much more, but the fact we are working on something really awesome for this new year :) At personal level this was a really hard year. The death of my father seems something a can't overcome. After a suddenly and aggressive cancer he passes away on April. The few months he lived after cancer was detected were a real hell for the family, and specially for mother. He always tried to have mental strength but seeing yourself deteriorating at giant steps is hard enough to make you fall and seeing someone you love dying a bit more every day, while trying to demonstrate strength, is something that is going accumulating in your mind. Simply writing these lines tears comes to my eyes. He teach me to have an open mind. He was the one that shows me what principles mean. I always thought if my father hadn't been my father we had been great friends. He always was proud about. See you some day friend ;) Live continues and we must continue going forward.

Reading very big JSON files in stream mode with GSON

23 October, 2015
- 5 min read

JSON is everywhere, it is the new fashion file format (see you XML). JSON format is mainly used on REST APIs because it is easy to read by JavaScript (JSON means JavaScript Object Notation) allowing to develop client side application. In Java and, similarly to the old days of XML, we have different ways to read JSON files: object model or streaming way. No one is better than other, they are simply designed for different needs and situations. The object model way works loading the whole file in memory and translating each JavaScript object to a Java object containing all the properties, array objects, etc. It is similar to the DOM way to read XML files. On the other hand, the stream way reads the file byte by byte and allows to apply actions when an object starts, an array ends, etc. It is similar to SAX way to read XML files. There is a third way to read a JSON file, similarly to StAX with XML, that combines the streaming way to read with the object model. For example, we can read a file in stream mode and when found an object start then read the whole object into memory. Which method is better depends on your needs. If your JSON is small the object model is great because you can load all the data and work as a normal Java objects, searching within an array, iterating, etc. When the file is really big you'll probably don't want to load it all into memory, so the streaming model is the best choice. The scene We are working on an API implemented in Java and one of the operations requires to open a big JSON file (about 10Mb) an returns an object identified by a given string. The file in question is formed by an array of objects, tons of object, and it has no sense to read the whole file and create tons of Java objects into memory only to return one. So, this is a good scenario to read the JSON file in stream mode. The code Before to continue it is worth to mention the code you will see here is available at java-json-examples repository. I have generated a 2.5MB JSON file using the mockjson tool. The data file is formed by an array ob person objects. Each person has an , , a status and a list of and : {% highlight json %} [ { "id" : 0, "married" : true, "name" : "George Moore", "sons" : null, "daughters" : { "age" : 25, "name" : "Elizabeth" }, { "age" : 28, "name" : "Nancy" }, { "age" : 9, "name" : "Sandra" } }, ... ] {% endhighlight %} GSON We are using GSON to read JSON files in Java. Take a look to the wikipedia link so you can see some of its main features. The other one great library to work with JSON data is Jackson. Both have similar features and performance. Our model class When GSON reads JSON data it allows to unmarshal it to a given Java class. For this purpose we have created the class which will store all the previous information (for the sample we are only stoting the person and ): {% highlight java %} public class Person { private int id; private String name; } {% endhighlight %} Reading using object model Next code shows how to read the JSON file using the object model. Remember, we are loading the whole file in memory converting data to java class. {% highlight java %} public static void readDom() { BufferedReader reader = null; try { reader = new BufferedReader(new FileReader(file)); Gson gson = new GsonBuilder().create(); Person[] people = gson.fromJson(reader, Person[].class); } {% endhighlight %} The magic occurs within the line: . We are telling GSON to read the file and return an array of objects. From here, if we want to return a given person we need to find it within the array. Let's take a look how we can do the same using the stream mode. Reading using the stream mode What we really are going to do here is to use a mixed mode between stream and object model. We are going to read file in stream mode and each time we found the start of a person object we are going to unmarshal it using the object model and we will repeat the process until we found the desired object. {% highlight java %} public static void readStream() { try { JsonReader reader = new JsonReader(new InputStreamReader(stream, "UTF-8")); Gson gson = new GsonBuilder().create(); } {% endhighlight %} With this approach we are loading all the objects to memory but one by one and not the whole file at once. Conclusion We have seen that JSON files, like XML, can be read with different techniques. Object model, stream mode or a mix of both. It is up to you to chose one depending on your needs. Object model allows to read a whole file and offers greater abstraction, because you can unmarshal data to your custom Java object model. Stream mode allows to read big files consuming fewer memory. As a cons, it implies a more low level reading but can be improved mixing both models (like in the example).

Webpack, not another task runner tool

15 October, 2015
- 2 min read

It seems we are living an gold era about JavaScript and front end world with a myriad of frameworks, language improvements and, what is related to this article, build systems, tasks runner or whatever you want to call them. We have adepts to Grunt, firm believers of Gulp or purists preferring the use of old fashion npm scripts way. Well, I'm sorry for all of you but there is a new kid on the block and it is (IMO) a really strong competitor. It's name is webpack and it is a module bundler. OMG !!! A module what? What is webpack ? Webpack is a module bundler. It has nothing to do with a tasks runner, although in many cases can substitute the need of gulp or grunt. Webpack understands about modules and its dependencies (among JavaScript files, CSS or whatever) and generates assets. Probably you don't understand the importance of the last sentence, so I repeat it again: Remember webpack understands about modules and its dependencies and is good to generate assets from it. That is probably the main impact I see when developing with webpack. To get all its potential you need to change your mind from programming a set of JavaScript files, that finally and concatenated and minimized, to a set of modules, that exports variables and has dependencies among them. A brief presentation Here I present a short slideshow I prepared to introduce webpack to my team. Any feedback will be appreciated. You can also view at slid.es The two samples present in the slideshow are available at github at the webpack-presentation repository.