Mutable Ideas
sampaversusbuenos:

#são paulo #buenos aires #brigadeiro #alfajor #illustration #sampaversusbuenos #brasil #argentina

Brigadeiro!!! Sempre!!!!

sampaversusbuenos:

#são paulo #buenos aires #brigadeiro #alfajor #illustration #sampaversusbuenos #brasil #argentina

Brigadeiro!!! Sempre!!!!

CURL: How to bypass Proxy / Load Balancer for testing

A properly way to connect to a backend webserver is passing the host on CURL, so the end server can handle correctly the request. This is useful when you need to test webservers behind a load balancer/proxy.

curl -v -H ‘Host: www.arjones.net’ ‘http://1.2.3.4:8080/’

mobocracy:

Most companies that operate at a large enough scale run into the problem of efficiently generating ID’s. At tumblr this problem was solved with a simple libevent based HTTP service that essentially generates ID’s fast enough to meet our current demand and handles failure/startup by grabbing the…

There are two ways to build a fast machine learning algorithm: Start with a slow algorithm and speed it up, or develop an intrinsically quick learning algorithm from the ground up. Yahoo! Research…

Running IndexTank

A few hours ago Diego Basch open-sourced IndexTank, a  real-time fulltext search-and-indexing system.

As soon I’ve been able I downloaded de code and start to play with.

The big challenge here is the little documentation available, I’m pretty sure it will be available as more people gets their hands into the code and start playing around, discussing, etc

Following instructions from the README, I’ve been able to compile and run the engine part, in order to make it simple to everyone to test I created a start script that must be placed on the indextank-engine/bin and the JSON configuration on indextank-engine/conf folder:

I didn’t have enough time to try to run the API part, but it have less documentation and once I don’t know much about Python I haven’t be able to run it.

I look forward to go on with my tests because the whole concept and idea of IndexTank seems really interesting and knowing the guys who are behind this project I’ve no doubt it is great.

Congrats again to all team for this milestone.

instagram-engineering:

With more than 25 photos & 90 likes every second, we store a lot of data here at Instagram. To make sure all of our important data fits into memory and is available quickly for our users, we’ve begun to shard our data—in other words, place the data in many smaller buckets, each holding a part of…

Common Crawl is an attempt to create an open and accessible crawl of the web. This document describes the steps required to access the latest Common Crawl corpus.

This talk on using Hadoop and Solr together for a NoSQL-like result was given by Ken Krugler, a friend of DZone who wrote the amazingly popular article, Solr + Hadoop = Big Data Love. The talk was…

The subject came to me while reading Stuart Halloway book, programming Clojure. Nice book, full of interesting exercises. One of the exercises aims to picture some very amusing use of STM (aka…

Guava is an open source library containing many classes for Java and written by Google. It’s a potentially useful source of miscellaneous utility functions and classes that I’m sure many developers…

Fork me on GitHub