Wednesday, July 3, 2013

CPLEX Remote Object

A very simple way to use CPLEX in an application is to create a CPLEX object, load a model from a file, and solve the problem. This could look like the following (in C++, no error handling)

    
#include <ilcplex/ilocplex.h>

void load_solve(const std::string& modelName) {
    IloEnv   env;
    IloCplex cplex(env);
    IloModel model(env);

    cplex.importModel(model, modelName.c_str());
    cplex.extract(model);
    cplex.solve();

    env.out() << "Solution value  = " 
              << cplex.getObjValue() 
              << std::endl;
    env.end();
}

int main() {
   load_solve("noswot.mps");
   return 0;
}

When that is all you need, you're good. But some challenging problems demand specific algorithms that CPLEX doesn't provide (yet!) and/or would benefit from the power of multiple machines... An example could be that you need to use a Benders decomposition for your very large problem, or you came up with a problem-specific algorithm that uses big LPs or MIPs as sub-problems. CPLEX includes APIs in multiple languages (currently : C, C++, Java, Python, C# and VB), and using any of these solves the first part of the issue. But what about the 'using multiple computers' part? You probably don't want to invest much time into writing a framework to enable distributed computing. So we did it for you!

CPLEX Optimizers 12.5.1, the Mathematical Programming engine in the latest version of IBM CPLEX Optimization Studio, features what we called the CPLEX Remote Object. With only additional parameters given to the first CPLEX call, you turn this CPLEX object into a 'remote' object that does its computations on another machine.

This feature was introduced in version 12.5 for the C API, and we just added the C++ and Java APIs. Let's see how to use this with the example above:

    
#include <ilcplex/ilocplex.h>

void load_solve(const std::string& modelName) {
    IloEnv   env;
    const char* remote = "-address=the_server:12345";
    IloCplex cplex(env, "tcpiptransport", 1, &remote);
    IloModel model(env);

    cplex.importModel(model, modelName.c_str());
    cplex.extract(model);
    cplex.solve();

    env.out() << "Solution value  = " 
              << cplex.getObjValue() 
              << std::endl;
    env.end();
}

int main() {
   load_solve("noswot.mps");
   return 0;
}

Note that the only difference with a purely 'local' computation is in the creation of the CPLEX object. In the case above, your program will try to connect to a distant machine and run CPLEX there. The data will still be read from the same (local) file, but they will be serialized and sent to the remote object. The latter will execute the 'solve' call, and when instructed to with the last call, will return you the objective value of the optimal solution.

This needs some preparation work on the distant machine, of course, and this depends on the protocol you want to use:

- TCP/IP : fire up the CPLEX interactive with some options to listen on a specific port;

- Process : the distant computer must accept SSH sessions from which the CPLEX interactive is in the path;

- MPI : in that case, both machines (local and distant) must belong to the MPI cluster.

In addition to offloading your computations to a distant machine, the CPLEX Remote Object allows you to create fully distributed algorithms, where a 'master' connects to several 'workers' and gives different computations to each. The CPLEX distribution includes two such examples: a Benders decomposition, and a distributed concurrent MIP solver... You will find more information about the Remote Object in Roland Wunderling's presentation. And you can browse the online documentation on the topic.

By the way, this new 12.5.1 version has a number of features, including a 43% performance improvement in the time to solve difficult MIP problems... See Jean-François Puget's blog post for more on this topic.

Thursday, May 16, 2013

A bit about CPLEX…

I’ve been serving the scientists and software engineers in the CPLEX Optimizer team as their manager for almost two years, now. I consider myself very lucky: this is an extremely dedicated and talented team, working on a great piece of software!

Let me explain what CPLEX is about…

Suppose you have decisions to take and there are many possibilities. It could be about choosing a location for a warehouses and which customers each will serve, deciding when to produce which item, allocating crews to trains or planes, etc. You don’t have the luxury of infinite resources, and you have some constraints to satisfy: all customers must be served, not all machines can produce any item, rest periods must be taken into consideration, etc. And of course all the solutions are not equivalent, and you needs the best possible solution according to some objective function that may refer to costs, revenue, idle times, etc. As you can see, the types of problems that can be modeled in this framework is very large. And, indeed, all industries and sectors use these technologies to improve their efficiency...

The issue is that for anything but toy problems, there are so many possible solutions that you can’t test them all to decide which is the best. Consider for example the problem of ordering a set of 30 tasks. There are so many possible solutions that you would need in the order of 100.000 years to test them all using all CPUs on earth!

Fortunately, there are sophisticated programs that do just that: find the optimal solution for your problem as quickly as possible. IBM has one such product, named ILOG CPLEX Optimization Studio. It features a modeling language (OPL) that allows you to express the problem to solve in an easy way, an IDE to write and run your models, several connectors to access your data (from Excel, DB2 or most other databases) and two computation engines to find the solutions: CPO (Constraint Programming Optimizer) and CPLEX, each dedicated to a particular class of models.

The algorithms included in CPLEX are targeted at solving Mathematical Programming models. They range from Linear Programming (the objective is a linear combination of the variables, and each constraint is an equality or inequality) to Mixed Integer Linear Programming (some of the variables must take integer values – makes the problem much harder to solve), Quadratic Programming (the objective may contain products of two variables) and Quadratically Constrained Programming (the constraints can include quadratic terms). You will find details about these e.g. on Wikipedia.

Most of the team’s work is to improve the performance of these algorithms, or add new ones. Consider for example that CPLEX 12.5, the latest version as of this writing, solves the most difficult MIP problems in our test set more than 190 times faster than version 6.0 (1998) on the same hardware! And the runs are deterministic: on a given platform, for the same data, the program will always run the same way and return the same solution, even if you use a heavily loaded multi-core machine…

As I don't have a technical role anymore, I don't often have much to post about on the 'how' we are doing things. So stay tuned for more about the 'what'...

Sunday, November 25, 2012

Book review: Continuous Delivery

Highly recommended.

Continuous Delivery, by Jez Humble and David Farley, is subtitled : `Reliable Software Releases through Build, Test and Deployment Automation'. And indeed, all of this can and should be automated!

The authors describe everything you should know about why and how to automate software releases, from the commit to the deployment on live production servers. I already knew about Continuous Integration when I started reading this book, so I didn't learn much about this topic. But if you are not already familiar with this concept, just stop now, get the book, and read the first three chapters! That's the absolute minimum you should do! And that's enough to highly recommend it.

The book is geared toward building applications that get released on the company's servers, rather than shipped to customers. This is very different from the project I'm currently working on, for which we essentially deliver libraries that customers will include in their own applications. But most of the content of the book is still completely relevant: it doesn't really matter whether you deploy or ship: you should still release often!

Chapter 11 is entirely devoted to 'Managing Infrastructure and Environments'... Did you know that you can store your test servers' configuration under source control? That you can turn bare metal into a running server configured for your application completely automatically, and thus reliably? That is something I want to implement!

Chapter 14 deals with 'Advanced Version Control'. As the authors put it, "poor version control practices are one of the most common barriers to fast, low-risk releases". VCS technologies are discussed and compared (centralized, distributed, stream-based), as well as branching patterns (develop on mainline, branch for release, branch by team, and branch by feature). Very important, and very interesting.

This book is definitely a must-read for anyone responsible for delivering software!...

Thursday, September 13, 2012

Suggestions for Jenkins on multi-platform projects

Our team uses Jenkins as our Continuous Integration tool. I would like in this post to describe our usage, and suggest a few ideas that could improve this great tool. But first let me explain what we are building...

What we build

The product that we are building is a Mathematical Programming engine. The most basic usage is that you feed it with the mathematical formulation of the business problem that you want to solve, and ask for the best possible solution to this problem. The program then cranks up all the CPUs/cores it can find on your machine and returns with an answer after a few tens of a second, or a few hours (some problems are REALLY complicated).

The core of the engine is a library built from 500.000 lines of C code. In addition to this library, we have APIs in half a dozen other languages (C++, Java, Python, etc), and connectors for several third party applications. No less than 14 platforms (including Windows, Linux on various CPU types, MacOs, AIX, HP-UX, etc.) are supported.

We are therefore very glad to use continuous builds: you really don't want to discover a possible compiler bug, or a non-determinism in the code on some exotic platform just before the release!

Our current setup

On our master branch (the one that receives most of the commits from the developers) we use two job families.

The first one builds the software in Debug mode and runs a fully comprehensive suite of tests. We have around 20 such jobs for all the platform/compiler/settings combinations we support. Run times vary widely: some of the jobs are done in 1h30, while others need almost 7 hours.

The second family (the 'distrib' jobs) builds the Release versions of the product. There is one job per platform. Each job builds all the components for this platform (e.g. on Windows32, we support both Visual Studio 2008 and 2010), packages them into some releasable form (could be a Zip, a TarZ or an installer) and tests the basic functionality of the software (e.g. the distributed samples). For those jobs, the run times vary even more: from 30 minutes to 10 hours, depending on the platform.

This setup has been in place for some time now. It works, and it's extremely useful!

Whishes

Although Jenkins is a great tool, it doesn't yet have all the features I'd like. So here are a few ideas, just in case the developers would not have already enough...

Detect stale jobs

We sometimes have jobs that stop running (no new run is triggered, or no available nodes). This is of course not intended, and it would be nice to be able to detect those easily. I suppose that adding a 'Last build' column to the list view, that would display the time since the job entered its current state, would be nice. Something like 'Ended 8.6 hr' or 'Queued 1.3 hr' or 'Started 12 min'...

Then I'd know that if the code changed 3 hours ago, I shouldn't see any number larger than 3 hours...

Detect hung jobs

We have many jobs running, typically 20 to 30 simultaneously. And some builds last for several hours. It happens that tests hang, or are abnormally slow. These situations should be detected as soon as possible for investigation.

Unfortunately, the 'Build History' list is not very helpful, for two reasons. It has too few jobs for us: with 50 builds, only the last 5 hours are covered, which is less than the duration of many of our builds. But then if this limit was increased, we'd probably need a list of 200 or so jobs, which would not be easy to handle.

I would thus suggest to allow filtering on the 'building' status. When this flag would be set, the 'Build History' would only display the jobs that are currently being built.

A view 'by revision'

I often need to check if a given revision of the source has been built by a given job, or what is the latest revision that is good on a set of jobs. For example, I may want to merge this revision to some 'stable' branch for other teams to use.

I think that a grid view with the following attributes would be very useful for this: each line is a commit id or SVN revision, each column is a job, each cell is blue, red or gray (or even empty if this revision has not yet been part of a run of the job, or the run is not finished yet).

Do you think these would be useful additions?

Monday, April 16, 2012

A Taste of C++11

Herb Sutter has an example of what C++11 feels like. Here it is, from the video:

string flip(string s) {
   reverse(s.begin(), s.end());
   return s;
}

int main() {
   vector<future<string>> v;
   
   v.push_back(async([] { return flip(   " ,olleH"); }));
   v.push_back(async([] { return flip(" egdelwonK"); }));
   v.push_back(async([] { return flip("\n!rebmahC"); }));

   for (auto& e : v) {
      cout << e.get();
   }
}

Concurrency, futures and lambda functions... Moved objects, automatic type deductions and new for loop syntax...

A whole new language, isn't it?

Friday, April 6, 2012

About 'Making Things Happen'

I just finished reading 'Making Things Happen: Mastering Project Management', by Scott Berkun.

Rather than trying to explain why this is a really great book (although it is, and you can find reviews here, here and there), I thought I'd rather just mention a few ideas/topics/quotes I noticed while reading. If you want to know more about these items, you know where to go...

p 11 - PMs have to balance several pairs of forces: ego/no-ego, autocrat/delegator, tolarate ambiguity/pursue perfection, oral/written, acknowledge complexity/champion simplicity, impatient/patient, courage/fear, believer/skeptic. Depending on the phase of the project, or the situation at hand, PMs must balance these forces differently

p 16 - "PMs have to understand the advantage of their perspective and choose to make use of it"

p 25 - A schedule has a forcing function: people tend to try and stick with it

About schedules: a simple way to build one is to ask people in the team to provide an indented list of one liner tasks with estimates no longer than 2 days

p 137 - Specifications are needed to build a plan, and help define tests:

- ensure the right thing gets build

- create milestones to focus the team

- enable reviews and feedback

They should be in VCS (markdown format?) to allow others to check what changed. PM should make it clear with the team what the goals for the specs are (p 138)

p 143 - "Remember that good feedback comes more easily if you ask for it than if you wait for it."

p 144 - Ask the readers of the spec 'Do you have what you need to do your best work?'

p 145 - When writing a spec, put the questions about the specs itself at the end, or in another document.

p 183 - PM is tough: you have to invest in relationships with people, regardless of how much they're investing in you

p 185 - PM should discuss, wich each person, his role, the other person's role, and the common parts. This sets expectations.

p 186 - "What can I do to help you do your best work?"

p 215 - When an urgent issue arises:

- Calm down

- Evaluate the problem

- Calm down again

- Get the right persons in a room (and often, you don't belong to this group -- Offer help, but don't get in the way)

p 221 - "The challenge [of managing projects] isn't sailing in calm, open waters with clear skies. Instead, the challenge is in knowing w to juggle, prioritize and respond to all the unexpected and difficult things that you're confronted with".

"Taking responsibility for something doesn't make it your fault: it means that you will be accountable for resolving the situation"

p 224 - Getting to Yes, by Roger Fisher -- Know you BATNA (Best Alternative To Negociated Agreement)

p 232 - It's much more expensive to recover from burnout than to slow the project down

p 232 - Feelings about feelings

If someone says something to you that makes you sad ("You smell funny"), next that happens is a feeling (anger) about this first feeling (sadness) and one usually only can express the former (the feeling about the feeling). Cf Virginia Satir

p 233 - Living, loving and learning, by Leo Buscaglia

p 233 - Beware the hero complex (the person creates bad situations to be able to solve them)

p 234 - 'Always' and 'never' are not valid answers to the question of when a process is necessary

p 235 - Beware codependance between bad management and heroes, where the former creates the bad situation that the other saves

p 236 - Exercises for bad situations

p 242 - "To be a good leader, you must learn how to find, build, earn, and grant trust to others - as well as learn how to cultivate trust in yourself"

p 253 - Criticizing others

p 254 - What to do after a mistake, what to learn from your mistakes

p 255 - Never reprimand in real time

p 257 - Self-Reliance, by Ralf Waldo Emerson

p 261 - PMs do ordered lists of stuff

p 265 - Saying no. "If you're asked something, say no and point them to me"

p 354 - Project Management Clinic (closed now, but archives available) - http://www.scottberkun.com/forums/pmclinic

Friday, January 6, 2012

LaTeX-like Project Management

With project management role comes the need for a project scheduling tool. The obvious choice would be Microsoft Project, but I thought I'd rather look for some free programs first.

I tried two that are very similar to MS-Project: OpenProj and GanttProject. Both are open-source software. Both have a GUI that's centered around a Gantt chart, and allow you to manipulate activities as in MS-Project. But both also lack a very basic feature: resource leveling. They don't have any intelligence whatsoever, and the user must check by himself that resources are not overloaded, and correct issues by adding spurious precedence or 'starts after' constraints. I therefore don't really see how to actually use these tools for any serious work...

But then I discovered The TaskJuggler. It's a command-line program written in Ruby that will read a text file and output HTML pages with various reports like a gantt chart or a resource allocation graph. Installing the tool is as easy (once/if you have Ruby) as typing 'gem install taskjuggler'.

Using a text file as the input for all your project data has important benefits. The first one is that you can make use of your Source Management system to allow multiple people to collaborate on the file, while with a binary file (or even a text file that's completely rewritten by your tool when you save it) can't effectively permit sharing. Another is that changes are much easier to track, as a change to a single part of the project will not touch the rest of the file. So it's possible to revert individual changes...

This very much reminds me the difference between a word processing program, and LaTeX...

Here are some screenshots:

The Gantt chart

The resource allocation chart

And here is an example of a trivial project file. You'll find a much more complete example of a project file in the tutorial that is provided with the project.

project tiny "Example TJ3 Project"  2012-01-09 +12m {
  timezone "Europe/Paris"
}

resource Xavier "Xavier Nodet" {}

resource dev "Developers" {
  managers Xavier
  resource dev1 "Dev1" {}
  resource dev2 "Dev2" {}
}

task Tiny "Our Tiny Project" {
  responsible Xavier

  task t1 "Task 1" {
    
    task sub1 "Sub-task 1.1" {
      effort 30d
      allocate dev1
    }
    task sub2 "Sub-task 1.2" {
      effort 10d
      allocate dev1
    }
  }

  task t2 "Task 2" {
    effort 20d
    allocate dev2
    depends !t1.sub1
  }
  
  task deliveries "Milestones" {

    task start "Project start" {
      start ${projectstart}
    }
    
    task ega "EGA" {
      start 2012-11-01
      depends !!t1, !!t2
    }
  }
}

# Skipping the report generation part...

I only scratched the surface so far, but this seems very promising to me...