John Graham-Cumming's blog: 2015

2015-11-30

The secret message hidden in every HTTP/2 connection

If you spy on an HTTP/2 connection starting up you'll notice that it sends an almost-but-not-quite valid HTTP request at the very start of the connection. Like this:

Written a little more clearly that's:

    PRI * HTTP/2.0

    SM

The HTTP verb is PRI and the body contains just SM. Put them together and you get... PRISM. This occurs right at the start of the connection to ensure that the server really supports HTTP/2.0. It is detailed in Section 3.5 of RFC7540 as follows:

   In HTTP/2, each endpoint is required to send a connection preface as
   a final confirmation of the protocol in use and to establish the
   initial settings for the HTTP/2 connection.  The client and server
   each send a different connection preface.

   The client connection preface starts with a sequence of 24 octets,
   which in hex notation is:

     0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a

   That is, the connection preface starts with the string "PRI *
   HTTP/2.0\r\n\r\nSM\r\n\r\n").

I tried to find an explanation of the specific letters used and why they spell PRISM. After a bit of spelunking the following comes to light.

May 29, 2013
IETF draft-ietf-httpbis-http2-03 describes this connection mechanism and indicates that the string to send is FOO * HTTP/2.0\r\n\r\nBA\r\n\r\n.

July 8, 2013
IETF draft-ietf-httpbis-http2-04 changes the string to PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n.

Strange.

I wonder what happened between May 29, 2013 and July 8, 2013? Could it be "U.S., British intelligence mining data from nine U.S. Internet companies in broad secret program"?

Thanks to this comment on Hacker News here's the actual commit that introduced this change. On June 14, 2013 the string was changed with the comment "Exercising editorial discretion regarding magic."

2015-07-10

Possibly the machines that injected JavaScript in pre-revolution Tunisia

Back in 2011 as the Tunisian Revolution was underway I blogged about JavaScript that was being injected into web pages visited by Tunisian's using HTTP. The JavaScript was designed to steal usernames and passwords of Facebook and Google Mail users.

With the release by Wikileaks of the archive of email from Hacking Team it's possible to lift the lid on this just a little.

Ten days before my blog post there was an article in Fast Company titled TUNISIAN GOVERNMENT ALLEGEDLY HACKING FACEBOOK, GMAIL ACCOUNTS OF DISSIDENTS AND JOURNALISTS which talked about this piece of JavaScript. This news article is discussed briefly by folks within Hacking Team.

One engineer comments on the JavaScript being injected by sending out a quotation just followed by the smiley ":P". Another responds in English "Truly remarkable" and finally someone responds "La foto l'ha fatta un nostro tecnico con cellulare!!!" (Photo take by one of our engineers with his cellphone).

The photo appears to show three servers labeled Facebook, GMAIL, Hotmail and named ATI.jpg (ATI is the Agence Tunisienne d'Internet; the Tunisian Internet Agency). I wonder if these are the servers that were injecting this JavaScript or receiving the purloined login credentials. It's not 100% clear.

It's unclear from the mails if Hacking Team was involved in this interception and another email discussing the topic of Internet restrictions in Tunisia says "Alcune di queste misure restrittive potrebbero essere state implementate da RESI." ("Some of these restrictive measures could have been
implemented by RESI")

But the photograph is eerie.

2015-07-02

Dirty Go: eliminating vertical space

Having written a lot of Go I sometimes find myself exasperated by functions like this:

because of all the repetition and vertical space. Working on something the other day I realized I could (ab-)use Go's switch like this:

with a small change to the functions being called. Instead of having doFirst (etc.) return an error, have it take a pointer to an error and return a boolean indicating success. That's so much more compact, but feels a little dirty.

PS You can play about with that here.

2015-05-21

The effectiveness of Turing's Vigenère cipher breaking technique

In Turing's paper The Applications of Probability to Cryptography he describes a technique for breaking the Vigenère Cipher using Bayes Theorem. The paper was declassified in 2012 and a computer typeset version has been released (Turing used a typewriter and pen to produce the original).

Having used Bayes Theorem quite a few times (for example in POPFile) I was interested to see how it was being applied to Vigenère by Turing. Bayes Theorem is surprisingly powerful even when great simplifications (such as assuming, for example, independence of the occurrence of words or letters in an English sentence: something that's clearly not the case).

Briefly Turing's technique is to use Bayes Theorem to 'weigh the evidence' that each of the 26 letters of the alphabet could be one of the letters in the key and output weights for each letter. His technique outputs weights for each letter in each position in the key and by picking the most likely it's possible to have a good guess at the key.

Turing gives the following example of a Vigenère enciphered text where the key length has been determined to be 10 (there are techniques for doing this see the relevant Wikipedia page).

D K Q H S H Z N M P
R C V X U H T E A Q
X H P U E P P S B K
T W U J A G D Y O J
T H W C Y D Z H G A
P Z K O X O E Y A E
B O K B U B P I K R
W W A C E J P H L P
T U Z Y F H L R Y C

A common way to attack Vigenère once the key length is determined is to look at each column (which corresponds to a single letter in the key) and try out trial decryptions for each letter A to Z. By looking at the resulting letters (and their frequencies) it's possible to make a guess which key letter is most likely by matching the frequencies of the letters against their corresponding frequencies in English.

For example, the first column is DRXTTPBWT. If decrypted with the key C this becomes BPVRRNZUR. The presence of a Z (which is fairly uncommon letter) makes C less likely to be the key. Decrypted with the letter P as the key it becomes OCIEEAMHE which looks more like letters sampled randomly from English text. So P is a possible key for column 1.

That's quite a laborious task across each column and each letter of the alphabet. But Turing essentially automates this by trying out every key letter for each column and producing a weight. Here's Turing's algorithm as Perl code:

The core idea is that for each key letter and for each encrypted letter a weight (called a ban in Turing's terminology) is calculated as (the log of) the likelihood of that letter being right (which is just the frequency of the corresponding plain text letter in English text) divided by the likelihood of getting that letter with some other key.

For example, if D appears in the cipher then the probability that D appears when the key letter is B is calculated as:

The a priori odds of key B are taken as 1/25 (i.e. the letters of the key are random; this may not actually be true).

The same calculation is done for all the letters in the column to obtain a final probability that a specific key letter is correct for that column. Using logs of probabilities means that the calculations change from being large multiplications to a series of additions (which are fast for both humans and machines to do).

To see Turing's algorithm in action I searched the web for an example of Vigenère and came across this sample which shows the classic way of breaking the cipher. The cipher text is

ANYVG YSTYN RPLWH RDTKX RNYPV QTGHP

HZKFE YUMUS AYWVK ZYEZM EZUDL JKTUL

JLKQB JUQVU ECKBN RCTHP KESXM AZOEN

SXGOL PGNLE EBMMT GCSSV MRSEZ MXHLP

KJEJH TUPZU EDWKN NNRWA GEEXS LKZUD

LJKFI XHTKP IAZMX FACWC TQIDU WBRRL

TTKVN AJWVB REAWT NSEZM OECSS VMRSL

JMLEE BMMTG AYVIY GHPEM YFARW AOAEL

UPIUA YYMGE EMJQK SFCGU GYBPJ BPZYP

JASNN FSTUS STYVG YS

And standard methods determine that the key length is most likely six. Feeding this into the Perl code above results in the following output.

$ perl t.pl ANYVGYSTYNRPLWHRDTKXR

NYPVQTGHPHZKFEYUMUSAYWVKZYEZMEZUD

LJKTULJLKQBJUQVUECKBNRCTHPKESXMAZ

OENSXGOLPGNLEEBMMTGCSSVMRSEZMXHLP

KJEJHTUPZUEDWKNNNRWAGEEXSLKZUDLJK

FIXHTKPIAZMXFACWCTQIDUWBRRLTTKVNA

JWVBREAWTNSEZMOECSSVMRSLJMLEEBMMT

GAYVIYGHPEMYFARWAOAELUPIUAYYMGEEM

JQKSFCGUGYBPJBPZYPJASNNFSTUSSTYVG

YS 6

A: . . . . 143 .

B: . . . . . .

C: . . . . . .

D: . . . . . .

E: . . . . . .

F: . . . . . .

G: . . 132 . . .

H: . . . . . .

I: . 151 . . . .

J: . . . . . .

K: . . . . . .

L: . . . . . 116

M: . . . . . .

N: . . . 135 . .

O: . . . . . .

P: . . . . . .

Q: . . . . . .

R: . . . . . .

S: 101 . . . . .

T: . . 30 . . .

U: . . . . . .

V: . . . . . .

W: . . . . . .

X: . . . . . .

Y: . . . . . .

Z: . . . . . .

Which indicates that the most likely key is SI(GT)NAL. Of the G and the T, G is most likely giving a key of SIGNAL. Which is correct.

Turing (and many others) used Bayesian techniques like this to attack other ciphers. See, for example, an old post of mine on the Japanese JN-25 cipher broken at Bletchley Park.

2015-04-15

Another way to find the value of GNU make's -j parameter

The other day I wrote a blog post about finding the value of -j in GNU make and it received a lot of commentary. Daniel Frey wrote in with a different solution that goes like this. It's worth studying if you are interested in GNU make wrangling.

.PHONY: default FORCE token recursive
default: foo.out
@$(eval JOB_COUNT := $(shell cat $<))
@echo Job count: $(JOB_COUNT)

foo.out: FORCE
@timeout 1 $(MAKE) --no-print-directory token recursive 2>/dev/null | wc -l >$@

FORCE:

token:
@echo X && sleep 2 && false

recursive:
@$(MAKE) --no-print-directory token recursive

2015-04-14

One weird trick that will give you makefile X-ray vision

Sometimes when you are running a make you'd like to see what commands get executed (even if they've been muted by prefixing them with @) and where in the makefile those commands reside. But you can't. There's no way to override @ and make -n doesn't actually run the make.

But here's 'one weird trick' that makes it possible to see inside a make as its running. Just add this to the makefile:

_SHELL := $(SHELL)
SHELL = $(warning [$@])$(_SHELL) -x

Before digging into how that gives you makefile X-ray vision, let's take a look at it in action. Here's an example makefile that builds an executable from foo.o and bar.o (which are built from corresponding foo.c and bar.c files).

.PHONY: all
all: runme

runme: foo.o bar.o ; @$(LINK.o) $^ -o $@

foo.o: foo.c
bar.o: bar.c

%.o: %.c ; @$(COMPILE.C) -o $@ $<

Running simply make on this performs the compiles and link but produces no output because output has been suppressed by the @ signs on the compile and link commands. Running a make -n gives this output:

% make -n
g++ -c -o foo.o foo.c
g++ -c -o bar.o bar.c
cc foo.o bar.o -o runme

That's handy, but doesn't tell the whole story. For example, which of those lines correspond to which targets? And where are the actual commands found in the makefile?

But add the two lines above to the makefile and do a normal make and everything becomes clear:

% make
Makefile:9: [foo.o]
+ g++ -c -o foo.o foo.c
Makefile:9: [bar.o]
+ g++ -c -o bar.o bar.c
Makefile:4: [runme]
+ cc foo.o bar.o -o runme

It's easy to see that foo.o is built by the rule on line 9 of a makefile called Makefile (the %.o: %.c pattern rule) and that the link is on line 4. As each rule runs the location of the rule (with the corresponding target) is output followed by the actual commands.

How that works

The first line of the trick defines a variable called _SHELL and captures the value of the built-in SHELL variable using :=. SHELL contains the path (and parameters) for the shell that will be used to execute commands.

Then SHELL itself is redefined (this time using = and not :=) to contains a $(warning) and the original shell command (from _SHELL) with -x added. The -x causes the shell to print out commands being executed.

Since SHELL gets expanded for every recipe in the makefile, as it runs the $(warning) gets expanded and outputs the specific makefile line where the recipe can be found and $@ will be valid and contain the name of the target being built.

Caveat: doing this will slow down a build as GNU make doesn't use the shell if it can avoid it. If SHELL is untouched in a makefile the GNU make short-circuits the shell and execs commands directly if it can.

More?

If that sort of thing interests you, you might enjoy my book: The GNU Make Book.

2015-04-13

Plain web text offenders: sending my location over HTTP when HTTPS was possible

The BBC Weather App sends the location of the user via HTTP to the BBC in order to determine the best location for weather information. It does this with roughly 100m accuracy, in the parameters of an unencrypted GET and even though the same API endpoint is available using HTTPS.

I discovered this accidentally using Charles Proxy to snoop on traffic from my iPhone at home. Here's the Charles Proxy view of that app interacting with the BBC's APIs:

It's hitting the endpoint http://open.live.bbc.co.uk/locator/locations with parameters la and lo containing the three decimal digit latitude and longitude (which give roughly 100m precision) over HTTP as a GET request.

The API then returns a JSON object containing nearby locations that the app can get weather information about.

Sadly, this API could have been accessed over HTTPS. Just switching the protocol from HTTP to HTTPS works fine. A legitimate wildcard certificate for *.bbc.co.uk is used.

So, the app could have used HTTPS.

Perhaps "This app would like to use HTTP for its API" should be a permission that the user has to explicitly give.

2015-04-10

The one line you should add to every makefile

If you're using GNU make and you need help debugging a makefile then there's a single line your should add. And it's so useful that you should add it to every makefile you create.

It's:

print-%: ; @echo $*=$($*)

It allows you to quickly get the value of any makefile variable. For example, suppose you want to know the value of a variable called SOURCE_FILES. You'd just type:

make print-SOURCE_FILES

If you are using GNU make 3.82 or above it's not even necessary to modify the makefile itself. Just do

make --eval="print-%: ; @echo $*=$($*)" print-SOURCE_FILES

to get the value of SOURCE_FILES. It 'adds' the line above to the makefile by evaluating it. The --eval parameter is a handy way of adding to an existing makefile without modifying it.

How that works

The line

print-%: ; @echo $*=$($*)

defines a pattern-rule that matches any target in the form print-% (the % is the wildcard character). So when you run make print-SOURCE_FILES that rule will execute and the % will match SOURCE_FILES.

The command executed by print-% is @echo $*=$($*). Here I've used a semicolon to separate the target name and the recipe (commands to be executed). That makes this into a one-liner. In more traditional make syntax (where a tab is used to introduce the recipe) that would be written.

print-%:
@echo $*=$($*)

Using semicolon makes this easy to copy and paste.

The automatic variable $* matches the % in print-% (so when executing print-SOURCE_FILES, $* will be SOURCE_FILES). So $* contains the name of the variable that we want to print out.

The $($*) uses gets the value of a variable whose name is stored in $*. For example, $* might expand to SOURCE_FILES and then GNU make would get the value of $(SOURCE_FILES). Getting a variable by storing its name in another variable turns out to be a useful technique in many makefiles.

More?

If that sort of thing interests you, you might enjoy my book: The GNU Make Book.

2015-04-09

"The GNU Make Book": probably more than you ever wanted to know about make

So, it's finally here. My second book, entitled The GNU Make Book and published by No Starch Press later this month (electronically it's available now). If you are a long time reader of my blog then you'll know that I've had a very long interest in GNU make and written quite a bit about it. For a while I had self-published all my articles on GNU make as a print on demand book.

The folks at No Starch Press kindly took my self-published effort and did the serious work of actually editing, organizing, indexing and generally sprucing it up. I added some articles that weren't in the old version and updated for GNU make 4.0.

And now it's here.

The book is not intended for people learning GNU make from scratch. If you need to do that go get the FSF's GNU make manual and read it. It covers all that you need to get started with GNU make.

Once you've done that you'll be ready to read mine. It's based on years of working with real makefiles, on having written a complete emulation of GNU make and having created the GNU Make Standard Library project.

It's thoroughly practical and I hope the ideas and techniques in it will prove useful to real-world makefile writers and maintainers. If you're interested there's sample chapter and Table of Contents.

2015-03-28

GNU make insanity: finding the value of the -j parameter

The other day at work a colleague posed the seemingly innocent question "Can I find out the value of the GNU make -j parameter inside a Makefile?". Simple, right? Recall that -j (or --jobs) specifies the maximum number of jobs that GNU make can run in parallel. You've probably used it to speed up a build by typing something like make -j16.

But it turns out the actual value supplied is not available in any GNU make variable. You can verify that by doing something like make -p -j16 which will dump out pretty much everything defined by GNU make and the Makefile. You won't find any reference to the -j16 anywhere.

What follows falls into the "It's crazy, but it just might work!" category.

But it is possible to calculate its value if you are willing push GNU make hard. Here's a Makefile that gets the value given to -j (or --jobs) into a variable called JOB_COUNT.

There's quite a lot of magic in that Makefile. I'll explain how it works bit by bit. First the general idea. The Makefile attempts to run up to 32 jobs in parallel (the actual value 32 can be adjusted on line 10 if you want to be able to detect -j values greater than 32). It will get GNU make to run as many jobs in parallel as possible and each job will pause for one second and then fail. The result is that exactly N jobs will run for -jN.

Where do the 32 jobs come from? They are targets named par-0, par-1, par-2, ..., par-31 and are created by the pattern rule on line 11. The pattern rule uses $(eval) to append a single x to a variable called PAR_COUNT. Each time GNU make runs one of the rules PAR_COUNT is appended.

Within the rule there's echo $(words $(PAR_COUNT)) which will echo the number of xs in PAR_COUNT when the rule runs. The result is that 1, 2, 3, ..., N is echoed for -jN.

The actual target names par-0, par-1, par-2, ..., par-31 are created on line 10 as prerequisites to a target called par by adding the prefix par- to the sequence of numbers 0 1 2 3 ... 31. Where that sequence comes from is interesting.

The sequence is created by the function to_n defined on line 7. To understand how to_n works consider what happens when $(call to_n,4) is executed (it should return 0 1 2 3).

Inside GNU make there's essentially a functional programming language that operates on space-separated lists. By building up a list of xs it's possible to output the numbers in sequence by using $(words) to count the number of elements in the list and by appending to the list on each recursive call to to_n. to_n finishes when the list has the required length. The equality test is done using $(filter-out) as a kind of 'string not equal function'.

So, that's par explained. To actually retrieve the number of jobs a sub-make is used on line 4. The sub-make will inherit the -jN (without the actual value being passed down) because of the way GNU make's jobserver functionality works (you can read about it here).

The output of the sub-make is written to a file called .parallel (which is always created because of the FORCE target) and then it is read at line 9 and the maximum value retrieved. Because the sub-make will actually fail (failure is used to detect the maximum parallelism available) there's an || true to hide the error from GNU make and stderr is redirected.

One last thing. To get the value into JOB_COUNT it's necessary to actually run the parallel target. I've done that using an order-only prerequisite in line 1 (notice the | before parallel?). That causes the parallel target to execute but its execution doesn't affect whether all runs or not.

The only remaining question remaining is... should you do this type of thing?

PS If you are into this sort of thing you might enjoy my new book from No Starch: The GNU Make Book.

2015-01-19

Failing to RTFM for Go: defer statements

So, a colleague wanders over the says: "You know that commit you just made to project X? It doesn't work" and the proceeds to remind me of that fact that the defer statement in Go is rather special.

The argument of the defer statement is a function that will be called when the defer statement executes. But its arguments are evaluated when the defer statement is encountered (here's where you, or at least I, RTFM).

A function like this won't do what you expect:

func deferPrintf() error {
var err error
defer fmt.Printf("deferPrintf: err has value %v\n", err)

err = errors.New("Error 2")
return err
}

That will always output deferPrintf: err has value because the function being called (fmt.Printf) by the defer has its arguments (which include err) evaluated when the defer is encountered.

Oops. I'd forgotten that.

The solution is pretty simple. Wrap the fmt.Printf in a closure like this:

func useAClosure() error {
var err error
defer func() {
fmt.Printf("useAClosure: err has value %v\n", err)
}()

err = errors.New("Error 1")
return err
}

The err inside the func() is the same err outside (it's a closure after all) and so it does the right thing. You can play with this here.