Tuesday, May 2, 2017

Eurocrypt 2017 - Parallel Implementations of Masking Schemes and the Bounded Moment Leakage Model

MathJax TeX Test Page
Side-channel analysis made its way into Eurocrypt this year thanks to two talks, the first of which given by François-Xavier Standaert on a new model to prove security of implementations in. When talking about provable security in the context of side-channel analysis, there is one prominent model that comes to mind: the d-probing model, where the adversary is allowed to probe d internal variables (somewhat related to d wires inside an actual implementation) and learn them. Another very famous model, introduced ten years later, is the noisy leakage model in which the adversary is allowed probes on all intermediate variables (or wires) but the learnt values are affected by errors due to noise. To complete the picture, it was proved that security in the probing model implies security in the noisy leakage one.

The work of Barthe, Dupressoir, Faust, Grégoire, Standaert and Strub is motivated precisely by the analysis of these two models in relation to how they specifically deal with parallel implementation of cryptosystems. On one hand, the probing model admits very simple and elegant description and proofs' techniques but it is inherently oriented towards serial implementations; on the other hand, the noisy leakage model naturally includes parallel implementations in its scope but, admitting the existence of noise in leakage functions, it lacks simplicity. The latter is particularly important when circuits are analysed with automated tools and formal methods, because these can rarely deal with errors.

The contribution of the paper can then be summarised in the definition of a new model trying to acquire the pros of both previous models: the Bounded Moment leakage Model (BMM). The authors show how it relates to the probing model and give constructions being secure in their model. In particular, they prove that BMM is strictly weaker than the probing model in that security in the latter implies security in the former but they give a counterexample that the opposite does not hold. The informal definition of the model given during the talk is the following:
An implementation is secure at order o in the BMM if all mixed statistical moments of order up to o of its leakage vectors are independent of any sensitive variable manipulated.

A parallel multiplication algorithm and a parallel refreshing algorithm are the examples brought to show practical cases where the reduction between models stated before holds, the statement of which is the following:
A parallel implementation is secure at order o in the BMM if its serialisation is secure at order o in the probing model.
The falsity of the converse is shown in a slightly different setting, namely the one of continuous leakage: the adversary does not just learn values carried by some wires by probing them, but such an operation can be repeated as many times as desired and the probes can be moved adaptively. Clearly this is a much stronger adversary in that accumulation of knowledge over multiple probing sessions is possible, which is used as a counterexample to show that security in the continuous BMM does not imply security in the continuous probing model. The refreshing scheme mentioned above can easily be broken in the latter after a number of iterations linear in the number of shares, but not in the former as adapting the position of the probes does not help: an adversary in the BMM can already ask for leakage on a bounded function of all the shares.

Both slides and paper are already available.

Eurocrypt 2017: On dual lattice attacks against small-secret LWE and parameter choices in HElib and SEAL

This morning, Martin gave a great talk on lattice attacks and parameter choices for Learning With Errors (LWE) with small and sparse secret. The work presents new attacks on LWE instances, yielding revised security estimates. This leads to a revised exponent of the dual lattice attack by a factor of 2L/(2L+1), for log q = Θ(L*log n). The paper exploits the fact that most lattice-based FHE schemes use short and sparse secret. We will write q to denote the LWE modulus throughout.

Let's first have a look at the set-up. Remember LWE consists of distinguishing between pairs (A, As+e) and (A,b). In the first instance, A is selected uniformly at random and b is selected from a special (usually Gaussian) distribution. In the second one, both A and b are uniformly random. Selecting s, as this work shows, is perhaps trickier than previously thought. Theory says that, in order to preserve security, selecting a short and sparse secret s means the dimension must be increased to n*log_2(q). Practice says just ignore that and pick a small secret anyway. More formally, HElib typically picks a secret s such that exactly h=64 entries are in {-1,1} and all the rest are 0. SEAL picks uniformly random secrets in {-1,0,1}.

We also recall that the dual lattice attack consists of finding a short vector w such that Aw = 0, then checking if
<Aw, (As+e)w> = <w,e>
is short. If we are in the presence of an LWE sample, e is short, so the inner product is short. Short*short = short, as any good cryptographer can tell you.

The improvements presented in this paper rely on three main observations. Firsly, a revised dual lattice attack is presented. This step is done by adapting BKW-style algorithms in order to increase efficiency and can be done in general, i.e. does not depend on either shortness or sparseness of the secret. It is achieved by applying BKZ to the target basis, then re-randomising the result and applying BKZ again, with different block size.

The second optimisation exploits the fact that we have small secrets. We observe that we can relax the condition on w somewhat. Indeed, if s is short, then finding w such that Aw is short instead of 0 is good enough. Therefore, we look for vectors (v,w) in the lattice

L = {(y,x): yA = x (mod q)}.

Now in small secret LWE instances, ||s||<||e|| and so we may allow ||v||>||w|| such that
||<w,s>|| ≈ ||<v,e>||.

Finally, the sparsity of the small secret is exploited. This essentially relies on the following observation: when s is very sparse, most of the columns of A become irrelevant, so we can just ignore them.

The final algorithm SILKE is the combination of the three above steps. The steps are the following.
  • Perform BKZ twice with different block sizes to produce many short vectors
  • Scale the normal form of the dual lattice
  • If sparse, ignore the presumed zero columns, correct for mistakes by checking shifted distribution

As usual, Martin wins Best Slides Award for including kittens.

Wednesday, April 12, 2017

Is Your Banking App Secure?

Last week I was in Malta for Financial Cryptography and Data Security 2017 to present my recent work on securing the PKCS#11 cryptographic API.

One talk that stood out for me was by researchers from the University of Birmingham, who looked for vulnerabilities in the mobile apps provided by major UK banks.

Sadly, they found major weaknesses in apps from 5 of the 15 banks they investigated.

Several apps use certificate pinning, where the app hard-codes a certificate from a trusted CA and only accepts public keys that are signed by the pinned certificate.
This is good practice, as an attacker can add their own certificate to the phone's trust store, but it won't be accepted by the app.
However, two Android apps (for Natwest and Co-op) accepted any public key signed by the pinned certificate, without checking the domain name!
So the attack works as follows:

  1. Purchase a certificate for a domain you own from the trusted CA
  2. The app will accept your public key with this certificate
  3. Man-in-the-middle all the encrypted traffic between the user and their bank.

Curiously, the authors note: "Co-op [...] hired two penetration testing companies to test their apps, both of which had missed this vulnerability". It seems odd that such an obvious mistake wasn't picked up in testing.

The group also found that several banks - Santander, First Trust and Allied Irish - served adverts to their app users over unencrypted HTTP, meaning an attacker could spoof these ads and mount a phishing scam, perhaps by displaying a fake 'security warning' and directing users to re-enter their account details on a malicious page. It was pointed out in the talk that we're much more likely to 'feel safe' within an app (and hence trust all the content we see) than, say, visiting a webpage using a laptop, so this kind of in-app phishing scam could be very effective.

There are even more exploits described in the paper.

It was refreshing to hear that the vulnerable banks responded well to the disclosures made by the Birmingham group and patched their apps as a result. But I'm a little baffled that these basic errors were ever made in such security critical applications.

Wednesday, March 29, 2017

PKC 2017: Kenny Paterson accepting bets on breaking TLS 1.3

The member of the TLS 1.3 working group is willing to bet for a beer that the 0-RTT handshake of TLS 1.3 will get broken in the first two years.

In his invited talk, Kenny managed to fill a whole hour on the history of SSL/TLS without even mentioning symmetric cryptography beyond keywords, thus staying within the topic of the conference. Despite all versions of SSL being broken to at least some degree, the standardised TLS became the most import security protocol on the Internet.

The core part of TLS is the handshake protocol, which establishes the choice of ciphers and the session key. Kenny highlighted the high complexity stemming from the many choices (e.g., using a dedicated key exchange protocol or not) and the possible interaction with other protocols in TLS. Together with further weaknesses of the specification, this created the space for the many attacks we have seen. On the upside, these attacks express an increased attention by academics, which comes together with an increased attention by the society as whole. Both have laid the ground for improvements in both the deployment and future versions of TLS. For example, the support of forward secrecy has increased from 12 percent to 86 according to SSL pulse.

Turning to concrete attacks, most important in the area of PKC is the Bleichenbacher attack published already at Crypto 1998 (a human born then would a considered a full adult at the conference venue now). Essentially, it exploits that RSA with the padding used in TLS is not CCA-secure, and it recovers the session key after roughly $2^{20}$ interactions with a server. Nevertheless, the TLS 1.0 specification published shortly after Bleichenbacher's publication incorporates the problematic padding (recommending mitigation measures), and later versions retain it for compatibility. The DROWN shows the danger of this by exploiting the fact that many servers still offer SSLv2 (about 8% of Alexa top 200k) and that it is common to use the same key for several protocol versions. An attacker can recover the session key of a TLS session by replaying a part of it in an SSLv2 session that uses the same key.

On a more positive note, Kenny presented the upcoming TLS 1.3, which is under development since 2014. It addresses a lot of the weaknesses of previous versions by involving academics from an early stage and doing away with a lot of the complexity (including reducing options and removing ciphers). It furthermore aims to decrease latency of the handshake by allowing the parties to send encrypted data as early as possible, reducing the round trip time to one in many cases. The goal of low latency has also led to the inclusion of QUIC, which provides zero round trip time, that is, the client can send data already in the first message when resuming a session. However, QUIC is not fully forward-secure and therefore confined to a separate API. Nevertheless, Kenny predicts that the sole availability will be too tempting for developers, hence the bet offered.

Concluding, he sees three major shifts in TLS this far: from RSA to elliptic-curve Diffie-Hellman, to Curve25519, and away from SHA-1 in certificates. A fourth shift might happen with the introduction of post-quantum algorithms such as Google's deployment of New Hope. Less optimistically, he expects that implementation vulnerabilities will continue to come up.

Update: An earlier version of this post mentioned the non-existing Curve255199 instead of Curve25519, and it attributed New Hope to Google.

Monday, March 27, 2017

Tools for proofs

Security proof  for even simple cryptographic systems are dangerous and ugly beasts. Luckily, they are only rarely seen: they are usually safely kept in the confines of ``future full-versions'' of papers, or only appear in cartoon-ish form, generically labelled as ... ``proof sketch". 

The following two quotes frame the problem in less metaphorical terms. 

``In our opinion, many proofs in cryptography have become essentially  unverifiable. Our field may be approaching a crisis of rigor".

                                              Bellare and Rogaway (cca 2004)

``Do we have a problem with cryptographic proofs? Yes, we

do [...] We generate more proofs than we carefully verify
(and as a consequence some of our published proofs are
                                                                     Halevi (cca 2005)

Solutions developed by cryptographers e.g. compositional reasoning and the game-hopping technique, help to structure proofs and reduce their complexity and therefore alleviate to some extent the pain of having to develop rigorous proofs. Yet, more often than not proofs are still sketchy and shady.

There is help that comes from the programming languages community which has a long experience with developing tools for proving that programs work correctly and...cryptographic systems are just programs. Recent progress,  e.g. automated verification of parts of TLS, fully verified security proofs of implementation masking schemes to defeat leakage, is impressive and exciting.  More work is under way. 

If you want to learn more about how can you get someone else to do the proofs for you or, more realistically, learn about what existent tools can currently do, what they could do in the future, and discuss what is needed and which way to go, then you should attend the 
Workshop on Models and Tools for Security Analysis and Proofs
 -- Paris, 29th of April; co-located with EuroSnP and Eurocrypt --

which the Bristol Crypto group helps organize. The workshop features as speakers some of the most prominent researchers that are contributing to this direction. You can register for the workshop HERE. Early registration ends March 31st!

But wait...there is more. If you want to explore this area beyond what a one-day workshop allows, then you should consider attending the

Nancy, France, July 10th - 13th --

See you all in Paris and/or Nancy!

Tuesday, February 21, 2017

Homomorphic Encryption API Software Library

The Homomorphic Encryption Application Programming Interface (HE-API) software library is an open source software library being developed as part of the Homomorphic Encryption Applications and Technology (HEAT) project, and is available here. The main purpose of this software library is to provide a common easy-to-use interface for various existing Somewhat Homomorphic Encryption (SHE) libraries. Limited support for fixed-point arithmetic is also provided by this library. Note that the HE-API library is still a work in progress.

Fully Homomorphic Encryption (FHE) is a cryptographic primitive that allows meaningful manipulation of ciphertexts. In spite of several recent advances, FHE remains out of practical reach. Hence a reasonable restriction to make is to limit the set of evaluated circuits to a specified subclass, usually determined by the multiplicative depth of the circuit. Such encryption schemes are called as SHE schemes.  Various libraries such as HElib, SEAL, FV-NFLlib, HElib-MP, etc., are already available that implement these SHE schemes.

The purpose of this HE-API software library is to provide a common, generic, easy-to-use interface for various existing libraries that implement SHE schemes. The SHE libraries that are currently integrated in the HE-API library are HElib and FV-NFLlib. It may be noted that the FV-NFLlib library is itself an outcome of the HEAT project. At a high-level, the HE-API software library abstracts out the technicalities present in the underlying SHE libraries. For instance, the HElib library implements the BGV SHE scheme, while the FV-NFLlib implements the FV SHE scheme. Needless to say, the syntax for various classes and routines in the individual libraries will be different, though the underlying semantics are very similar. The HE-API library integrates the underlying SHE libraries under a single interface, thereby shielding the user from syntactic differences. Another feature of the HE-API library is that it contains minimal, yet complete, set of routines to perform homomorphic computations. The design of this library is motivated by the ease of use for non-experts.

Supported Data Types
The following application data types are supported by the HE-API software library. 
  • Boolean
  • Unsigned long integers
  • GMP's arbitrary precision integers class: mpz_class
  • Polynomials with coefficients of type: unsigned long integers or mpz_class
  • Vectors of : unsigned long integers or mpz_class
  • Fixed-point numbers
Note that all the data types and routines described above may not be currently supported by every underlying SHE library.

Friday, January 13, 2017

RWC 2017 - Secure MPC at Google

This talk was given by Ben Kreuter and its focus was on the apparent disparity between what we research in academia versus what is required in the real world, specifically in the field of multi-party computation (MPC). MPC is the idea of allowing multiple parties to compute some function on their combined input without any party revealing anything about their input to the other parties (other than what can be learnt from the output alone).

While significant work has been done on making MPC efficient in practice (for example, the work of Yehuda Lindell et al. on high-throughput MPC which was presented by Lindell in the preceding talk), the focus tends to be on generic protocols (e.g. general logic circuits) with strong security guarantees (e.g. malicious security), which invariably leads to large computational overhead. In practice, we usually require only specific protocols, which can therefore be optimised, and comparatively weak security guarantees.

In the real world, network cost is the salient factor, rather than the speed of the protocol, since the parties who are involved in a computation often have to use networks (such as the Internet) which are being used by many other people at the same time and cannot make the best use of the network's full capabilities. The MPC at Google is about computation amongst, for example, mobile phones, laptops and servers; this introduces issues like battery constraints and the possibility of the computation not completing; these considerations, firmly grounded in the real world, are important when developing MPC techniques in research.

Business applications

A large portion of Google's revenue is generated by advertising: the tech giant, well-known for its aptitude for accurately determining users' desired search results even when queries are expressed ineloquently, specialises in creating personalised adverts to its wide spectrum of users. The efficacy of an advert is generally measured by the proportion of viewers of it who later become customers. Clearly this can be done by businesses comparing their database of customers' transactions with Google's databases of who has been shown which adverts. This, however, would be an invasion of privacy: instead, Google and the business can do MPC: more specifically, a private set intersection protocol.

In a private set intersection protocol, the parties involved compute how large the intersection is amongst the sets input by each party, or even some function on those elements in the intersection. So if the business and Google compute a private set intersection protocol on their data, they can determine how well the advertising went.

Roughly speaking, the MPC Google does in the real world is as follows: Google has a set $\{g_1,g_2,...,g_n\}$ of field elements which encodes a set of people who have been shown an advert for a certain product, and a business has a set $\{b_1,b_2,...,b_m\}$ of field elements which encodes a set of people who have been sold the product in question; Google raises each of its elements to a power $G$ and sends the set $\{g_1^G,g_2^G,...,g_n^G\}$ to the business. The business does the same with its elements for some exponent $B$ to get $\{b_1^B,b_2^B,...,b_m^B\}$, encrypts a set of binary vectors under Paillier encryption (which is additively homomorphic), one corresponding to each element in its set, encoding some other property of the sales (like the amount paid), and also computes the set $\{g_1^{GB},g_2^{GB},...,g_n^{GB}\}$. The business sends Google the set of pairs $\{(b_1^B,P(v_1)),(b_2^B,P(v_2)),...,(b_m^B,P(v_m))\}$ along with $\{g_1^{GB},g_2^{GB},...,g_n^{GB}\}$, and Google computes $\{b_1^{GB},b_2^{GB},...,b_m^{GB}\}$ and adds together all encrypted vectors $P(v_i)$ for which there exists some $j$ such that $g_i^{GB}=b_j^{GB}$. It sends this ciphertext back to the business, which decrypts and interprets the result.

This protocol is very simple, and it is only passively secure (in which players are assumed to execute the protocol faithfully but will possibly try to learn things by inspecting their communication transcripts). An interesting, perhaps somewhat orthogonal concern, to how we approach research from an academic point of view is that it is important that we can convey the security and efficiency of our protocols to lawyers, managers and software engineers who will eventually be sanctioning, authorising or implementing the protocols. "The lawyers are interesting because you can show them a proof, and two plus two equals four is a negotiable statement here... managers usually trust your expertise...and software engineers are the worst because they already assume [the protocol] is impossible."

An alternative solution using garbled circuits was explored in the recent past, but it turned out that their use required some subtle assumptions regarding the computation and communication which would have made the protocol impractical.

Future work would involve getting a (not too much more expensive) maliciously secure protocol and developing the use of the homomorphic encryption to allow different functions to be computed on the data in the intersection.

Consumer applications

The Android keyboard app by Google, Gboard, logs what a user types so that it can guess words for auto-completing in the future. This data could be used for training machine learning models, and merging results from many local models would enable the formation of guessing algorithms that work well for everyone. However, to do this, the server would need to receive a set large dataset of words typed by a user from each phone so that this processing could be done. Clearly there is an issue of privacy here; moreover, there is also potentially a differential privacy issue.

This is clearly a good situation in which to use MPC. Each party masks their data using a basic additive secret-sharing scheme: if each party has a vector to input, for every coordinate, every pair of parties agrees on some random field element, one subtracts and one adds this to that coordinate of their vector. When the parties send this to Google, the masks will therefore cancel when added together.

In practice,they use a PRG and perform a key exchange (in which one key is given to each pair of parties, for every possible pair) at the beginning to achieve the same effect but with much smaller communication overhead. They also have a trick for dealing with device failures (which is important given the application).

This talk provided helpful and relevant insight into the the importance of matching what we research with what we require in the real world, which is, after all, one of the main reasons for having conferences such as Real World Crypto. Many of the talks are available to watch online here, and I would highly recommend doing so if interested.