The best kittens, technology, and video games blog in the world.

Tuesday, December 19, 2017

Challenges for November 2017 SecTalks London

Christmas Luke by Nicholas Erwin from flickr (CC-NC-ND)

Following highly successful September round of London SecTalks, I ran another round in November.

The round consisted of 8 tasks, and they were a bit harder this time, with even the winner only finishing 7 in time - a few people completing the challenges only after time.

You can find challenges and code used to generate them in this spoiler-free repository.

This post doesn't contain answers, but it might spoil a bit.

Archive (5 points)

It was nearly identical to previous round's archive challenge - 16-level nested archive, with 1 real and 15 fake archives on every level. The only difference was that distraction files were 0-padded to have same size as the real file, which forced smarter strategy than simply going for the largest file every level.

Of course MD5ing to find unique file, or just unpacking them all and removing duplicate files still worked.

CSS (10 points)

The password was encoded within CSS rules. I've never seen this kind of challenge anywhere, so maybe it's the world's first?

It was very short, and every character was independent, so it seems that everyone just manually brute forced it.

Secret Message (15 points)

The answer was written in one color on background of another extremely similar color. Everybody managed to finish it so quickly, I didn't even have a chance to see what kind of tools they used to solve it.

EDIT: Oops, it seems that I messed up ImageMagick options and also accidentally left the answer in EXIF.

Python (20 points)

As we all know Python is a whitespace-sensitive language. So I encoded some secrets in the whitespace.

Quite a few people used editors which cleaned up whitespace automatically, messing up with the file. Once a person figured out what the challenge is about, it wasn't usually too hard to solve it.

Ruby (25 points)

Obfuscated Ruby challenge was the hardest one of the round. It used two layers of Unicode obfuscation, first with emoji, and then with CJK characters. Other than using unusual characters, obfuscations applied weren't particularly hard.

ECB BMP (30 points)

This was a fun one. It was basically a version of the famous ECB penguin from Wikipedia.

People had a lot of trouble figuring out dimensions and bit depth of the image, which had to be given as a hint, even thought they were fairly usual.

XOR GIF (35 points)

This was a two step challenge. A GIF file was xor-encrypted with a word from a dictionary.

The challenge was then to find out which Twitter account the image is from.

Since GIF header is known, it was very easy to figure out the first few letters of the key. However, people had a lot of trouble completing it, as the word I've chosen was only in some dictionaries. This wasn't intentional.

After getting the image, it turns out only some reverse image search could find it properly, and others returned bogus matches.

ROT Word (40 points)

I wanted to have a task for statistical analysis of some classical cipher, but all the real ones have online tools you can use to solve them in a few seconds.

So I made up one - it's like rot cipher with multi-letter key, except each letter is used for whole word, not for one letter.

encrypt("All your base are belong to us!", "omg") == "ozz kagd hgyk ofs nqxazs zu ig"

For a bit of extra challenge the message was in English, but contained a bunch of non-English proper names.

Final Thoughts

I made this one just a bit harder, and maybe it was a tiny bit too much.

Overall, a lot of fun happened.

I'd definitely recommend CTFd server for this.

Tuesday, November 28, 2017

How to watch high speed let's plays on London Underground

le petit chat by FranekN from flickr (CC-NC-ND)

Apparently the idea that some places - like London trains - are offline - never occurred to anyone in California or Seattle or wherever people who write mobile software tend to live. And even support for high speed playback is not quite as common as it should be.

So I came up with a process to deal with it - which even with all the scripts to automate it still has far too many steps. I'm not saying I recommend it to anyone, but maybe someone needs to do something similar, and they might use it as a starting point.

Download let's plays

First, let's find a bunch of let's plays we want to watch, it's best to use playlists instead of individual videos to reduce URL copy and pasting time, but it works for both.

To download them we can use youtube-dl, which is available as a homebrew package (brew install youtube-dl), or you can get it from here.

$ youtube-dl -t -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best"
  "url1" "url2" "url3"

Youtube offers videos in many formats, and arguments above are what I found to result in highest quality and best compatibility with various video players. Default settings often ended up causing issues.

Speed up the videos

There's plenty of command line tools to manipulate audio and video, and they tend to have ridiculously complicated interfaces.

I wrote speedup_mp3 script (available in my unix-utilities repository) which wraps all such tools to provide easy speedup of various formats - including of video files.

The script uses ffmpeg to speedup videos - as well as sox and id3v2 to deal with audio files if you need to do so to some podcasts. All those dependencies you can satisfy with brew install ffmpeg sox id3v2.

The script can speedup a whole directory of downloaded videos at once by 2.0x factor:

$ speedup_mp3 -2.0 downloaded_videos/ fast_videos/

Adjust that number to your liking. Factors higher than 2.0 are not currently supported, as ffmpeg requires multiple speedup rounds in such case. I plan to add support for that later.

The process takes a lot of time, so it's best left overnight to do its thing.

The script already skips videos which exist in target directory, so you can add more videos to the source, run it again, and it won't redo videos it already converted.

Put them on Dropbox

Infuriatingly there doesn't seem to be any good way to just send files to Android tablet from a laptop over WiFi. There used to be some programs, but they got turned into microtransaction nonsense.

If you already use Dropbox, the easiest way is to put those sped up files there. This step is a bit awkward of course, as video files are big, people's upload speeds are often low, and free Dropbox plan is pretty small.

If that doesn't discourage you, open Dropbox app on your tablet, and use checkbox to make your files available offline. You don't need to wait for it to finish syncing, Dropbox should keep updating files as they get uploaded.

After that any video player works. I use VLC. Just open Dropbox folder, click on the video, it will open in VLC and play right away. First time you do it, make sure to set VLC as default app to avoid an extra dialog.

Isn't this ridiculously overcomplicated?

Yeah, it sort of is. Some parts of it will probably get better - for example speed controls on video/audio playback are getting more common, so you could skip that part (watching at 100% speed is of course totally silly). It still makes some sense to pre-speedup to save space and battery on the device, as faster files are proportionally smaller, but if you feel it's not worth the hassle, you can probably find a video player with appropriate functionality.

TfL showed zero interest in fixing lack of connectivity on London underground, and mobile ecosystem assumes you're always online or everything breaks, so this part will probably be a major pain point for very long time.

The part I find most embarrassing is lack of any builtin way to just send files over to a device. Hopefully this gets fixed soon.

Saturday, November 18, 2017

10 Unpopular Opinions

Cat by kimashi tower from flickr (CC-BY)

I posted these on twitter a while back on Robin's request, but I wanted to elaborate a bit and give some context.

The list avoids politics, anything politics-adjacent like economics, and is not about just preferences.

If these turn out to be not controversial enough, I might post another list sometime in the future.

Listening to audio or watching videos at 100% speed is a waste of life

People speak very slowly, and for a good reason. When you talk with another person you need to not just process what they said, you're also preparing your responses, considering their reaction to your responses, and so on. With more than two people involved, it gets even more complicated.

None of this applies when you're just listening to something passively. Using audio speeds designed to leave you with enough brainpower to model your interlocutor when there's no interlocutor to model is just wasting it.

It will probably take a while to get used to it, but just speed it up, 200% should be comfortable for almost every content - podcasts, audiobooks, let's plays, TV etc. At first I used slight speedups like 120%, but I kept increasing it.

Side effect of this is that you might end up listening to music at higher speeds too (I end up using 140%), and people find this super weird.

I recommend this Chrome extension to control exact playback speed. It works with all major video sites.

I also wrote speedup_mp3 command line tool for podcasts and audiobooks, but nowadays most devices have builtin methods.

Oh and back in analog days, speeding up audio messed up the pitch, so everything sounded funny. It's not true for modern methods.

Any programmer who does not write code recreationally is invariably mediocre at best

This comes up every now and then on sites like reddit and the masses of mediocre programmers are always like "oh it's totally fine to just code at work". It's not.

Coding is unique in its ability to change the world - with even tiny amounts of effort you can affect reality. If someone never codes recreationally, this means one of:
  • They're so content they never needed or wanted to create something that didn't exist before
  • They coded some stuff, but never bothered to Open Source it
  • They'd like to, but they're just not good enough
So when you're hiring, all CVs without github link should go straight to the bin.

Couldn't be bothered to Open Source used to be sort of excusable, but it's nowadays just so easy to push something to github, Signaling 101 strongly implies people without github account are just bad.

And that applies to even junior / graduate roles. Even if you don't have anything amazingly useful to show yet, you can still share as you learn.

Avoidance of suffering can't be basis of morality - if it was, knocking out a few pain genes would be highest moral imperative

Nobody buys morality systems based on "God said so" or "Kant said so", and when people spend too much time on utilitarianism, they run into all kinds of problems.

So it became fashionable to ignore all pleasurable parts of utilitarianism, and just focus on minimizing suffering.

This is a total nonstarter. "Pain" and "suffering" are not exactly the same thing, but if you want to minimize suffering getting rid of pain is pretty much mandatory, and it's just a few simple gene edits to abolish it completely.

So far nobody's interested in researching some gene edits for humans or animals to get rid of pain, so by revealed preference they don't actually buy their own stated believes that avoidance of suffering is terribly important.

An obvious objection might be that people with congenital insensitivity to pain keep getting themselves in physically dangerous situations, but it's completely irrelevant. They live in the world of pain-sensitive people, which is currently full of objects dangerous to people without pain sensitivity. It would take very modest effort to redesign common risk factors for greater safety, and establish cultural norms to always see medical help just in case whenever something unusual is happening to one's body, not just when it's painful (since nothing ever will be).

Or even if that as somehow unachievable, we could simply reduce pain sensitivity without completely losing it as a signal. If it was really key to all morality, science should drop everything and focus on it.

Any takers? No? I thought so.

Mobile "games" are closer to fidget spinner than to real games

As a proud gamer, I find it infuriating that people call those mobile things "games".

It's not that they're bad games - I have no problem with bad games. They are not games.

For a good analogy, let's say you're into movies. And then someone is like "oh I totally love movies, I put news on tv playing in the background every morning while I get ready to go out". Ridiculous, isn't it? Somehow everybody else is spared from this nonsense, except gamers.

A game - like a movie - is something you actually get fully into. In game time, or movie time.

A mobile "game" - like background TV news - is something happening part-time mentally, just to fill otherwise dead time. Like on a train, in a long queue, or otherwise when you can't do anything better.

You know what mobile "games" are closer to? Fidget spinners. Rubik cubes. Sudokus. Toys. Not games.

That's not to say there aren't some legit games on mobile platforms, like let's say Hearthstone. They have nothing in common with all that fidget spinnery stuff.

Future medicine will develop easy fitness pill/hardware, and modern diet/exercise obsession will be wtf-tier to them

We evolved in very different world, and recently nearly everyone all over the world is getting overweight, horribly unfit, and suffering from all kinds of chronic conditions as a result.

Currently the best way people have to deal with it is to go on ever crazier diets, spend billions on "healthy" food and weight loss preducts, spend hours every week in gyms, and all that effort has at most modest effect.

But why is any of that even remotely necessary? You already have all the genes necessary to be fit, healthy, and attractive (and if you don't, most simple genetic problems can be fixed with simple medical interventions). If that fails, it's because something about current environment messes up with your body's regulatory system so much the result is failure to achieve your biological potential.

Contrary to "calorie" nonsense, all the dieting and exercise is just attempt to make your regulatory system work more like it's evolutionarily designed to.

At some point we'll inevitably figure out some ways to monitor and affect that body's regulatory system directly, skipping this insanity of self-denial and waste of endless hours for very modest result.

For a good example, consider 20th century's biggest health menace - smoking cigarettes. It led to enormous social campaign, punitive taxation, and in some specially evil countries like UK government is literally using death panels against smokers. Then vaping came, and you can get basically all the benefits of smoking cigarettes with basically zero of health risks.

Problem is completely solved. Well, at least it would be if governments and society fully embraced vaping instead of treating it as smoking tier evilness.

For older example, people used to have crazy complicated dietary cleanliness rules to reduce their exposure to pathogens. All forgotten now, except among religious nuts. Food sold in supermarkets is pathogen free, we moved on.

We already have some examples of this direct approach working - stomach surgery has far stronger and immediate results than all kinds of diets and exercise put together with zero effort needed - and there are less invasive methods in development.

There were also a lot of pills which improved fitness and reduced obesity greatly, but they foolishly keep getting themselves banned due to rare side effects, or as part of the evil War on Drugs.

Or alternatively maybe sexbots are going to get so good everyone is going to get many hours of intense exercise every night without any self-denial. But whichever way, it's going to get solved.

MongoDB figured out the one true way to represent data as JSON documents - now if only everything else about it was any good

Relational database are sort of insane. They essentially model data as a collection of Excel spreadsheets. There's some irrelevant mathematical nonsense like relational calculus, but it only has the most remote relationship with RDBMSs.

Would you consider writing a program where the only data type was Excel spreadsheet? What kind of question is that, obviously not, yet a lot of you use relational database, and some ORM to make those Excel spreadsheets look kinda like something more useful, and it's painful.

Sure, they have a lot of nice stuff on top of that Excel spreadsheets - like ability to merge multiple Excel spreadsheets into a new temporary Excel spreadsheets, but that's all they ever do.

We don't need any of that. MongoDB style storing data as collections of JSON documents is close to perfect as it gets. And its performance can be pretty amazing.

It's just not very good for anything else. Lack of good query language, and the silly thing of building JSON query trees is not even remotely acceptable. Take a look at this website which translates very simple SQL into MongoDB queries. They are insane.

If we had MongoDB style data modelling, and good query language on top of it, it would win all database wars.

By the way, programs which literally use Excel as their backend engine are an actual thing.

Farm animals are generally better off than wild animals - enjoy that chicken

Wild animals live on edge of Malthusian equilibrium - with lives just tolerable enough to survive, generally on edge of starvation, death by predation, or by disease. And in times of abundance, they just fight for status in their pack, with a lot more losers than winners. It's not a great life.

None of that applies to domesticated animals. They have safety, abundance of food, freedom from disease, and their lives end as painlessly as possible in their prime, saving them from degenerations of old age.

That's not to say their lives are anywhere near optimized for greatest happiness, but by any dispassionate evaluation the contrast in really one sided.

And it's not like going vegan would somehow reduce suffering - those cows and chickens would simply never exist.

So enjoy the meat.

Popularity of javascript won't last long - compiling real languages to web assembly is near future

Javascript was never meant as a "real" general purpose language. It was created for 10 line hacks to validate some online forms and other such trivial things, and it was perfectly adequate for it. Then jQuery happened, and it turned even more into special purpose language for browser APIs.

Thanks to great success of web browsers as a platform, it somehow managed to tag along and is enjoying temporary time of popularity, being used for things far bigger than it's reasonable to.

But Javascript has no real competitive advantages. All advantages are in browser APIs, and any language which compiles to something browsers can run can use them.

Right now the Web has a mix of:
  • sites with old style trivial Javascript, jQuery, and simple plugins like Facebook buttons - that's close to 99% of the web
  • sites with new Javascript frameworks - they're so rare you can't even see them in popularity statistics except Angular somehow gets over 1% mark
  • very small number of high profile custom written sites like Google Maps and Gmail
There are two orders of magnitude gaps between these categories.

Anyway, the interesting thing is that in framework world, people already abandoned Javascript, and use various Javascript++ languages like CoffeeScript, JSX, TypeScript, whatever Babel does etc. And it's all compiled, with browser never seeing raw code.

This is all intermediate situation, and the only long term equilibrium will be Javascript++ being displaced by actual programming languages like Ruby or Python.

Right now all ways to use them in browser like Opal are in infancy, but when you look at numbers, everything about Javascript frameworks is in its infancy.

Widespread piracy alternative motivated game companies to treat gamers well - less piracy led to anti-gamer behaviour like loot boxes

Video game piracy is much less common than it used to be. There are many factors, both positive and negative - Steam and other online retailers made it far easier to buy games without waiting a week for the box, there are many discount sites and promotions so even people with less money can buy legit games, many games focus on online play and that's harder for pirates to emulate, there's been aggressive DRM effort that mostly worked on consoles, and is even causing some delays on PCs, and popular pirating sites keep getting shut down or get infected by malware.

It's still possible to pirate, but it all adds up to much lower rates (unlike let's say TV shows, where it's as rampant as ever). Whatever the reasons, the result is horrible for gamers.

Back when everyone had alternative of easy piracy, companies were essentially forced to treat gamers well, as any bullshit would just lead to alt-tab to The Pirate Bay. Now that piracy is much more niche, companies can do whatever they want.

Day one DLCs, DLCs which are basically bugfixes, DLCs while the game is still in Early Access, DLCs and cost $500+, all kinds of Pay-to-Win schemes, lootboxes, all that crap is happening not because companies are getting greedier, but because abused gamers are less likely to exercise the pirate option than in the past.

There are no easy ways. Outrage campaigns just slow down these abusive practices. Platforms like Steam could ban some of the worst abuses, and in theory even game rating agencies and governments could intervene, for example treating lootboxes as gambling, and completely ban it for under-18s games. In practice governments are by anti-gamer old people, and they're more likely to cause even more harm.

Keeping piracy option alive is the best way we have if we want to be treated with dignity.

For all its historical significance, apt-get is not really a good package manager

Twenty years ago Linux's main selling point were package managers like apt-get. You didn't need to download software from twenty sites, and chase incompatibilities, you just typed one command and it was all setup properly. It even upgraded everything with one command, rarely breaking anything in process.

It was amazing. It also didn't age well.

Just to cover some difference between modern (mostly OSX) environment:
  • There's no reason for admin access to install most software
  • Programs are self-updating
  • Many programs have some kind of plugin system
  • Pretty much every programming language has its own package system
  • Quite often you need to install multiple versions
apt-get really doesn't deal with any of it.

A while ago I'd have thought it's really funny, but OSX-style package management like Linuxbrew and Nix are now a thing.

On Cloud servers people usually use language-specific package managers, or nowadays even occasionally something like Docker.

Either way, Linux is still not on desktop. I guess lack of usable graphics card drivers in any distro might be among the reason.

Wednesday, November 01, 2017

Architecture of z3 gem

Kitten by www.metaphoricalplatypus.com from flickr (CC-BY)

This post is meant for people who want to dig deep into Z3 gem, or who want to learn from example how to interface with another complex C library. Regular users of Z3 are better off checking out some tutorials I wrote.

Architecture of z3 gem The z3 theorem prover is a C library with quite complex API, and z3 gem needs to take a lot of steps to provide good ruby experience with it.

Z3 C API Overview

The API looks conventional at first - a bunch of black box data types like Z3_context Z3_ast (Abstract Syntax Tree), and a bunch of functions to operate on them. For example to create a node representing equality of two nodes, you call:

Z3_ast Z3_API Z3_mk_eq(Z3_context c, Z3_ast l, Z3_ast r);

A huge problem is that so many of those calls claim to accept Z3_ast, but it needs to be particular kind of Z3_ast, otherwise you get a segfault. It's not even static limitation - l and r can be anything, but they must represent the same type. So any kind of thin wrapper is out of the question.

Very Low Level API

The gem uses ffi to setup Z3::VeryLowLevel with direct C calls. For example the aforementioned function is attached like this:

attach_function :Z3_mk_eq, [:ctx_pointer, :ast_pointer, :ast_pointer], :ast_pointer

There's 618 API calls, so it would be tedious to do it manually, so instead a tiny subproject lives in api and generates most of it with some regular expressions. A list of C API calls is extracted from Z3 documentation into api/definitions.h. They look like this:

def_API('Z3_mk_eq', AST, (_in(CONTEXT), _in(AST), _in(AST)))

Then api/gen_api script translates it into proper ruby code. It might seem like it could be handled by ffi library, but there are too many Z3-specific hacks needed. A small number of function calls can't be handled automatically, so they're written manually.

For example Z3_mk_add function creates a node representing addition of any number of nodes, and has signature of:

attach_function :Z3_mk_add, [:ctx_pointer, :int, :pointer], :ast_pointer

Low Level API

There's one intermediate level between raw C calls and ruby code. Z3::LowLevel is also mostly generated by api/gen_api. Here's an example of automatically generated code:

def mk_eq(ast1, ast2) #=> :ast_pointer
  VeryLowLevel.Z3_mk_eq(_ctx_pointer, ast1._ast, ast2._ast)
end

And this one is written manually, with proper helpers:

def mk_and(asts) #=> :ast_pointer
  VeryLowLevel.Z3_mk_and(_ctx_pointer, asts.size, asts_vector(asts))
end

A few things are happening here:
  • Z3 API requires Z3_context pointer for almost all of its calls - we automatically provide it with singleton _ctx_pointer.
  • We get ruby objects, and extract C pointers from them.
  • We return C pointers FFI::Pointer and leave responsibility for wrapping them into ruby objects to the caller, as we actually don't have enough information here to do so.
Another thing Z3::LowLevel API does is setting up error callback, to convert Z3 errors into Ruby exceptions.

Ruby objects

And finally we get to ruby objects like Z3::AST, which is a wrapper for FFI::Pointer representing Z3_ast. Other Z3 C data types get similar treatment.

module Z3
  class AST
    attr_reader :_ast
    def initialize(_ast)
      raise Z3::Exception, "AST expected, got #{_ast.class}" unless _ast.is_a?(FFI::Pointer)
      @_ast = _ast
    end

    # ...

    private_class_method :new
  end
end

First weird thing is this Python-style pseudo-private ._ast. This really shouldn't ever be accessed by user of the gem, but it needs to be accessed by Z3::LowLevel a lot. Ruby doesn't have any concept of C++ style "friend" classes. I've chosen Python pseudo-private convention as opposed to a lot of .instance_eval or similar.

Another weird thing is that Z3::AST class prevents object creation - only its subclasses representing nodes of specific type can be instantiated.

Sorts

Z3 ASTs represent multiple things, mostly sorts and expressions. Z3 automatically interns ASTs, so two identically-shaped ASTs will be the same underlying objects (like two same Ruby Symbols), saving us memory management hassle here.

Sorts are sort of like types. The gem creates a parallel hierarchy so every underlying sort gets an object of its specific class. For example here's whole Z3::BoolSort, which should only have a single object.

module Z3
  class Sort < AST
    def initialize(_ast)
      super(_ast)
      raise Z3::Exception, "Sorts must have AST kind sort" unless ast_kind == :sort
    end
    # ...

module Z3
  class BoolSort < Sort
    def initialize
      super LowLevel.mk_bool_sort
    end

    def expr_class
      BoolExpr
    end

    def from_const(val)
      if val == true
        BoolExpr.new(LowLevel.mk_true, self)
      elsif val == false
        BoolExpr.new(LowLevel.mk_false, self)
      else
        raise Z3::Exception, "Cannot convert #{val.class} to #{self.class}"
      end
    end

    public_class_method :new
  end
end

ast_kind check is for additional segfault prevention.

BoolSort.new creates Ruby object with instance variable _sort pointing to Z3_ast describing Boolean sort.

It seems a bit overkillish to setup so much structure for BoolSort with just two instance values, but some Sort classes have multiple Sort instances. For example Bit Vectors of width n are:

module Z3
  class BitvecSort < Sort
    def initialize(n)
      super LowLevel.mk_bv_sort(n)
    end

    def expr_class
      BitvecExpr
    end    

Expressions

Expressions are also ASTs, but they all carry reference to Ruby instance of their sort.

module Z3
  class Expr < AST
    attr_reader :sort
    def initialize(_ast, sort)
      super(_ast)
      @sort = sort
      unless [:numeral, :app].include?(ast_kind)
        raise Z3::Exception, "Values must have AST kind numeral or app"
      end
    end

This again might seem like an overkill for expressions representing Bool true, but it's extremely important for BitvecExpr to know if it's 8-bit or 24-bit. Because if they get mixed up, segfault.

Building Expressions

Expressions can be built from constants:

IntSort.new.from_const(42)

Declared as variables:

IntSort.new.var("x")

Or created from one or more of existing expression nodes:

module Z3
  class BitvecExpr < Expr
    def rotate_left(num)
      sort.new(LowLevel.mk_rotate_left(num, self))
    end

As you can see, the low level API doesn't know how to turn those C pointers into Ruby objects.

This interface is a bit tedious for the most common case, so there are wrappers with simple interface, which also allow mixing Z3 expressions with Ruby expressions, with a few limitations:

Z3::Int("a") + 2 == Z3::Int("b")

For some advanced use you actually need the whole interface.

Creating Sorts and Expressions from raw pointers

For ASTs we construct we track their sorts. Unfortunately sometimes Z3 gives us raw pointers and we need to guess their types - most obviously when we actually get a solution to our set of constraints.

Z3's introspection API lets us figure this out, and find out proper Ruby objects to connect to.

It has unfortunate limitation that we can only see underlying Z3 sorts. I'd prefer to have SignedBitvectorExpr and UnsignedBitvectorExpr as separate types with nice APIs, unfortunately there's no way to infer if answer Z3 gave came from Ruby SignedBitvectorExpr or UnsignedBitvectorExpr, so that idea can't work.

Printer

Expressions need to be turned into Strings for human consumption. Z3 comes with own printer, but it's some messy Lisp-like syntax, with a lot of weirdness for edge cases.

The gem instead implements its own printer in traditional math notation. Right now it sometimes overdoes explicit parentheses.

Examples

The gem comes with a set of small and intermediate examples in examples/ directory. They're a good starting point to learn common use cases.

There are obvious things like sudoku solvers, but also regular expression crossword solver.

Testing

Testing uses RSpec and has two parts.

Unit tests require a lot of custom matchers, as most objects in the gem override ==.

Some examples:

let(:a) { Z3.Real("a") }
let(:b) { Z3.Real("b") }
let(:c) { Z3.Real("c") }
it "+" do
  expect([a == 2, b == 4, c == a + b]).to have_solution(c => 6)
end

Integration tests run everything in examples and verify that output is exactly as expected. I like reusing other things as test cases like this.

How to properly setup RSpec

kitten by trash world from flickr (CC-NC-ND)

This post is recommended for everyone from total beginners to people who literally created RSpec.

Starting a new project

When you start a new ruby project, it's common to begin with:

$ git init
$ rspec --init

to create a repository and some sensible TDD structure in it.

Or for rails projects:

$ rails new my-app -T
$ cd my-app

Then edit Gemfile adding rspec-rails to the right group:

group :development, :test do
  gem "rspec-rails"
end

And:

$ bundle install
$ bundle exec rails g rspec:install

I feel all those Rails steps really ought to be folded into a single operation. There's no reason why rails new can't take options for a bunch of popular packages like rspec, and there's no reason why we can't have some kind of bundle add-development-dependency rspec-rails to manage simple Gemfile automatically (like npm already does).

But this post is not about any of that.

What test frameworks are for

So why do we even use test frameworks really, instead of using plain ruby? A minimal test suite is just a collection of test cases - which can be simple methods, or functions, or code blocks, or whatever works.

The most important thing test framework provides is a test runner, which runs each test case, gathers results, and reports them. What could be possible results of a test case?
  • Test case could pass
  • Test case could have test assertion which fails
  • Test case could crash with an error
And here's where everything went wrong. For silly historical reasons test frameworks decided to treat test assertion failure as if it was test crashing with an error. This is just insane.

Here's a tiny toy test, it's quite compact, and reads perfectly fine:

it "Simple names are treated as first/last" do
  user = NameParser.parse("Mike Pence")
  expect(user.first_name).to eq("Mike")
  expect(user.middle_name).to eq(nil)
  expect(user.last_name).to eq("Pence")
end

If assertion failures are treated as failures, and first name assertion fails, then we still have no idea what the code actually returned, and at this point developer will typically run binding.pry or equivalent just to mindlessly copy and paste checks which are already in the spec!

We want the test case to keep going, and then all assertion failures to be reported afterwards!

Common workarounds

There's a long list of workarounds. Some people go as far as recommending "one assertion per test" which is an absolutely awful idea which would result in enormous amounts of boilerplate and hard to read disconnected code. Very few real world projects follow this:

describe "Simple names are treated as first/last" do
  let(:user) { NameParser.parse("Mike Pence") }

  it do
    expect(user.first_name).to eq("Mike")
  end

  it do
    expect(user.middle_name).to eq(nil)
  end

  it do
    expect(user.last_name).to eq("Pence")
  end
end

RSpec has some shortcuts for writing this kind of one assertion tests, but the whole idea is just misguided, and very often it's really difficult to twist test case into a sets of reasonable "one assertion per test" cases, even disregarding code bloat, readability, and performance impact.

Another idea is to collect all tests into one. As vast majority of assertions are simple equality checks, this usually sort of works:

it "Simple names are treated as first/last" do
  user = NameParser.parse("Mike Pence")
  expect([user.first_name, user.middle_name, user.last_name])
    .to eq(["Mike", nil, "Pence])
end

Not exactly amazing code, but at least it's compact.

Actually...

What if test framework was smart enough to keep going after assertion failure? Turns out RSpec can do just that, but you need to explicitly tell it to be sane, by putting this in your spec/spec_helper.rb:

RSpec.configure do |config|
  config.define_derived_metadata do |meta|
    meta[:aggregate_failures] = true
  end
end

And now the code we always wanted to write magically works! If parser fails, we see all failed assertions listed. This really should be on by default.

Limitations

This works with expert and should syntax, and doesn't clash with any commonly used RSpec functionality.

It does not work with config.expect_with :minitest, which is how you can use assert_equal syntax with RSpec test driver. It's not a common thing to do, other than to help migration from minitest to RSpec, and there's no reason why it couldn't be made to work in principle.

What else can it do?

You can write a whole loop like:

it "everything works" do
  collection.each do |example|
    expect(example).to be_valid
  end
end

And if it fails somehow, you'll get a list of failing examples only in test report!

What if I don't like the RSpec syntax?

RSpec syntax is rather controversial, with many fans, but many other people very intensely hating it. It changed multiple times during its existence, including:

user.first_name.should equal("Mike")
user.first_name.should == "Mike"
user.first_name.should eq("Mike")
expect(user.first_name).to eq("Mike")

And in all likelihood it will continue changing. RSpec sort of supports more traditional expectation syntax as a plugin, but it currently doesn't support failure aggregation:

assert_equal "Mike", user.first_name

When I needed to mix them for migration reasons I just defined assert_equal manually, and that was good enough to handle vast majority of tests.

In long term perspective, I'd of course strongly advise every other test frameworks in every language to abandon the historical mistake of treating test assertion failures as errors, and to switch to this kind of failure aggregation.

Considering how much time a typical developer spends dealing with failing tests, even this modest improvement in the process can result in significantly improved productivity.

Saturday, October 21, 2017

Challenges for September 2017 SecTalks London

20171021_064556 by dejesus54 from flickr (CC-NC-ND)

SecTalks's name suggests that it might have started as "security talks" meetup, but its format evolved and it's primarily about doing security-related challenges.

Usually the person who won previously, or another volunteer, prepares a challenge, and then during the meetup everyone tries to solve it. Whoever finishes first then gets to run the next one.

The challenge usually takes multiple steps - once you solve one, you unlock the next layer.

After my most recent victory I decided to tweak the format a bit. Participants vary a lot in level of experience, so with layered challenge it's common for many to just get stuck and not do much for the rest of the meetup.

So instead I setup a CTFd server with 8 challenges which can be done in any order. This way if someone is stuck, they can just try another challenge.

You can get the challenges and the source code I used to generate them from this spoiler-free repository.

This post doesn't contain answers, but it might spoil quick a bit.

Archive (5 points)

It was a small bonus challenge, which I expected to be the easiest of all, but it somehow turned out to be hardest, with even the eventual winner solving it only after finishing everything else.

The archive contained file with the answer and some distraction files. It was packed into another archive together with some distraction archive files. And so on a few more levels.

It could be done manually, as distraction files were always identical, so you just go for the different one each time. Or it could be done with some simple Unix scripting.

I'm not even sure why people had so much trouble with it, maybe they expected something more devious?

Javascript 1 (10 points)

Most of challenges were about reverse engineering password validation script. Finding your way in a messy Javascript is one of core skills nowadays.

I wrote simple script with lines like these:

checksum += Math.pow(password.charCodeAt(0) - 115, 2)

And passed it through this obfuscator.

Javascript 2 (15 points)

Second Javascript reverse engineering challenge had more complicated validator.

Every character went through a few variables where it'd get shifted by a constant. Order was reshuffled and then obfuscator would rename those variables, so it was a small puzzle, but it wasn't too hard to just walk backwards.

var a5 = password.charCodeAt(5)
var b5 = a5 + 80
var c5 = b5 - 194

I passed this script through this more complex obfuscator , with settings of string array encoding: Base64, string array threshold: 1.

It was a lot harder than the first Javascript challenge.

One Letter (20 points)

This challenge is a cryptography classic - a block of text encoded by a monoalphabetic substitution cipher with spaces and punctuation removed.

In principle it's totally doable manually in a few minutes, but people who tried to do it this way. Most solved it with online statistical analysis tools.

Python (25 points)

Finding obfuscated code in non-JS languages is rare, but everybody should have basic familiarity with wide range of popular languages.

Validation script contained tests like these:

    if ord(key[7]) + 97 != 197:
        return False

Then I used online obfuscator, but it's no longer available. I think it was based on this.

There was extra step here, as it only worked in Python 2, and many people obviously tried to run it on the most recent version.

Ruby (30 points)

I couldn't find any ruby obfuscators, so I found some old one for 1.8, and based on it wrote my own.

People had little trouble getting through the obfuscator, but the next step of dealing with validation script itself turned out to be really difficult.

Validator concatenates password 4 times, then modifies it by taking out letters one at a time.

  return unless password.slice!(10, 1).ord == 110

Some people wrote scripts to reverse slice! into insert, others tried to do it one character at a time from the top. Either way it was hardest of all validation scripts.

Perl (35 points)

Validation script contained shuffled lines like these:

return if substr($key, 10, 1) ne "s";

And then I used this obfuscator.

Even though Perl isn't very popular any more, it turned out to be quite easy.

RSA (40 points)

And finally, a simple RSA challenge.

With same message sent to 3 different recipients, and small exponent e=3, it's probably the easiest of all RSA attacks.

Final Thoughts


CTFd server was very easy to setup. It takes just a few minutes in basic (for "classroom" sized userbase) setup.

The format was very successful - on top end it was a tight race - with leader changing many times during the meetup, and in the end with only 13 second gap between first and second finisher.

For everyone else, almost everyone managed to do at least a few challenges, and when people got stuck, they just tried something else instead of getting stuck and giving up.

Friday, June 30, 2017

11 Small Improvements For Ruby - followup

Jo Jo by Garen M. from flickr (CC-NC)

Last week I posted a list of 11 things I'd like to see changed about Ruby.

Time for some followup, as I have workarounds for at least some of the issues.

Pathname#glob - now in a gem

I wrote a gem pathname-glob which provides the missing method so you can code:

Pathname("foo").glob("*.txt")

This doesn't just save you a few characters, it's the only way to get reliable globbing when the path contains special characters like [, ], {, }, ?, *, or \ - which is pretty likely if you're dealing with users' files.

Support for special characters can't be fixed in Dir.glob("foo/*.txt") style API, as it doesn't know where name of the directory ends and globbing pattern starts.

Another workaround would be to use Dir.chdir("foo"){ Dir.glob("*.txt") } - that'd deal with special characters in folder names, but would cause issues with threads.

Of all the issues, this one is probably most useful to get into ruby standard library. It's like jQuery for Pathnames.

Hash#zip - now in a gem

I wrote a gem hash-zip.

You can use it for zipping any number of Hashes together, and it will pad any missing values with nils. Usually the next step is to merge them in some meaningful way (for simple case when one overrides the other Hash#merge already exists).

default_options.zip(user_options).map{|key, (default_value, user_value)| ... }

Technically it overwrites existing #zip method Hash inherits from Enumerable, but Enumerable#zip really doesn't make any sense when applied to Hashes, so it's better this way than introducing a new name.

Missing Hash methods - now in a gem

I wrote a gem hash-polyfill, which contains a bunch of simple methods Hash is bound to get eventually.

These are:
  • Hash#compact
  • Hash#compact!
  • Hash#select_values
  • Hash#select_values!
  • Hash#reject_values
  • Hash#reject_values!
  • Hash#transform_values
  • Hash#transform_values!
As I assume these will be added eventually, gem only adds methods which don't exist yet. Gem skips already defined methods.

In particular the last two already exist in 2.4, but you can get access to them from older Ruby versions with this polyfill.

Naming convention for #compact, #compact! follows Array, and for #select_values etc. follows 2.4's #transform_values and existing Enumerable#select/reject.

Names Ruby will end up using might be different, but these are as good guesses as any.

Enumerable#count_by - easier with 2.4

The example I had last time:

posts.count_by(&:author)

is actually not too bad with 2.4:

posts.group_by(&:author).transform_values(&:size)

For very large collections (like all bytes in some huge file), group_by / transform_values is going to take a lot more memory than counting things directly, but I ran benchmarks and it seems it's usually a lot faster than Ruby-based loop implementation.

If you're not on 2.4, check out hash-polyfill gem.

Tuesday, June 27, 2017

Simple Terrain Mapmode Mod for Hearts of Iron 4

Paradox grand strategy games don't let mods create new map modes without some serious hacks - even something as important as truce map modes for CK2 and EU4 is still missing.

Fortunately at least here a hack worked.

If you just want to skip the explanations and get the mod:

How Hearts of Iron 4 map works

In most strategy games maps are generated dynamically, so there's always simple mapping from map data to what's displayed.

Even a lot of games with static map use similar technique - gameplay map drives what's displayed.

That's not how Hearts of Iron 4 and other Paradox games work. Instead, they contain a bunch of bitmaps driving what's displayed, and almost completely separate gameplay map data.

So it's possible for a tile to look like a forest visually while being a gameplay hill. In some parts of the map like Persia or Tibet majority of tiles are not what they look like.

This separation is supposed to make things look nicer, but I'd say it's not worth it. When players have a choice - like in Europa Universalis 4 - vast majority instantly disable default map mode (terrain) and nevel look back again; instead using political map mode 99% of time, and if they want to see terrain information they use simple terrain map mode which is driven by gameplay data.

Unfortunately Hearts of Iron 4 doesn't let us disable visual terrain view. And to make matters worse it contains no way to see gameplay terrain. Instead we're forced to see clutter which might or might not match gameplay terrain in every map mode.

Since visual and gameplay terrain match poorly, the result is that figuring out which way you can send your tanks is a hellish exercise of mouse-overing every tile to see its real terrain, and then trying to remember it all.

What the mod does

The mod changes visual map to make it match game map more accurately, by running some image editing scripts.

It changes terrain and tree layers. It make everything far more accurate, and while it loses some detail, I'd say map looks cleaner and better this way.

I experimented with changing heightmap as well, but that made the map look rather ugly, and other changes are usually sufficient to make terrain clear. Taiwan is an example of region where terrain and heightmap mismatch.

It doesn't completely wipe out detail on source map, just tweaking it in places where there's significant mismatch.

There are a few mods that try to make terrain look better by replacing textures used for various terrain - but they don't deal with core issue that data they're trying to display doesn't match in game map.

Compatibility

There are two versions:
You can use them with other mods as long as they don't change the map - in particular Road to 56 mod works just fine, since it uses vanilla map.

As it's all a script if there's demand I can run it for other mods. Or you could use it anyway, it might be better than vanilla map anyway.

You can use it with mods that change map graphics.

Before


Good luck guessing where the mountains are. Or hills. Or deserts.

After


And these are the answers.

Issues

Some minor rivers flow through a tile, and it's still some guesswork if you get river penalty or not attacking it. Mod doesn't affect river placement in any way.

A number of tiles in vanilla are declared as "urban", even though game files declare them as something else. The mod believes game files. If anybody has a clue what's going on, I'd like to know. The only tiles affected that I noticed are a few VPs in Iraq region. Kaiserreich doesn't seem to be affected by this problem.

Sunday, June 25, 2017

11 Small Improvements For Ruby

"HALT! HOO GOEZ THAR?" 😸 by stratman² (2 many pix!) from flickr (CC-NC-ND)

For followup post with some solutions, check this.

A while ago I wrote a list of big ideas for Ruby. But there's also a lot of small things it could do.

Kill 0 octal prefix

I'm limiting this list to backwards compatible changes, but this is one exception. It technically breaks backwards compatibility, but in reality it's far more likely to quietly fix bugs than to introduce them.

Here's a quick quiz. What does this code print:

p 0123
p "0123".to_i
p Integer("0123")

Now check in actual ruby and you'll see my point.

If your answer wasn't what you expected, then it proves my point. The whole damn thing is far more likely to be an accident than intentional behaviour - especially in user input.

If you actually want octal - which nobody ever uses other than Unix file permissions, use 0o123.

Add missing Hash methods

Ruby has a bad habit of treating parts of standard library as second class, and nowhere it's more mind boggling than with Hash, which might be the most commonly used object after String.

It only just got transform_values in 2.4, which was probably the most necessary one.

Some other methods which I remember needing a lot are:
  • Hash#compact
  • Hash#compact!
  • Hash#select_values
  • Hash#select_values!
  • Hash#reject_values
  • Hash#reject_values!
You can probably guess what they ought to do.

Hash#zip

Technically Hash#zip will call Enumerable#zip so it returns something, but that something is completely meaningless.

I needed it crazy often. With a = {x: 1, y: 2} and b = {y: 3, z: 4} to run a.zip(b) and get {x: [1, nil], y: [2,3], z: [nil, 4]}, which I can then map or transform_values to merge them in meaningful way.

Current workaround of (a.keys|b.keys).map{|k| [k, [a[k], b[k]]]}.to_h works but good luck understanding this code if you run into it, so most people would probably just loop.

Enumerable#count_by

Here's a simple SQL:

SELECT author, COUNT(*) count FROM posts GROUP BY author;

Now let's try doing this in ruby:

posts.count_by(&:author)

Well, there's nothing like it, so let's try to do it with existing API:

posts.group_by(&:author).map{|author, posts| [author, posts.size]}.to_h

For such a common operation having to do group_by / map / to_h feels real bad - and most people would just loop and += like we're coding in some javascript and not in a civilized language.

I'm not insisting on count_by - there could be a different solution (maybe some kind of posts.map(&:author).to_counts_hash).

URI query parameters access

Ruby is an old language, and it added a bunch of networking related APIs back then internet was young. I don't blame anyone for these APIs not being very good, but by now they really ought to be fixed or replaced.

One mindbogglingly missing feature is access to query parameters in URI objects to extract or modify them. The library treats the whole query as opaque string with no structure, and I guess expects people to use regular expressions and manual URI.encode / URI.decode.

There are gems like Addressable::URI that provide necessary functionality, and URI needs to either adapt or get replaced.

Replace net/http

It's similar story of API added back when internet was young and we didn't know any better. By today's needs the API feels so bad quite a few people literally use `curl ...`, and a lot more use one of hundred replacement gems.

Just pick one of those gems, and make it the new official net/http. I doubt you can do worse than what's there now.

Again, I'm not blaming anyone, but it's time to move on. Python had urllib, urllib2, urllib3, and by now it's probably up to urllib42 or so.

Make bundler chill out about binding.pry

For better or worse bundler became the standard dependencies manager for ruby, and pry its standard debugger.

But if you try to use require "pry"; binding.pry somewhere in your bundle exec enabled app, it will LoadError: cannot load such file -- pry, so you either need to add pry to every single Gemfile, or edit that, bundle install every time you need to debug anything, then undo that afterwards.

I don't really care how that's done - by moving pry to standard library, by some unbundled_require "pry", or special casing pry, the current situation is just too silly.

Actually, just make binding.pry work without any require

I have this ~/.rubyrc.rb:

begin
  require "pry"
rescue LoadError
end

which I load with RUBYOPT=-r/home/taw/.rubyrc.rb shell option.

It's such a nice quality of life improvement to type binding.pry instead of require "pry"; binding.pry, it really ought to be the default, whichever way that's implemented.

Pathname#glob

Pathname suffers from being treated as second class part of the stdlib.

Check out this code for finding all big text files in path = Pathname("some/directory"):

path.glob("*/*.txt").select{|file| file.size > 1000}

Sadly this API is missing.

In this case can use:
glob("#{path}/*/*.txt").map{|subpath| Pathname(subpath)}.select{|file| file.size > 1000}

which not only looks ugly, it would also fail if path contains any funny characters.

system should to_s its argument

If wait_time = 5 and uri = URI.parse("https://en.wikipedia.org/wiki/Fidget_Spinner"), then this code really ought to work:

system "wget", "-w", wait_time, uri

Instead we need to do this:

system "wget", "-w", wait_time.to_s, uri.to_s

There's seriously no need for this silliness.

This is especially annoying with Pathname objects, which naturally are used as command line arguments all the time. Oh and at least for Pathnames it used to work in Ruby 1.8 before they removed Pathname#to_str, so it's not like I'm asking for anything crazy.

Ruby Object Notation

Serializing some data structures to send over to another program or same in a text file is a really useful feature, and it's surprising ruby doesn't have such functionality yet.

So people use crazy things like:
  • Marshal - binary code, no guarantees of compatibility, no security, can't use outside Ruby
  • YAML - there's no compatibility between every library's idea of what counts as "YAML", really horrible idea
  • JSON - probably best solution now, but not human readable, no comments, dumb ban on line final commas, and data loss on conversion
  • JSON5 - fixes some of problems with JSON, but still data loss on conversion
What we really need is Ruby Object Notation. It would basically:
  • have strict standard
  • have implementations in different languages
  • with comments allowed, mandatory trailing commas before newline when generated, and other such sanity features
  • Would use same to_rbon / RBON.parse interface.
  • And have some pretty printer.
  • Support all standard Ruby objects which can be supported safely - so it could include Set.new(...), Time.new(...), URI.parse(...) etc., even though it'd actually treat them as grammar and not eval them directly.
  • Optionally allow apps to explicitly support own classes, and handle missing ones with excepions.
This is unproved concept and it should be gem somewhere, not part of standard library, but I'm surprised it's not done yet.

Celebrating Cloud's 10th Birthday With Cat Pictures

Cloud is now a serious mature lady cat. Let's celebrate it the best possible way - by posting a lot of cat pictures!

Young Cloud

In her youth "smartphones" took potato-quality pictures which took a minute per picture to transfer over custom USB cables, so unfortunately I don't have many photos from that time.

The few I've got shows she already had affinity towards computers, using laptop power supply as a chair:

Cloud And Computers

Cloud is indoors cat surrounded by computers, so she took a lot of interest in them.

What a nice keyboard pillow:

I wonder what keyboard pillow tastes like?


Fixing Cabling:


3D Printing:


Using laptops as chairs:


Hooman got me new laptop chair? Did it come in a box too?


This laptop is weird, but if I fits, I sits:



Doing Hooman Things

Cloud was curious about things hoomans do, so she sometimes trying to act like one, but eventually figured out it's much better to just be a cat.

Drinking coffee:

Standing on back paws:


Packing for travel:


Exploring Her World


As indoors cat she doesn't have far to explore but she's still doing her best.

Going up:


What is my hooman doing down there?


Visiting vet:



Keeping her fur snow white:


Cloud and Chair

Cloud loves her hooman's chair. Especially when it's warm after use. There's just one problem - the hooman needs that chair too.

Using chair as bed:


Using chair for photoshoots:


OK hooman, you can keep the big one, I like the cat-sized sidechair better anyway:



Sleepy Cloud

As a proper cat, Cloud loves sleeping on everything.

Sleeping next to a computer:


Sleeping in boxes:



Sleeping on her cat tree:


Sleeping on what her hooman is trying to read:


Or write:



Playful Cloud

Unfortunately it's hard to get a good pic of Cloud in action, as she's not a very active cat, and I don't have a Go Pro.

Here's a photo of her catching a bird on a stick:


Just being cute


Like any cat she loves just being cute:




Happy Birthday Cloud!

Tuesday, March 28, 2017

CK2 to EU4 converter - how it works and how to mod it

Pussy by tripleigrek from flickr (CC-SA)

There's very little information about this wonderful thing, so after doing some research I decided to write it all up.

What is a megacampaign

Paradox Grand Strategy games cover very long time, and most people never even "finish" a single game of CK2, or EU4 - but it's possible to go the other extreme and play a "megacampaign" - start in one game, then convert to another, and keep playing. Possibly even chain multiple such games, like CK2 to EU4 to Vic2 to HoI4.

Since Paradox games make it hard to avoid blobbing, you pretty much have to play with mods, severe self-imposed restrictions, switch countries every now and then, or otherwise do silly things or your second/third/fourth campaigns will be pretty boring.

Please don't ironman megacampaigns

I'd generally dislike ironman, but it's even more important not to play ironman for a few reasons:
  • Conversion process is never perfect, so you're very likely to want to tweak some things with console commands or save game editing before and after export.
  • It's bad enough when game bugs out and you need to fix it by console, but when it happens to your campaign you've been playing for literally months, it can be really devastating
  • If any of the games receive updates, you might need to do some fixing. CK2 is notorious for breaking saves any time trait list expands because they couldn't be bothered to include 10kB trait id to trait name mapping table in each save game.
  • Ironman mode uses special save format which is pretty damn hard to "un-ironman" and edit if necessary.
  • Megacampaign is not a meaningful achievement run - with so much time it's inherently going to be pretty easy past first 100 years, other than for self-imposed challenges with which ironman will be of no help.
Or if you really want to, go ahead, just don't say I didn't warned you.

What is the converter

There's been many converter tools which take save game from one game and create a mod for another.

In theory you could convert save game to save game, but converting to a mod is much more flexible, so that's the generally used method. Mod setups starting map, rulers, and whatever else it felt like including.

CK2 to EU4 converter is a builtin feature of Crusader Kings 2. You can only access it if you have converter DLC.

If you want to continue to Vic2 and HoI4, you'll need to use third party tools.

Where are the files

If you want to mod it, or just look at it, you might be surprised. Unlike pretty much other DLC-locked content which is included in base game, just with some flags telling it to turn it off, all converter files are in DLC zip.

Go to your Crusader Kings 2 game folder, find dlc/dlc030.zip, and unpack it somewhere.

If you want to mod it, you can include modified files directly in your mod, in eu4_converter folder, just as if they were in base game.

What converter does

Converter creates EU4 map matching CK2 map, with proper cultures, religions, dynasties, and government types, adding new countries as necessary.

It creates matching rulers with sensibly mapped stats, traits, and heirs.

It reassigns province development and centers of trade.

In limited way it also sets up vassals.

It will convert any game to 1444.11.11 game, no matter at which point you decided to use it.

It will convert your stash of CK2 gold to EU4 gold at 10:1 rate, and your prestige at 50:1 rate.

What converter doesn't do

It ignores any diplomatic relations or wars.

It ignores any family relations, except for dynasty name. So if you're king of England and your son is ruling France, that won't be modeled in any way except giving you same dynasty.

All provinces produce same goods as before (including gold), are in same trade nodes, have same estuaries, and other modifiers - with exception of centers of trade.

It doesn't extrapolate outside map - so even if you converted everyone to Zoroastrian, you'll still need Sunnis in Africa and Indonesia on conversion in any place not included on CK2 map.

All ongoing rebellions are cancelled.

Does converter randomize

The process seems to be completely deterministic.

If you export multiple times without reloading, you'll get same files.

If you do it after reloading, you'll get trivial changes, like order if tags (so things like which duke gets X01 and which gets X02 can flip), but there doesn't seem to be any meaningful randomness.

How active DLCs affect converter

If you have Sunset Invasion DLC active, new world map will have very powerful Aztec and Inca empires, will be much better settled, and will be mostly in High American technology group. Otherwise, it will look pretty much the same as vanilla. This is true even if Aztec invasion never happened.

If you have Sons of Abraham DLC active, heresies will convert to individual religions, otherwise they'll get folded back to their base religions.

If you have Conclave DLC, it uses different system to convert government types.

Can I use converter with modded CK2 games

Yes, as long as they don't modify the map significantly.

One notable exception is my Suez Canal mod, which is totally fine to use with converter as it doesn't add or remove provinces. You won't have canal prebuilt in EU4.

What affects ruler attributes and traits

Ruler/heir martial, stewardship, and diplomacy attributes divided by 3 and rounded up are what ends up as their EU4 attributes - capped at 6 of course.

There's some correction for underage rulers, since in CK2 children grow attributes with age, while in EU4 toddlers are born 6/6/6 or 0/0/0 somehow.

This means any attribute at 16 or higher is converted to a 6 - so it might be worth switching focus before conversion, just to get that extra point.

If you're immortal, you'll get immortal trait in EU4. Other traits are based on trait_conversion.txt. For some reason converter will only generate 2 traits, even if ruler reigned long enough to get 3.

Conversion is based mostly on your traits, and secondarily on your attributes. Some good CK2 traits like greedy convert to some bad EU4 traits like greedy. It's mostly nothing to worry about unless you have an immortal ruler.

How is map created

CK2 titles are described by rank and name (like c_jylland is County of Jylland), while EU4 are described by number (like 15 Jylland), and their names are just descriptive and occasionally not unique (like 157 Bihar and 558 Bihar).

It is based on many to many mapping in province_table.csv. Every CK2 title (not just a county!) can map to some number of EU4 provinces.

In simplest case a single title maps to a single title, for example in Iceland CK2 has two counties. So CK2's Vestisland maps to EU4's 370 Reykjavik, while CK2's Austisland maps to EU4's 371 Akureyri.

A title can map to multiple counties - like county of Holstein maps to two provinces - 1775 Holstein and 4141 Ditmarschen. So whoever controls it gets both.

Sometimes multiple CK2 titles map to single EU4 province - like counties of Lyon and Forez both map to 203 Lyonnais. In such case country holding majority gets it, and if someone else held part of it, they'll get a permanent claim on it.

Quite often 2 counties and a duchy map to a province, for example counties of Coruna and Santiago and duchy of Galicia all map to 206 Galicia. If duke holds one of them, he gets 2/3 and gets it.

I haven't investigated what happens in 50%:50% splits, presumably it's based on their order somehow, since conversion is not random.

What affects province development traits

Development seems to be redistributed between provinces based on total value of building in each province.

This means that provinces with more slots will generally get more development, because they'll generally have more buildings in them.

Because development is redistributed and not mapped, games converted early or late will have same overall EU4 development.

Things like technology levels, saved technology points, whose capital is it etc. have no effect on converted development.

Development mapping is separate from province mapping - so one very well developed Jylland could give you 4 30-development EU4 provinces, while 2 very well developed counties of Coruna and Santiago would only give you 1 30-development EU4 province.

What affects centers of trade

Centers of trade are also redistributed, generating approximately but not exactly same number, and generally to high development provinces, but I couldn't figure out exact mapping logic.

It doesn't seem to strongly relate to merchant republics, sometimes two CoTs spawn next to each other. I couldn't figure out how it works.

What affects government type

If you have Conclave, that otherwise purely cosmetic feature describing your government as "Hereditary Despotic Monarchy" is actually used to drive mapping, with government_table.csv determining it.

A few countries - like Papacy, holy orders, and other religious heads - are hardcoded to specific types, also listed in government_table.csv.

Which government flavor you get is determined by common/government_flavor/00_government_flavor.txt, based on your laws, religions etc.

One silly thing about current table is that merchant republics will be converted to merchant, oligarchic, or administrative republics based on their laws, which feels really silly and I'd recommend fixing that in a mod.

What affects cultures

Cultures of characters and provinces are mapped by culture_table.csv.

Every mapping is currently unique, so you can get Horse culture in EU4.

What affects religions

If you have Sons of Abraham, converter uses heresy_table_soa.csv, where every religion maps one to one.

If you don't, it uses religion_table.csv, where heresies are folded back into base religions, including some dubious mappings of unreformed pagans to various EU4 vanilla pagan types.

One very notable problem is that all Buddhists will map to EU4 Theravada, completely ignoring your character's traits. Off-map Buddhists in EU4 will have same groups as vanilla.

Another problem for CK2 Buddhists is that your vassals will rarely convert Hindu/Jain provinces as they're in same culture group, but in EU4 they're suddenly in a separate group ever since patch EU4 1.6 - which to be honest still feels like a highly questionable choice.

What affects vassals

Converter doesn't generate PUs, marches, protectorates, or tributaries, but it sometimes generates vassals. There's special logic for HRE.

For non-HRE countries, if you have Conclave, government_table.csv specifies maximum number of vassals generated - between 0 to 2 depending on your laws.

Without Conclave, rules are specified by defines.lua and maximum number of vassals is 4, but it will only happen at zero crown authority.

Generated vassals will often start at very high liberty desire.

For some reasons vassals won't have heirs generated.

How Holy Roman Empire works

One title, chosen by defines.lua (by default e_hre, but you can change it) uses special Holy Roman Empire converter mechanics.

It will turn all your CK2 vassals into independent countries unless you have absolute crown authority (non-Conclave), or its Conclave equivalent.

This also includes your de jure vassals - so even if you're king of Lombardy and Holy Roman Emperor, you'll be left with just your demesne and duke tier title.

Meanwhile even viceroys will keep all their vassals' lands, de jure or not, which feels rather silly overall. I'd recommend destroying all king-level viceroyalties by console if you want to convert HRE.

All lands in your empire, regardless of de jure status, become part of empire, even if they're in Africa or Asia.

You can be HRE as any religion, like Zoroastrian or Jewish. It will generally result in HRE starting at religious peace.

Emperor will have a permanent claim on any province contested between HRE members due to map conversion rules, which I find rather silly, and I'd recommend removing them.

EU4 interface for HRE has very small area for HRE princes, so it's very likely conversion game will just overflow it. Then again, it's not uncommon for the interface to overflow during EU4 campaign if you create a bunch of new princes. It's another example of Paradox games seriously needing bigger interface mods for people with first world monitors.

What affects claims

You'll get permanent claims on any land that's your de jure that you don't own - including land generated by your vassals (but not HRE members).

You'll also get a permanent claim on any land that should be partly yours by map conversion, but which another country got.

There are no regular claims.

How are tags converted

Every CK2 country - independent, generated vassal, or generated HRE member - gets EU4 tag.

It's based on their primary title and nation_table.csv. So for example kingdom of Bulgaria (k_bulgaria) gets tag BUL (Bulgaria).

Converter also has a small number of unique tags like ISR Israel, JOM Jomsvikings etc.

If tag is not listed, a new dynamic tag will be generated like X01, X02 etc., with appropriate names.

Countries with existing tags have same national ideas they'd have in vanilla EU4. For dynamically generated countries, it's based on their cultures etc., just as for existing EU4 minors.

However - if you use cultural names (like Norge, Danmark etc.) these will always convert to dynamic tags. This doesn't affect their idea groups.

What happens if there are duplicate tags

Sometimes two countries map to same tag, like duchy of Perm and kingdom of Perm. One of them will get proper tag PRM, the other will get dynamic tag with same name. In my tests, duchy got the real tag, and it's not clear why.

It's probably good idea to rename them manually in such case.

If you rename a country in CK2, it keeps that new name on conversion.

What affects government rank

It's directly mapped from CK2.

There are rules within EU4 that can change your rank - for example if you're vassal, or non-elector HRE member you'll get reduced to duchy on next monthly tick.

What affects technology groups

Technology groups are based on location of your capital. You'll get Western, Eastern, Muslim, Indian, West African, and East African tech groups based on which group your capital would get it in EU4.

One exception is that nomads always get nomad technology group.

I haven't seen any Anatolian (Ottoman) technology group nations.

Things like your religion, your technology, your primary tag etc. don't seem to have any impact on tech group you'll get - if you're Sunni France with maxed out tech but your capital is in Constantinople you'll get Eastern group.

Rest of the world gets their predefined groups. If you play with Sunset Invasion New World will be mostly High American technology group, which is strongest group in game - like Western except with better units.

All this doesn't matter much since institution system got added.

How national ideas work in EU4

In normal EU4 game each national ideas have associated trigger, so to get English ideas you need to start as ENG England or GBR Great Britain; to get Rajput ideas you need to be Rajput or Malvi culture and non-Muslim etc.

If no ideas match, you get generic National Ideas, which tend to be awful. By now very few countries in EU4 get National Ideas.

What affects national ideas in converted games

Converter adds some new national ideas with new triggers, for example Karlings, Israel, or Jomsvikings get some new ideas.

Converter also disables a lot of idea groups from EU4, and changes trigger rules for others.

For example to get English ideas you no longer need specific tag, you need to be English or Anglo-Saxon culture kingdom or empire with at least 10 provinces (after conversion) and at least 5 coastal provinces. Simply being kingdom of England or ENG tag does not suffice.

You can check these rules in common/ideas/*.txt in generated mod, which comes from copy/common/ideas/*.txt and sunset_invasion/common/ideas/*.txt.

What about nations matching no rules

Because these rules tend to be very restrictive, most countries get no ideas - they'll show as having generic "National Ideas" when you start new EU4 game.

That's not however what happens - they all get some random set of custom ideas, similar to what you could do with EU4 nation designer. CK2 converter changes weights for these custom ideas to better work with generated countries.

Whenever you start new campaign, EU4 will randomly select new set of ideas, so there's no way to predict them.

You can check their weights in common/custom_ideas/*.txt in generated mod, which comes from copy/common/custom_ideas/*.txt.

Are any buildings generated

The only buildings generated are basic forts, and I can't see any pattern to them.

Known Bugs

As of CK2 2.8 Jade Dragon, converter got updated to support EU4 1.23 Cradle of Civilization. There are still some unsolved problems.

With Sunset Invasion, some new world provinces have 0 base production. These are:
  • 367 The Azores - uncolonized
  • 368 Madeira - uncolonized
  • 852 Mexico - Aztec gold
  • 2626 Tullucan - Aztec gold
  • 2628 Tepeacac - Aztec grain
As two of five bugged provinces are Aztec goldmines, this effectively makes them far weaker than they were supposed to be.

Occasionally province's controller won't have a core on it for no clear reason.

There's a bunch of province visibility glitches depending on tech group, like seeing Iceland but not waters around it. These generally don't cause any serious gameplay problems.

Playing with any combinations of versions other than supported ones (and hotfixes) will invariably cause additional issues. Usually there are minor, like any newly added provinces having wrong setup  - either uncolonized, or controlled by whoever controlled them in 1444 vanilla EU4 start.

Using converter with map mods

If you're willing to mod province_table.csv, you can have map mods on both ends, but that's significant project, especially without proper tools.

Then again, the kind of people who play megacampaigns are probably excactly the kind of people who would put that effort.

If mod you're using changes map only somewhat, you can always try converting and then clean up a bit manually.

Advanced conversion

You can get a lot from the converter:
  • during CK2 campaign, play with converter in mind - so if you want some cultures, religions, HRE, etc. just setup things accordingly
  • when campaign ends, just grab console and do any cleanup you want, removing bordergore, setting laws appropriately (to spawn or not spawn vassals) etc.
  • run the converter
  • cleanup converter files - doing things like removing silly claims, moving CoTs to more sensible places, giving everybody traits you feel they deserve, fixing Buddhists to match type you want etc.
  • start new game
  • possibly cleanup map some more before starting to play
It's up to you how much of it you really want.

Writing your own converter

Converter doesn't do anything magical - the most complex part is province_table.csv. So based on just that file and save game it wouldn't be that hard to just create your own converter, customized whichever way you need.

Are there any third party converters?

The only one I'm aware of doesn't work with current version of the game, so no.

It's not terribly hard to write one (especially if you reuse province table), but then the real work of keeping it updated begins.

Summary

And that's all I discovered about the converter. If you have any corrections, questions, or feedback, please comment and I'll update this post.