We were grabbing a bite of lunch at a small cafe, in a mall, right across from a booth that sold jewelry and where ears could be pierced for a fee. A mother approaches with a little girl of six or seven years old. The little girl is clearly stating that she doesn’t want her ears pierced, that she’s afraid of how much it will hurt, that she doesn’t like earrings much in the first place. Her protests, her clear ‘no’ is simply not heard. The mother and two other women, who work the booth, begin chatting and trying to engage the little girl in picking out a pair of earrings. She has to wear a particular kind when the piercing is first done but she could pick out a fun pair for later.

"I don’t want my ears pierced."

"I don’t want any earrings."

The three adults glance at each other conspiratorially and now the pressure really begins. She will look so nice, all the other girls she knows wear earrings, the pain isn’t bad.

She, the child, sees what’s coming and starts crying. As the adults up the volume so does she, she’s crying and emitting a low wail at the same time. “I DON’T WANT MY EARS PIERCED.”

Her mother leans down and speaks to her, quietly but strongly, the only words we could hear were ‘… embarrassing me.’

We heard, then, two small screams, when the ears were pierced.

Little children learn early and often that ‘no doesn’t mean no.’

Little children learn early that no one will stand with them, even the two old men looking horrified at the events from the cafeteria.

Little girls learn early and often that their will is not their own.

No means no, yeah, right.

Most often, for kids and others without power, ”no means force.”

from "No Means Force" at Dave Hingsburger’s blog.

This is important. It doesn’t just apply to little girls and other children, though it often begins there.

For the marginalized, our “no’s” are discounted as frivolous protests, rebelliousness, or anger issues, or we don’t know what we’re talking about, or we don’t understand what’s happening.

When “no means force” we become afraid to say no.

(via k-pagination)

(via villiljos)

Oops. The internet ran out of routes

Yesterday, internet service providers around the world and the services that used their networks started going offline and experiencing abnormal levels of packet loss and latency. The issue was widespread and affected a great many services. What happened?

The internet ran out of routes.

Update: 

Here is a more thorough explanation of the issues of a couple of days ago: http://www.bgpmon.net/what-caused-todays-internet-hiccup/

The IPv4 public address space (the available publicly-routable IP addresses) has become more and more fragmented as companies have broken down the address blocks they own more and more and sold small chunks of their IPs on. Each distinct IP block that is connected to the internet needs to be routed to and from by routers all over the world and so requires an entry in the global BGP routing table. This is data stored by routers that ISPs operate.

Most routers, especially older pieces of hardware, have a limit of 512,000 odd routes that they can hold in their global routing table. This limit was mostly defined by the available memory the router had and when IPv4 was conceived, 512,000 routes seemed like a ludicrously high number.

As a new address block is carved off in the IPv4 address space, another route is advertised for that netblock. Yesterday, we exceeded the hard limit of 512k routes that most routers could hold.

Date Prefixes   CIDR Aggregated
06-08-14   511103   280424
07-08-14   511297   280432
08-08-14   511736   280442
09-08-14   511719   280722
10-08-14   511762   280563
11-08-14   511719   280860
12-08-14   511648   280869
13-08-14   512521   280918

Modern routers didn’t all have this limitation – in fact Cisco and other people posted an advisory about the impending issues that the growing number of IPv4 routes were going to cause – but routers have typically been deploy-and-forget devices that are set up and then run with minimal interaction. Older devices were mostly configured with a default of 512k. When the number of advertised routes exceeded 512k, in the words of redditor DiscoDave86,

“Upon further investigation it appears that the IPV4 public address space exceeded 512k routes, and some older routers have that as their limit due to memory constraints, consequently a whole load of routers became expensive doorstops”

The fix is simple enough: reconfigure the routing table limits on your router, but it requires a reboot of the device and rebooting a core router is not a task undertaken lightly.

Many ISPs have been scrambling to reconfigure their hardware to make sure they aren’t stung by this again, but the effects have been far-reaching as a result.

Impact

Update: I’ve drawn some quick & dirty nodegraphs to illustrate what happens when routers reboot.

In this (very simplistic) illustration of the Internet, Node 1 is trying to connect to Node 7. The bold path is the path its network traffic takes across the ‘net.

Everything is normal – traffic is routed according to the hop distance (fewest nodes to target). This isn’t always how it works in reality, but for the purposes of this example, it’ll do.

 Node 4′s administrator notices the problem, applies the fix and reboots the router, causing all routes that are using Node 4 to fail and have to be re-calculated.

 While Node 4 is rebooting, Node 8, which is operated by someone else, also starts to reboot to apply the changes to the maximum size of the routing table. N1 > N8 > N7 is no longer valid, so route is recalculated to N1 > N2 > N6 > N7
Nodes 4 and 8 are offline pending a reboot, so the path from N1 to N7 is routed through N2 and N6. Any addresses behind N4 and N8 are offline and become un-routable. It’s as though they no longer exist. 

N4 is back online but now has to re-create its routing table and only adds N1 and N7, so it can no longer route to N3 and N5

N8 is back online and starts to recreate its routing table, adding N1 and N4 as its available nodes.

After the nodes reboot, this is the final state of the network. As you can see, N4 and N8 have not got their original routes back, necessarily.

This is a very simplistic representation of what happens when you reboot a core router attached to the Internet, such as those that the likes of L3, AboveNet, TiNET, NTT operate. I haven’t included link costs in this diagram, either.

Last night, when many ISPs were doing this, entire blocks of addresses simply became un-routable. You didn’t get timeouts or dropped packets or lag. They just didn’t exist, for all the Internet was concerned.

Oops. The internet ran out of routes was originally published on antispin

Import lumberjack events manually with stdin

A typical install of logstash-forwarder (lumberjack) is configured to watch a set of files in specific locations and often playing with that file is impossible. However, you might need to load a file into it that it doesn’t typically monitor.

In another situation, you may need to load historic logfiles into LSF. This can be problematic as LSF keeps track of its position in a given file and will often recognise the file as one it has already processed and won’t  reimport events it considers as “old”.

So here is a quick way of getting events in without interrupting your log shipping.

  1. Create a new config file somewhere where the user you run LSF as can read it e.g. /etc/logstash-forwarder/temp.conf
  2. Add a bare-bones config with your remote server and a single stdin input:
    {
      "network": {
        "servers": [ "10.0.0.10:5043" ],
        "ssl certificate": "/etc/logstashforwarder/ssl/logstashforwarder.crt",
        "ssl ca": "/etc/logstashforwarder/ssl/ca.crt",
        "ssl key": "/etc/logstashforwarder/ssl/logstashforwarder.key",
        "timeout": 15
      },
      "files": [
        {
          "paths": [ "-" ],
          "fields": { "type": "nginx" }
        }
      ]
    }
  3. cat your logfile into a new instance of LSF with the config above like so:cat /var/log/nginx/temp/server.access | /opt/logstash-forwarder/bin/logstash-forwarder -config /etc/logstash-forwarder/temp.conf -spool-size 100 -log-to-syslog
  4. You can watch syslog to see if your events are being shipped by tailing syslog.
  5. ???
  6. Profit.

You can shut down the temp instance once the flood of events dies down.

Cheers!

Import lumberjack events manually with stdin was originally published on antispin

Casual Love

carsieblanton:

Friends, put on your flak jackets. It’s time to drop some honesty on yet another uncomfortable topic: love.
We use the word “love” to mean a lot of things. Throughout this post I’ll be referring to the romantic kind of love, the kind that usually involves sexual attraction, AKA “falling in…

Tips for setting up your home NAS

After a conversation with a friend about the best way to go about setting up a NAS for your home, I thought I’d relate some of the things I’ve learned over the few years that I’ve been working with ZFS at home and at work.

Specs

If you’re going to be building a homebrew NAS box to use as your media/file storage, streaming server and porn dump, you’ll need to think about what sort of kit you need. I’m going to be assuming an OpenIndiana-based setup similar to what I currently run.

Easymode

Buy a HP ProLiant Microserver Gen8. The Gen7 are also decent bits of kit, but these units support more memory, have a faster CPU and are better-built and generally More Better.

DIY

You need a couple things:

  • A decent CPU. It doesn’t need to be blazing-fast since it’ll be doing mostly a lot of nothing 95% of the time. A moderately cheap i3 or i5 will probably do the trick.
  • RAM. Lots of RAM. As much as you can afford/the motherboard will support/you think is necessary. 8GB minimum. The reason for this is that ZFS loves RAM. The more memory it has, the more data it can load into RAM, meaning faster IO for you.
  • Disks. You want to get disks with as good IO performance as you can, since having disks that have shitty IOPS will make your NAS crawl.
  • SATA/SAS controller. Don’t buy a RAID card. You won’t need it. ZFS paired with a hardware RAID card can result in epic data corruption as the RAID card actually interferes with ZFS’ own data integrity magic and can lead to hilariously broken arrays. If you have a RAID card, set it to JBOD mode or the equivalent. ZFS needs direct access to the disks without a RAID controller being in the way.

Software

I personally run OpenIndiana on my MicroServer and it’s been rock-solid for years now. I’d recommend it. You can get it here. Don’t waste a whole disk for installing it on, however, especially if you have a limited number of disks you can use.

The beauty of ZFS-based systems is that the OS is actually trivial. You care about your data, primarily. If the OS dies, you can reinstall, import your pool and you’re back up. With that in mind, I would install the OS on a USB stick or SDCard on your motherboard, if it supports it. This way you have the maximum number of available disks to use for storage.

Disk Config

Here, you have two options:

Use a set of mirrors.

                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
virt                       1.89T  3.54T      0  1.60K  15.3K  51.3M
  mirror                    324G   604G      0    267      0  8.27M
    c4t5000C50041FDDFEBd0      -      -      0    147      0  8.27M
    c4t5000C50041FDE0DFd0      -      -      0    146      0  8.27M
  mirror                    323G   605G      0    168      0  7.78M
    c4t5000C50041FDEA1Fd0      -      -      0     88      0  7.78M
    c4t5000C50041FDEB2Fd0      -      -      0     87      0  7.78M
  mirror                    323G   605G      0    217      0  9.10M
    c4t5000C500426D38D3d0      -      -      0    118      0  9.10M
    c4t5000C500426D3417d0      -      -      0    118      0  9.10M
  mirror                    323G   605G      0    206  15.3K  7.60M
    c4t5000C50057F93F57d0      -      -      0     80      0  7.60M
    c4t5000C50057FA2D6Fd0      -      -      0     83  15.3K  7.60M
  mirror                    323G   605G      0    295      0  7.69M
    c4t5000C50057FA2DC3d0      -      -      0    127      0  7.69M
    c4t5000C50057FA28C3d0      -      -      0    133      0  7.69M
  mirror                    323G   605G      0    226      0  7.11M
    c4t5000C50057FA273Fd0      -      -      0    142      0  7.11M
    c4t5000C50057FA2847d0      -      -      0    143      0  7.11M

This sort of setup gives you expandability further down the line, so if you decide you only need 4TB of storage, you can create a stripe of two mirrors of 2TB disks initially. If you decide that you need to add an extra 2TB of storage, you simply add a third mirror of 2TB disks, etc, etc. This gives you great flexibility and decent read/write performance to boot.

Use a RAIDZn

If you aren’t worried about your IO performance being blazing fast, you can instead opt for a RAIDZ1 or RAIDZ2which is the equivalent of a RAID5/RAID6. If you have a limited number of usable disks – for price or other reasons, this is often a good alternative. It will still provide you with some redundancy – you can lose a disk and keep going, providing you replace the failed drive sooner rather than later.

The drawback is that you can’t expand a RAIDZ pool as easily as ZFS doesn’t support adding disks to a RAIDZ. You can increase the size of the array by sequentially swapping disks out for larger ones – e.g. swapping your 2TB disks for 4TB disks – and letting the pool resilver (rebuild) on to the new drives. Once all disks are replaced, the pool will expand to the full capacity of your drives.

Hope this helps answer some questions about how best to set up your home NAS

Tips for setting up your home NAS was originally published on antispin

ELK Stack Retrospective

For the past six months or so, I’ve been running an ELK stack setup in our hosting infrastructure at work to monitor, among other things:

  • HTTP requests coming in
  • Nginx response times
  • System loads
  • Sendmail and Postfix activity
  • Disk IO and related metrics

To do this, I’ve had to evolve the infrastructure somewhat. Here’s a brief overview of what happened.

v1

  • Logstash was installed on a single box using its built-in Elasticsearch server to store data.
  • Each host I wanted to monitor had a local instance of Logstash installed on it and configured manually to send events to the indexer.
  • The indexer processed the events and stored them in ES.

v2

I quickly realised that this wouldn’t work in the long-term as event rates increased and a single box couldn’t handle the load of indexing and storing the logs in ES. So, I

  • Commissioned a new, dedicated ES instance to store the data and added it to the cluster that Logstash was running on its own.
  • Once the shards had replicated, I shut down the LS internal Elasticsearch instance and reconfigured LS to write to the dedicated ES cluster (of one device)
  • In addition, I replaced the Logstash instances on the client nodes with a lighter-weight Lumberjack/Logstash-forwarder instance. LSF is built in Go and will run on any platform with minimal requirements, unlike Logstash proper which needs a JVM and is much more memory-intensive.

v3

The setup above was good, but it lacked two things that would be needed in a production environment:

  1. Resilience/availability
  2. Scalability

The main problem was that while the lumberjack instances would queue up events to a certain point, restarting the main Logstash indexer process would cause lost events and load would jump through the roof as soon as the LS process restarted.

Logstash is capable of using Redis or another message queue to handle variable event rates, so I set up a “logstash relay” box which ran a simple, lightweight instance of LS with minimal configuration (no event processing, just forwarding) which dumped everything into Redis and LS1 (the indexer) would then connect to the Redis instance and grab events off the queue to process them.

This system worked pretty well for a time until several lumberjack/LSF agents went crazy and started dumping millions of events into the queue from old log files that they decided to parse. This was a Bad Thing because Redis writes its queue to disk by default and when it runs out of disk space, there is no easy way (that I knew of at the time) to throttle incoming events. This ends up crashing the Redis instance and stops the relay dead.

So, some further re-architecture was done.

v4

The current incarnation of the ELK stack looks like this

The main change was that I replaced Redis with RabbitMQ. RMQ is much “heavier” than Redis, but it is also more manageable, configurable and, crucially, it will look after itself and ensure that if it runs out of memory and/or disk space, upstream clients will be blocked or throttled until such a time that disk space falls below the warning threshold and events can be written to the queue once more.

This means that the LS relay instance can be made to not accept events from client nodes because it’s aware that it’s being throttled by RMQ. This wasn’t the case with Redis, as LS was attempting to blindly dump events into Redis and failing.

Performance Notes

Here are some recommendations for building out a resilient ELK stack

Elasticsearch
  • Memory. ES loves memory. The more RAM you give you ES instances, the better they’ll perform.
  • Set your Java heapsize to 50% of available RAM for ES – so if you have 24GB RAM, set the heap size to 12GB.
  • Storage: the faster your disks, the better ES does, especially as event rates climb and it’s having to index several thousand records a second.
Logstash
  • CPU. You need lots of CPU and as little contention as possible. Depending on just how much processing you’re doing using Grok and other filters, the LS workers can become saturated and if your CPU is being stressed already, LS is likely to lock up and fail very un-gracefully.
  • You will need to balance fine-grained analysis of your events with CPU use.
  • Playing with the number of workers that LS starts can give you a boost in event rate at the cost of using more CPU. To set this, edit /etc/default/logstash (in Debian) and set the following:
    LS_OPTS="-w 6"

    Change 6 to whatever number of workers you want. You can experiment with this value to find the sweet spot between event rate and CPU use.

  • You can also use this command to get a good overview of what LS is up to.
    alias lstop='top -Hp `cat /var/run/logstash.pid`'

    Once you run that, you’ll be able to issue ‘lstop’ from your comandline and get a quick view of all the LS processes. If you begin to see many |worker processes constantly in bold and with high CPU use, you’re looking at the CPU being saturated and may need to back off the filtering or the number of workers or both

Hopefully this helps those of you who are playing with LS and are looking to improve your infrastructure. Let me know in the comments if you have any ideas of your own.

ELK Stack Retrospective was originally published on antispin

Multiple Steam Library Folders

Steam, Valve’s content delivery network, games marketplace and cloud savefile storage system, among other things has supported having multiple install locations for your games library.

As SSDs become more and more prevalent and cheaper, many people are looking to install their stuff on the much faster SSD.

To create a new installation location for future installs – this article doesn’t cover moving your existing Steam library – simply open your Steam settings, hit the “Downloads” tab and click “Steam Library Folders”

From here, you can click “Add Library Folder” and specify the location of your new library.

However – and this took me about a half-hour of searching – you can’t add a new location while a game is updating or a download is taking place. To get around this just pause all your downloads!

Multiple Steam Library Folders was originally published on antispin

Wordpress woes

It would seem that my wordpress instance over at antisp.in is flipping the fuck out and isn’t posting things and images and stuff as intended. 

When’s Ghost 0.5+ coming out? I want to get off the horrible lumbering behemoth that WP has become. 

Also, it would seem that not using Varnish is actually beneficial to my infrastructure…