NaibofTabr

NaibofTabr@infosec.pub · 3 months

Login with ID form #10-T

Login with PEBKAC token

NaibofTabr@infosec.pub · 3 months

On second thought, let’s not go outside. 'Tis a silly place.

NaibofTabr@infosec.pub · 4 months

First and most important:

In the context of long-term data storage
ALL DRIVES ARE CONSUMABLES

I can’t emphasize this enough. If you only skim the rest of my post, re-read the above line and accept it as fundamental truth. “Long-term” means 1+ years, by the way.

It does not matter what type of drive you buy, how much you spend on it, who manufactured it, etc. The drive will fail at some point, probably when you’re least prepared for it. You need to plan around that. You need to plan for the drive being completely useless and the data on it unrecoverable post-failure. Wasting time and money to acquire the fanciest most bulletproof drives on the market is a pointless resource pit, and has more to do with dick-measuring contests between data-hoarders.

Knife geeks buy $500+ patterned steel chef’s knives with ebony handles and finely ground edges and bla bla bla. Professional kitchens buy the basic Victorinox with the plastic handle. Why? Because they actually use it, not mount it on a wall to look pretty.

The knife is a consumable, not an heirloom. So are your storage drives. We call them “spinning rust” for a reason.

The solution to drive failure is redundancy. Period.

Unfortunately, this reality runs counter to the desire to maximize available storage. Do not follow the path of desire, that way lies data loss and outer darkness. Fault-tolerant is your watchword. Component failure is unpredictable, no matter how much money you spend. A random manufacturing defect will ruin your day when you least expect it.

A minimum safe layout is to have 2 live copies of data (one active, one mirror), hot standby for 1 copy (immediate swap-in when the active or mirror fails), and cold standby on the shelf to replace the hot standby when it enters service.

Note that this does not describe a specific number of disks, but copies of data. The minimum to implement this is 4 disks of identical storage capacity (2 live, 1 hot standby, 1 on the shelf) and a server with slots for 3 disks. If your storage needs expand beyond the capacity of 1 disk, then you need to scale up by the same ratio. A disk is indivisible - having two copies of the same data on a disk does not give you any redundancy value. (I won’t get into striping and mucking about with weird RAID choices in this post because it’s too long already, but basically it’s not worth it - the KISS principle applies, especially in small configurations)

This means you only get to use 25% of the storage capacity that you buy. Them’s the breaks. Anything less and you’re not taking your data longevity seriously, you might as well just get a consumer-grade external drive and call it a day.

Buy 4 disks, it doesn’t matter what they are or how much they cost (though if you’re buying used make sure you get a SMART report from the seller and you understand what it means) but keep in mind that your storage capacity is just 1 of the disks. And buy a server that can keep 3 of them online and automatically swap in the standby when one of the disks fails. Spend more money on the server than the disks, it will last longer.

Remember, long-term is a question of when, not if.

NaibofTabr@infosec.pub · 4 months

Cursor is pouring gasoline on the fire.

NaibofTabr@infosec.pub · 4 months

I think I saw a 2…

NaibofTabr@infosec.pub · 5 months

Really nice overview

NaibofTabr@infosec.pub · 5 months

AI coding tools can do common, simple functions reasonably well, because there are lots of examples of those to steal from real programmers on the Internet. There is a large corpus of data to train with.

AI coding tools can’t do sophisticated, specific-case solutions very well, because there aren’t many examples of those for any given use case to steal from real programmers on the Internet. There is a small corpus of data to train with.

AI coding tools can’t solve new problems at all, because there are no examples of those to steal from real programmers on the Internet. There is no corpus of data to train with.

AI coding tools have already ingested all of the code available on the Internet to train with. There is no more new data to feed in. AI coding tools will not get substantially better than they are now. All of the theft that could be committed has been committed, which is why the AI development companies are attempting to feed generated training material into their models. Every review of this shows that it makes the output from generative models worse rather than better.

Programming is not about writing code. That is what a manager thinks.
Programming is about solving problems. Generative AI doesn’t think, so it cannot solve problems. All it can do is regurgitate material that it has previously ingested which is hopefully close-ish to the problem you’re trying to solve at the moment - material which was written by a real thinking human that solved that problem (or a similar one) at some point in the past.

If you patronize a generative AI system like Claude Code, you are paying into, participating in, and complicit in, the largest example of labor theft in history.

NaibofTabr@infosec.pub · 5 months

I mean… do you know what community you’re in right now?

NaibofTabr@infosec.pub · 5 months

No no, the imperative “get six” overrides the previous “buy a gallon of milk” if the “they have eggs” condition is met.

“get six” implies x === 6 not x = x + 6, that would be “get six more”

The real problem is that “buy” was only specified in the first case. Because the conditional was met, he should get six gallons of milk but not buy them.

NaibofTabr@infosec.pub · 5 months

Remember, RAID (or RAID-adjacent) is not a backup.

This. So much this. OP please listen to and understand this.

Even with full mirroring in RAID 1, it’s not a backup. Using the second drive as an independent backup would be so much better than RAID.

NaibofTabr@infosec.pub · 5 months

You SHOULD NOT do software RAID with hard drives in separate external USB enclosures.

There will be absolutely no practical benefit to this setup, and it will just create risk of transcription errors between the mirrored drives due to any kind of problems with the USB connections, plus traffic overhead as the drives constantly update their mirroring. You will kill your USB controller, and/or the IO boards in the enclosures. It will be needlessly slow and not very fault-tolerant.

If this hardware setup is really your best option, what you should do is use 1 of the drives as the active primary for the server, and push backups to the other drive (with a properly configured backup application, not RAID mirroring). That way each drive is fully independent from the other, and the backup drive is not dependent on anything else. This will give you the best possible redundancy with this hardware.

NaibofTabr@infosec.pub · 6 months

A modern OS running with low RAM (e.g. an RPi with 2G) is going to fill the RAM pretty quickly just in normal operation, so a larger swap space will allow it to run more efficiently as it regularly moves things in and out of swap. You still want to have some overhead to allow for storing the live RAM for hibernation, which with a small amount of RAM is likely to be near 100%. Therefore, running with 3x RAM for swap space is recommended.

it only needs to be at least the size of RAM

Yes, technically it only needs to be the size of the RAM, but no matter how much RAM you have some of the swap space will be used at any given time for the swap file during system operarion. If you only have exactly as much swap space as RAM, there won’t be enough available swap space to store the entire live RAM for hibernation.

The size of the swap file and the size of the live RAM image at any point is unpredictable, therefore 1.5x RAM is the lowest recommended value that is probably safe for hibernation, assuming the swap file is not being used heavily enough to be 50% of the RAM. If you can’t provide at least that much disk space for swap, you should disable hibernation.

NaibofTabr@infosec.pub · 6 months

This is the best simple guideline: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/managing_storage_devices/getting-started-with-swap#recommended-system-swap-space

Basically, if you want your system to be able to hibernate then you need enough swap space to sustain both the active swap file and a full image of the live system RAM (hibernate = suspend-to-disk, and uses the swap space). The swap file could be as large as the RAM, so a safe value is 2x the RAM. If you don’t want to dedicate that much disk space to swap, the safe option is to disable hibernation but note that suspend-to-disk is safer for system recovery in the event of power failure.

If you’ve ever had a Linux system go into hibernate and fail to awake, lack of swap space was probably the reason.

In Red Hat’s chart where they recommend 1.5x RAM for 8-64 GiB, basically you’re hoping that your system is never completely using all of the RAM. If you do cap out the RAM such that the swap file plus the in-use memory is greater than 1.5x RAM, and the system goes into hibernate, it will not recover because there isn’t enough free swap space to store the in-use memory. You have to make a judgment call when you set up your system about how you’re going to use it - whether you expect to be using 100% of the RAM at any point, whether you’ll remember to close some running applications to free up memory every time you leave the system idle long enough to go into hibernate, whether other users will be using the system (if they’re logged in then they are partially using the RAM and the swap), etc.

Deciding how much swap space you need is a risk management decision based on your tolerance for data loss, application stability, and whether or not you need hibernation.

NaibofTabr@infosec.pub · 6 months

Whatever you do, and whoever you end up working with, document document document. Take notes.

And I mean on paper, in a notebook, something that can’t crash or get accidentally deleted and doesn’t require electricity to operate.

You’re doing this for yourself, not for a boss, which means you can take the time to keep track of the details. This will be especially important for ongoing maintenance.

Write down a list of things you imagine having on your network, then classify them as essential vs. desired (needs and wants), then prioritize them.

As you buy hardware, write down the name, model and serial number and the price (so that you can list it on your renter’s/homeowner’s insurance). As you set up the devices, also add the MAC and assigned IP address(es) to each device description, and also list the specific services that are running on that device. If you buy something new that comes with a support contract, write down the information for that.

Draw a network diagram (it doesn’t have to be complicated or super professional, but visualizing the layout and connections between things is very helpful)

When you set up a service, write down what it’s for and what clients will have access to it. Write down the reference(s) you used. And then write down the login details. I don’t care what advice you’ve heard about writing down passwords, just do it in the notebook so that you can get back into the services you’ve set up. Six months from now when you need to log in to that background service to update the software you will have forgotten the password. If a person you don’t trust has physical access to your home network notebook, you have a much more serious problem than worrying about your router password.

NaibofTabr@infosec.pub · 6 months

Because they want step-by-step guidance and support, and design help, and long-term support, not just a few questions answered.

This is a job. The kind of work that IT consultants get paid for. A fair rate would be US$100/hr, minimum, for an independent contractor.

NaibofTabr@infosec.pub · 6 months

You can just use openssl to generate x509 certificates locally. If you only need to do this for a few local connections, the simplest thing to do is create them manually and then manually place them in the certificate stores for the services that need them. You might get warnings about self-signed certificates/unrecognized CA, but obviously you know why that’s the case.

This method becomes a problem when:

You need to scale - manually transferring certs is fine maybe half a dozen times, after that it gets real tedious and you start to lose track of where they are and why.
You need other people to access your encrypted services - self-signed certs won’t work for public access to an HTTPS website because every visitor will get a warning that you’re signing your own encryption certs, and most will avoid it. For friends and family you might be able to convince them that your personal cert is safe, but you’ll have to have that conversation every time.
You need to implement expiration - the purpose of cert expiration is to mitigate the damage if the cert private key leaks, which happens a lot with big companies that have public-facing infrastructure and bad internal security practices (looking at you, Microsoft). As an individual, it is still worthwhile to update your certs every so often (e.g. every year) if for no other reason than to remind yourself how your SSL infrastructure is connected. It’s up to you whether or not it’s worth the effort to automate the cert distribution.

I’ve used Letsencrypt to get certs for the proxy, but the traffic between the proxy and the backend is plain HTTP still. Do I need to worry about securing that traffic considering its behind a VPN?

In spite of things you may have read, and the marketing of VPN services, a VPN is NOT a security tool. It is a privacy tool, as long as the encryption key for it is private.

I’m not clear on what you mean by “between the proxy and the backend”. Is this referring to the VPS side, or your local network side, or both?

Ultimately the question is, do you trust the other devices/services that might have access to the data before it enters the VPN tunnel? Are you certain that nothing else on the server might be able to read your traffic before it goes into the VPN?

If you’re talking about a rented VPS from a public web host, the answer should be no. You have no idea what else might be running on that server, nor do you have control over the hypervisor or the host system.

NaibofTabr@infosec.pub · 6 months

The problem is that many employers are requiring employees to use them.

NaibofTabr@infosec.pub · 6 months

Sure, then you get outbid by another contractor who is willing to cut corners.

NaibofTabr@infosec.pub · 6 months

What if there are other things wired to those switches?

NaibofTabr@infosec.pub · 6 months

I tend to agree with this line of thinking. If you’re trying to hire an effective problem solver, well the first step to solving any problem is understanding the problem - the whole problem - and often more importantly the context in which the problem exists.

And while my first reaction is to be frustrated with the person asking for a solution to such a vague problem… in the real world problems are rarely clearly stated, and frequently misstated. Investigating the apparent conditions of the problem is always necessary, and generally the fastest path to resolution.