Best Practices for Encrypted Search

tapdattl@lemmy.world · 47 minutes

Maybe do some deduplication first? https://github.com/qarmin/czkawka

tapdattl@lemmy.world · 5 months

I think you need a Windows server for that.

tapdattl@lemmy.world · 5 months

The assignment requires the database contents itself to be encrypted, not just where its stored, unfortunately.

tapdattl@lemmy.world · 5 months

Task

I’m working on my final project for school, we are supposed to make a web app of our choosing and there has to be specific features in it. One of it is all data must be encrypted, and the other is that we have to have a search functionality. My app (A customer support framework) has a ticket functionality where customers can submit help request tickets, the contents of these tickets need to be encrypted at rest, at the same time admins need to be able to search contents of tickets.

Current Plan

My current plan is to store an AES-256 encrypted copy of the message message.content to meet the encrypted requirement, and also store a tokenized and hashed version of the message message.hashed to meet the searchability requirement.

The tokenization/hashing method will be:

strip the message to alphanumeric + whitespace ([a-zA-Z0-9 ])
tokenize by splitting the message by whitespace,
SHA-256 each token,
rejoin all the hashed tokens into a space seperated string and stored in the message.hashed field.

Thus this is a test string becomes <hash of this> <hash of is> <hash of a> <hash of test> <hash of string>

When the user searches their search string goes through all of the steps in the tokenization/hashing method, then we query the message table for message.hashed LIKE %%<hashed string>%% and if my thinking is right, we should be able to find it.

Concerns

Statistical analysis of hashed tokens
- I really don’t see a way around this, to make the string searchable the hashing needs to be predictable.
message.hashed field could potentially be huge, if each word is getting a SHA256 hash, a large message could result in a very large hash string
- maybe we just store the last 4 of the hash?
  - This would increase collisions, but the likelihood of multiple last 4’s colliding in a given search string should be pretty dang small, and any collisions would likely not be valid language.
  - Would this help with the statistical analysis concern? Increasing collisions would decrease the effectiveness of statistical analysis. It would be a performance hit, but after returning all matches against the hashes I could decrypt the message.content data and search the raw search query against the unencrypted text and remove any incorrect returns caused by collisions.

I’m interested in hearing everyone’s thoughts, am I being logical in my reasoning?

tapdattl@lemmy.world · 1 year

That happens on my RSS reader, I haven’t looked into it too deeply but I’m assuming he’s using JavaScript to populate the entries and are thus not being populated on none dowser clients

tapdattl@lemmy.world · 1 year

Spotube is an android app that provides a frontend to Spotify and allows you to download songs you listen to to your device. Im guessing you could sync those files to your server and store them in a different system.

tapdattl@lemmy.world · 1 year

FreeIPA and Keycloak will give you directory management (LDAP and Kerberos), identity management, and single-sign on (OIDC and SAML) which if all your computers are running Linux as well, will give you centralized management of users.

You can then set other FOSS business management/productivity applications like NextCloud, Oodoo, Seafile, OnlyOffice, LibreOffice, CryptPad, etc. To use Keycloak as its authentication mechanism.

A lot of this will depend on what kind of work the business does.

You’ll also want to look into log management and SEIM for security monitoring, Wazuh, Graylog, and others. This is especially true if the business has any data compliancy responsibilities in the country this is in.

tapdattl@lemmy.world · 1 year

I think the general consensus for homelabbers is a mesh network – Tailscale and Netbird are the two most popular options

tapdattl@lemmy.world · 1 year

I would love any comments/criticism as this is the first project I’ve written where I actually felt comfortable with what I was doing

Thanks!

tapdattl@lemmy.world · 1 year

That’s a bingo! Yeah I decided to dip my toes into Go by writing a simple library on a topic I was learning about

tapdattl@lemmy.world · 1 year

I don’t think libraries should log by default

That’s a fair point, interfaces are still a concept that boggle my mind a bit, but maybe this is the problem that will help me actually grasp them. Thanks!

tapdattl@lemmy.world · 1 year

there’s no explanation of what this is supposed to do.

Totally right, sorry about that, I’ll update the Github, but it brief this is a library that’s supposed to help a developer set up a Role Based Access Control system for an API for web service. Role Based Access Control is a method of access control whereby (And this is my very beginner’s understanding of it) users are assigned roles, and these roles are in turn issued different permissions based off what that role is supposed to have access to. When checking if a user is authorized access to a certain resource, the roles assigned to them are checked for the permissions needed for the resource. If they have permission then they are granted access to the resource, otherwise they are denied access.

This library manages roles, permissions assigned to roles, and checking of permissions against roles via an http middleware.

Then, there’s no main function. Where’s the entry point? This is a bit where I’m doubting myself now. Maybe go has changed, but when I was writing it, it requires a main function to even run.

Well, this is supposed to be a library that’s used by other people, so it has no main function itself, rather it’s called by other people

tapdattl@lemmy.world · 1 year

Permanently Deleted

tapdattl@lemmy.world · 1 year

The Homelab Show was a good one, though they haven’t posted a new podcast in almost a year. Lawrence Systems and Learn Linux TV are the makers of it and have their own content as well

tapdattl@lemmy.world · 1 year

He did

[…] Why does the radius need to be reactive? What do you stand to gain over just setting to like 3 or 4px and moving on with your life?

Junior webdev points

AKA you gain nothing.

tapdattl@lemmy.world · 1 year

What’s your solution? PiHole? The thing I don’t like about the PiHole is the lack of wildcard domain rewrites. I’ve been playing with AdGuard Home and Unbound, not sure what my final solution will be, though.

tapdattl@lemmy.world · 1 year

Yeah I’ve been toying with FreeIPA for IdM, Keycloak for SSO, and Netbird to create a zero trust internal network. DNS is the hurdle I’m currently figuring my way over

tapdattl@lemmy.world · 2 years

I’ve been playing with Stalwart-Email as a combined SMTP/IMAP server. Its open source and written in rust, still pretty early in development and I haven’t played with it enough to give any real opinion on the pluses or minuses compared to other software, but its worth taking a look at.

tapdattl@lemmy.world · 2 years

You could self host a web client

tapdattl@lemmy.world · 2 years

Well the internet down scenario has only happened once, and I returned home to no internet, booted up my laptop, and could not connect to any of my services since I couldn’t reach my control server. I haven’t forced the issue to occur by disconnecting my internet and testing connectivity. I just did the lazy thing and connected to the services I wanted via their IPv4 address

Best Practices for Encrypted Search

A little learning project for me, a Role Based Access Control library in Go

Permanently Deleted