Commit 5ceeb319 authored by Michał Woźniak's avatar Michał Woźniak
Browse files

Documentation

parent 8b3dd7e5
# Samizdat
A browser-based solution to Web censorship, implemented as a JavaScript library to be deployed easily on any website. Samizdat uses [ServiceWorkers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers) and a suite of non-standard in-browser delivery mechanisms, with a strong focus on decentralized tools like [Gun](https://gun.eco) and [IPFS](https://ipfs.io/).
A browser-based solution to Web censorship, implemented as a JavaScript library to be deployed easily on any website. Samizdat uses [ServiceWorkers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers) and a suite of non-standard in-browser delivery mechanisms, with a strong focus on decentralized tools like [IPFS](https://ipfs.io/).
Ideally, users should not need to install any special software nor change any settings to continue being able to access a blocked Samizdat-enabled site as soon as they are able to access it *once*.
## Current status
Samizdat is currently considered *alpha*: the code works, but major rewrites and API changes are coming.
Samizdat is currently considered *alpha*: the code works, but major rewrites and API changes are coming. It has been tested on Firefox, Chromium and Chrome on desktop, as well as Firefox for mobile on Android, but it should work in any browser implementing the ServiceWorker API.
Feel free to test it, but be aware that it might not work as expected. If you'd like to get in touch, please email us at `rysiek+samizdat[at]hackerspace.pl`.
Feel free to test it, but be aware that it might not work as expected. If you'd like to get in touch, please email us at `rysiek+samizdat[at]hackerspace.pl`, create an [issue](https://0xacab.org/rysiek/samizdat/-/issues/new), or contact us [on the Fediverse](https://mastodon.social/tags/samizdat).
## Rationale
While a number of censorship circumvention technologies exist, these typically require those who want to access the blocked content (readers) to install specific tools (applications, browser extensions, VPN software, etc.), or change their settings (DNS servers, HTTP proxies, etc.). This approach does not scale.
At the same time, large-scale Internet censorship solutions are deployed in places like Azerbaijan or Tajikistan, effectively blocking whole nations from accessing information deemed *non grata* by the relevant governments. And with the ever-increasing centralization of the Web, censorship has never been easier.
At the same time, large-scale Internet censorship solutions are deployed in places like UK, Azerbaijan or Tajikistan, effectively blocking whole nations from accessing information deemed *non grata* by the relevant governments. And with the ever-increasing centralization of the Web, censorship has never been easier.
This project explores the possibility of solving this in a way that would not require the reader to install any special software or change any settings; the only things that are needed are a modern Web browser and the ability to visit a website that deployed this tool once, caching the JavaScript library.
This project explores the possibility of solving this in a way that would not require visitors to install any special software or change any settings; the only things that are needed are a modern Web browser and the ability to visit a website once, so that the JavaScript ServiceWorker kicks in.
You can read more in-depth overview of Samizdat [here](./docs/OVERVIEW.md). And [here](./docs/PHILOSOPHY.md) is a document describing the philosophy influencing project goals and relevant technical decisions.
## Architecture
......@@ -24,34 +26,7 @@ A [ServiceWorker](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worke
After the ServiceWorker is downloaded and activated, it handles all `fetch()` events by first trying to use the regular HTTPS request to the original website. If that fails for whatever reason (be it a timeout or a `4xx`/`5xx` error), the plugins kick in, attempting to fetch the content via any means available.
## Proof of concept
The proof of concept is being implemented using Gun as a resolution strategy and IPFS for content delivery.
- Content delivery:
- [x] ServiceWorker being activated and handling `fetch()` events
- [x] resolution working via Gun
- [x] content delivery working via IPFS
- [x] automatic GUN-and-IPFS-based delivery (handled by the ServiceWorker) of content deployed via the CI/CD pipeline in case it's not accessible via HTTPS
- [x] caching content locally (via the [`cache` API](https://developer.mozilla.org/en-US/docs/Web/API/Cache))
- [ ] caching content addressing locally (via the `cache` API)
- [ ] GUN-and-IPFS-based delivery triggered manually by the reader
- Content deployment:
- [x] content pushed to IPFS as part of the CI/CD pipeline
- [x] verification and preloading of IPFS content pushed in CI/CD
- [x] content addressing pushed to Gun as part of the CI/CD pipeline
- [x] verification and preloading of addresses pushed to Gun in CI/CD
- [x] deployment to IPFS directly from the browser
- [x] verification of IPFS content pushed from the browser
- [x] pushing IPFS addresses to Gun directly from the browser
- [x] verification of Gun addresses pushed from the browser
- [x] user-facing UI allowing the user to log into Gun and trigger the browser-based deployment
- Status display and user control panel:
- [x] basic status data available (short commit SHA, whether or not ServiceWorker was used, and whether or not Gun and IPFS were used)
- [x] more advanced status data available (which URLs were fetched using Gun and IPFS, and which went through HTTPS)
- [x] "control panel"-type status display with user-friendly information on what content was pulled from cache, what was downloaded via GUN and IPFS, etc.
Currently tested on Firefox, Chromium and Chrome on GNU/Linux, as well as Firefox for mobile on Android.
A more complete overview of the architecture and technicalities of Samizdat is available [here](./docs/ARCHITECTURE.md).
## Draft API
......@@ -61,7 +36,7 @@ The plan is to have an API to enable the use of different strategies for getting
- **delivery**
*how* to get it
These need to be closely integrated. For example, if using Gun and IPFS, resolution is performed using Gun, and delivery is performed using IPFS. However, Gun needs to resolve content to something that is usable with IPFS. If, alternatively, we're also using Gun to resolve content available on BitTorrent, that will have to be a separate namespace in the Gun graph, since it will have to resolve to magnet links.
These need to be closely integrated. For example, if using Gun and IPFS, resolution is performed using Gun, and delivery is performed using IPFS. However, Gun needs to resolve content to something that is usable with IPFS. If, alternatively, we're also using Gun to resolve content available on BitTorrent, that will have to be a separate namespace in the Gun graph, since it will have to resolve to magnet links.
Therefore, it doesn't seem to make sense to separate resolution and delivery. Thus, a Samizdat plugin would need to implement the whole pipeline, and work by receiving a URL and returning a Promise that resolves to a valid Response object containing the content.
......@@ -101,16 +76,16 @@ The code in the browser window context is responsible for keeping a more permane
Better suited for resolution than for delivery, although it could handle both. Pretty new project, dynamically developed. No global network of public peers available currently. Content is cryptographically signed.
- **[IPNS](https://docs.ipfs.io/guides/concepts/ipns/)**
Only suitable for resolution. Deployed, stable, and well-documented. Fits like a hand in a glove with IPFS.
Only suitable for resolution. Experimental, not fully functional in the browser yet. Fits like a hand in a glove with IPFS.
- **[DNSLink](https://docs.ipfs.io/guides/concepts/dnslink/)**
Only suitable for resolution. Deployed, stable, and well-documented. Fits like a hand in a glove with IPFS. The downside is that it requires the publishing of DNS records to work, which means it might not be useful in most situations where censorship is involved – depending on where the DNSLink-to-IPFS address resolution happens.
Only suitable for resolution. Deployed, stable, and well-documented. Fits like a hand in a glove with IPFS. The downside is that it requires publishing of DNS records to work (every time any new content is published), which means it might not be useful in most situations where censorship is involved – depending on where the DNSLink-to-IPFS address resolution happens.
- **[IPFS](https://ipfs.io/)**
Only suitable for delivery, since it is content-addressed. Resolution of a content URI to an IPFS address needs to be handled by some other technology (like Gun or IPNS). Deployed, stable, and well-documented, with a large community of developers. Redeploying a new content package with certain files unchanged does not change the addresses of the unchanged files, meaning that small changes in content do not lead to the whole content tree needing to be re-seeded.
Only suitable for delivery, since it is content-addressed. Resolution of a content URI to an IPFS address needs to be handled by some other technology (like Gun or IPNS, or using [gateways](https://ipfs.github.io/public-gateway-checker/)). Deployed and well-documented, with a large community of developers. Redeploying a new content package with certain files unchanged does not change the addresses of the unchanged files, meaning that small changes in content do not lead to the whole content tree needing to be re-seeded.
- **[WebTorrent](https://github.com/webtorrent/webtorrent)**
Only suitable for content delivery. It seems possible to fetch a particular file from a given torrent, so as not to have to download a torrent of the whole website just to display a single page with some CSS and JS. Requires a resolver to point to the newest torrent since torrents are immutable.
Only suitable for content delivery. It seems possible to fetch a particular file from a given torrent, so as not to have to download a torrent of the whole website just to display a single page with some CSS and JS. Requires a resolver to point to the newest torrent since torrents are immutable. Even small changes (for example, only a few files changed in the whole website tree) require creating a new torrent and re-seeding, which is obviously less than ideal.
- **Plain files via HTTPS**
This delivery method is obvious if we're talking simply about the originating site and it serving the files, but this can also mean non-standard strategies like pushing static HTML+CSS+JS to CloudFront or Wasabi, and having a minimal resolver kick in if the originating site is blocked, to fetch content seamlessly from alternative locations (effectively implementing domain fronting and collateral freedom in the browser). However, this will require some thought being put into somehow signing content deployed to third-party locations – perhaps the resolver (like Gun) could be responsible for keeping SHA sums of known good content, or perhaps we should just address it using the hashes, effectively imitating IPFS.
......@@ -135,7 +110,7 @@ One way to deal with this is to have a large list of such public nodes and send
Plus, the ever-increasing adoption of IPv6 will also partially fix this.
Finally, [NetBlocks](https://netblocks.org/) deployed a very similar tool (ServiceWorker pulling content from a few specific IP addresses in case of upstream domain blocking) and it worked; the fallback IP addresses apparently were not blocked.
Finally, [NetBlocks](https://netblocks.org/) deployed a very similar tool (ServiceWorker pulling content from a few specific IP addresses in case of upstream domain blocking) and reportedly it worked rather well; the fallback IP addresses apparently were not blocked, proving that censors move slow.
## Related developments
......@@ -146,7 +121,6 @@ Finally, [NetBlocks](https://netblocks.org/) deployed a very similar tool (Servi
- https://netblocks.org/
Former(?) Lazarus project. Basically the same idea as Samizdat: a ServiceWorker that tries to fetch content from the website, and if it's unavailable, fetches it from somewhere else (in the case of Lazarus, from a few specified IP addresses). Used to be deployed in production and used successfully by users in the field.
## Special thanks and acknowledgements
The name "Samizdat" [was suggested for the project by Doc Edward Morbius](https://mastodon.cloud/@dredmorbius/102949927295700792) and was clearly the right choice. There were many other great suggestions (see [the relevant thread](https://mastodon.social/@rysiek/102916750160299480)). We'd like to thank everyone who suggested names, or took part in the poll!
......@@ -6,14 +6,14 @@ Eventually this will document the architecture of Samizdat.
There are two kinds of plugins:
- **Live plugins**
Plugins that *fetch* live content, e.g. by using regular HTTPS [`fetch()`](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API), or by going through [IPFS](https://js.ipfs.io/). They *should* also offer a way to *publish* content by website admins (if relevant credentials or encryption keys are provided, depending on the method).
- **Transport plugins**
Plugins that *retrieve* website content, e.g. by using regular HTTPS [`fetch()`](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API), or by going through [IPFS](https://js.ipfs.io/). They *should* also offer a way to *publish* content by website admins (if relevant credentials or encryption keys are provided, depending on the method).
Methods these plugins implement:
- `fetch` - fetch content from an external source (e.g., from IPFS)
- `publish` - publish the content to the external source (e.g., to IPFS)
* **Stashing plugins**
Plugins that *stash* content locally (e.g., in the [browser cache](https://developer.mozilla.org/en-US/docs/Web/API/Cache)) for displaying when no *live plugin* works, or before content is received via one of them.
Plugins that *stash* content locally (e.g., in the [browser cache](https://developer.mozilla.org/en-US/docs/Web/API/Cache)) for displaying when no *transport plugin* works, or before content is received via one of them.
Methods these plugins implement:
- `fetch` - fetch the locally stored content (e.g., from cache)
- `stash` - stash the content locally (e.g., in cache)
......@@ -31,9 +31,9 @@ self.SamizdatPlugins.push({
})
```
### Live plugins
### Transport plugins
Live plugins *must* add `X-Samizdat-Method` and `X-Samizdat-ETag` headers to the response they return, so as to facilitate informing the user about new content after content was displayed using a stashing plugin.
Transport plugins *must* add `X-Samizdat-Method` and `X-Samizdat-ETag` headers to the response they return, so as to facilitate informing the user about new content after content was displayed using a stashing plugin.
- **`X-Samizdat-Method`**:
contains the name of the plugin used to fetch the content.
......@@ -57,9 +57,9 @@ If a background plugin `fetch()` succeeds, the result is added to the cache and
## Stashed versions invalidation
Invalidation heuristic is rather naïve, and boils down to checking if either of `X-Samizdat-Method` or `X-Samizdat-ETag` differs between the response from a live plugin and whatever has already been stashed by a stashing plugin. If either differs, the live plugin response is considered "*fresher*".
Invalidation heuristic is rather naïve, and boils down to checking if either of `X-Samizdat-Method` or `X-Samizdat-ETag` differs between the response from a transport plugin and whatever has already been stashed by a stashing plugin. If either differs, the transport plugin response is considered "*fresher*".
This is far from ideal and will need improvements in the long-term. The difficulty is that different live plugins can provide different ways of determining the "*freshness*" of fetched content -- HTTPS-based requests offer `ETag`, `Date`, `Last-Modified`, and other headers that can help with that; whereas IPFS can't really offer much apart from the address which itself is a hash of the content, so at least we know the content is *different* (but is it *fresher* though?).
This is far from ideal and will need improvements in the long-term. The difficulty is that different transport plugins can provide different ways of determining the "*freshness*" of fetched content -- HTTPS-based requests offer `ETag`, `Date`, `Last-Modified`, and other headers that can help with that; whereas IPFS can't really offer much apart from the address which itself is a hash of the content, so at least we know the content is *different* (but is it *fresher* though?).
## Messaging
......@@ -69,4 +69,4 @@ When the browser window context wants to message the service worker, it uses the
### Messages
TBD.
This section is a work in progress.
# Samizdat Overview
[Samizdat](https://samizdat.is/) is a browser-based solution to Web censorship, implemented as a JavaScript library to be deployed easily on any website. Samizdat uses [Service Workers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers) and a suite of non-standard in-browser delivery mechanisms, with a strong focus on decentralized tools like [IPFS](https://ipfs.io/).
This is a high-level overview of the project. It is not supposed to be technical, although some technical details will necessarily get included.
## What is Samizdat?
Samizdat is a tool to make websites more resilient to censorship without requiring visitors to install any software or change their settings or habits.
Traditionally, censorship circumvention is a responsibility of those who want to access blocked resources. While effective tools exist for that ([Tor Browser](https://www.torproject.org/download/), [Psiphon](https://psiphon3.com/en/index.html), [Lantern](https://getlantern.org/en_US/index.html), to name a few), it is not reasonable to expect whole national populations to switch to using them daily.
State-level web censorship remains effective because it only needs to be *good enough*; a few activists slipping through do not matter, if most of the population cannot access the blocked content. That's where Samizdat comes in.
### Project status
Samizdat is currently considered `alpha` software: the code works and the concept has been proven, but it's not ready for production use. There is a [`beta` milestone](https://0xacab.org/rysiek/samizdat/-/milestones/1/), which tracks issues that need to be resolved before `beta` version can be released.
The code has been tested on Firefox, Chromium, and Chrome (on desktop); as well as Firefox for mobile on Android. It should, however, work in any browser implementing the Service Worker API.
### Goals
Samizdat aims to give *website owners* a tool that works in any modern browser, in its default configuration. The process should not be more complicated than:
1. Website admin deploys Samizdat on a website.
1. A visitor visits the website *once* while it is not blocked.
1. From that point on, the visitor gets the website's content (including new content) even if the website itself is unavailable to them.
Explicit project goals are:
- **Compatibility for visitors:**
Samizdat needs to work on any modern browser on default settings.
No additional software should be required for Samizdat to work.
No specific action on the part of website visitors should be needed.
- **Control for website owners:**
Website owners shouldn't have to relinquish any control over their content.
No single central gatekeeper should exist for Samizdat.
Whenever possible decentalized technologies should be used.
### Specific use-cases
Due to its architecture, Samizdat has three distinct use cases:
1. **Censorship circumvention**
The primary use case and focus of the project
1. **Visitor privacy protection**
Retrieving content from locations other than the target domain can obscure the fact that a visitor is trying to access the target domain.
1. **Seamless CDN with multiple back-ends**
Load can be distributed using any available transport plugins (regular HTTPS `fetch()`, `IPFS`, etc.)
They are discussed in more detail further down.
### Project's Philosophy
Samizdat's philosophy can be boiled down to a single sentence:
**Information must remain easily accessible.**
The choice of words here is very deliberate: Samizdat focuses on keeping websites available, but does not concern itself with live two-way communication. This also means that we make intentional architectural decisions to make Samizdat less useful in activities aimed at bullying and silencing diverse voices online (which usually requires live two-way communication).
This is covered in more depth [here](./PHILOSOPHY.md).
### What Samizdat is not?
**Samizdat is not a *personal* censorship circumvention tool**;
it will not help you, as a user, access specific blocked content, nor will it help website admins access administration panels of their blocked websites.
**Samizdat is also not a hosting provider, nor a CDN**,
although it can be used by hosting providers (especially those providing managed CMS hosting), including to to create an impromptu CDN using different back-ends.
**Samizdat is not a security tool**;
it could be used to help secure a website from certain kinds of attacks, but that is somewhat beyond the core focus of the project.
**Samizdat very deliberately focuses on *unidirectional information flow*** — from the website to the visitor.
That's why it only implements `GET` requests. `POST` requests, WebSockets, and other means of bidirectional communication are beyond the scope of the project, for two main reasons:
1. *It frees Samizdat from relying on the original server being accessible*, makes it possible to make certain assumptions about content, and enables the use of very varied back-ends (from distributed, like IPFS; to centralized, like CloudFront).
1. *It makes Samizdat less useful to those interested in silencing diversity*; while discriminatory content does also comes in the form of articles on websites, it becomes truly toxic when live two-way communication can be employed in an aggressive manner.
## Assumptions
Certain assumptions are made related to the content served via Samizdat, threats Samizdat is deployed against, and technical capabilities of website admins deploying Samizdat.
As long as these assumptions hold, Samizdat should be able to do its job. There are obviously plenty of scenarios where these assumptions do not hold, and thus Samizdat will not be useful. We have to choose our battles.
Finally, there are additional assumptions and limitations depending on which transport plugins are used and how they are used. More on this below.
### Website administrator
**The administrator of the website is assumed to have *unfettered and unblocked access* to the administration panel of the website.** Samizdat cannot provide means of accessing admin panels of blocked websites — for this, Tor Browser is a better tool.
### Content
**Content is assumed to be *public*.** We do not and cannot make any guarantees about who gains access to any content published using any Samizdat plugin. Specifically, using any IPFS-based plugins means that content pushed will potentially remain available indefinitely.
That is not really different from what can be expected of a regular website which makes its content available without a paywall or a login step. Any content on a website like that can be scraped by anyone, including the WebArchive. For website content [`ROBOTS.txt` file](https://developer.mozilla.org/en-US/docs/Glossary/Robots.txt) offers some level of control, the Samizdat equivalent of that is not publishing content one does not want to be available indefinitely using plugins using IPFS and similar technologies.
**It is also assumed to be *static-like*.** This means that it is meaningful to treat it as a collection of files, and that there is no need to send information back to the original server.
For example, a standard WordPress website, even though it's dynamically served, can be scraped and the resulting HTML and media can be served as static files without serious loss of functionality (apart from the comment section, removal of which can often be treated as a feature). In this sense, a standard WordPress site is *static-like*, and can benefit from Samizdat.
On the other hand, an on-line chat loses almost all of its usefulness if saved to a file. The point of an on-line chat is real-time communication between multiple parties, and in such a case Samizdat would have very limited usability.
**The website deploying Samizdat needs to be *served over HTTPS* with a valid certificate.** Samizdat relies on [Service Workers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API), and its specification explicitly requires that HTTPS for delivery of the Service Worker script.
### Visitors
**Visitors need to use a browser that implements web APIs used by the project;** any modern browser does. The browser cannot be running in private browsing mode, since in it Service Workers are disabled as per the specification.
**A visitor has to be able to *visit the website once*** so that the Service Worker script gets loaded and run — every subsequent visit will be handled by Samizdat. This is not an unreasonable ask even in places where web censorship is pervasive: these are complicated systems and from time to time there are slip-ups.
### Adversaries
Samizdat assumes (and is designed to impede) an adversary that is able to make a website unavailable by:
- DDoSsing it;
- domain or hosting takedowns;
- blocking traffic to/from specific IP addresses;
- blocking or hijacking DNS queries for specific domains;
- blocking based on deep packet inspection (`HOST` header for HTTP, TLS SNI `ClientHello` for HTTPS).
However, Samizdat assumes that the adversary ***does not*** have the capability to gain write-access to the website itself, for example by:
- exploiting a vulnerability in the CMS;
- stealing or guessing admin credentials;
- redirecting traffic to adversary-owned infrastructure *with a valid certificate for the targeted site*.
As long as the adversary cannot deploy a piece of JavaScript to the targeted domain that would remove or otherwise disable the Service Worker script (which is only possible from the relevant domain using a valid SSL certificate), Samizdat will continue to work for visitors who had visited the targeted site.
## Architecture and operation
Samizdat is divided into a few parts:
- **[`service-worker.js` script](./../service-worker.js)**
once loaded in a visitor's browser, it takes over handling all requests originating from the website, using plugins for content retrieval;
- **[plugins](./../plugins/)**
which implement different means of content retrieval, of which there are two kinds:
- *transport plugins*, handling retrieval of content from remote locations; this can be achieved by using a regular [HTTPS `fetch()`](./../plugins/fetch.js) to the original domain, or [via `IPFS`](./../plugins/gun-ipfs.js), or by requesting content from any other pre-configured location, or through any other means, as long as it's possible to implement it in JavaScript runningin the browser;
- *stashing plugins*, which handle saving successfully retrieved content locally (for example using the [Cache API](https://developer.mozilla.org/en-US/docs/Web/API/Cache)) to be used in case the website is blocked.
You can read more about Samizdat's architecture [here](./ARCHITECTURE.md).
### Content retrieval
When a visitor visits a Samizdat-enabled website for the first time, the `service-worker.js` script gets loaded, cached, and registered by the browser. During every subsequent visit, every request originating from that website (be it to the website's original domain, or to any third-party domains) is going to be handled by the Service Worker.
When a request is made, plugins are used to handle it in the order defined in code. By default, that is:
1. regular HTTPS `fetch()` to the original domain;
1. retrieval from local `cache`;
1. any other transport plugins (`IPFS`, or fetching content from other pre-configured endpoints).
It is up to the website admin's to configure the plugins in a way that makes sense. For example, if using the [`gun+ipfs`](./../plugins/gun-ipfs.js) plugin, the admin needs to create a Gun account and populate the plugin's Gun public key variable. If using `IPNS`, the admin needs to populate the `IPNS` public key in the respective plugin. And if alternative HTTPS endpoints are used, it's up to the admin to populate their URLs.
Examples of possible alternative HTTPS locations:
- an IP address or a domain controlled by the admin that is not linked to the original website and thus can be expected not to be blocked;
- WebArchive URL;
- a CloudFront location;
- a Google Drive public folder containing the content.
### Content publishing
For content to be available for retrieval by transport plugins, it first needs to be published to whatever locations it is going to be retrieved from.
For the `gun+ipfs` plugin, for instance, content needs to first be published in IPFS, and then the IPFS addresses for each file need to be published to Gun. Currently it is an involved process (example is available in the [CI/CD configuration for the project](./../.gitlab-ci.yml)); making it simpler and easier is the focus of current development.
For plugins relying on fetching content from alternative HTTPS locations, this can be as simple as deploying the content to the alternative IP address or domain name, pushing the content to WebArchive, putting the content in the Google Drive folder, or uploading it to the CloudFront location.
Eventually, Samizdat will have examples and at least some tools to automate this; [`samizdat-cli`](./../samizdat-cli/) already implements some elements of that.
### Note on threat-models
Each transport plugin will make its own assumptions and have its own weaknesses. For example:
- `IPFS`-based plugins still necessarily rely on IPFS entry nodes; if the adversary blocks these, they will not work
- if content is to be fetched from an alternative IP address or domain name, the adversary could block these
- using a service like Google Drive to host the content means that the adversary will have to block all of Google, which might not be something they're willing to do; however, Google will have control over that content and might take it down themselves, for whatever reason.
The good news is that government censors move slowly, so it might be possible to switch endpoints quickly enough to always stay ahead, even if using just IP addresses or alternative domains. The benefit of Samizdat in such a case is that visitors do not need to switch to a new domain, Samizdat handles this for them under the hood.
## Use-cases
Samizdat can be deployed in three different ways to fit three different usage scenarios. These are not exclusive, which means Samizdat configured and deployed for one of them can still be useful for others.
### Censorship circumvention
Deployment for censorship circumvention needs to balance usability (which boils down to how quickly the visitor gets the content) and resilience.
This means that a regular HTTPS `fetch()` should happen first. If it succeeds, content should be stashed using a stashing plugin so that it's available in case the site gets blocked in the future; if it fails, content should be displayed using a stashing plugin (if already stashed), while additional transport plugins are used to try to retrieve the content.
Once any of these succeeds, content gets stashed and the user informed to reload the page to see it.
```mermaid
graph TD;
https_fetch["HTTPS fetch()"]-->https_result{{Request succeeded?}};
https_result-->|yes|stash[Stash the response];
https_result-->|no|in_stash{{Was stashed already?}};
stash-->display[Display to the user];
in_stash-->|no|alternative_fetch[Try alternative transports];
alternative_fetch-->alternative_result{{Any request succeeded?}}
alternative_result-->|yes|stash;
alternative_result-->|no|error[Return an error]
in_stash-->|yes|display_from_stash[Display stashed];
display_from_stash-->alternative_fetch_for_later[Try alternative transports in the background];
alternative_fetch_for_later-->alternative_fetch_for_later_result{{Any request succeeded?}}
alternative_fetch_for_later_result-->|yes|stash_for_later[Stash for future use]
```
### Visitor privacy protection
We can skip the regular HTTPS `fetch()` plugin completely and rely only on a stashing plugin and some transport plugins that do not make requests to the original site.
This way as soon as Samizdat kicks in (after the first visit to the site) there will be no direct communication indicating they are trying to access the site (not even DNS traffic), since all content is pulled from IPFS or other locations that are not known to the adversary (and thus are not being monitored), or which are not possible to monitor by the adversary (if for example content is pulled directly from a public Google Drive folder).
This is of course a trade-off — it trades the ability of the adversary to notice the visitor going to the site for the ability of wherever we are pulling the content (IPFS nodes; Google Drive) to notice it.
```mermaid
graph TD;
stash[Stash the response]-->display[Display to the user];
in_stash{{Was stashed already?}}-->|no|alternative_fetch[Try alternative transports];
alternative_fetch-->alternative_result{{Any request succeeded?}}
alternative_result-->|yes|stash;
alternative_result-->|no|error[Return an error]
in_stash-->|yes|display_from_stash[Display stashed];
display_from_stash-->alternative_fetch_for_later[Try alternative transports in the background];
alternative_fetch_for_later-->alternative_fetch_for_later_result{{Any request succeeded?}}
alternative_fetch_for_later_result-->|yes|stash_for_later[Stash for future use]
```
### Seamless CDN with multiple back-ends
In this scenario, the focus is on loading the content from different locations (to limit the load on the main server) and displaying it as quickly as possible.
In this case Samizdat could be configured such that all transport plugins, including regular HTTPS `fetch()`, are used simultaneously, and content is displayed from whichever returns a non-error response first. The response is then stahced by a stashing plugin (like `cache`). If all transport plugins return errors, content is displayed using the stashing plugin (if already stashed).
Samizdat could also potentially be configured in a way such that some kinds of content are *always* retrieved using specific plugins. For example, an IPFS-based plugin could be used for static resources, while HTTPS `fetch()` directly to the site and separately to a different location (CloudFront, for example) could race to get the content that is expected to change more often.
```mermaid
graph TD;
https_fetch["HTTPS fetch() and alternative transports simultaneously"]-->https_result{{Any request succeeded?}};
https_result-->|yes|stash[Stash the response];
https_result-->|no|in_stash{{Was stashed already?}};
stash-->display[Display to the user];
in_stash-->|yes|display;
in_stash-->|no|error[Return an error];
```
# Project's Philosophy
Samizdat's philosophy can be boiled down to a single sentence:
**Information must remain easily accessible.**
The choice of words here is very deliberate.
## Information vs. communication
Samizdat purposely focuses on ***making information accessible***, as opposed to facilitating *live two-way communication flow*.
There is plenty of misinformation to go around, and plenty of communication that is meant solely to muddy the waters and create a toxic information environment. Those who organize such disingenuous communication and participate in it often rely on it being two-way, fast-paced, and emotional, intentionally leaving as little space for calm rational thought as possible.
There is a meaningful difference between a debate of ideas, and a shouting match or a lynch mob. Samizdat is not interested in supporting the latter. While discriminatory content does also come in the form of articles on websites, it becomes truly toxic when live mass communication can be employed in an aggressive manner.
This is where Samizdat draws a line by making specific architectural decisions. We cannot stop bigots from using Samizdat on their websites, but we can make Samizdat less useful for specific strategies often employed by them.
## Centralization as a dis-service
Samizdat grew out of the experience of managing websites that are blocked in some places, and the frustration regarding options available to website admins in who find their websites made unavailable, entirely or only to certain groups of visitors, be it via direct malicious actions (like exploiting CMS vulnerabilities), or DDoS, or state-level web censorship.
These options tend to be limited to a few massive gatekeepers like CloudFlare, and a few smaller ethical providers like [Deflect](https://deflect.ca/).
In practice, website owners are incentivised to use the massive gatekeepers' services, which [gradually centralizes the Internet](https://iscloudflaresafeyet.com/). Such centralization then becomes a problem itself, when these gatekeepers [find themselves under pressure to drop protection for specific sites](https://www.techrepublic.com/article/as-google-and-aws-kill-domain-fronting-users-must-find-a-new-way-to-fight-censorship/), leaving website owners with nowhere to go.
Samizdat is explicitly focusing on decentralized tools like [IPFS](https://ipfs.io); in some cases and for certain very specific threats using the biggest gatekeepers might still make sense, and Samizdat might facilitate that. But whenever that is the case, care will be taken to do it in a way that is not tied to particular service or company.
......@@ -21,7 +21,7 @@
</div>
<div id="description">
<p><em>Samizdat</em> is a browser-based Web censorship circumvention library, easily deployable on any website.</p>
<p>Implemented in JavaScript, it uses <a href="https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers">ServiceWorkers</a> and a set of non-standard in-browser content delivery mechanisms (with a strong focus on decentralized ones like <a href="https://gun.eco/">Gun</a> and <a href="https://github.com/ipfs/js-ipfs">JS-IPFS</a>).</p>
<p>Implemented in JavaScript, it uses <a href="https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers">ServiceWorkers</a> and a set of non-standard in-browser content delivery mechanisms (with a strong focus on decentralized ones like <a href="https://github.com/ipfs/js-ipfs">JS-IPFS</a>).</p>
<p>Ideally, as soon as users are able to access a blocked <em>Samizdat</em>-enabled site <em>once</em>, they would not need to install any special software nor change any settings in order to continue to access that site.</p>
<p><em>Samizdat</em> is currently considered <code>alpha</code> software. We would love to hear if you'd like to test it &ndash; you can contact us at <code>rysiek+samizdat[at]hackerspace.pl</code>.<p>
</div>
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment