Commit 79665ebd authored by Michał Woźniak's avatar Michał Woźniak
Browse files

Typo and style fixes in true #hacktoberfest spirit.

parent 4afdd5ed
# Samizdat Overview
[Samizdat](https://samizdat.is/) is a browser-based solution to Web censorship, implemented as a JavaScript library to be deployed easily on any website. Samizdat uses [Service Workers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers) and a suite of non-standard in-browser delivery mechanisms, with a strong focus on decentralized tools like [IPFS](https://ipfs.io/).
[Samizdat](https://samizdat.is/) is a browser-based solution to Web censorship, implemented as a JavaScript library to be deployed easily on any website. Samizdat uses [Service Workers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API/Using_Service_Workers) and a suite of non-standard in-browser delivery mechanisms, with a strong focus on decentralized tools like [`IPFS`](https://ipfs.io/).
This is a high-level overview of the project. It is not supposed to be technical, although some technical details will necessarily get included.
......@@ -46,7 +46,7 @@ Due to its architecture, Samizdat has three distinct use cases:
The primary use case and focus of the project
1. **Visitor privacy protection**
Retrieving content from locations other than the target domain can obscure the fact that a visitor is trying to access the target domain.
Retrieving content from locations other than the original domain to obscure the fact that a visitor is trying to access the original domain.
1. **Seamless CDN with multiple back-ends**
Load can be distributed using any available transport plugins (regular HTTPS `fetch()`, `IPFS`, etc.)
......@@ -68,15 +68,15 @@ This is covered in more depth [here](./PHILOSOPHY.md).
it will not help you, as a user, access specific blocked content, nor will it help website admins access administration panels of their blocked websites.
**Samizdat is also not a hosting provider, nor a CDN**,
although it can be used by hosting providers (especially those providing managed CMS hosting), including to to create an impromptu CDN using different back-ends.
although it can be used by hosting providers (especially those providing managed CMS hosting), including to create an impromptu CDN using different back-ends.
**Samizdat is not a security tool**;
it could be used to help secure a website from certain kinds of attacks, but that is somewhat beyond the core focus of the project.
it could be used to help secure a website from certain kinds of attacks, but that is somewhat outside of the core focus of the project.
**Samizdat very deliberately focuses on *unidirectional information flow*** — from the website to the visitor.
That's why it only implements `GET` requests. `POST` requests, WebSockets, and other means of bidirectional communication are beyond the scope of the project, for two main reasons:
1. *It frees Samizdat from relying on the original server being accessible*, makes it possible to make certain assumptions about content, and enables the use of very varied back-ends (from distributed, like IPFS; to centralized, like CloudFront).
1. *It frees Samizdat from relying on the original server being accessible*, makes it possible to make certain assumptions about content, and enables the use of very varied back-ends (from distributed, like `IPFS`; to centralized, like CloudFront).
1. *It makes Samizdat less useful to those interested in silencing diversity*; while discriminatory content does also comes in the form of articles on websites, it becomes truly toxic when live two-way communication can be employed in an aggressive manner.
......@@ -94,17 +94,17 @@ Finally, there are additional assumptions and limitations depending on which tra
### Content
**Content is assumed to be *public*.** We do not and cannot make any guarantees about who gains access to any content published using any Samizdat plugin. Specifically, using any IPFS-based plugins means that content pushed will potentially remain available indefinitely.
**Content is assumed to be *public*.** We do not and cannot make any guarantees about who gains access to any content published using any Samizdat plugin. Specifically, using any `IPFS`-based plugins means that content will potentially remain available indefinitely.
That is not really different from what can be expected of a regular website which makes its content available without a paywall or a login step. Any content on a website like that can be scraped by anyone, including the WebArchive. For website content [`ROBOTS.txt` file](https://developer.mozilla.org/en-US/docs/Glossary/Robots.txt) offers some level of control, the Samizdat equivalent of that is not publishing content one does not want to be available indefinitely using plugins using IPFS and similar technologies.
That is not really different from what can be expected of a regular website which makes its content available without a paywall or a login step. Any content on a public website can be scraped by anyone (for example the [WebArchive](https://web.archive.org/)). For public website content the [`ROBOTS.txt` file](https://developer.mozilla.org/en-US/docs/Glossary/Robots.txt) offers some level of control. Samizdat equivalent of that is not publishing content one does not want to be available indefinitely using `IPFS`-based plugins and similar technologies.
**It is also assumed to be *static-like*.** This means that it is meaningful to treat it as a collection of files, and that there is no need to send information back to the original server.
For example, a standard WordPress website, even though it's dynamically served, can be scraped and the resulting HTML and media can be served as static files without serious loss of functionality (apart from the comment section, removal of which can often be treated as a feature). In this sense, a standard WordPress site is *static-like*, and can benefit from Samizdat.
For example, a standard WordPress website, and therefore dynamically served, can be scraped, and the resulting HTML and media can be served as static files without serious loss of functionality (apart from the comment section, removal of which can often be treated as a feature). In this sense, a standard WordPress site is *static-like*, and can benefit from Samizdat.
On the other hand, an on-line chat loses almost all of its usefulness if saved to a file. The point of an on-line chat is real-time communication between multiple parties, and in such a case Samizdat would have very limited usability.
**The website deploying Samizdat needs to be *served over HTTPS* with a valid certificate.** Samizdat relies on [Service Workers](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API), and its specification explicitly requires that HTTPS for delivery of the Service Worker script.
**The website deploying Samizdat needs to be *served over HTTPS* with a valid certificate.** Samizdat relies on the [Service Workers API](https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API), and its specification explicitly requires that HTTPS is used for delivery of the Service Worker script.
### Visitors
......@@ -115,6 +115,7 @@ On the other hand, an on-line chat loses almost all of its usefulness if saved t
### Adversaries
Samizdat assumes (and is designed to impede) an adversary that is able to make a website unavailable by:
- DDoSsing it;
- domain or hosting takedowns;
- blocking traffic to/from specific IP addresses;
......@@ -122,6 +123,7 @@ Samizdat assumes (and is designed to impede) an adversary that is able to make a
- blocking based on deep packet inspection (`HOST` header for HTTP, TLS SNI `ClientHello` for HTTPS).
However, Samizdat assumes that the adversary ***does not*** have the capability to gain write-access to the website itself, for example by:
- exploiting a vulnerability in the CMS;
- stealing or guessing admin credentials;
- redirecting traffic to adversary-owned infrastructure *with a valid certificate for the targeted site*.
......@@ -131,12 +133,14 @@ As long as the adversary cannot deploy a piece of JavaScript to the targeted dom
## Architecture and operation
Samizdat is divided into a few parts:
- **[`service-worker.js` script](./../service-worker.js)**
once loaded in a visitor's browser, it takes over handling all requests originating from the website, using plugins for content retrieval;
- **[plugins](./../plugins/)**
which implement different means of content retrieval, of which there are two kinds:
- *transport plugins*, handling retrieval of content from remote locations; this can be achieved by using a regular [HTTPS `fetch()`](./../plugins/fetch.js) to the original domain, or [via `IPFS`](./../plugins/gun-ipfs.js), or by requesting content from any other pre-configured location, or through any other means, as long as it's possible to implement it in JavaScript running in the browser;
- *stashing plugins*, which handle saving successfully retrieved content locally (for example using the [Cache API](https://developer.mozilla.org/en-US/docs/Web/API/Cache)) to be used in case the website is blocked.
- *stashing plugins*, which handle saving successfully retrieved content locally (for example using the [Cache API](https://developer.mozilla.org/en-US/docs/Web/API/Cache)) to be used in case the website is blocked;
- *composing plugins*, compose other plugins in some specific way (for example, making a request using several *transport plugins* simultaneously and returning the first result).
You can read more about Samizdat's architecture [here](./ARCHITECTURE.md).
......@@ -144,14 +148,16 @@ You can read more about Samizdat's architecture [here](./ARCHITECTURE.md).
When a visitor visits a Samizdat-enabled website for the first time, the `service-worker.js` script gets loaded, cached, and registered by the browser. During every subsequent visit, every request originating from that website (be it to the website's original domain, or to any third-party domains) is going to be handled by the Service Worker.
When a request is made, plugins are used to handle it in the order defined in code. By default, that is:
When a request for a resource from the original domain is made, plugins are used to handle it in the order defined in code. By default, that is:
1. regular HTTPS `fetch()` to the original domain;
1. retrieval from local `cache`;
1. any other transport plugins (`IPFS`, or fetching content from other pre-configured endpoints).
It is up to the website admin's to configure the plugins in a way that makes sense. For example, if using the [`gun-ipfs`](./../plugins/gun-ipfs.js) plugin, the admin needs to create a Gun account and populate the plugin's Gun public key variable. If using `IPNS`, the admin needs to populate the `IPNS` public key in the respective plugin. And if alternative HTTPS endpoints are used, it's up to the admin to populate their URLs.
It is up to the website admin's to configure the plugins. For example, if using the [`gun-ipfs`](./../plugins/gun-ipfs.js) plugin, the admin needs to create a Gun account and populate the plugin's Gun public key variable. If using `IPNS`, the admin needs to populate the `IPNS` public key in the respective plugin. And if alternative HTTPS endpoints are used, it's up to the admin to populate their URLs.
Examples of possible alternative HTTPS locations:
- an IP address or a domain controlled by the admin that is not linked to the original website and thus can be expected not to be blocked;
- WebArchive URL;
- a CloudFront location;
......@@ -161,7 +167,7 @@ Examples of possible alternative HTTPS locations:
For content to be available for retrieval by transport plugins, it first needs to be published to whatever locations it is going to be retrieved from.
For the `gun-ipfs` plugin, for instance, content needs to first be published in IPFS, and then the IPFS addresses for each file need to be published to Gun. Currently it is an involved process (example is available in the [CI/CD configuration for the project](./../.gitlab-ci.yml)); making it simpler and easier is the focus of current development.
For the `gun-ipfs` plugin, for instance, content needs to first be published in `IPFS`, and then the `IPFS` addresses for each file need to be published to Gun. Currently it is an involved process (example is available in the [CI/CD configuration for the project](./../.gitlab-ci.yml)); making it simpler and easier is the focus of current development.
For plugins relying on fetching content from alternative HTTPS locations, this can be as simple as deploying the content to the alternative IP address or domain name, pushing the content to WebArchive, putting the content in the Google Drive folder, or uploading it to the CloudFront location.
......@@ -170,8 +176,9 @@ Eventually, Samizdat will have examples and at least some tools to automate this
### Note on threat-models
Each transport plugin will make its own assumptions and have its own weaknesses. For example:
- `IPFS`-based plugins still necessarily rely on IPFS entry nodes; if the adversary blocks these, they will not work
- if content is to be fetched from an alternative IP address or domain name, the adversary could block these
- `IPFS`-based plugins still necessarily rely on `IPFS` entry nodes; if the adversary blocks these, they will not work;
- if content is to be fetched from an alternative IP address or domain name, the adversary could block these;
- using a service like Google Drive to host the content means that the adversary will have to block all of Google, which might not be something they're willing to do; however, Google will have control over that content and might take it down themselves, for whatever reason.
The good news is that government censors move slowly, so it might be possible to switch endpoints quickly enough to always stay ahead, even if using just IP addresses or alternative domains. The benefit of Samizdat in such a case is that visitors do not need to switch to a new domain, Samizdat handles this for them under the hood.
......@@ -208,9 +215,9 @@ graph TD;
We can skip the regular HTTPS `fetch()` plugin completely and rely only on a stashing plugin and some transport plugins that do not make requests to the original site.
This way as soon as Samizdat kicks in (after the first visit to the site) there will be no direct communication indicating they are trying to access the site (not even DNS traffic), since all content is pulled from IPFS or other locations that are not known to the adversary (and thus are not being monitored), or which are not possible to monitor by the adversary (if for example content is pulled directly from a public Google Drive folder).
This way as soon as Samizdat kicks in (after the first visit to the site) there will be no direct communication indicating they are trying to access the site (not even DNS traffic), since all content is pulled from `IPFS` or other locations that are not known to the adversary (and thus are not being monitored), or which are not possible to monitor by the adversary (if for example content is pulled directly from a public Google Drive folder).
This is of course a trade-off — it trades the ability of the adversary to notice the visitor going to the site for the ability of wherever we are pulling the content (IPFS nodes; Google Drive) to notice it.
This is of course a trade-off — it trades the ability of the adversary to track the visitor going to the site for the ability of wherever we are pulling the content (`IPFS` nodes; Google Drive) to track it.
```mermaid
graph TD;
......@@ -229,9 +236,9 @@ graph TD;
In this scenario, the focus is on loading the content from different locations (to limit the load on the main server) and displaying it as quickly as possible.
In this case Samizdat could be configured such that all transport plugins, including regular HTTPS `fetch()`, are used simultaneously, and content is displayed from whichever returns a non-error response first. The response is then stahced by a stashing plugin (like `cache`). If all transport plugins return errors, content is displayed using the stashing plugin (if already stashed).
In this case Samizdat would be configured such that all transport plugins, including regular HTTPS `fetch()`, are used simultaneously, and content is displayed from whichever returns a non-error response first. The response is then stashed by a stashing plugin (like `cache`). If all transport plugins return errors, content is displayed using the stashing plugin (if already stashed).
Samizdat could also potentially be configured in a way such that some kinds of content are *always* retrieved using specific plugins. For example, an IPFS-based plugin could be used for static resources, while HTTPS `fetch()` directly to the site and separately to a different location (CloudFront, for example) could race to get the content that is expected to change more often.
Samizdat could also potentially be configured in a way such that some kinds of content are *always* retrieved using specific plugins. For example, an `IPFS`-based plugin could be used for static resources, while HTTPS `fetch()` directly to the site and separately to a different location (CloudFront, for example) could race to get the content that is expected to change more often.
```mermaid
graph TD;
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment